55
Stream Reasoning ttp://streamreasoning.org/ It's a Streaming World! Reasoning upon Rapidly Changing Information Emanuele Della Valle [email protected]

It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Embed Size (px)

DESCRIPTION

Reasoning on rapidly chancing information requires: a) semantic models for representing both data streams and continuous querying/reasoning tasks, and b) reasoning algorithms optimised for continuous reactive query-answering. This talk presents applications cases from which Stream Reasoning requirements were elicited, it briefly covers the findings of 5 year of research, it presents an optimised algorithm for Incremental Reasoning on RDF Streams (IMaRS), and offers an outlook on future research opportunities.

Citation preview

Page 1: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Stream Reasoning http://streamreasoning.org/

It's a Streaming World! Reasoning upon Rapidly Changing InformationEmanuele Della [email protected]

Page 2: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Agenda

Stream Reasoning in a nutshell

A sample of Stream Reasoning investigation: IMaRS

Conclusions

3

Page 3: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Agenda

Stream Reasoning in a nutshell

It's a streaming world

Requirements

State-of-the-art solutions

Stream Reasoning research questions

Example of findings

A sample of Stream Reasoning investigation: IMaRS

Conclusions

4

Page 4: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

It's a streaming World! 1/2

Off-shore oil operations

Smart Cities

Global Contact Center

Social networks

Generate data streams!

5

Page 5: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

It's a streaming World! 2/2

What is the expected time to failure when thatturbine's barring starts to vibrate as detected in the last 10 minutes?

Is public transportationwhere the people are?

Who are the best available agents to route all these unexpected contacts about the tariff plan launched yesterday?

Who is driving the discussion about the top 10 emerging topics ?

E. Della Valle, S. Ceri, F. van Harmelen, D. Fensel It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems 24(6): 83-89 (2009)

6

Page 6: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Requirements 1/8

A system able to answer those queries must be able to

handle massive datasets• A typical oil production platform is equipped

with about 400.000 sensors

• Telecom data is the most pervasive datasource in urban are, in Milano there are1.8 million mobile users

• A global contact centre of a Telecom operator counts 500 millions of clients

• Facebook alone has 1.1 billion of active users

7

Page 7: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Requirements 2/8

A system able to answer those queries must be able to

process data streams on the fly • The sensors on typical oil production

platform generates 10,000 observationsper minute with peaks of 100,000 o/m

• The mobile users in Milano generates20,000 call/sms/data connectionsper minute with peaks of 80,000 c/m

• A global contact centre receives10,000 contacts per minute withpeaks of 30,000 c/m

• Facebook, as of May 2013, observes3 millions "I like" per minute

8

Page 8: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Requirements 3/8

A system able to answer those queries must be able to

cope with heterogeneous dataset • The sensors on typical oil production

have been deployed over 10 yearsby 10s of different producers

• Tens of data sources are normallyneeded to make sense of an urbanphenomena

• A global contact centre consists in 100sof offices owned by different subsidiary companies engaged yearly

• Each social network has its owndata model, APIs, …

9

Page 9: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Requirements 4/8

A system able to answer those queries must be able to

cope with incomplete data • 10s of sensors and networking links

broke down daily

• Coverage is incomplete

• Only standard cases are covered byfully machine processable data records100s of contacts per minute are manage ad-hoc

• Conversations happen outside thesocial networks, too!

10

Page 10: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Requirements 5/8

A system able to answer those queries must be able to

cope with noisy data • Sensor out-of-operating range

• Faulty sensors

• Agents misunderstand, get tired, …

• Irony, sarcasm, …

11

Page 11: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Requirements 6/8

A system able to answer those queries must be able to

provide reactive answers • detection of dangerous situations

must occur within minutes

• recommendations to citizens mustbe performed in few seconds

• routing a contact through each step of the decision tree must take less than asecond

• Search autocompleting may needto be updated every few minutes

12

Page 12: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Requirements 7/8

A system able to answer those queries must be able to

support fine-grained information access • Identify a turbine among thousands

• Locate a bus among thousands

• Contact an agent among thousands

• Identify an opinion maker amongthousands of influencers for a topic

13

Page 13: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Requirements 8/8

A system able to answer those queries must be able to

integrate complex domain models of • operational and control process

• various city aspects

• contact management, contract types, agent skills, contactor profiles, …

• topics, user profiles, …

14

Page 14: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Requirements (wrap up)

A system able to answer those queries must be able to

handle massive datasets

process data streams on the fly

cope with heterogeneous dataset

cope with incomplete data

cope with noisy data

provide reactive answers

support fine-grained information access

integrate complex domain models

15

Page 15: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

What are data streams anyway?

Formally: • Data streams are unbounded sequences of time-varying data

elements

Less formally: • an (almost) “continuous” flow of information

Assumption• recent information is more relevant as it describes the

current state of a dynamic system

time

16

Page 16: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

The continuous nature of streams

The nature of streams requires a paradigmatic change*• from persistent data

– to be stored and queried on demand – a.k.a. one time semantics

• to transient data – to be consumed on the fly by continuous queries– a.k.a. continuous semantics

* This paradigmatic change first arose in DB community in the late '90s

17

Page 17: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Continuous Semantics

Continuous queries registered over streams thatare observed trough windows

window

input streams streams of answerRegistered Continuous Query

Dynamic System

18

Page 18: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

DSMS/CEP State of the Art

Gianpaolo Cugola, Alessandro Margara: Processingflows of information: From data stream to complex event processing. ACM Comput. Surv. 44(3): 15 (2012)

Content• Type of models compared

– Functional and processing– Deployment and interactions– Data, Time, and Rule– Language

• # of systems surveyed: – Academic: 24 – Industrial: 9– Total: 33

• To learn more:– http://home.dei.polimi.it/margara/papers/survey.pdf

19

Page 19: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Existing solutions

RequirementDSMS CEP

massive datasets

data streams

heterogeneous dataset

incomplete data

noisy data

reactive answers

fine-grained information access

complex domain models

20

Page 20: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Given ontology O and query Q, use O to rewrite Qas Q’ so that, for any set of ground facts A contained in multiple databases:• answer(Q,O,A) = answer(Q’,,A)

– The answer of the query Q using the ontology O for any set of ground facts A is equal to answer of a query Q’ without considering the ontology O

Use mapping M to map Q’ to multiple SQL queries to the various databases

Ontology Based Data Access

Rewrite

O

QQ’

MapSQL

M

answerA

21

Page 21: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Existing solutions

RequirementDSMS CEP

OBDA

massive datasets

data streams

heterogeneous dataset

incomplete data

noisy data

reactive answers

fine-grained information access

complex domain models

22

Page 22: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Research Question

is it possible to make sense in real time of multiple, heterogeneous, gigantic and inevitably noisy and incomplete data streams in order to support the decision process of extremely large numbers of concurrent user?

23

Page 23: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Stream Reasoning feasibility (intuition)

Many relevant reasoning methods are not able todeal with high frequency data streams

However, trade-off exists between the complexity of the reasoning method and the frequency of the data stream

ComplexityRaw Stream Processing

Semantic Streams

DL-Lite

DLAbstraction

Selection

Interpretation

Reasoning

Querying

Re-writing

Change Frequency

PTIME

NEXPTIME

104 Hz

1 Hz

Complexity vs. Dynamics

Heiner Stuckenschmidt, Stefano Ceri, Emanuele Della Valle, Frank van Harmelen: Towards Expressive Stream Reasoning. Proceedings of the Dagstuhl Seminar on Semantic Aspects of Sensor Networks, 2010.

AC0

24

Page 24: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Sub-research questions

Is it possible model at semantic levelheterogeneous data streams, continuous queries, and continuous reasoning tasks?

Does the ordered nature of data streams and the possibility to forget old enough information allow to optimize continuous querying and continuous reasoning tasks so to provide reactive answers to large number of concurrent users without forsaking correctness or completeness?

Can Semantic Web and Machine Learning technologies be jointly employed to cope with the noisy and incomplete nature of data streams?

Are there practical cases where processing data stream at semantic level is the best choice?

25

Page 25: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

FindingsModel data stream at semantic level 1/2

State of the art: RDF• It allows to make statements about resources in the form

of subject-predicate-object expressions– In RDF terminology triples– E.g. @BarakObama posts "Four more years"

• A collection of RDF statements represents a labelled, directed graph– In RDF terminology a graph– E.g., the tweet above by Barak Obama is connected to

- 800,000+ twitter user profiles via retweets- 300,000+ twitter user profiles favorite- …

Contribution: RDF stream

subject predicate object

26

Page 26: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

FindingsModel data stream at semantic level 2/2

State of the art: RDF

Contribution: RDF Stream• Unbound sequence of time-varying triples• each represented by a pair made of an RDF triple and its

timestamp

@BarakObama posts "Four more years", 8:16PM 6 Nov 2012

@Alice posts "RT: Four more years", 8:17PM 6 Nov 2012

D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus: Querying RDF streams with C-SPARQL. SIGMOD Record 39(1): 20-26 (2010)

27

subject predicate object timestamp

Page 27: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

FindingsAdding continuous semantics to SPARQL

Who are the opinion makers? i.e., the users who arelikely to influence the behavior their followers

REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS

CONSTRUCT { ?opinionMaker sd:about ?resource }

FROM STREAM <http://…> [RANGE 30m STEP 5m]

WHERE {

?opinionMaker ?opinion ?resource .

?follower sioc:follows ?opinionMaker.

?follower ?opinion ?resource.

FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker) )

}

HAVING ( COUNT(DISTINCT ?follower) > 3 )28

Page 28: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

FindingsAdding continuous semantics to SPARQL

Who are the opinion makers? i.e., the users who arelikely to influence the behavior their followers

REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS

CONSTRUCT { ?opinionMaker sd:about ?resource }

FROM STREAM <http://…> [RANGE 30m STEP 5m]

WHERE {

?opinionMaker ?opinion ?resource .

?follower sioc:follows ?opinionMaker.

?follower ?opinion ?resource.

FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker) )

}

HAVING ( COUNT(DISTINCT ?follower) > 3 )

Query registration(for continuous execution)

FROM STREAM clause

WINDOW

RDF Stream added as new ouput format

Builtin to access timestamps

29

Page 29: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

FindingsCope with the noisy and incomplete data

Noise is reduced using DSMS techniques

Deductive stream reasoning copes with incompleteness deducing implicit facts

Inductive stream reasoning copes "irrepairable" incompleteness inducing missing facts

D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp, A. Rettinger, H. Wermser: Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics. IEEE Intelligent Systems 25(6): 32-41 (2010)

30

Page 30: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

FindingsPractical cases

10+ deployments in Sensor Networks,Social media analytics and Cloud Computing, e.g.

BOTTARI

Winner of Semantic Web Challenge 2011

City Data Fusion

Winner of IBM faculty award 2013

M. Balduini, I. Celino, D. Dell’Aglio, E. Della Valle, Y. Huang, T. Lee, S.-H. Kim, V. Tresp: BOTTARI: An augmented reality mobile application to deliver personalized and location-based recommendations by continuous analysis of social media streams. J. Web Sem. 16: 33-41 (2012)

31

Page 31: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

FindingsOptimize for reactive answers

C-SPARQL time window-based selections outperform SPARQL filter-based selections

Incremental Reasoning on RDF streams (IMaRS): new reasoning algorithm optimized for reactive query answering

D.F. Barbieri, D. Braga, S.Ceri, E. Della Valle, M. Grossniklaus: Incremental Reasoning on Streams and Rich Background Knowledge. ESWC (1) 2010: 1-15D. Dell'Aglio, E. Della Valle: Incremental Reasoning on RDF Streams. In A.Harth, K.Hose, R.Schenkel (Eds.) Linked Data Management, CRC Press 2014, ISBN 9781466582408

32

Page 32: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Agenda

Stream Reasoning in a nutshell

It's a streaming world

Requirements

State-of-the-art solutions

Stream Reasoning research questions

Affirmative answers collected in 5 years of investigations

A sample of Stream Reasoning investigation: IMaRS

Problem statement

State-of-the art solutions

Proposed solution

Experimental results

Conclusions 33

Page 33: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

The problem

Reasoning upon rapidly changing information

Example Context

Social Media Analytics: a set of methods and technologies for collecting, monitoring, analyzing, summarizing, and visualizing media generated and disseminated in a conversational, and distributed mode among online communities

Real query Which have been the most discussed posts in the last hour?

Simplified version How many posts have discussed A in the last 40 min?

Data

A B C

A, B, and C represent postsA B means B discusses AA C means C indirectly discusses ANOTE: discusses is transitive property

34

Page 34: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

TimeEnter

windowExit

windowExplicitly in

windowInferred in

window # Exp. # Inf. SUM

How many posts have discussed A in the last 40 min?

The problem

A B

A B C A C

A B C

E

A C

A B C

E

A C

A C

E

A C

A B C

E

A C

CE

10:00 AB

10:10 BC

10:20 AE

10:30 EC

10:40 AB

10:50 BC

11:00 AE

1 1

1 1 2

2 1 3

2 1 3

1 1 2

1 1 2

0 0 0

35

Page 35: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Is it an hard problem?

In the database community it is known as the incremental view maintenance problem

The state of the art solution is the DRed algorithm• Overestimation of deletion: Overestimates deletions by

computing all direct consequences of a deletion.

• Rederivation: Prunes those estimated deletions for which alternative derivations (via some other facts in the program) exist.

• Insertion: Adds the new derivations that are consequences of insertions

Deletion are twice as expensive as insertions

A B C

E

A C

A B C

E

A C

36

Page 36: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/ 37

Is it an hard problem?

1000

100

10

0

rematerialize the view after each update

0% 2% 4% 6% 8% 10% 12% 14%

% of deletions w.r.t the size of the DB

ms

Break-even point

In a normal DB incremental view maintenance is convenient in most of the cases:• Lot of inserts, few deletions• Small % of deletions w.r.t.

the size of the DB

incremental view maintenance

Page 37: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/ 38

Is it an hard problem?

1000

100

10

0

rematerialize the view after each update

0% 2% 4% 6% 8% 10% 12% 14%

% of deletions w.r.t the content of the window

ms

Break-even point

In a streaming setting • The number of inserts are equals

to the number of deletions• the data exiting the windows

causes large % of deletions w.r.t. the size of the window

incremental view maintenance

Page 38: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/ 39

What is the target behaviour?

1000

100

10

0

rematerialize the view after each update

incremental view maintenance

0% 2% 4% 6% 8% 10% 12% 14%

ms

Break-even point

Target behaviour

% of deletions w.r.t the content of the window

Page 39: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Observation

DRed works with random insertions and deletions

In a streaming setting, when a triple enters the window, given the size of the window, the reasoner knows already when it will be deleted!

E.g., • if the window is 40 minutes

long, and, • it is 10:00, the triple(s)

entering now• will exit on 10:40.

Conclusion• deletions are predictable

40

Page 40: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Optimized Stream Reasoning (IMaRS)

Idea: • add an expiration time to each triple and • use an hash table to index triples by their expiration time

The algorithm1. deletes expired triples 2. Adds the new derivations that are consequences of insertions

annotating each inferred triple with an expiration time (the min of those of the triple it is derived from), and

3. when multiple derivations occur, for each multiple derivation, it keeps the max expiration time.

41

Page 41: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

TimeEnter

windowExit

windowExplicitly in

windowInferred in

window # Exp. # Inf. SUM

How many posts have discussed A in the last 40 min?

The problem

A B

A B C A C

10:00 AB

10:10 BC

1 1

1 1 2

10:40

10:40 10:50 10:40

The min of the two expiration times

42

Page 42: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

TimeEnter

windowExit

windowExplicitly in

windowInferred in

window # Exp. # Inf. SUM

How many posts have discussed A in the last 40 min?

The problem

A B

A B C A C

A B C

E

A C

10:00 AB

10:10 BC

10:20 AE

1 1

1 1 2

2 1 3

10:40

10:40 10:50 10:40

10:40 10:50 10:4011:00

AC remains unchanged

43

Page 43: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

TimeEnter

windowExit

windowExplicitly in

windowInferred in

window # Exp. # Inf. SUM

How many posts have discussed A in the last 40 min?

The problem

A B

A B C A C

A B C

E

A C

A B C

E

A C

10:00 AB

10:10 BC

10:20 AE

10:30 EC

1 1

1 1 2

2 1 3

2 1 3

10:40

10:40 10:50 10:40

10:40 10:50 10:4011:00

10:40 10:50 11:0011:00 11:10

AC is inferred with a longer lasting expiration time, the max of the expiration time is used

44

Page 44: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

TimeEnter

windowExit

windowExplicitly in

windowInferred in

window # Exp. # Inf. SUM

How many posts have discussed A in the last 40 min?

The problem

A B

A B C A C

A B C

E

A C

A B C

E

A C

A B C

E

A C

10:00 AB

10:10 BC

10:20 AE

10:30 EC

10:40 AB

1 1

1 1 2

2 1 3

2 1 3

2 1 3

10:40

10:40 10:50 10:40

10:40 10:50 10:4011:00

10:40 10:50 11:0011:00 11:10

10:50 11:0011:00 11:10

This deletion causes no effect

45

Page 45: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

TimeEnter

windowExit

windowExplicitly in

windowInferred in

window # Exp. # Inf. SUM

How many posts have discussed A in the last 40 min?

The problem

A B

A B C A C

A B C

E

A C

A B C

E

A C

A C

E

A C

A B C

E

A C

10:00 AB

10:10 BC

10:20 AE

10:30 EC

10:40 AB

10:50 BC

1 1

1 1 2

2 1 3

2 1 3

2 1 3

2 1 3

10:40

10:40 10:50 10:40

10:40 10:50 10:4011:00

10:40 10:50 11:0011:00 11:10

10:50 11:0011:00 11:10

11:0011:00 11:10

46

This deletion causes no effect

Page 46: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

TimeEnter

windowExit

windowExplicitly in

windowInferred in

window # Exp. # Inf. SUM

How many posts have discussed A in the last 40 min?

The problem

A B

A B C A C

A B C

E

A C

A B C

E

A C

A C

E

A C

A B C

E

A C

CE

10:00 AB

10:10 BC

10:20 AE

10:30 EC

10:40 AB

10:50 BC

11:00 AE

1 1

1 1 2

2 1 3

2 1 3

2 1 3

2 1 3

0 0 0

10:40

10:40 10:50 10:40

10:40 10:50 10:4011:00

10:40 10:50 11:0011:00 11:10

10:50 11:0011:00 11:10

11:0011:00 11:10

Deletion of inferred triples is just a look up

47

Page 47: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Experimental results

Re-materialize after each window slide

Use DRed

IMaRS

% of deletions w.r.t. the content of the window

48

Page 48: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Implication of the experimental results

comparison of the average time needed to answera C-SPARQL query, when 2% of the content exits the window each time it slides, using • A backward reasoner on the window content• DRed + standard SPARQL on the materialization• IMaRS + standard SPARQL on the materialization

Backward DRed +SPARQL IMaRS + SPARQL

49

Page 49: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Agenda

Stream Reasoning in a nutshell

It's a streaming world

Requirements

State-of-the-art solutions

Stream Reasoning research questions

Affirmative answers collected in 5 years of investigations

A sample of Stream Reasoning investigation: IMaRS

Problem statement

State-of-the art solutions

Proposed solution

Experimental results

Conclusions 50

Page 50: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/ 51

Scholarly impact #

of c

itatio

ns

year

SR Vision

C-SPARQL

IMaRS

Inductive SR

BOTTARI

Page 51: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Recent WorkBenchmarks are needed

Stream Reasoning benchmarks are appearing• SRBench Ying Zhang, Minh-Duc Pham, Óscar Corcho, Jean-

Paul Calbimonte– http://www.w3.org/wiki/SRBench

• LSBench: Danh Le Phuoc, Minh Dao-Tran, Minh-Duc Pham, Peter A. Boncz, Thomas Eiter, Michael Fink– http://code.google.com/p/lsbench/

What shall we evaluate?• PeCK methodology: Sys Properties -> Challenges -> KPI

How?• Maximise throughput is OK, but do not forsake correctness!

52

T. Scharrenbach, J. Urbani, A. Margara, E. Della Valle, A. Bernstein: Seven Commandments for Benchmarking Semantic Flow Processing Systems. ESWC 2013: 305-319

D. Dell'Aglio, J.-P. Calbimonte, M. Balduini, Ó. Corcho, E. Della Valle: On Correctness in RDF Stream Processor Benchmarking. ISWC (2) 2013: 326-342

Page 52: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Recent WorkSupporting industrial take up

W3C's RDF stream processing Community Group • http://www.w3.org/community/rsp/ • Goals

– collect use cases– cense approaches– define RDF stream models– define SPARQL extension for RDF stream processing– define RESTful interfaces

• 52 members from 24 organization (6 industries)• Biweekly phone calls since September 2013

53

Page 53: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Next stepGeneralizing from time to any ordering

Observation: order reflects recency, relevance, …

Indexes

Recency

Relevance …

Combinations

Typ

es o

f o

rder

s

No Yes

Traditional solutions

DSMS/CEP

Top-k Q/A

Continuous top-k Q/A

Scalable reasoning

Stream reasoning

Order-aware reasoning

Top-k Reasoning

Reasoning

Emanuele Della Valle, Stefan Schlobach, Markus Krötzsch, Alessandro Bozzon, Stefano Ceri, Ian Horrocks: Order matters! Harnessing a world of orderings for reasoning over massive data. Semantic Web 4(2): 219-231 (2013)

Sara Magliacane, Alessandro Bozzon, Emanuele Della Valle: Efficient Execution of Top-K SPARQL Queries. International Semantic Web Conference (1) 2012: 344-360

54

Page 54: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Emanuele Della Valle - http://streamreasoning.org/

Credits

Politecnico di Milano’s colleagues Vision: Prof. S. Ceri C-SPARQL: M. Balduini, D. Barbieri, D. Braga, S. Ceri and M.

Grossniklaus Benchmarking: M. Balduini, D. Dell'Aglio, and A. Margara SPARQL-RANK: S. Magliacane and A. Bozzon People I directly worked with on the topic CEFRIEL: I. Celino Saltlux: T. Lee and S.-H. Kim SIEMENS: prof. V. Tresp and Y. Huang U-Innsbruck: prof. D. Fensel U-Oxford: prof. I. Horrocks and M. Krötzsch UP-Madrid: Ó. Corcho and J.-P. Calbimonte U-Mannheim: prof. H. Stuckenschmidt U-Zurich: prof. A. Bernstein and T. Scharrenbach VA-Amsterdam: prof. F. van Harmelen, S. Schlobach and J. UrbaniThe broader research community that showed interest in stream reasoning

55

Page 55: It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

Stream Reasoning http://streamreasoning.org/

Thank you for paying attention!

Any question?Emanuele Della Valle

[email protected]