68
Stream Reasoning: mastering the velocity and the variety dimensions of Big Data at once Emanuele Della Valle DEIB Politecnico di Milano @manudellavalle [email protected] hBp://emanueledellavalle.org University of Olso, Norway 3.11.2015

Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Embed Size (px)

Citation preview

Page 1: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Stream  Reasoning:  mastering  the  velocity  and  the  variety    

dimensions  of  Big  Data  at  once  

Emanuele  Della  Valle  DEIB  -­‐  Politecnico  di  Milano  

@manudellavalle  [email protected]  hBp://emanueledellavalle.org    

University  of  Olso,  Norway  -­‐    3.11.2015  

Page 2: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

It's  a  streaming  world  …  •  Off-­‐shore  oil  operaQons  

•  Smart  CiQes  

•  Global  Contact  Center  

•  Social  networks  

•  Generate  data  streams!  

E.  Della  Valle,  S.  Ceri,  F.  van  Harmelen,  D.  Fensel  It's  a  Streaming  World!  Reasoning  upon  Rapidly  Changing  Informa:on.  IEEE  Intelligent  Systems  24(6):  83-­‐89  (2009)  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   2

Page 3: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

…  looking  for  reacQve  answers  …  •  What  is  the  expected  Qme  to  failure  when  that  

turbine's  barring  starts  to  vibrate  as    detected  in  the  last  10  minutes?    

•  Is  public  transportaQon  where  the  people  are?    

•  Who  are  the  best  available  agents  to    route  all  these  unexpected  contacts    about  the  tariff  plan  launched  yesterday?    

•  Who  is  driving  the  discussion    about  the  top  10  emerging  topics  ?    

•  Require  conQnuous  processing    and  reacQve  answer  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   3

Page 4: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

…with  conflicQng  requirements  1/8  A  system  able  to  answer  those  queries  must  be  able  to    •  handle  massive  datasets  

–  A  typical  oil  producQon  plaeorm  is  equipped    with  about  400.000  sensors  

–  Telecom  data  is  the  most  pervasive  data  source  in  urban  are,  in  Milano  there  are  1.8  million  mobile  users  

–  A  global  contact  centre  of  a  Telecom    operator  counts  500  millions  of  clients    

–  Facebook  alone  has  1.1  billion    of  acQve  users    

   

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   4

Page 5: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

…with  conflicQng  requirements  2/8  A  system  able  to  answer  those  queries  must  be  able  to    •  process  data  streams  on  the  fly    

–  The  sensors  on  typical  oil  producQon    plaeorm  generates  10,000  observaQons  per  minute  with  peaks  of  100,000  o/m  

–  The  mobile  users  in  Milano  generates  20,000  call/sms/data  connecQons  per  minute  with  peaks  of  80,000  c/m  

–  A  global  contact  centre  receives  10,000  contacts  per  minute  with  peaks  of  30,000  c/m  

–  Facebook,  as  of  May  2013,  observes  3  millions  "I  like"  per  minute  

   

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   5

Page 6: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

…with  conflicQng  requirements  3/8  A  system  able  to  answer  those  queries  must  be  able  to    •  cope  with  heterogeneous  dataset      

–  The  sensors  on  typical  oil  producQon  have  been  deployed  over  10  years  by  10s  of  different  producers    

–  Tens  of  data  sources  are  normally  needed  to  make  sense  of  an  urban  phenomena  

–  A  global  contact  centre  consists  in  100s  of  offices  owned  by  different  subsidiary    companies  engaged  yearly  

–  Each  social  network  has  its  own  data  model,  APIs,  …  

   

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   6

Page 7: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

…with  conflicQng  requirements  4/8  A  system  able  to  answer  those  queries  must  be  able  to    •  cope  with  incomplete  data        

–  10s  of  sensors  and  networking  links    broke  down  daily    

–  Coverage  is  incomplete      

–  Only  standard  cases  are  covered  by  fully  machine  processable  data  records  100s  of  contacts  per  minute  are    manage  ad-­‐hoc  

–  Conversa:ons  happen  outside  the  social  networks,  too!  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   7

Page 8: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

…with  conflicQng  requirements  5/8  A  system  able  to  answer  those  queries  must  be  able  to    •  cope  with  noisy  data          

–  Sensor  out-­‐of-­‐opera:ng  range        

–  Faulty  sensors      

–  Agents  misunderstand,  get  :red,  …      

–   Irony,  sarcasm,  …  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   8

Page 9: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

…with  conflicQng  requirements  6/8  A  system  able  to  answer  those  queries  must  be  able  to    •  provide  reac:ve  answers            

–  detecQon  of  dangerous  situaQons    must  occur  within  minutes      

–  recommendaQons  to  ciQzens  must  be  performed  in  few  seconds    

–  rouQng  a  contact  through  each  step  of    the  decision  tree  must  take  less  than  a  second  

–  Search  autocompleQng  may  need  to  be  updated  every  few  minutes    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   9

Page 10: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

…with  conflicQng  requirements  7/8  A  system  able  to  answer  those  queries  must  be  able  to    •  support  fine-­‐grained  informa:on  access              

–  IdenQfy  a  turbine  among  thousands      

–  Locate  a  bus  among  thousands      

–  Contact  an  agent  among  thousands      

–  IdenQfy  an  opinion  maker  among  thousands  of  influencers  for  a  topic  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   10

Page 11: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

…with  conflicQng  requirements  8/8  A  system  able  to  answer  those  queries  must  be  able  to    •  integrate  complex  domain  models  of              

–  opera:onal  and  control  process        

–  various  city  aspects      

–  contact  management,  contract  types,    agent  skills,  contactor  profiles,  …      

–  topics,  user  profiles,  …  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   11

Page 12: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Challenges  A  system  able  to  answer  those  queries  must  be  able  to    •  handle  massive  datasets              x    •  process  data  streams  on  the  fly            x    •  cope  with  heterogeneous  datasets            x    •  cope  with  incomplete  data                  x  x          •  cope  with  noisy  data                        x          •  provide  reac:ve  answers                x            •  support  fine-­‐grained  access              x        x              •  integrate  complex  domain  models              x    

Volum

e'

Veloc

ity'

Varie

ty'

Verac

ity'

In Big Data terms

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   12

Page 13: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Grand  challenge  

•  Volume  +  Velocity  +  Variety  =  hard  deal  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org  

Volu

me

months days hours min. sec. ms. velocity

ZB EB PB TB GB MB KB

Variety

13

Page 14: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

A  good  reason  to  embrace  it!  

•  ++  Variety  à  ++  value    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org  

Valu

e

ms. sec. min. hours days months years velocity

Variety

14

Page 15: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

From  challenges  to  opportuniQes  •  Formally  data  streams  are  :    

–  unbounded  sequences  of  Qme-­‐varying  data  elements  

•  Less  formally,  in  many  applicaQon  domains,  they  are:    –  a  “conQnuous”  flow  of  informaQon    –  where  recent  informa:on  is  more  relevant  as  it  describes  the  current  state  of  a  dynamic  system  

•  OpportuniQes  –  Forget  old  enough  informa:on  –  Exploit  the  implicit  ordering  (by  recency)  in  the  data    

time

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   15

Page 16: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

State-­‐of-­‐the-­‐art:  DSMS  and  CEP    •  A  paradigma:c  change!  •  ConQnuous  queries  registered  over  streams  that  

are  observed  trough  windows  

 

window

input streams streams of answer Registered  ConQnuous  Query  

Dynamic  System

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   16

Page 17: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

DSMS  and  CEP  vs.  requirements  

Requirement DSMS CEP

massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models

✗ ✗

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   17

Page 18: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

State of the art: OBDA  

•  Given  ontology  O  and  query  Q,  use  O  to  rewrite  Q  as  Q’  so  that,  for  any  set  of  ground  facts  A  contained  in  mulQple  databases:  –  answer(Q,O,A)  =  answer(Q’,!,A)  

The  answer  of  the  query  Q  using  the  ontology  O  for  any  set  of  ground  facts  A  is  equal  to  answer  of  a  query  Q’  without  considering  the  ontology  O    

•  Use  mapping  M  to  map  Q’  to  mulQple  SQL  queries  to  the  various  databases  

Rewrite

O

Q Q’

Map SQL

M

answer AUiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   18

Page 19: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

DSMS/CEP,OBDA  vs.  requirements  

Requirement DSMS CEP

OBDA

massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models

✗ ✗

✗ ✗

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   19

Page 20: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Stream  Reasoning  •  Research  quesQon  

–  is  it  possible  to  make  sense  in  real  :me  of    mul:ple,  heterogeneous,  gigan:c  and  inevitably  noisy  and  incomplete  data  streams  in  order  to  support  the  decision  processes  of  extremely  large  numbers  of  concurrent  users?  

•  Proposed  approach  

 

Complexity  Raw  Stream  Processing  

SemanQc  Streams  

DL-­‐Lite  

DL  AbstracQon  

SelecQon  

InterpretaQon  

Reasoning  

Querying  

Re-­‐wriQng  

Change  Frequency  

PTIME  

NEXPTIME  

104  Hz  

1  Hz    

Complexity  vs.  Dynamics    AC0  

H.  Stuckenschmidt,  S.  Ceri,  E.  Della  Valle,  F.  van  Harmelen:  Towards  Expressive  Stream  Reasoning.  Proceedings  of  the  Dagstuhl  Seminar  on  SemanQc  Aspects  of  Sensor  Networks,  2010.    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   20

Page 21: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Sub-­‐research  quesQons  

1.  Is  it  possible  extend  the  Seman:c  Web  stack    in  order  to  represent  heterogeneous  data  streams,  conQnuous  queries,  and  conQnuous  reasoning  tasks?  

2.  Does  the  ordered  nature  of  data  streams  and  the  possibility  to  forget  old  enough  informaQon  allow  to  op:mize  con:nuous  querying  and  con:nuous  reasoning  tasks  so  to  provide  reac:ve  answers  to  large  number  of  concurrent  users  without  forsaking  correctness  or  completeness?    

3.  Can  SemanQc  Web  and  Machine  Learning  technologies  be  jointly  employed  to  cope  with  the  noisy  and  incomplete  nature  of  data  streams?  

4.  Are  there  prac:cal  cases  where  processing  data  stream  at  semanQc  level  is  the  best  choice?    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   21

Page 22: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Sub-­‐research  quesQons  1.  Is  it  possible  extend  the  Seman:c  Web  stack    

in  order  to  represent  heterogeneous  data  streams,  conQnuous  queries,  and  conQnuous  reasoning  tasks?  

2.  Does  the  ordered  nature  of  data  streams  and  the  possibility  to  forget  old  enough  informaQon  allow  to  op:mize  con:nuous  querying  and  con:nuous  reasoning  tasks  so  to  provide  reac:ve  answers  to  large  number  of  concurrent  users  without  forsaking  correctness  or  completeness?    

3.  Can  SemanQc  Web  and  Machine  Learning  technologies  be  jointly  employed  to  cope  with  the  noisy  and  incomplete  nature  of  data  streams?  

4.  Are  there  prac:cal  cases  where  processing  data  stream  at  semanQc  level  is  the  best  choice?    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   22

Page 23: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

State-­‐of-­‐the-­‐art:  RDF  model  •  RDF:  Resource  DescripQon  Framework  

–  It  allows  to  make  statements  about  resources  in  the  form  of  subject-­‐predicate-­‐object  expressions  

•  In  RDF  terminology  triples  •  E.g.                @BarakObama                posts                    "Four  more  years"      

– A  collecQon  of  RDF  statements  represents  a  labelled,  directed  graph  

•  In  RDF  terminology  a  graph  •  E.g.,  the  tweet  above  by  Barak  Obama  is  connected  to  

–  800,000+  twiBer  user  profiles  via  retweets  –  300,000+  twiBer  user  profiles  favorite  –  …  

subject predicate object

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   23

Page 24: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

ContribuQon:  RDF  stream  Models      •  RDF  Stream  (the  C-­‐SPARQL  way)  

–  Unbound  sequence  of  :me-­‐varying  triples  –  each  represented  by  a  pair  made  of  an  RDF  triple  and  its  Qmestamp  

–  Timestamp  are  non-­‐decreasing  (allowing  for  simultaneity)            …    @BarakObama                posts              "Four  more  years",                              8:16PM  6  Nov  2012    @Alice                                      posts              "RT:  Four  more  years",                  8:17PM  6  Nov  2012            …  

 

D.F.  Barbieri,  D.  Braga,  S.  Ceri,  E.  Della  Valle,  M.  Grossniklaus:  Querying  RDF  streams  with    C-­‐SPARQL.  SIGMOD  Record  39(1):  20-­‐26  (2010)    

subject predicate object timestamp

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   24

Page 25: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

ContribuQon:  RDF  stream  Models    

•  RDF  Stream  (the  Streaming  Linked  Data  way)  –  Unbound  sequence  of  :me-­‐varying  graphs  –  each  represented  by  a  pair  made  of  an  RDF  graph  and  its  Qmestamp    

–  Timestamps  (if  present)  are  monotonically  increasing  –  Graphs  act  as  a  form  of  punctuaQon  (all  triples  in  a  graph  are  simultaneous)    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org  

D.F.  Barbieri,  E.  Della  Valle:  A  Proposal  for  Publishing  Data  Streams  as  Linked  Data  -­‐  A  Posi:on  Paper.  LDOW  (2010)     25

Page 26: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

RDF  streams  Qme  semanQcs  1/3  

•  A  RDF  stream  without  Qmestamp  is  an  ordered  sequence  of  data  items  

•  The  order  can  be  exploited  to  perform  queries  –  Does  Alice  meet  Bob  before  Carl?  – Who  does  Carl  meet  first?  

S   e1  

:alice  :isWith  :bob  

e2  

:alice  :isWith  :carl  

e3  

:bob  :isWith  :diana  

e4  

:diana  :isWith  :carl  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   26

Page 27: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

RDF  streams  Qme  semanQcs  2/3  

•  One  Qmestamp:  the  Qme  instant  on  which  the  data  item  occurs  

•  We  can  start  to  compose  queries  taking  into  account  the  Qme  –  How  many  people  has  Alice  met  in  the  last  5m?  –  Does  Diana  meet  Bob  and  then  Carl  within  5m?  

e1   e2   e3   e4  S  

t  3   6   9  1  

:alice  :isWith  :bob  :alice  :isWith  :carl  

:bob  :isWith  :diana  :diana  :isWith  :carl  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   27

Page 28: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

RDF  streams  Qme  semanQcs  3/3  

•  Two  Qmestamps:  the  Qme  range  on  which  the  data  item  is  valid  (from,  to]  

•  It  is  possible  to  write  even  more  complex  constraints:  – Which  are  the  meeQngs  the  last  less  than  5m?  – Which  are  the  meeQngs  with  conflicts?  

.  

S  

t  3   6   9  1  

:alice  :isWith  :bob  :alice  :isWith  :carl  

:bob  :isWith  :diana  :diana  :isWith  :carl  

e1

e2

e3

e4

D.  Anicic,  P.  Fodor,  S.  Rudolph,  &  N.  Stojanovic.  EP-­‐SPARQL:  a  unified  language  for  event  processing  and  stream  reasoning.  In  WWW  2011,  pages  635–644  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   28

Page 29: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Finding  •  The  Seman:c  Web  stack  can  be  extended  so  to  incorporate  streaming  data  as  a  first  class  ciQzen  –  RDF  stream  data  model(s)  –  Con:nuous  SPARQL  syntax  and  semanQcs  –  Con:nuous  deduc:ve  reasoning  semanQcs          

 

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   29

Page 30: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Work  in  progress  

•  In  2013,  an  RDF  Stream  Processing  (RSP)  community  group  was  created  at  W3C  

hBp://www.w3.org/community/rsp/    

•  RSP  data  model  and  serializaQon  – hBps://github.com/streamreasoning/RSP-­‐QL/blob/master/SerializaQon.md    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   30

Page 31: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

State-­‐of-­‐the-­‐art:  SPARQL  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   31

Page 32: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

ContribuQon:  ConQnuous-­‐SPARQL  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   32

Page 33: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

ContribuQon:  ConQnuous-­‐SPARQL Who  are  the  opinion  makers?  i.e.,  the  users  who  are  

likely  to  influence  the  behavior  their  followers  

REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource }

FROM STREAM <http://…> [RANGE 30m STEP 5m] WHERE {

?opinionMaker ?opinion ?res .

?follower sioc:follows ?opinionMaker.

?follower ?opinion ?res.

FILTER (cs:timestamp(?follower ?opinion ?res) > cs:timestamp(?opinionMaker ?opinion ?res) ) }

HAVING ( COUNT(DISTINCT ?follower) > 3 )

SR  2015,  Austria  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   33

Page 34: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

ContribuQon:  ConQnuous-­‐SPARQL Who  are  the  opinion  makers?  i.e.,  the  users  who  are  

likely  to  influence  the  behavior  their  followers  

REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource }

FROM STREAM <http://…> [RANGE 30m STEP 5m] WHERE {

?opinionMaker ?opinion ?res .

?follower sioc:follows ?opinionMaker.

?follower ?opinion ?res.

FILTER (cs:timestamp(?follower ?opinion ?res) > cs:timestamp(?opinionMaker ?opinion ?res) ) }

HAVING ( COUNT(DISTINCT ?follower) > 3 )

Query  registra:on  (for  con:nuous  execu:on)  

FROM  STREAM  clause  

WINDOW  

RDF  Stream  added  as    new  ouput  format      

Buil:n  to  access  :mestamps  

SR  2015,  Austria  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org  

D.F.  Barbieri,  D.  Braga,  S.  Ceri,  E.  Della  Valle,  M.  Grossniklaus:  Querying  RDF  streams  with    C-­‐SPARQL.  SIGMOD  Record  39(1):  20-­‐26  (2010)     34

Page 35: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Finding  •  The  Seman:c  Web  stack  can  be  extended  so  to  incorporate  streaming  data  as  a  first  class  ciQzen  –  RDF  stream  data  model  –  Con:nuous  SPARQL  syntax  and  semanQcs  –  Con:nuous  deduc:ve  reasoning  semanQcs          

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   35

Page 36: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

AlternaQves  to  C-­‐SPARQL  •  CQELS  

–  What:  STREAM  clause,  focus  on  new  answer    –  Ref:  Le-­‐Phuoc,  D.,  Dao-­‐Tran,  M.,  Xavier  Parreira,  J.,  &  Hauswirth,  M.    

A  naQve  and  adapQve  approach  for  unified  processing  of  linked  streams  and  linked  data.  In  ISWC  2011,  pages  370–388.    

•  SPARQLStream  –  What:  window  in  the  past,  focus  on  RDF  to  Stream  operators  –  Ref:  Calbimonte,  J.-­‐P.,  Corcho,  O.,  &  Gray,  A.  J.  G.  Enabling  ontology-­‐based  

access  to  streaming  data  sources.  In  ISWC,  2010,  pages  96–111.    •  EP-­‐SPARQL  

–  What:  focus  on  event  specific  operators  –  Ref:  Anicic,  D.,  Fodor,  P.,  Rudolph,  S.,  &  Stojanovic,  N.  EP-­‐SPARQL:  a  unified  

language  for  event  processing  and  stream  reasoning.  In  WWW  2011,  pages  635–644.    

•  TEF-­‐SPARQL  –  What:  adds  "facts"  as  first  class  elements    –  Ref:  hBps://www.merlin.uzh.ch/publicaQon/show/8467    

 UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   36

Page 37: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

AlternaQves  to  C-­‐SPARQL  •  Comparison  between  exisQng  approaches  

 

System   S2R   R2R   Time-­‐aware   R2S  

C-­‐SPARQL  Engine   Logical  and  triple-­‐based  

SPARQL  1.1  query  

Qmestamp  funcQon   Batch  only  

Streaming  Linked  Data  Framework  

Logical  and  graph-­‐based  

SPARQL  1.1   no   Batch  only  

SPARQLstream   Logical  and  triple-­‐based  

SPARQL  1.1  query  

no   Ins,  batch,  del  

CQELS   Logical  and  triple-­‐based  

SPARQL  1.1  query  

no   Ins  only  

TEF-­‐SPARQL   no   SPARQL-­‐like   Temporarily  Facts,  BEFORE  SINCE,  UNTIL,  DURING,    

Batch  only  

EP-­‐SPARQL   no   SPARQL  1.0   SEQ,  PAR,  AND,  OR,  DURING,  STARTS,  EQUALS,  NOT,  MEETS,  FINISHES  

Ins  only  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   37

Page 38: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Work  in  progress  at  RSP@W3C  •  RSP-­‐QL  

–  Syntax  •  hBps://github.com/streamreasoning/RSP-­‐QL/blob/master/RSP-­‐QL%20Sample%20Queries.md    

–  Proposed  semanQcs  •  D.Dell'Aglio,  E.Della  Valle,  J.-­‐P.Calbimonte,  Ó.  Corcho:  RSP-­‐QL  SemanQcs:  A  Unifying  Query  Model  to  Explain  Heterogeneity  of  RDF  Stream  Processing  Systems.  Int.  J.  SemanQc  Web  Inf.  Syst.  10(4):  17-­‐44  (2014)  

–  SemanQcs  (work  in  progress)  •  hBps://github.com/streamreasoning/RSP-­‐QL/blob/master/SemanQcs.md    

– Quick  ref.  •  D.  Dell'Aglio,  J.-­‐P.  Calbimonte,  E.  Della  Valle,  Ó.  Corcho:  Towards  a  Unified  Language  for  RDF  Stream  Query  Processing.  ESWC  (Satellite  Events)  2015:  353-­‐363  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   38

Page 39: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

ContribuQon:    conQnuous  deducQve  reasoning    

•  DL  Ontology  Stream  ST  – A  ontology  stream  with  respect  to  a  staQc  Tbox  T  is  a  sequence  of  Abox  axioms  ST(i)  

•  A  Windowed  Ontology  Stream  ST(o,c]  – A  windowed  ontology  stream  with  respect  to  a  staQc  Tbox  T  is  the  union  of  the  Abox  axioms  ST(i)  where  o<i≤c  

•  Reasoning  on  a  Windowed  Ontology  Stream  ST(o,c]  is  as  reasoning  on  a  staQc  DL  KB  

SR  2015,  Austria  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   39 Emanuele  Della  Valle,  Stefano  Ceri,  Davide  Francesco  Barbieri,  Daniele  Braga,  Alessandro  Campi:  A  First  Step  Towards  Stream  Reasoning.  FIS  2008:  72-­‐81    

Page 40: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

discusses   discusses   discusses  

discusses   discusses  

discusses  

discusses  

Example  of    conQnuous  deducQve  reasoning  

What impact has been my micropost p1 creating in the last hour? Let’s count the number of microposts that discuss it … REGISTER STREAM ImpactMeter AS SELECT (count(?p) AS ?impact) FROM STREAM <http://…/fb> [RANGE 60m STEP 10m] WHERE { :Alice posts [ sr:discusses ?p ] }

p1   p3   p5   p8  

p2   p4   p7  

p6  7!

Transitive property

Alice posts p1 .

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   40

Page 41: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Finding  •  The  Seman:c  Web  stack  can  be  extended  so  to  incorporate  streaming  data  as  a  first  class  ciQzen  –  RDF  stream  data  model  –  Con:nuous  SPARQL  syntax  and  semanQcs  –  Con:nuous  deduc:ve  reasoning  semanQcs          

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   41

Page 42: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

AlternaQves  to  conQnuous  deducQve    (RDFS++)  reasoning    

•  ETALIS  –  What:  RDFS  +  Allen  Algebra  –  Ref:  Anicic,  D.,  Rudolph,  S.,  Fodor,  P.,  &  Stojanovic,  N.  Stream  reasoning  and  

complex  event  processing  in  ETALIS.  SemanQc  Web,  3(4),  2012,    397–407.    •  STARQL  

–  What:    •  DL-­‐Lite  +  ConjuncQve  Query  +  Qme-­‐series  •  SHI  +  Grounded  ConjuncQve  Queries  +  Qme-­‐series  

–  Ref:  ÖL  Özçep,  R  Möller.  Ontology  Based  Data  Access  on  Temporal  and  Streaming  Data.  Reasoning  Web,  2014  

•  ASP-­‐based  –  What:  Qme-­‐decaying  ASP  –  Ref:  hBp://arxiv.org/abs/1301.1392  

•  LARS  –  What:  high-­‐level  unified  formal  foundaQon  for  stream  reasoning    –  Ref:  H.  Beck,  M.  Dao-­‐Tran,  T.  Eiter,  M.  Fink:  LARS:  A  Logic-­‐Based  Framework  

for  Analyzing  Reasoning  over  Streams.  AAAI  2015:  1431-­‐1438H.    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   42

Page 43: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Sub-­‐research  quesQons  1.  Is  it  possible  extend  the  Seman:c  Web  stack    

in  order  to  represent  heterogeneous  data  streams,  conQnuous  queries,  and  conQnuous  reasoning  tasks?  

2.  Does  the  ordered  nature  of  data  streams  and  the  possibility  to  forget  old  enough  informaQon  allow  to  op:mize  con:nuous  querying  and  con:nuous  reasoning  tasks  so  to  provide  reac:ve  answers  to  large  number  of  concurrent  users  without  forsaking  correctness  or  completeness?    

3.  Can  SemanQc  Web  and  Machine  Learning  technologies  be  jointly  employed  to  cope  with  the  noisy  and  incomplete  nature  of  data  streams?  

4.  Are  there  prac:cal  cases  where  processing  data  stream  at  semanQc  level  is  the  best  choice?    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   43

Page 44: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

ContribuQon:  opQmize  querying    for  reacQve  answers  

•  C-­‐SPARQL  engine  Qme  window-­‐based  selecQon  outperforms                            SPARQL  filter-­‐based  selecQon  (Jena-­‐ARQ)  

D.  Barbieri,  D.  Braga,  S.  Ceri,  E.  Della  Valle,  Y.  Huang,  V.  Tresp,  A.Re�nger,  H.  Wermser:  Deduc:ve  and  Induc:ve  Stream  Reasoning  for  Seman:c  Social  Media  Analy:cs    IEEE  Intelligent  Systems,  30  Aug.  2010.  

Our In-memory RDF stream processing

engine

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   44

Page 45: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Finding  •  Stream  Reasoning  task  is  feasible  and  the  very  nature  of  streaming  data  offers  opportuniQes  to  op:mise  reasoning  tasks  where  data  is  ordered  by  recency  and  can  be  forgoBen  a�er  a  while  –  C-­‐SPARQL  Engine  prototype  –  IMaRS  conQnuous  incremental  reasoning  algorithm  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   45

Page 46: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Work  in  progress  

•  When  volumes  also  maBers  …  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   46

Join  

Data  Stream   SPARQL  endpoint  

Window

 

Maintenance  Policy  

Local  View  

RSP  engine  

Web  

Soheila  Dehghanzadeh,  Daniele  Dell'Aglio,  Shen  Gao,  Emanuele  Della  Valle,  Alessandra  Mileo,  Abraham  Bernstein:  Approximate  Con:nuous  Query  Answering  over  Streams  and  Dynamic  Linked  Data  Sets.  ICWE  2015:  307-­‐325  

Page 47: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

State-­‐of-­‐the-­‐art    deducQve  reasoning  

•  Data-­‐driven  (a.k.a.  forward  reasoning)  

 

•  Query-­‐driven  –  backward  reasoning  

•  Query-­‐driven  –  query  rewriQng  (a.k.a.  ontology  based  data  access)  

Reasoner  RDFdata   SPARQL  Inferred  

data  

ontology  

SPARQL  

ontology  

RewriBen  query  

Reasoner  

Reasoner  RDFdata   SPARQL  

ontology  

data  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   47

Page 48: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Naïve  approaches  to  Stream  Reasoning  windowing  then  reasoning  

•  Data-­‐driven  (a.k.a.  forward  reasoning)  

•  Query-­‐driven  –  backward  reasoning  

•  Query-­‐driven  –  query  rewriQng  (a.k.a.  ontology  based  data  access)  

Reasoner  RDF  data   SPARQL  Inferred  

data  

ontology  

ontology  

RewriBen  query  

Reasoner  

Reasoner  RDF  data  

ontology  

Window  

Window  

Window  

SPARQL  

SPARQL  data  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   48

Page 49: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Not  so  naïve  approach  to  stream  reasoning  

•  The  problem  is  that  materializaQon  (the  result  of  data-­‐driven  processing)  are  very  difficult  to  decrement  efficiently.  –  State-­‐of-­‐the-­‐art:  DRed  algorithm  

•  Over  delete  •  Re-­‐derive  •  Insert  

Reasoner   Inferred  data  

ontology  

window  inserQons  

deleQons  

Incremental  !!!  

SPARQL  

Y.  Ren,  J.  Z.  Pan.  OpQmising  ontology  stream  reasoning  with  truth  maintenance  system.  In  CIKM  (2011)  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   49

Page 50: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Is  DRed  needed?  •  DRed  works  with  random  inserQons  and  deleQons  •  In  a  streaming  sedng,  when  a  triple  enters  the  window,    given  the  size  of  the  window,  the  reasoner  knows  already    when  it  will  be  deleted!  

•  E.g.,    –  if  the  window  is  40  minutes  long,  and,    

–  it  is  10:00,  the  triple(s)    entering  now  

–  will  exit  on  10:40.  •  Conclusion  

–  dele:ons  are  predictable  

Time Enter

window Exit

window Explicitly in

window Inferred in

window # Exp. # Inf. SUM

10:00 A!B 1 1

10:10 B!C 1 1 2

10:20 A!E 2 1 3

10:30 E!C 2 1 3

10:40 A!B 2 1 3

10:50 B!C 2 1 3

11:00 A!E 0 0 0

A B

A B C A C

A B CE

A C

A B CE

A C

A CE

A C

A B CE

A C

CE

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   50

Page 51: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

ContribuQon:  IMaRS  algorithm  

•  Idea:    –  add  an  expira:on  :me  to  each  triple  and    –  use  an  hash  table  to  index  triples  by  their  expiraQon  Qme  

•  The  algorithm  1.   deletes  expired  triples    2.  Adds  the  new  derivaQons  that  are  consequences  of  

inserQons  annota:ng  each  inferred  triple  with  an  expira:on  :me  (the  min  of  those  of  the  triple  it  is  derived  from),  and  

3.   when  mul:ple  deriva:ons  occur,  for  each  mulQple  derivaQon,  it  keeps  the  max  expiraQon  Qme.  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   51

Page 52: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

ContribuQon:  IMaRS  algorithm  •  Incremental  Reasoning  on  RDF  streams  (IMaRS):  new  reasoning  

algorithm  opQmized  for  reacQve  query  answering  

D.F.  Barbieri,  D.  Braga,  S.Ceri,  E.  Della  Valle,  M.  Grossniklaus:  Incremental  Reasoning  on  Streams  and  Rich  Background  Knowledge.  ESWC  (1)  2010:  1-­‐15  D.  Dell'Aglio,  E.  Della  Valle:  Incremental  Reasoning  on  RDF  Streams.  In  A.Harth,  K.Hose,  R.Schenkel  (Eds.)  Linked  Data  Management,  CRC  Press  2014,  ISBN  9781466582408  

!  Re-materialize after each window slide

!  Use DRed

!  IMaRS

% of deletions w.r.t. the content of the window

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   52

Page 53: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

ContribuQon:  IMaRS  algorithm  •  comparison  of  the  average  Qme  needed  to  answer  

a  C-­‐SPARQL  query,  when  2%  of  the  content  exits  the  window  each  Qme  it  slides,  using    –  A  backward  reasoner  on  the  window  content  –  DRed  +  standard  SPARQL  on  the  materializaQon  –  IMaRS  +  standard  SPARQL  on  the  materializaQon  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   53

Page 54: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Finding  •  Stream  Reasoning  task  is  feasible  and  the  very  nature  of  streaming  data  offers  opportuniQes  to  op:mise  reasoning  tasks  where  data  is  ordered  by  recency  and  can  be  forgoBen  a�er  a  while  –  C-­‐SPARQL  Engine  prototype  –  IMaRS  conQnuous  incremental  reasoning  algorithm  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   54

Page 55: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

OpQmizing  for  stream  reasoning  alternaQve  approaches  

•  DyKnow  –  How:  logical  models  of  an  observed  dynamic  system  +  metric  temporal  logics    –  Fredrik  Heintz,  Jonas  Kvarnström,  Patrick  Doherty:  Bridging  the  sense-­‐reasoning  gap:  

DyKnow  -­‐  Stream-­‐based  middleware  for  knowledge  processing.  Advanced  Engineering  InformaQcs  24(1):  14-­‐26  (2010)  

•  MorphStream  –  How:  rewriQng  in  DSMS  languages  (one  at  a  Qme)  –  Ref:  Calbimonte,  J.-­‐P.,  Corcho,  O.,  &  Gray,  A.  J.  G.  Enabling  ontology-­‐based  access  to  

streaming  data  sources.  In  ISWC,  2010,  pages  96–111.    •  TR-­‐OWL  

–  How:  Truth  maintenance  for  EL++  with  syntacQc  approximaQons  –  Ref:  Y.  Ren,  J.  Z.  Pan.  OpQmising  ontology  stream  reasoning  with  truth  maintenance  

system.  In  CIKM  (2011)  •  ETALIS  

–  How:  rewriQng  in  prolog  –  Ref:  Anicic,  D.,  Rudolph,  S.,  Fodor,  P.,  &  Stojanovic,  N..  Stream  reasoning  and  

complex  event  processing  in  ETALIS.  SemanQc  Web,  3(4),  2012,    397–407.      (conQnues  in  the  next  slide)  

SR  2015,  Austria  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   55

Page 56: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

OpQmizing  for  stream  reasoning  alternaQve  approaches  

•  Sparkwave  –  How:  extended  RETE  algorithm  for  windows  and  RDFS  –  Ref:  Sparkwave:  ConQnuous  Schema-­‐Enhanced  PaBern  Matching  over  RDF  Data  

Streams.  Komazec  S,  Cerri  D.  DEBS  2012  •  DynamiTE  

–  How:  Truth  maintenance  for  ρDF  (a  fragment  of  RDFS)  –  J.  Urbani,  A.  Margara,  C.  J.  H.  Jacobs,  F.  van  Harmelen,  H.E.  Bal:  DynamiTE:  Parallel  

MaterializaQon  of  Dynamic  RDF  Data.  ISWC  (1)  2013:  657-­‐672  •  STARQL  

–  How:  rewriQng  on  a  scalable  DSMS  with  Qme-­‐series  support  –  Ref:  ÖL  Özçep,  R  Möller.  Ontology  Based  Data  Access  on  Temporal  and  Streaming  

Data.  Reasoning  Web,  2014  •  ASP-­‐based  

–  How:  opQmizing  ASP  for  incremental  and  Qme-­‐decaying  programs  –  Ref:  hBp://arxiv.org/abs/1301.1392  

•  The  Backward/Forward  Algorithm  –  How:  opQmizing  DRed  –  B.  MoQk,  Y.  Nenov,  R.E.F.  Piro,  I.  Horrocks:  Incremental  Update  of  Datalog  

MaterialisaQon:  the  Backward/Forward  Algorithm.  AAAI  2015:  1560-­‐1568  

SR  2015,  Austria  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   56

Page 57: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Sub-­‐research  quesQons  1.  Is  it  possible  extend  the  Seman:c  Web  stack    

in  order  to  represent  heterogeneous  data  streams,  conQnuous  queries,  and  conQnuous  reasoning  tasks?  

2.  Does  the  ordered  nature  of  data  streams  and  the  possibility  to  forget  old  enough  informaQon  allow  to  op:mize  con:nuous  querying  and  con:nuous  reasoning  tasks  so  to  provide  reac:ve  answers  to  large  number  of  concurrent  users  without  forsaking  correctness  or  completeness?    

3.  Can  SemanQc  Web  and  Machine  Learning  technologies  be  jointly  employed  to  cope  with  the  noisy  and  incomplete  nature  of  data  streams?  

4.  Are  there  prac:cal  cases  where  processing  data  stream  at  semanQc  level  is  the  best  choice?    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   57

Page 58: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Cope  with  the  noisy  and    incomplete  data  

•  "Noise"  is  reduced  using  DSMS  techniques  •  Deduc:ve  stream  reasoning  copes  with  incompleteness  deducing  implicit  facts  •  Induc:ve  stream  reasoning  copes  with  "irrepairable"  incompleteness  inducing  

missing  facts  

D.F.  Barbieri,  D.  Braga,  S.  Ceri,  E.  Della  Valle,  Y.  Huang,  V.  Tresp,  A.  Re�nger,  H.  Wermser:  Deduc:ve  and  Induc:ve  Stream  Reasoning  for  Seman:c  Social  Media  Analy:cs.    IEEE  Intelligent  Systems  25(6):  32-­‐41  (2010)    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   58

Page 59: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Findings  •  A  combina:on  of  deduc:ve  and  induc:ve  stream  reasoning  techniques  can  cope  with  incomplete  and  noisy  data    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   59

Page 60: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

AlternaQve  approaches  •  Stream  Reasoning  with  ProbabilisQc  Answer  Set  Programming  – MaBhias  Nickles,  Alessandra  Mileo:  Web  Stream  Reasoning  Using  ProbabilisQc  Answer  Set  Programming.  RR  2014:  197-­‐205  

–  Anastasios  SkarlaQdis,  Georgios  Paliouras,  Alexander  ArQkis,  George  A.  Vouros:  ProbabilisQc  Event  Calculus  for  Event  RecogniQon.  ACM  Trans.  Comput.  Log.  16(2):  11:1-­‐11:37  (2015)  

–  Anni-­‐Yasmin  Turhan,  Erik  Zenker:  Towards  Temporal  Fuzzy  Query  Answering  on  Stream-­‐based  Data.  HiDeSt@KI  2015:  56-­‐69  

SR  2015,  Austria  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   60

Page 61: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Sub-­‐research  quesQons  1.  Is  it  possible  extend  the  Seman:c  Web  stack    

in  order  to  represent  heterogeneous  data  streams,  conQnuous  queries,  and  conQnuous  reasoning  tasks?  

2.  Does  the  ordered  nature  of  data  streams  and  the  possibility  to  forget  old  enough  informaQon  allow  to  op:mize  con:nuous  querying  and  con:nuous  reasoning  tasks  so  to  provide  reac:ve  answers  to  large  number  of  concurrent  users  without  forsaking  correctness  or  completeness?    

3.  Can  SemanQc  Web  and  Machine  Learning  technologies  be  jointly  employed  to  cope  with  the  noisy  and  incomplete  nature  of  data  streams?  

4.  Are  there  prac:cal  cases  where  processing  data  stream  at  semanQc  level  is  the  best  choice?    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   61

Page 62: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

ContribuQon:    Streaming  Linked  Data  Framework  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   62

Stream Bus

Recorder Re-player

Analyser Decorator

Adapter Publisher Visualizer Stream

HTT

P

HTT

P

Data Source Streaming Linked Data Server HTML5 Browser

Marco  Balduini,  Emanuele  Della  Valle,  Daniele  Dell'Aglio,  Mikalai  Tsytsarau,  Themis  Palpanas,  CrisQan  Confalonieri:  Social  Listening  of  City  Scale  Events  Using  the  Streaming  Linked  Data  Framework.  InternaQonal  SemanQc  Web  Conference  (2)  2013:  1-­‐16  

Page 63: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

ContribuQon:  RSP  services  

•  RSP  services:  a  RESTful  interface  for  RSP  engines  

–  hBp://streamreasoning.org/download/rsp-­‐services    

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   63

Page 64: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

PracQcal  cases  •  10+  deployments  in  Sensor  Networks  &  Social  media  analyQcs,  e.g.                            

BOTTARI

Winner of Semantic Web Challenge 2011

 

City Data Fusion

Winner of IBM

faculty award 2013

 

M.  Balduini,  I.  Celino,  D.  Dell’Aglio,  E.  Della  Valle,  Y.  Huang,  T.  Lee,  S.-­‐H.  Kim,  V.  Tresp:    BOTTARI:  An  augmented  reality  mobile  applica:on  to  deliver  personalized  and  loca:on-­‐based  recommenda:ons  by  con:nuous  analysis  of  social  media  streams.  J.  Web  Sem.  16:  33-­‐41  (2012)    

Social Listener

M.Balduini,  E.Della  Valle,  M.Azzi,  R.Larcher,  F.Antonelli,  and  P.Ciuccarelli:    CitySensing:  Fusing  City  Data  for  Visual  Storytelling.  IEEE  MulQMedia  22(3):  44-­‐53  (2015)  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   64

Page 65: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Findings  1.   The  Seman:c  Web  stack  can  be  extended  so  to  incorporate  

streaming  data  as  a  first  class  ciQzen  –  RDF  stream  data  model  –  Con:nuous  SPARQL  syntax  and  semanQcs  –  Con:nuous  deduc:ve  reasoning  semanQcs          

2.   Stream  Reasoning  task  is  feasible  and  the  very  nature  of  streaming  data  offers  opportuniQes  to  op:mise  reasoning  tasks  where  data  is  ordered  by  recency  and  can  be  forgoBen  a�er  a  while  –  IMaRS  conQnuous  incremental  reasoning  algorithm  –  C-­‐SPARQL  Engine  prototype  

3.  A  combinaQon  of  deduc:ve  and  induc:ve  stream  reasoning  techniques  can  cope  with  incomplete  and  noisy  data    

4.  There  are  applica:on  domains  where  Stream  Reasoning  offers  an  adequate  soluQon  

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   65

Page 66: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Open  issues  1.   The  Seman:c  Web  stack  can  be  extended    

–  "NavigaQng  the  Chasm  between  the  Scylla  of  PracQcal  ApplicaQons  and  the  Charybdis  of  TheoreQcal  Approaches"  

A.  Bernstein,  2015  2.   Stream  Reasoning  task  is  feasible    

–  It's  Qme  to  start  removing  assumpQons  •  knowledge  does  not  change  •  background  data  does  not  change  

–  OBDA  for  SQL  ≠  OBDA  for  conQnuous  querying  3.   Stream  reasoning  can  cope  with  incomplete  and  noisy  data  

–  Theory  is  needed!    4.  There  are  applica:on  domains  where  Stream  Reasoning  offers  

an  adequate  soluQon  –  Rigorous  quanQtaQve  comparaQve  research  is  needed      

UiO,  Norway  -­‐    3.11.2015     @manudellavalle    -­‐    hBp://emanueledellavalle.org   66

Page 67: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

AdverQsements  :-­‐P  

•  Check  out  my  PhD  thesis  – hBp://dare.ubvu.vu.nl/handle/1871/53293    – Chapter  1:  IntroducQon  

•  The  content  of  this  presentaQon  – Chapter  8:  conclusions  

•  A  review  of  stream  reasoning  approaches  updated  in  spring  2015  

•  Put  an  "I  like"  to  Stream  Reasoning  on  Facebook  – hBps://www.facebook.com/streamreasoning    

@manudellavalle    -­‐    hBp://emanueledellavalle.org  UiO,  Norway  -­‐    3.11.2015     67

Page 68: Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once

Thank  you!  Any  QuesQon?  

Emanuele  Della  Valle  DEIB  -­‐  Politecnico  di  Milano  [email protected]  hBp://emanueledellavalle.org    

University  of  Olso,  Norway  -­‐    3.11.2015