Click here to load reader

RDF Stream Processing Implementa · PDF file 2016-11-02 · RDF Stream Processing Implementations Jean-Paul Calbimonte ... •RDF Streams in practice •RSP Query Engines •Developing

  • View
    9

  • Download
    0

Embed Size (px)

Text of RDF Stream Processing Implementa · PDF file 2016-11-02 · RDF Stream Processing...

  • Tutorial on RDF Stream

    Processing 2016 M.I. Ali, J-P Calbimonte, D. Dell'Aglio,

    E. Della Valle, and A. Mauri http://streamreasoning.org/events/rsp2016

    RDF Stream Processing

    Implementations Jean-Paul Calbimonte

    [email protected] http://jeanpi.org @jpcik

    http://streamreasoning.org/events/rsp2016 http://dellaglio.org/

  • http://streamreasoning.org/events/rsp2016

    Share, Remix, Reuse — Legally

     This work is licensed under the Creative Commons Attribution 3.0 Unported License.

     You are free:

    • to Share — to copy, distribute and transmit the work

    • to Remix — to adapt the work

     Under the following conditions

    • Attribution — You must attribute the work by inserting

    – “[source http://streamreasoning.org/rsp2014]” at the end of each reused slide

    – a credits slide stating - These slides are partially based on “RDF Stream Processing 2014”

    by M. Balduini, J-P Calbimonte, O. Corcho, D. Dell'Aglio, E. Della Valle http://streamreasoning.org/rsp2014

     To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/

    2

    http://streamreasoning.org/rsp2014 http://streamreasoning.org/rsp2014 http://creativecommons.org/licenses/by/3.0/

  • http://streamreasoning.org/events/rsp2016

    RSP for developers

    • RDF Streams in practice

    • RSP Query Engines

    • Developing with an RSP Engine

    • Handling Results

    • RSP Services

    3

  • http://streamreasoning.org/events/rsp2016

    RDF Streams in Practice

    4

  • http://streamreasoning.org/events/rsp2016

    RSP: Keep the data moving

    5

    Process data in-stream

    Not required to store

    Active processing model

    input streams

    RSP queries/

    rules output streams/events

    RDF Streams

  • http://streamreasoning.org/events/rsp2016 6

    RDF Stream

    Gi

    Gi+1

    Gi+2

    Gi+n

    …un b o u n d e d s

    e q u e n c e

    Gi {(s1,p1,o1),

    (s2,p2,o2),…} [ti]

    1+ triples

    implicit/explicit

    timestamp/interval

    RDF streams in theory

    How do I code this?

    Use Web standards?

  • http://streamreasoning.org/events/rsp2016 7

    Linked Data on the Web

    Web of Data

    Linked Data

    W3C Standards: RDF, SPARQL, etc.

  • http://streamreasoning.org/events/rsp2016 8

    Linked Data principles for RDF streams?

    e.g. publish sensor data as RDF/Linked Data?

    URIs as names of things HTTP URIs

    useful information when URI is dereferenced

    Link to other URIs

    users

    application s

    WEB

    Use RDF model to continuously query real-time data streams?

    static vs. streams

    one-off vs. continuous

  • http://streamreasoning.org/events/rsp2016 9

    (Sensor) Data Streams on the Web

    9

    http://mesowest.utah.edu/

    http://earthquake.usgs.gov/earthquakes/feed/v1.0/

    http://swiss-experiment.ch

    • Monitoring

    • Alerts

    • Notifications

    • Hourly/daily updates

    • Myriad of Formats

    • Ad-hoc access points

    • Informal description

    • Convention-semantics

    • Uneven use of standards

    • Manual exploration

  • http://streamreasoning.org/events/rsp2016 10

    RDF Streams before RDF Streams

    http://richard.cyganiak.de/2007/10/lod/

    2011

    Linked Sensor Data

    MetOffice AEMET

  • http://streamreasoning.org/events/rsp2016 11

    Sensor Data & Linked Data

    11

    Zip Files

    Number of Triples

    Example: Nevada dataset

    -7.86GB in n-triples format

    -248MB zipped

    An example: Linked Sensor Data

    http://wiki.knoesis.org/index.php/LinkedSensorData

  • http://streamreasoning.org/events/rsp2016 12

    Sensor Data & Linked Data

    12

    .

    "30.0"^^ .

    .

    .

    .

    .

    .

    .

    "2003-03-31T05:10:00-07:00^^http://www.w3.org/2001/XMLSchema#dateTime" .

    What do we get in these datasets?

    Nice triples

    What is measured

    Measurement

    Unit

    Sensor

    When is it measured

  • http://streamreasoning.org/events/rsp2016 13

    RDF Streams before RDF Streams

    i.e. just use RDF

    :observation1 rdf:type om-owl:Observation .

    :observation1 om-owl:observedProperty weather:_AirTemperature .

    :observation1 om-owl:procedure :sensor1 .

    :observation1 om-owl:result :obsresult1 .

    :observation1 om-owl:resultTime "2015-01-01T10:00:01"

    :obsresult1 om-owl:floatValue 35.4 .

    Plain triples

    Where is the

    timestamp?

    :observation2 rdf:type om-owl:Observation .

    :observation2 om-owl:observedProperty weather:_AirTemperature .

    :observation2 om-owl:procedure :sensor1 .

    :observation2 om-owl:result :obsresult2 .

    :observation2 om-owl:resultTime "2015-01-01T10:00:02"

    :obsresult2 om-owl:floatValue 36.4 .

    What is the order

    in the RDF graph?

    Appended to a file?

    Or to some RDF dataset?

    How to store it?

  • http://streamreasoning.org/events/rsp2016 14

    Feed an RDF Stream to a RSP engine

    Ad-hoc

    Conversion to

    RDF

    Live Non-RDF Streams

    RDF

    RDF datasets

    RSP

    Add (internal)

    timestamp

    on insertion

    What is currently done in most RSPs

    Continuous

    additions

    RDF +

    timestamps

  • http://streamreasoning.org/events/rsp2016 15

    Feed an RDF Stream to C-SPARQL

    public class SensorsStreamer extends RdfStream implements Runnable {

    public void run() {

    ..

    while(true){

    ...

    RdfQuadruple q=new RdfQuadruple(subject,predicate,object,

    System.currentTimeMillis());

    this.put(q);

    }

    }

    }

    something

    to run on

    a thread

    timestamped

    triple

    the stream is

    “observable”

    Data structure, execution

    and callbacks are mixed

    Observer pattern

    Tightly coupled listener

    Added timestamp

  • http://streamreasoning.org/events/rsp2016 16

    Actor Model

    Actor

    1

    Actor

    2

    m No shared mutable state

    Avoid blocking operators

    Lightweight objects

    Loose coupling

    communicate

    through messages

    mailbox

    state

    behaviornon-blocking response

    send: fire-forget

    Implementations: e.g. Akka for Java/Scala

  • http://streamreasoning.org/events/rsp2016 17

    RDF Stream

    object DemoStreams {

    ...

    def streamTriples={

    Iterator.from(1) map{i=>

    ...

    new Triple(subject,predicate,object)

    }

    }

    Data structure

    Infinite

    triple

    iterator

    Execution val f=Future(DemoStreams.streamTriples)

    f.map{a=>a.foreach{triple=>

    //do something

    }}

    Asynchronou

    s iteration

    Message passing f.map{a=>a.foreach{triple=>

    someSink ! triple

    }}

    send triple to

    actor

    Immutable RDF stream

     avoid shared mutable

    state

     avoid concurrent writes

     unbounded sequence

    Ideas using akka actors

    Futures

     non blocking composition

     concurrent computations

     work with not-yet-

    computed results

    Actors

     message-based

     share-nothing async

     distributable

  • http://streamreasoning.org/events/rsp2016 18

    RDF Stream

    … other issues:

    Graph implementation?

    Timestamps: application vs system?

    Serialization?

     Loose coupling

     Immutable data streams

     Asynchronous message passing

     Well defined input/output

  • http://streamreasoning.org/events/rsp2016 19

    Data stream characteristics

    19

    Data regularity • Raw data typically collected as time series

    • Very regular structure.

    • Patterns can be exploited

    E.g. mobile NO2 sensor readings

    29-02-2016T16:41:24,47,369,46.52104,6.63579

    29-02-2016T16:41:34,47,358,46.52344,6.63595

    29-02-2016T16:41:44,47,354,46.52632,6.63634

    29-02-2016T16:41:54,47,355,46.52684,6.63729

    ...

    Data order • Order of data is crucial

    • Time is the key attribute for establishing an order among the data items.

    • Important for indexing

    • Enables efficient time-based selection, filtering and windowing

    Timestamp Sensor Observed

    Value

    Coordinates

  • http://streamreasoning.org/events/rsp2016 20

    Feed an RDF Stream to a RSP engine

    Conversion to

    RDF

    Live Non-RDF Streams

    RDF

    RDF datasets

    RSP

    Add (internal)

    timestamp

    on insertion

    Adding mappings to the data flow

    Continuous

    additions

    RDF +

    times