YABench: A Comprehensive Framework for RDF Stream Processor Correctness and Performance Assessment

YABench: A Comprehensive Framework for RDF

Stream Processor Correctness and Performance Assessment

Maxim Kolchin, Peter Wetz, Elmar Kiesling, A Min TjoaITMO University, Russia | TU Wien, Austria

The 16th International Conference on Web Engineering 2016, Lugano, Switzerland

RDF Stream Processing (RSP)

RDF Stream - a potentially infinite sequence of time-varying data elements encoded in RDF

Continuous query - a query registered over streams that in most cases are observed through windows

Query results - similarly to SPARQL they can be tuples, RDF dataset or a new RDF Stream

2

State of the art

■ LSBench (2012)■ SRBench (2012)■ CSRBench (2013)■ CityBench (2015)

Details can be found at W3C RSP Community Group’s Wiki: https://www.w3.org/community/rsp/wiki/RSP_Benchmarking

3

Our contribution

■ We propose a benchmarking framework for RDF Stream Processing engines that focuses on correctness and performance○ Stream generator (generates configurable RDF stream)○ Oracle (validates correctness of the results)○ Runner (measures performance of an RSP engine)

■ We run a benchmark with the window-based RDF stream processing engines:○ C-SPARQL○ CQELS

4

Requirements

■ Scalable and configurable input■ Comprehensive correctness checking■ Flexible queries■ Reproducibility

5

Architecture1. Define tests,2. Generate data streams,3. Run the tests with a given

engine,a. Performance metrics

are collected in a separate process,

4. At the end validate the results with the oracle.

6

Architecture: Reporting tool

7

Validation against CSRBench

We validated the correctness checking functionality of YABench by reproducing the CSRBench* benchmark.

CSRBench defines 7 queries for C-SPARQL, CQELS and SPARQLstream engines.

Datasets, test configurations and results are available online: github.com/YABench/csrbench-validation

*Daniele Dell’Aglio, et al. “On Correctness in RDF Stream Processor Benchmarking”, 2013

8

Validation against CSRBench (C-SPARQL)

QueryC-SPARQL

CSRBench YABench

Q1 ✓ ✓

Q2 ✓ ✓

Q3 ✓ ✓*

Q4 ✓ ✓

Q5 ✗ ✗

Q6 ✓ ✓*

Q7 ✓ ✓*

* - the results are the same, but because of timing discrepancies some results sometimes present in the subsequent window

9

Validation against CSRBench (CQELS)

QueryCQELS

CSRBench YABench

Q1 ✓ ✓

Q2 ✓ ✓

Q3 ✓ ✓

Q4 ✗ ✗

Q5 ✓ ✓

Q6 ✗ ✗

Q7 ✗ ✗**

** - indicates that the query did not execute successfully on the CQELS engine. The engine crashed before returning the query results

10

Benchmark

We reuse queries introduced by CSRBench, but we’re able to parametrize them, e.g. window size, window slide, filter values, etc.

Measure:

- Precision and recall,- Window and result size, and delay,- Memory and CPU usage, # of threads

We run each test 10 times, to compute the distribution of precision/recall.

Detailed results are available online: github.com/YABench/yabench-one

11

Benchmark: Data Stream Model

A data stream is generated based on:

■ Number of weather stations,

■ Time interval between two observations of a single station,

■ Duration of the stream,■ A seed for the

randomize function

12

Benchmark: Queries

Experiment 1: SELECT + FILTER

Experiment 2: SELECT + AVG + FILTER

Experiment 3: joining of triples from different timestamps

Experiment 4: demonstrates the use of gracious mode which implemented by the oracle to eliminate the timing discrepancy issues of the engines

13

Experiment 1 (precision/recall): 50 stations

14


15


16

Experiment 1 (memory usage): 50 stations

17


18


19

Experiment 1 (delay): 50 stations

20


21


22

Experiment 1 (C-SPARQL): delay vs result size

23

Architecture: Gracious mode

In this mode the oracle tries to adjusts its window scope to match the scope of an actual window, by moving the left and right borders to back and/or forth while the precision and recall grows.

It allows to:

(a) confirm our assumption on why precision and recall are low,(b) reconstruct and visualize the actual window borders

24

Experiment 4: gracious vs non-gracious modes

(a) In non-gracious (default) mode (b) In gracious mode

C-SPARQL

25

Experiment 4: gracious vs non-gracious modes

(a) In non-gracious (default) mode (b) In gracious mode

CQELS

26

Conclusion

■ We build a framework for benchmarking RSP engines which allows to assess their correctness and performance

■ We run a benchmark which revealed some insides:○ CQELS shows better precision/recall for simple queries,○ C-SPARQL is slightly more memory efficient than CQELS,

○ C-SPARQL outperformes CQELS in terms of delay for more complex queries, which is mainly caused by a different reporting strategy

■ By introducing gracious mode we’re able to estimate the extent of the timing discrepancy

27

Thank you!

github.com/YABench

Science

YABench: A Comprehensive Framework for RDF Stream Processor Correctness and Performance Assessment