Optimizing the Performance for Concurrent RDF Stream Processing Queries...

Preview:

Citation preview

Optimizing the Performance for

Concurrent RDF Stream Processing

Queries

Chan Le Van, Feng Gao, Muhammad Intizar Ali

The INSIGHT Centre for Data Analytics – NUI Galway, Ireland

May, 2017

OutlineI. Introduction

II. Foundations

III. Optimization of Concurrent CQELS Queries

IV. Evaluations

V. Conclusion and Future Works

2

Data Streams are Everywhere !

3

RDF Stream Processing

4

RDF Stream Processing• RDF Stream Processing(RSP) Engines: C-

SPARQL, SPARQL-stream, CQELS

4

RDF Stream Processing• RDF Stream Processing(RSP) Engines: C-

SPARQL, SPARQL-stream, CQELS

• Concurrent Query Processing is still a challenge

with these engines

4

RDF Stream Processing• RDF Stream Processing(RSP) Engines: C-

SPARQL, SPARQL-stream, CQELS

• Concurrent Query Processing is still a challenge

with these engines

• CQELS+: Extension of CQELS aiming at

optimizing the multiple-query processing

4

II. Foundations

5Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

II. Foundations• CQELS – RDF Stream Processing Framework

5Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

II. Foundations• CQELS – RDF Stream Processing Framework

• Multi-way Join Operator

5Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

II. Foundations• CQELS – RDF Stream Processing Framework

• Multi-way Join Operator

• Shared Join Operator

5Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

II. Foundations• CQELS – RDF Stream Processing Framework

• Multi-way Join Operator

• Shared Join Operator

• Network of Shared Join Operators(NSJO)

5Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

61313

• Accepting CQELS-declarative

language(extended from SPARQL

language)

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

61414

• Accepting CQELS-declarative

language(extended from SPARQL

language)

• Following eager-execution approach

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

61515

• Accepting CQELS-declarative

language(extended from SPARQL

language)

• Following eager-execution approach

• Can process both static and RDF stream

data

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

61616

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

61717

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

61818

S1 S3 S2

CQELS

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

61919

S1 S3 S2

Q1(S1, S2, S3)CQELS

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

62020

J

j

B11 B1

2 B13

S1 S3 S2

Q1(S1, S2, S3)CQELS

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

62121

J

j

B11 B1

2 B13

S1 S3 S2

Q1(S1, S2, S3)CQELS

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

62222

J

j

B11 B1

2 B13

S1 S3 S2

Q1(S1, S2, S3)CQELS

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

62323

J

j

B11 B1

2 B13

S1 S3 S2

Q1(S1, S2, S3)

Q2(S2, S3)

CQELS

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

62424

J

j

B11 B1

2 B13

S1 S3 S2

Q1(S1, S2, S3)

Q2(S2, S3)

j

B22 B2

3

CQELS

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

62525

J

j

B11 B1

2 B13

S1 S3 S2

Q1(S1, S2, S3)

Q2(S2, S3)

j

B22 B2

3

CQELS

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

62626

J

j

B11 B1

2 B13

S1 S3 S2

Q1(S1, S2, S3)

Q3(S1, S3)

Q2(S2, S3)

j

B22 B2

3

CQELS

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

62727

J

j

B11 B1

2 B13

S1 S3 S2

Q1(S1, S2, S3)

Q3(S1, S3)

Q2(S2, S3)

j

B22 B2

3

j

B31 B3

3

CQELS

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

62828

J

j

B11 B1

2 B13

S1 S3 S2

Q1(S1, S2, S3)

Q3(S1, S3)

Q2(S2, S3)

j

B22 B2

3

j

B31 B3

3

CQELS

CQELS – RDF Stream Processing Framework

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

62929

J

j

B11 B1

2 B13

S1 S3 S2

Q1(S1, S2, S3)

Q3(S1, S3)

Q2(S2, S3)

j

B22 B2

3

j

B31 B3

3

CQELS

Multi-way Join Operator

7Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Multi-way Join Operator

7Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Multi-way Join Operator

7

• Incremental evaluation

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Multi-way Join Operator

7

(indexed)

• Incremental evaluation

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Multi-way Join Operator

7

(indexed)

(indexed)

• Incremental evaluation

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

8

Shared Join Operator

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

8

Shared Join Operator

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

8

Shared Join Operator

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

8

Shared Join Operator

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

8

Shared Join Operator

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

8

Shared Join Operator

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

8

Shared Join Operator

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

8

Shared Join Operator

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

8

Shared Join Operator

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

8

Shared Join Operator

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

8

Shared Join Operator

Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Join

Graph

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Join

Graph

Join

Graph

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Join

Graph

Join

Graph

Join

Graph

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Join

Graph

Join

Graph

Join

Graph

Join

Graph

Network of Shared Join Operators(NSJO)

9Reference: D. Le-Phuoc. A Native and Adaptive Approach for Linked Stream Data Processing. PhD thesis, National University of

Ireland Galway, IDA Business Park, Lower Dangan, Galway, Ireland, 2012.

Join

Graph

Join

Graph

Join

Graph

Join

Graph

• Join Graph contains the best-cost join

sequences of the involved queries

• Join Sequence = order of joined data buffers

III. Optimization of Concurrent CQELS queries

10

III. Optimization of Concurrent CQELS queries

• CQELS+: Extending CQELS with the network of

shared join operators

10

III. Optimization of Concurrent CQELS queries

• CQELS+: Extending CQELS with the network of

shared join operators

o Output reutilization Heuristic over Join Graph

o Join graph Example

10

III. Optimization of Concurrent CQELS queries

• CQELS+: Extending CQELS with the network of

shared join operators

o Output reutilization Heuristic over Join Graph

o Join graph Example

• Load Balancing for Parallel CQELS+ Instances

10

III. Optimization of Concurrent CQELS queries

• CQELS+: Extending CQELS with the network of

shared join operators

o Output reutilization Heuristic over Join Graph

o Join graph Example

• Load Balancing for Parallel CQELS+ Instances

o Rotation

o Minimal Average Latency

o Minimal Buffer Size

10

11

Output Reutilization Heuristic

CQELS+: Join Graph – Example

12Reference: Conference scenario: Integrating physical stream with online profiles Manfred Hauswirth Danh Le-Phuoc, Josiane Xavier Parreira. Linked stream data processing. 2012.

CQELS+: Join Graph – Example

12

QUERY 1: inform a participant about the name and description of the location he just entered.

Reference: Conference scenario: Integrating physical stream with online profiles Manfred Hauswirth Danh Le-Phuoc, Josiane Xavier Parreira. Linked stream data processing. 2012.

CQELS+: Join Graph – Example

12

QUERY 1: inform a participant about the name and description of the location he just entered.

QUERY 2:notify two people when they can reach each other from two different and directly connected (nearby) locations.

Reference: Conference scenario: Integrating physical stream with online profiles Manfred Hauswirth Danh Le-Phuoc, Josiane Xavier Parreira. Linked stream data processing. 2012.

CQELS+: Join Graph – Example

12

QUERY 1: inform a participant about the name and description of the location he just entered.

QUERY 2:notify two people when they can reach each other from two different and directly connected (nearby) locations.

QUERY 3:notify an author of his co-authors who have been in his current location during the last 5 seconds.

Reference: Conference scenario: Integrating physical stream with online profiles Manfred Hauswirth Danh Le-Phuoc, Josiane Xavier Parreira. Linked stream data processing. 2012.

CQELS+: Join Graph – Example

12

QUERY 1: inform a participant about the name and description of the location he just entered.

QUERY 2:notify two people when they can reach each other from two different and directly connected (nearby) locations.

QUERY 3:notify an author of his co-authors who have been in his current location during the last 5 seconds.

QUERY 4:count the number of co-authors appearing in nearby locations in the last 30 seconds grouped by location.

Reference: Conference scenario: Integrating physical stream with online profiles Manfred Hauswirth Danh Le-Phuoc, Josiane Xavier Parreira. Linked stream data processing. 2012.

CQELS+: Join Graph – Example

13

CQELS+: Join Graph – Example

13

CQELS+: Join Graph – Example

13

CQELS+: Join Graph – Example

13

CQELS+: Join Graph – Example

13

CQELS+: Join Graph – Example

13

CQELS+: Join Graph – Example

14

CQELS+: Join Graph – Example

14

CQELS+: Join Graph – Example

14

CQELS+: Join Graph – Example

14

CQELS+: Join Graph – Example

14

CQELS+: Join Graph – Example

14

CQELS+: Join Graph – Example

14

CQELS+: Join Graph – Example

14

CQELS+: Join Graph – Example

14

CQELS+: Join Graph – Example

14

Load Balancing for Parallel CQELS+ Instances

15

Load Balancing for Parallel CQELS+ Instances

15

Load Balancing for Parallel CQELS+ Instances

15

Load Balancing for Parallel CQELS+ Instances

15

Load Balancing for Parallel CQELS+ Instances

15

Load Balancing for Parallel CQELS+ Instances

15

Register query using Load-balancing strategies:

Load Balancing for Parallel CQELS+ Instances

15

Register query using Load-balancing strategies:

1. Rotation: Round-robin

registration

Load Balancing for Parallel CQELS+ Instances

15

Register query using Load-balancing strategies:

1. Rotation: Round-robin

registration

2. Minimum

Average Latency:

Choose the engine

with the lowest

average latency to

register

Load Balancing for Parallel CQELS+ Instances

15

Register query using Load-balancing strategies:

1. Rotation: Round-robin

registration

2. Minimum

Average Latency:

Choose the engine

with the lowest

average latency to

register

3. Minimum Average

Buffer Size: Choose

the engine with the

lowest average buffer

size to register

IV. Evaluation

• Shared Join Operator Evaluation

• Load Balancing over CQELS+ engines

• Query Registration Time

16

Query 3

Query 5Query 6

Query 2

Join Performance between CQELS and CQELS+

17Experimentation: https://github.com/chanlevan/CqelsplusExperiment

Source code: https://github.com/chanlevan/CQELSPLUS

Query 3

Query 5Query 6

Query 2

Join Performance between CQELS and CQELS+

17Experimentation: https://github.com/chanlevan/CqelsplusExperiment

Source code: https://github.com/chanlevan/CQELSPLUS

Query 5

Load Balancing

18

Scale instances Scale Streams

Experimentation: https://github.com/chanlevan/CqelsplusLoadBalancingExperiment

Source code: https://github.com/chanlevan/CPFederation

Query Registration Time

19Experimentation: https://github.com/chanlevan/CqelsplusLoadBalancingExperiment

Source code: https://github.com/chanlevan/CPFederation

VI. Conclusion and Future Works

20

VI. Conclusion and Future Works

Better Performance of handling multiple queries

Federating CQELS+ engines with different load-

balancing strategies

20

VI. Conclusion and Future Works

Better Performance of handling multiple queries

Federating CQELS+ engines with different load-

balancing strategies

CQELS+:Reduce Query Registration Time

Distributed model: More efficient load-balancing

strategies

20

VI. Conclusion and Future Works

Better Performance of handling multiple queries

Federating CQELS+ engines with different load-

balancing strategies

CQELS+:Reduce Query Registration Time

Distributed model: More efficient load-balancing

strategies

20

Recommended