Upload
vincenzo-gulisano
View
327
Download
1
Embed Size (px)
Citation preview
2023-05-03 1
The benefits of fine-grained synchronization in deterministic and efficient stream processing
Vincenzo Gulisano, Yiannis Nikolakopoulos
Vincenzo Gulisano - Yiannis Nikolakopoulos
Chalmers University of technology
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 2
Agenda
• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and
Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other
use-cases• Research in progress• Conclusions
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 3
Agenda
• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and
Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other
use-cases• Research in progress• Conclusions
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 4
Who we are
Chalmers university
Computer Science and Engineering department
Distributed Computing and Systems research group
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 5
Who we are
• 12 PhD degrees awarded• ~20 researchers, 3 Postdocs and 10 PhD
students. • acknowledged internationally as leading
group in practical multicore synchronization algorithms and programming (results adopted by Java JDK, C++, Intel, NVIDIA)
• extensive network of academic and industrial collaborations and has been continuously supported by national and international projects
Distributed Computing and Systems research group
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 6
Who we areVincenzo Gulisano (PhD)Assistant Professor at Chalmers University of Technology
Research focus: • scalable big data analysis in cyber-physical systems
(Smart Grids, Advanced Metering Infrastructures and Vehicular Networks).
• streaming operators’ parallelization, load balancing, provisioning, decommissioning and fault tolerance protocols.
• enhancement of parallel stream operators by means of streaming- and energy-aware concurrent data structures that boost analysis performance (both in throughput and latency).
• applied use of the streaming paradigm in data validation, intrusion detection systems, privacy-preserving online analysis, distributed embedded networks and cloud infrastructures.
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 7
Who we areYiannis NikolakopoulosPhD Student Research interests:
• Parallel and distributed computing• Shared memory concurrent data structures• Multicore/Manycore systems • Consistency problems• Synchronization• Data Streaming:
operator parallelism & scalability
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 8
Agenda
• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and
Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other
use-cases• Research in progress• Conclusions
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 9
MotivationApplications in sensor networks, cyber-physical systems:• large and fluctuating volumes of data generated
continuouslydemand for:• Continuous processing of data streams• In a real-time fashion
Store-then-process is not feasible
Main Memory1 Data
Query Processing
3 Query results
2 Query
Main Memory
Query Processing
Disk Main Memory
Query Processing
ContinuousQuery Data Query
results
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos
System Model• Data Stream: unbounded sequence of tuples
– Example: Call Description Record (CDR)
time
Field Field
Caller text
Callee text
Time (secs) int
Price (€) double
A B 8:00 3 C D 8:20 7 A E 8:35 6
10
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos
System Model
• Operators:
OP Stateless1 input tuple1 output tuple
OP Stateful1+ input tuple(s)1 output tuple
11
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos
System Model
• Continuous Query: graph operators/streams
Convert€ $
Only> 10$
Count callsmade by eachCaller number
Map
Filter Agg
12
Count the number of calls whose price is more than 10 dollars made by each caller
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos
System Model
• Infinite sequence of tuples / bounded memory windows
• Example: 1 hour windows
time[8:00,9:00)
[8:20,9:20)
[8:40,9:40)
13
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos
System Model
• Infinite sequence of tuples / bounded memory windows
• Example: count tuples - 1 hour windows
time[8:00,9:00)
8:05 8:15 8:22 8:45 9:05
Output: 4
14
[8:20,9:20)
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 15
The evolution of SPEs
CentralizedSPEs
DistributedSPEs
Parallel-DistributedSPEs
ElasticSPEs
Inter-operator parallelism … …
Intra-operator parallelism
… …
+/- +/-
Scale up / down
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 16
Agenda
• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and
Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other
use-cases• Research in progress• Conclusions
2023-05-03 17
What is a stream join?
t1
t2
t3
t4
t1
t2
t3
t4
R S
Sliding window Window
size WSWSWR
Predicate P
2023-05-03 18
Why parallel stream joins?• WS = 600 seconds
• R receives 500 tuples/second• S receives 500 tuples/second
• WR will contain 300,000 tuples
• WS will contain 300,000 tuples
• Each new tuple from R gets compared with all the tuples in WS
• Each new tuple from S gets compared with all the tuples in WR
… 300,000,000 comparisons/second!
t1
t2
t3
t4
t1
t2
t3
t4
R S
WSWR
2023-05-03 19
Which are the challenges of a parallel stream join?
Scalability
High throughput
Low latency
Disjoint parallelism
Skew resilience
Determinism
2023-05-03 20
The 3-step procedure (sequential stream join)
For each incoming tuple t:1. compare t with all tuples in opposite window given predicate P2. add t to its window3. remove stale tuples from t’s window
Add tuples to S
Add tuples to R
Prod R
Prod S
Consume resultsConsPU
We assume each producer delivers tuples
in timestamp order
2023-05-03 21
The 3-step procedure, is it enough?
Scalability
High throughput
Low latency
Disjoint parallelism
Skew resilience
Determinism
t1
t2
t1
t2
R S
WSWR
t3
t1
t2
t1
t2
R S
WSWR
t4
t3
2023-05-03 22
Enforcing determinism in sequential stream joins
• Next tuple to process = earliest(tS,tR)
• The earliest(tS,tR) tuple is referred to as the next ready tuple
• Process ready tuples in timestamp order Determinism
PU
tS tR
2023-05-03 23
Deterministic 3-step procedure
Pick the next ready tuple t:1. compare t with all tuples in opposite window given predicate P2. add t to its window3. remove stale tuples from t’s window
Add tuples to S
Add tuples to R
Prod R
Prod S
Consume resultsConsPU
2023-05-03 24
Shared-nothing parallel stream join(state-of-the-art)
Prod R
Prod S
PU1
PU2
PUN
…
ConsAdd tuple to PUi S
Add tuple to PUi R
Consume results
Pick the next ready tuple t:1. compare t with all tuples in opposite window given P2. add t to its window3. remove stale tuples from t’s window
Chose a PU
Chose a PU
Take the next ready output tuple
Scalability
High throughput
Low latency
Disjoint parallelism
Skew resilience
Determinism
Merge
2023-05-03 25
Shared-nothing parallel stream join(state-of-the-art)
Prod R
Prod S
PU1
PU2
PUN…
enqueue()dequeue()
ConsMerge
2023-05-03 26
From coarse-grained to fine-grained synchronization
Prod R
Prod S
PU1
PU2
PUN
… Cons
2023-05-03 27
ScaleGate
addTuple(tuple,sourceID)allows a tuple from sourceID to be merged by ScaleGate in the resulting timestamp-sorted stream of ready tuples.
getNextReadyTuple(readerID)provides to readerID the next earliest ready tuple that has not been yet consumed by the former.
https://github.com/dcs-chalmers/ScaleGate_Java
2023-05-03 28
ScaleJoin
Prod R
Prod S
PU1
PU2
PUN
… ConsAdd tuple SGin
Add tuple SGin
Get next ready output tuplefrom SGout
Get next ready input tuple from SGin
1. compare t with all tuples in opposite window given P2. add t to its window in a round-robin fashion3. remove stale tuples from t’s window
SGin SGout
Steps for PU
2023-05-03 29
t1
t2
R S
WR
t3
t4
R S
t4
t1WR
R S
t4
t2
WR
R S
t4
WR
t3
Sequential stream join:
ScaleJoin with 3 PUs:
ScaleJoin (example)
2023-05-03 30
ScaleJoin
Prod R
Prod S
PU1
PU2
PUN
… ConsAdd tuple SGin
Add tuple SGin
Get next ready output tuplefrom SGout
SGin SGout
Scalability
High throughput
Low latency
Disjoint parallelism
Skew resilience
Determinism
Prod S
Prod S
Prod R Get next ready input tuple from SGin
1. compare t with all tuples in opposite window given P2. add t to its window in a round robin fashion3. remove stale tuples from t’s window
Steps for PUi
2023-05-03 32
ScaleJoin Scalability – comparisons/second
Number of PUs
4 billion comparison / second!!!
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 35
Agenda
• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and
Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other
use-cases• Research in progress• Conclusions
2023-05-03 36
addTuple(tuple,sourceID)allows a tuple from sourceID to be merged by ScaleGate in the resulting timestamp-sorted stream of ready tuples.
getNextReadyTuple(readerID)provides to readerID the next earliest ready tuple that has not been yet consumed by the former.
Vincenzo Gulisano - Yiannis Nikolakopoulos
ScaleGate: Main Functionality
• addTuple(tuple,sourceID)– Insert tuple from sourceID
effectively in ts order
<6,…>
<1,…>
<3,…>Insert
concurrently
Vincenzo Gulisano - Yiannis Nikolakopoulos
ScaleGate: Main Functionality
• getNextReadyTuple(readerID)– readerID gets the next ready tuple t in ts order
i.e. no tuple with ts<t.ts will arrive afterwards– return null if no ready tuple
<1,…>
Get ready tuples
<6,…>
<3,…>
<7,…>
<9,…>
<8,…>
Vincenzo Gulisano - Yiannis Nikolakopoulos
Implementation?
• getNextReadyTuple(readerID)– we get tuples in ts order,
so maybe an extractMin()? (priority queue)– still no guarantee that t is ready
• addTuple(tuple, sourceID)– sorted insertion maybe could help
<6,…> <1,…><3,…>
Is there a concurrent data structure with cheap extractMin() + insert()
Binary Search Trees,Heaps:
Expensive rebalancing and heap property
Vincenzo Gulisano - Yiannis Nikolakopoulos
ScaleGate Anatomy (1)
• Inspired from lock-free skip lists– randomized height of nodes– expected cost for search/insertion O(logN)
• Search by traversing from higher to lower levels BigData’15 [GNPT]
Vincenzo Gulisano - Yiannis Nikolakopoulos
ScaleGate Anatomy (2)
• Reader-local view of "head" (also has minimum ts for that reader)
• Flagging mechanism: – If "head" is not flagged can be safely returned – Flag the last written tuple of each source
• Nodes free to be garbage-collected after every reader passes (almost...)
head0 head1
BigData’15 [GNPT]
Vincenzo Gulisano - Yiannis Nikolakopoulos
ScaleGate Properties
• Safety: Linearizable– Every operation appears to take effect
instantaneously• Progress: Lock-free
– At least one thread makes progress, independently of the state of others
• Basic building block for providing deterministic processing (through the ready semantics)
Vincenzo Gulisano - Yiannis Nikolakopoulos
Why not a Skip List?
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 44
Agenda• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient
Stream Join• The ScaleGate data structure• Fine-grained synchronization and other
use-cases– Streaming Aggregation– Towards real-time analytics– Some ongoing work
• Research in progress• Conclusions
Vincenzo Gulisano - Yiannis Nikolakopoulos
Timestamp sorted
Multi-way Streaming Aggregation
1 5
Aggregate F
3 7
5 9
<6,…>
<1,…>
<3,…>
When to process a tuple?
Tuples not in TS order across streams;not ready for
processing at arrival…
Merge & sort streams
Timestamp sorted tuples per stream
Multi-way Streaming Aggregation(baseline [Streamcloud, Borealis])
Vincenzo Gulisano - Yiannis Nikolakopoulos
Aggregate F:Update
windows and output<6,…>
<1,…>
<3,…>
When to process a tuple?Keep checkingall buffers!!!
Merge & sort streamsCan Data
Structures do more than insert
and extract?
Data Structures to the Rescue
<6,…>
<1,…>
<3,…>Insert
concurrently
Concurrently update
windows
Get ready tuples
Manage sorted windows
Output tuples
ScaleGate objects allow concurrent access with - consistency/safety , progress guarantees - determinism- appropriate interface parallelization of pipeline stages
Contribution:SPAA’14 [CGNPT]
“Brief announcement: concurrent data structures for
efficient streaming aggregation”
Vincenzo Gulisano - Yiannis Nikolakopoulos
Aggregate Scaling
Vincenzo Gulisano - Yiannis Nikolakopoulos
Latency with Increasing# Input Streams
Vincenzo Gulisano - Yiannis Nikolakopoulos
Best Solution Award
Analyze taxi trip reports from NYC and compute:
ACM DEBS 2015 [GNWPT]“Deterministic Real-Time
Analytics of Geospatial Data Streams through ScaleGate
Objects”
Vincenzo Gulisano - Yiannis Nikolakopoulos
System Architecture Overview
Vincenzo Gulisano - Yiannis Nikolakopoulos
• Networks on chip (NoC)• Short distance
between cores• Message passing
model support• Shared memory support
Can we have Data Structures:Fast
ScalableGood progress guarantees
Cache Cache
IA Core
Shared Local
• Eliminatedcache coherency
• Limited support for synchronization primitives
What’s happening in hardware?
Vincenzo Gulisano - Yiannis Nikolakopoulos
Off-chip DRAM
Adapteva’s Epiphany Architecture:16 or 64 cores
core
core
On-chip32kb per
core
core
core core
core
Cores communicate through mesh by reading and writing the on-chip memory
Distributed shared memory
Vincenzo Gulisano - Yiannis Nikolakopoulos
Ongoing work with Ericsson:ScaleGate on Epiphany
• Task-graph based processing• (Baseband) signal processing applications
Can ScaleGate help?
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 55
Agenda
• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and
Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other use-cases• Research in progress• Conclusions
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 56
Source: http://storm.apache.org/
Source: http://storm.apache.org/
… …
Intra-operator parallelism
Storm is a parallel-distributed Stream Processing Engine• Data streaming operators
Spout / Bolt• Splitting of work into tasks / workers
Intra-operator parallelism
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos
Parallelization
• General approach
LB: Load BalancerIM: Input Merger
OPA OPB
…… …
…OPALB
IM
Node 1
OPALB
IM
Node m
…
Subcluster A
OPBLB
IM
Node 1
OPBLB
IM
Node n
…
Subcluster B
57
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos
Parallelization• Stateful operators: Semantic awareness
– Aggregate: count within last hour, group-by caller number
Previous Subcluster
LB…
LB…
IM
Agg1
IM
Agg2
IM
Agg3
…
…
…
Caller A
A B 8:00 3
58
A C 8:30 5
A D 9:05 2
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos
Parallelization
Previous Subcluster
LB…
LB…
IM
Agg1
IM
Agg2
IM
Agg3
…
…
…
59
Source: http://storm.apache.org/
Fields groupingSource: http://storm.apache.org/
User defined
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 60
On-going researchSource: http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/
Coarse-grained synchronization!!!• Individual queues for
executors• 2 threads per executor• Dedicated Receive and Send
thread queues…
… What about fine-grained synchronization?
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 61
Agenda
• Who we are• Motivation and System Model• ScaleJoin: a Deterministic, Disjoint-Parallel and
Skew-Resilient Stream Join• The ScaleGate data structure• Fine-grained synchronization and other
use-cases• Research in progress• Conclusions
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 62
1.Parallel stream joins2.Parallel stream aggregation3.“Ad-hoc” streaming applications4.Determinism in Storm…
https://github.com/dcs-chalmers/ScaleGate_Java
2023-05-03 Vincenzo Gulisano - Yiannis Nikolakopoulos 63
The benefits of fine-grained synchronization in deterministic and efficient stream processing
Vincenzo Gulisano, Yiannis Nikolakopoulos
Chalmers University of technology
Thank you! Questions?
Vincenzo Gulisano - Yiannis Nikolakopoulos
References• Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou, and
Philippas Tsigas, “ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join”, to appear in IEEE BigData 2015
• Vincenzo Gulisano, Yiannis Nikolakopoulos, Ivan Walulya, Marina Papatriantafilou, Philippas Tsigas, “Deterministic Real-time Analytics of Geospatial Data Streams Through ScaleGate Objects”, 9th ACM International Conference on Distributed Event-Based Systems (DEBS '15)
• Yiannis Nikolakopoulos, Anders Gidenstam, Marina Papatriantafilou, Philippas Tsigas “A Consistency Framework for Iteration Operations in Concurrent Data Structures”, 2015 IEEE 29th International Symposium on Parallel & Distributed Processing (IPDPS)
• Daniel Cederman, Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou, and Philippas Tsigas “Brief announcement: concurrent data structures for efficient streaming aggregation”, 2014 26th ACM symposium on Parallelism in algorithms and architectures (SPAA '14)