37
3/8/2012 Data Streams: Lecture 15 1 CS 410/510 Data Streams Lecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring Massive Network Data Streams Kristin Tufte, David Maier

Kristin Tufte, David Maier

  • Upload
    bryson

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

CS 410/510 Data Streams Lecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring Massive Network Data Streams. Kristin Tufte, David Maier . How Soccer Players Would do Stream Joins. Handshake Join Evaluate window-based stream joins Highly parallelizable - PowerPoint PPT Presentation

Citation preview

Page 1: Kristin Tufte, David Maier

Data Streams: Lecture 15 13/8/2012

CS 410/510Data StreamsLecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring Massive Network Data Streams

Kristin Tufte, David Maier

Page 2: Kristin Tufte, David Maier

Data Streams: Lecture 15 2

How Soccer Players Would do Stream Joins Handshake Join

Evaluate window-based stream joins Highly parallelizable Implementation on multi-core machine and

FPGA Previous stream join execution

strategies Sequential execution based on operational

semantics

3/8/2012

Page 3: Kristin Tufte, David Maier

Data Streams: Lecture 15 3

Let’s talk about stream joins

Join window of R with window of S Focus on sliding windows here

Scan, Insert, Invalidate How might I parallelize?

Partition and replicate Time-based windows vs. tuple-based

windows3/8/2012

Figure Credit: How Soccer Players Would do Stream Joins – Teubner,

Mueller, Sigmod 2011

Page 4: Kristin Tufte, David Maier

Data Streams: Lecture 15 4

So, Handshake Join…

3/8/2012

Stream Join

Input A Input B

Handshake Join

Traditional Stream Join

Entering tuple pushes oldest tuple out

No central coordination Same semantics May introduce disorder

Parallelization needs partitioning; possibly replication

Needs central coordination

Figure Credit : How Soccer Players Would do Stream Joins – Teubner,

Mueller, Sigmod 2011

Page 5: Kristin Tufte, David Maier

Data Streams: Lecture 15 5

Parallelization

Each core gets a segment of each window

Data flow: act locally on new data arrival and passing on data

Good for shared-nothing setups Simple communication – interact with

neighbors; avoid bottlenecks

3/8/2012Figure Credit: How Soccer Players Would do Stream Joins –

Teubner, Mueller, Sigmod 2011

Page 6: Kristin Tufte, David Maier

Data Streams: Lecture 15 6

Parallelization - Observations Parallelizes tuple-based windows and

non equi-join predicates As written, compares all tuples – could

hash at each node to optimize Note data transfer costs between cores

and each tuple is processed at each core

Soccer players have short arms, hardware is NUMA

3/8/2012Figure Credit: How Soccer Players Would do Stream Joins –

Teubner, Mueller, Sigmod 2011

Page 7: Kristin Tufte, David Maier

Data Streams: Lecture 15 7

Scalability Data flow + point-to-point

communication Add’l cores: larger window sizes or

reduce workload per core “directly turn any degree of parallelism

into higher throughput or larger supported window sizes”

“can trivially be scaled up to handle larger join windows, higher throughput rates, or more compute-intensive join predicates”3/8/2012

Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

Page 8: Kristin Tufte, David Maier

Data Streams: Lecture 15 8

Encountering Tuples Item in either

window, encounters all current times in the other window

Immediate scan strategy

Flexible segment boundaries (cores)

Other local implementations

3/8/2012

Figure : How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011

Page 9: Kristin Tufte, David Maier

Data Streams: Lecture 15 9

Handshake Join with Message Passing

Lock-step processing (tuple-based windows)

FIFO queues with message passing Missed join-pair

3/8/2012

Page 10: Kristin Tufte, David Maier

Data Streams: Lecture 15 10

Two-phase forwarding Asymmetric

synchronization (replication on one core only)

Keep copies of forwarded tuples until ack received

Ack for s4 must be processed between r5 and r6

3/8/2012

Page 11: Kristin Tufte, David Maier

Data Streams: Lecture 15 11

Load Balancing & Synchronization

3/8/2012

Even distribution not needed for correctness

Maintain mostly even-sized local S windows

Synch at pipeline ends to manage windows

Page 12: Kristin Tufte, David Maier

Data Streams: Lecture 15 12

FPGA Implementation

Tuple-based windows that fit into memory

Common clock signal; lock-step processing

Nested-loops join processing3/8/2012

Page 13: Kristin Tufte, David Maier

Data Streams: Lecture 15 13

Performance

3/8/2012

Scalability on Multi-Core CPU

Scalability on FPGAs; 8 tuples/window

Page 14: Kristin Tufte, David Maier

Data Streams: Lecture 15 14

Before we move on…

Soccer joins focuses on sliding windows How would their algorithm and

implementation work for tumbling windows?

What if we did tumbling windows only?

3/8/2012

Page 15: Kristin Tufte, David Maier

Query-Aware Partitioning for Monitoring Massive Network Data Streams OC-786 Networks

100 million packets/sec 2x40 Gbit/sec

Query plan partitioning Issues: “heavy” operators, non-uniform

resource consumption Data stream partitioning

3/8/2012 Data Streams: Lecture 15 15

Page 16: Kristin Tufte, David Maier

Data Streams: Lecture 15 16

Let’s partition the data…

Computes packet summaries between src and dest for network monitoring

Round robin partitioning -> worst case a single flow results in n partial flows

3/8/2012

SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len),

MIN(timestamp), MAX(timestamp) ...FROM TCPGROUP BY time, srcIP, destIP, srcPort, destPort

Page 17: Kristin Tufte, David Maier

And, we might want a HAVING…

Round robin partitioning -> no node can apply HAVING

CPU and network load on final aggregator is high

3/8/2012 Data Streams: Lecture 15 17

SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len),

MIN(timestamp), MAX(timestamp) ...FROM TCPGROUP BY time, srcIP, destIP, srcPort, destPortHAVING OR_AGGR(flags) = ATTACK_PATTERN

Page 18: Kristin Tufte, David Maier

So, let’s partition better…

What about partitioning on : srcIP, destIP, srcPort, destPort (partition flows)? Yeah! Nodes can compute and apply

HAVING locally … But, what if I have more than one

query? 3/8/2012 Data Streams: Lecture 15 18

SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len),

MIN(timestamp), MAX(timestamp) ...FROM TCPGROUP BY time, srcIP, destIP, srcPort, destPortHAVING OR_AGGR(flags) = ATTACK_PATTERN

Page 19: Kristin Tufte, David Maier

Data Streams: Lecture 15 19

But I need to run lots of queries… Large number of simultaneous queries

are common (i.e. 50) Subqueries place different requirements

on partitioning Dynamic repartitioning for each query?

That’s what the parallel DBs do… Splitting 80 Gbit/sec -> specialized network

hardware Partition stream once and only once…

3/8/2012

Page 20: Kristin Tufte, David Maier

Data Streams: Lecture 15 20

Partitioning Limitations Program partitioning in FPGAs

TCP fields (src, dest IP) - ok Fields from HTTP – not ok

Can’t re-partition every time the workload changes

3/8/2012

Page 21: Kristin Tufte, David Maier

Data Streams: Lecture 15 21

Query-Aware Partitioning Analysis framework

Determine optimal partitioning Partition-aware distributed query

optimizer Takes advantage of existing partitions

3/8/2012

Page 22: Kristin Tufte, David Maier

Query-Aware Partitioning Analysis framework

Determine optimal partitioning Partition-aware distributed query

optimizer Takes advantage of existing partitions

Compatible partitioning Maximizes amount of data reduction done

locally Formal definition of compatible partitioning Compatible partitioning – aggregations &

joins3/8/2012 Data Streams: Lecture 15 22

Page 23: Kristin Tufte, David Maier

Data Streams: Lecture 15 23

GS Uses Tumbling Windows (only)

3/8/2012

SELECT tb, srcIP, destIP, sum(len)FROM PKTGROUP BY time/60 as tb, srcIP, destIP

SELECT time, PKT1.srcIp, PKT1.destIP, PKT1.len + PKT2.lenFROM PKT1 JOIN PKT2WHERE PKT1.time = PKT2.time and PKT1.srcIP = PKT2.srcIP and PKT1.destIP = PKT2.destIP

Time attribute is ordered (increasing)

Page 24: Kristin Tufte, David Maier

Data Streams: Lecture 15 24

Query Example

3/8/2012

flows:SELECT tb, srcIP, destIP, COUNT(*) as cntFROM TCPGROUP BY time/60 as tb, srcIP, destIP

heavy_flows:SELECT tb, srcIP, max(cnt) as max_cntFROM flowsGROUP BY tb, srcIP

flow_pairs:SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cntFROM heavy_flows S1, heavy_flows S2WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1

Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

Page 25: Kristin Tufte, David Maier

Data Streams: Lecture 15 25

Query Example

3/8/2012

flows:SELECT tb, srcIP, destIP, COUNT(*) as cntFROM TCPGROUP BY time/60 as tb, srcIP, destIP

heavy_flows:SELECT tb, srcIP, max(cnt) as max_cntFROM flowsGROUP BY tb, srcIP

flow_pairs:SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cntFROM heavy_flows S1, heavy_flows S2WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1

Which partitioning scheme is optimal for each of the queries?

Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

Page 26: Kristin Tufte, David Maier

Query Example

3/8/2012 Data Streams: Lecture 15 26

flows:SELECT tb, srcIP, destIP, COUNT(*) as cntFROM TCPGROUP BY time/60 as tb, srcIP, destIP

heavy_flows:SELECT tb, srcIP, max(cnt) as max_cntFROM flowsGROUP BY tb, srcIP

flow_pairs:SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cntFROM heavy_flows S1, heavy_flows S2WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1

How to reconcile potentially conflicting partitioning requirements?

Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008

Page 27: Kristin Tufte, David Maier

Data Streams: Lecture 15 27

Query Example

3/8/2012

flows: SELECT tb, srcIP, destIP, COUNT(*) as cntFROM TCPGROUP BY time/60 as tb, srcIP, destIP

heavy_flows:SELECT tb, srcIP, max(cnt) as max_cntFROM flowsGROUP BY tb, srcIP

flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cntFROM heavy_flows S1, heavy_flows S2WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1

How can we use information about existing partitioning in a distributed query optimizer? Figure Credit: Query-Aware Partitioning for Monitoring Massive

Network Data Streams, Johnson, et al. SIGMOD 2008

Page 28: Kristin Tufte, David Maier

Data Streams: Lecture 15 28

What if we could only partition on destIP?

3/8/2012Figure Credit: Query-Aware Partitioning for Monitoring Massive

Network Data Streams, Johnson, et al. SIGMOD 2008

Page 29: Kristin Tufte, David Maier

Data Streams: Lecture 15 29

Partition compatibility

Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union

(srcIP, destIP, srcPort, destPort) can’t aggregate locally

3/8/2012

SELECT tb, srcIP, destIP, sum(len)FROM PKTGROUP BY time/60 as tb, srcIP, destIP

Page 30: Kristin Tufte, David Maier

Data Streams: Lecture 15 30

Partition compatibility

Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union

(srcIP, destIP, srcPort, destPort) can’t aggregate locally

P is Compatible with Q if for every time window, the output of Q is equal to a stream union of the output of Q running on partitions produced by P3/8/2012

SELECT tb, srcIP, destIP, sum(len)FROM PKTGROUP BY time/60 as tb, srcIP, destIP

Page 31: Kristin Tufte, David Maier

Data Streams: Lecture 15 31

Should we partition on temporal attributes? If we partition on temporal atts:

Processor allocation changes with time epochs

May help avoid bad hash fcns Might lead to incorrect results if using panes Tuples correlated in time tend to be

correlated on temporal attribute – bad for load balancing

Exclude temporal attr from partitioning

3/8/2012

Page 32: Kristin Tufte, David Maier

Data Streams: Lecture 15 32

What partitionings work for aggregation queries?

Group-bys on scalar expressions of source input attr Ignore grouping on aggregations in lower-

level queries Any subset of a compatible partitioning is

also compatible

3/8/2012

SELECT expr1, expr2, .., exprn

FROM STREAM_NAMEWHERE tup_predicateGROUP BY temp_var, gb_var1, ..., gb_varm

HAVING group_predicate

Page 33: Kristin Tufte, David Maier

Data Streams: Lecture 15 33

What partitionings work for join queries?

3/8/2012

Equality predicates on scalar expressions of source stream attrs Any non-empty subset of a compatible partitioning

is also compatible Need to reconcile partitioning of S and R

SELECT expr1, expr2, .., exprn

FROM STREAM1 AS S{LEFT|RIGHT|FULL}[OUTER] JOIN STREAM2 as RWHERE STREAM1.ts = STREAM2.ts and STREAM1.var11 = STREAM2.var21 and STREAM1.var1k = STEAM2.var2k and other_predicates

Page 34: Kristin Tufte, David Maier

Data Streams: Lecture 15 34

Now, multiple queries…

3/8/2012

tcp_flows:SELECt tb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len)FROM TCPGROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort

flow_cnt:SELECt tb, srcIP, destIP, count(*)FROM tcp_flowsGROUP BY tb, srcIP, destIP

{sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)}

{sc_exp(srcIP), sc_exp(destIP)}

{sc_exp(srcIP), sc_exp(destIP)}Result:

Page 35: Kristin Tufte, David Maier

Data Streams: Lecture 15 35

Now, multiple queries…

3/8/2012

tcp_flows:SELECt tb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len)FROM TCPGROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort

flow_cnt:SELECt tb, srcIP, destIP, count(*)FROM tcp_flowsGROUP BY tb, srcIP, destIP

{sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)}

{sc_exp(srcIP), sc_exp(destIP)}

Fully compatible partitioning set likely to be empty

Partition to minimize cost of execution

Page 36: Kristin Tufte, David Maier

Data Streams: Lecture 15 36

Query Plan Transformation

3/8/2012Figure Credit: Query-Aware Partitioning for Monitoring Massive

Network Data Streams, Johnson, et al. SIGMOD 2008

Main idea: push aggregation operator below merge to allow aggregations to execute independently on partitions

Main idea: partial aggregates (think panes)

Page 37: Kristin Tufte, David Maier

Data Streams: Lecture 15 37

Performance

3/8/2012Figure Credit: Query-Aware Partitioning for Monitoring Massive

Network Data Streams, Johnson, et al. SIGMOD 2008