Heartbeat Mechanism and its Applications in Gigascope Vladislav Shkapenyuk (speaker), Muthu S. Muthukrishnan Rutgers University Theodore Johnson Oliver

Heartbeat Mechanism and its Applications in Gigascope

Vladislav Shkapenyuk (speaker),Muthu S. Muthukrishnan

Rutgers University

Theodore JohnsonOliver Spatscheck

AT&T Labs – Research

Unblocking streaming operators

• Data stream management systems (DSMS) work with infinite stream of tuples

• How to get answers out of join, aggregation, etc., before the end of time?– limit the scope of output tuples which input tuple can

affect

• Two views– define a window over the input streams for the

blocking operators (STREAM, TelegraphCQ)– use a pipelined operator, make use of an existing sort

order (Gigascope, Tribeca)• most queries make reference to timestamps

Unblocking streaming operators• Some stream attributes are labeled with temporal

properties (e.g monotone increasing)• In aggregation query one grouping attribute must

have a timestampness :

SELECT tb, srcIP, count(*) FROM TCP GROUP BY time/60 as tb, srcIP

tb is infered to be monotone increasing too

• Similarly stream merge (union) and join also need to have a set of attributes that have temporal properties

What if a data streams stalls?• Consider a query that

merges multiple streams

• Presence of tuples carries temporal information, absence doesn't– memory overflow at merge

• Similar issues with every operator with multiple input streams (e.g. joins)

Low-level Aggregation

backup


main

Stream Merge

High-level Aggregation

Stream Punctuations• Unblock operators by embedding special marks

in the stream– indicate the end of the subset of the data

• Stalled stream can notify the parent about the end of the epoch

Lots of issues- How these punctuations can be generated and

propagated?- How do we integrate such a mechanism into high-

performance DSMS?

Gigascope Architecture

App

high high

lowlowlow

NIC

ring buffer

• DSMS designed for monitoring high-rate data streams– pure stream database (no stored

relations or continuous queries)– pipelined operators that rely on

temporal properties of the stream

• Two layer architecture for early data reduction– fast lightweight data reduction

queries (LFTA)– high level queries for expensive

processing (HFTA)

Pipelined Operators• Aggregation:


• Merge operator performs a union of two streams R and S in a way that preserves timestamps:MERGE R.tb : S.tbFROM Inpackets R, Outpackets S

• A join query on streams R and S must contain a join predicate such as R.tb=S.tb :SELECT R.sourceIP, R.tb, R.length_sum + S.length_sumOUTER_JOIN from Inpackets R, Outpackets Swhere R.sourceIP = S.destIP and R.tb = S.tb

Gigascope heartbeats

• Initially designed to collect statistics about operator load

Low-level operators

High-level operators

• Special messages propagated using regular tuple routing mechanism

- performance monitoring

- failure detection

Unblocking operators using heartbeats

• Stream punctuation mechanism– injects special temporal update tuples into operator’s

output stream– notifies the operator about the end of subset of a data

(end of the time window on aggregations, stream merge and joins operate)

• Heartbeats are the perfect vehicles for carrying the temporal update tuples– regular propagation through operator DAG– unblocks all operators on its way in timely manner

Temporal update tuples• Temporal update tuples generated by operator

have a schema identical to regular tuple – only values of temporal attributes are initialized (the

rest is ignored)– future tuples are guaranteed not to violate temporal

properties of the stream

Operator output schema:(Timebucket, SrcIP, DestIP, PacketCount)

Timebucket is monotone increasing

Temporal tuple(T, Unitlitialized, Unitlitialized, Unitlitialized)

– guarantees that all future tuples will have value of Timebucket >= T

Heartbeat generation

• Naïve solution– operators emit last produced tuple cast as a

temporal tuple– too conservative to be useful – heartbeats

don’t carry any additional information

• Goal: aggressively generate the values of temporal attributes– set attributes to maximum values we can

safely guarantee

Heartbeat generation• Two approaches

– infer the values of temporal update tuples based on tuples operator received so far

– infer based on system time

• Inference based on received tuples– works when operators observe some tuples but they

might be filtered out by selection predicates– works on every level of query execution

• Inference based on system clock– works even with completely stalled streams– only for time based temporal attributes– potentially dangerous

Inferring temporal attributes• Every operator maintains state required to

correctly generate temporal update tuples– last seen values of all temporal attributes referenced

in select clause– operator specific state

• Attribute values for temporal tuples are computed using inference rules


If last seen value of time is X, infer that the value of tb for temporal update tuple should be X/60

Inferring temporal attributes

• What if the stream is completely stalled?– cannot advance values of temporal attributes

• Inference based on system time– works in the temporal attribute can be correlated with

system clock (usually the case in network streams)– unsafe for high level operators (need to reason about

propagation delays)– need to be careful about the clock skew

• Gigascope uses skew information entered by admin to infer the values of temporal attributes

Selection & merge operators• Selection operator (filtering):

– save the last seen values of temporal attributes regardless of whether tuple passes selection predicate

• Merge (stream union):– combines multiple streams while preserving ordering

properties• Requires buffering of input streams

– maintains minimum timestamp values observed by every input

• S1_ max, S2_max, … Sn_max

– Uses MIN(S1_ max, S2_max, … Sn_max) to generate temporal update tuple

Aggregation & sampling operator

• Maintains hash table of aggregates for current time window– when the time window advances the table content is

flushed– uses traffic shaping (slow flush) to avoid flushing

excessive amounts of data

• Slow flush can lead to incorrect generation of temporal tuples– if there is some unflushed tuples in hash table,

generate temporal tuples based on unflushed tuples– otherwise uses last seen values saved by operator

Join operators

• Stream join between R and S relates timestamp from R to timestamp in S (e.g. R.ts = S.ts)– critical for guaranteeing bounded memory– supports inner and,right,and full outer equi-joins

• Maintains maximum values of timestamps observed on each stream (Rmax and Smax)– Rmax and Smax can be composite structures storing

max values of all attributes that a part of timestamp

• Infers the values of attributes of temporal update tuples based on MIN(Rmax, Smax)

Experimental Evaluation

• Two main data feeds– DAG4.3GE Gigabit Ethernet interfaces– 100,000 packets/sec (about 400Mbit/sec)

• One low-rate control data feed– 100Mbit interface– Good representative of backup interface

• Dual 2.8 GHz P4 server w/ 4 GB of RAM, FreeBSD 4.8

Merge Query

SELECT tb, protocol, srcIP, destIP, srcPort, destPort, count(*)

FROM DataProtocol

GROUP BY time/10 as tb, protocol, srcIP, destIP, srcPort, destPort


control


main1


main2

Stream Merge

Stream Merge


Performance EvaluationQuery memory usage

0

100

200

300

400

500

0 5 10 15 20 25 30 35

Hearbeat interval (sec)

Me

mo

ry u

sa

ge

(M

B)

Outer Join QueryQuery flow1:SELECT tb, protocol, srcIP, destIP, srcPort, destPort, count(*) as cntFROM [main0_and_control].DataProtocol GROUP BY time/10 as tb,protocol,srcIP,destIP,srcPort,destPort; Query flow2:SELECT tb, protocol, srcIP, destIP, srcPort, destPort, count(*) as cnt FROM main1.DataProtocolGROUP BY time/10 as tb, protocol, srcIP, destIP, srcPort, destPort; Query full_flow:SELECT flow1.tb, flow1.protocol, flow1.srcIP, flow1.destIP, flow1.srcPort,

flow1.destPort, flow1.cnt, flow2.cntOUTER_JOIN FROM flow1, flow2 WHERE flow1.srcIP=flow2.srcIP and flow1.destIP=flow2.destIP andflow1.srcPort=flow2.srcPort and flow1.destPort=flow2.destPort and flow1.protocol=flow2.protocol and flow1.tb = flow2.tb

Outer Join Query


backup


main1


main2

Stream Merge

Outer Join



Performance Evaluation

Query memory usage

0

100

200

300

400

500

600

0 10 20 30 40 50 60 70

Hearbeat interval (sec)

Me

mo

ry u

sa

ge

(M

B)

CPU load w/ heartbeats enabled – 37.5%

w/ heartbeats disabled – 37.3%

Other heartbeat applications• Fault tolerance

– Heartbeats regularly propagate through query DAGs– Easy detection of failed nodes

• System performance analysis– Every heartbeat message is timestamped by

receiving node– Timestamp traces are perfect for analyzing queuing

delays

• Distributed query optimization– Every heartbeat message carries runtime statistics

(operator selectivities, sampling rates, in/out rates, memory footprint, etc)

– Collected statistics can be fed to distributed query optimizer

Conclusions• Punctuation carrying heartbeats

– effective at unblocking streaming operators on all levels

– significantly reduce query memory utilization– capable at working on multiple Gigabit line speeds

• Variety of other uses– fault tolerance, performance analysis, distributed

query optimization

• Part of production version of Gigascope

Documents

Heartbeat Mechanism and its Applications in Gigascope Vladislav Shkapenyuk (speaker), Muthu S. Muthukrishnan Rutgers University Theodore Johnson Oliver