Upload
harvey-ball
View
217
Download
0
Embed Size (px)
Citation preview
Heartbeat Mechanism and its Applications in Gigascope
Vladislav Shkapenyuk (speaker),Muthu S. Muthukrishnan
Rutgers University
Theodore JohnsonOliver Spatscheck
AT&T Labs – Research
Unblocking streaming operators
• Data stream management systems (DSMS) work with infinite stream of tuples
• How to get answers out of join, aggregation, etc., before the end of time?– limit the scope of output tuples which input tuple can
affect
• Two views– define a window over the input streams for the
blocking operators (STREAM, TelegraphCQ)– use a pipelined operator, make use of an existing sort
order (Gigascope, Tribeca)• most queries make reference to timestamps
Unblocking streaming operators• Some stream attributes are labeled with temporal
properties (e.g monotone increasing)• In aggregation query one grouping attribute must
have a timestampness :
SELECT tb, srcIP, count(*) FROM TCP GROUP BY time/60 as tb, srcIP
tb is infered to be monotone increasing too
• Similarly stream merge (union) and join also need to have a set of attributes that have temporal properties
What if a data streams stalls?• Consider a query that
merges multiple streams
• Presence of tuples carries temporal information, absence doesn't– memory overflow at merge
• Similar issues with every operator with multiple input streams (e.g. joins)
Low-level Aggregation
backup
Low-level Aggregation
main
Stream Merge
High-level Aggregation
Stream Punctuations• Unblock operators by embedding special marks
in the stream– indicate the end of the subset of the data
• Stalled stream can notify the parent about the end of the epoch
Lots of issues- How these punctuations can be generated and
propagated?- How do we integrate such a mechanism into high-
performance DSMS?
Gigascope Architecture
App
high high
lowlowlow
NIC
ring buffer
• DSMS designed for monitoring high-rate data streams– pure stream database (no stored
relations or continuous queries)– pipelined operators that rely on
temporal properties of the stream
• Two layer architecture for early data reduction– fast lightweight data reduction
queries (LFTA)– high level queries for expensive
processing (HFTA)
Pipelined Operators• Aggregation:
SELECT tb, srcIP, count(*) FROM TCP GROUP BY time/60 as tb, srcIP
• Merge operator performs a union of two streams R and S in a way that preserves timestamps:MERGE R.tb : S.tbFROM Inpackets R, Outpackets S
• A join query on streams R and S must contain a join predicate such as R.tb=S.tb :SELECT R.sourceIP, R.tb, R.length_sum + S.length_sumOUTER_JOIN from Inpackets R, Outpackets Swhere R.sourceIP = S.destIP and R.tb = S.tb
Gigascope heartbeats
• Initially designed to collect statistics about operator load
Low-level operators
High-level operators
• Special messages propagated using regular tuple routing mechanism
- performance monitoring
- failure detection
Unblocking operators using heartbeats
• Stream punctuation mechanism– injects special temporal update tuples into operator’s
output stream– notifies the operator about the end of subset of a data
(end of the time window on aggregations, stream merge and joins operate)
• Heartbeats are the perfect vehicles for carrying the temporal update tuples– regular propagation through operator DAG– unblocks all operators on its way in timely manner
Temporal update tuples• Temporal update tuples generated by operator
have a schema identical to regular tuple – only values of temporal attributes are initialized (the
rest is ignored)– future tuples are guaranteed not to violate temporal
properties of the stream
Operator output schema:(Timebucket, SrcIP, DestIP, PacketCount)
Timebucket is monotone increasing
Temporal tuple(T, Unitlitialized, Unitlitialized, Unitlitialized)
– guarantees that all future tuples will have value of Timebucket >= T
Heartbeat generation
• Naïve solution– operators emit last produced tuple cast as a
temporal tuple– too conservative to be useful – heartbeats
don’t carry any additional information
• Goal: aggressively generate the values of temporal attributes– set attributes to maximum values we can
safely guarantee
Heartbeat generation• Two approaches
– infer the values of temporal update tuples based on tuples operator received so far
– infer based on system time
• Inference based on received tuples– works when operators observe some tuples but they
might be filtered out by selection predicates– works on every level of query execution
• Inference based on system clock– works even with completely stalled streams– only for time based temporal attributes– potentially dangerous
Inferring temporal attributes• Every operator maintains state required to
correctly generate temporal update tuples– last seen values of all temporal attributes referenced
in select clause– operator specific state
• Attribute values for temporal tuples are computed using inference rules
SELECT tb, srcIP, count(*) FROM TCP GROUP BY time/60 as tb, srcIP
If last seen value of time is X, infer that the value of tb for temporal update tuple should be X/60
Inferring temporal attributes
• What if the stream is completely stalled?– cannot advance values of temporal attributes
• Inference based on system time– works in the temporal attribute can be correlated with
system clock (usually the case in network streams)– unsafe for high level operators (need to reason about
propagation delays)– need to be careful about the clock skew
• Gigascope uses skew information entered by admin to infer the values of temporal attributes
Selection & merge operators• Selection operator (filtering):
– save the last seen values of temporal attributes regardless of whether tuple passes selection predicate
• Merge (stream union):– combines multiple streams while preserving ordering
properties• Requires buffering of input streams
– maintains minimum timestamp values observed by every input
• S1_ max, S2_max, … Sn_max
– Uses MIN(S1_ max, S2_max, … Sn_max) to generate temporal update tuple
Aggregation & sampling operator
• Maintains hash table of aggregates for current time window– when the time window advances the table content is
flushed– uses traffic shaping (slow flush) to avoid flushing
excessive amounts of data
• Slow flush can lead to incorrect generation of temporal tuples– if there is some unflushed tuples in hash table,
generate temporal tuples based on unflushed tuples– otherwise uses last seen values saved by operator
Join operators
• Stream join between R and S relates timestamp from R to timestamp in S (e.g. R.ts = S.ts)– critical for guaranteeing bounded memory– supports inner and,right,and full outer equi-joins
• Maintains maximum values of timestamps observed on each stream (Rmax and Smax)– Rmax and Smax can be composite structures storing
max values of all attributes that a part of timestamp
• Infers the values of attributes of temporal update tuples based on MIN(Rmax, Smax)
Experimental Evaluation
• Two main data feeds– DAG4.3GE Gigabit Ethernet interfaces– 100,000 packets/sec (about 400Mbit/sec)
• One low-rate control data feed– 100Mbit interface– Good representative of backup interface
• Dual 2.8 GHz P4 server w/ 4 GB of RAM, FreeBSD 4.8
Merge Query
SELECT tb, protocol, srcIP, destIP, srcPort, destPort, count(*)
FROM DataProtocol
GROUP BY time/10 as tb, protocol, srcIP, destIP, srcPort, destPort
Low-level Aggregation
control
Low-level Aggregation
main1
Low-level Aggregation
main2
Stream Merge
Stream Merge
High-level Aggregation
Performance EvaluationQuery memory usage
0
100
200
300
400
500
0 5 10 15 20 25 30 35
Hearbeat interval (sec)
Me
mo
ry u
sa
ge
(M
B)
Outer Join QueryQuery flow1:SELECT tb, protocol, srcIP, destIP, srcPort, destPort, count(*) as cntFROM [main0_and_control].DataProtocol GROUP BY time/10 as tb,protocol,srcIP,destIP,srcPort,destPort; Query flow2:SELECT tb, protocol, srcIP, destIP, srcPort, destPort, count(*) as cnt FROM main1.DataProtocolGROUP BY time/10 as tb, protocol, srcIP, destIP, srcPort, destPort; Query full_flow:SELECT flow1.tb, flow1.protocol, flow1.srcIP, flow1.destIP, flow1.srcPort,
flow1.destPort, flow1.cnt, flow2.cntOUTER_JOIN FROM flow1, flow2 WHERE flow1.srcIP=flow2.srcIP and flow1.destIP=flow2.destIP andflow1.srcPort=flow2.srcPort and flow1.destPort=flow2.destPort and flow1.protocol=flow2.protocol and flow1.tb = flow2.tb
Outer Join Query
Low-level Aggregation
backup
Low-level Aggregation
main1
Low-level Aggregation
main2
Stream Merge
Outer Join
High-level Aggregation
High-level Aggregation
Performance Evaluation
Query memory usage
0
100
200
300
400
500
600
0 10 20 30 40 50 60 70
Hearbeat interval (sec)
Me
mo
ry u
sa
ge
(M
B)
CPU load w/ heartbeats enabled – 37.5%
w/ heartbeats disabled – 37.3%
Other heartbeat applications• Fault tolerance
– Heartbeats regularly propagate through query DAGs– Easy detection of failed nodes
• System performance analysis– Every heartbeat message is timestamped by
receiving node– Timestamp traces are perfect for analyzing queuing
delays
• Distributed query optimization– Every heartbeat message carries runtime statistics
(operator selectivities, sampling rates, in/out rates, memory footprint, etc)
– Collected statistics can be fed to distributed query optimizer
Conclusions• Punctuation carrying heartbeats
– effective at unblocking streaming operators on all levels
– significantly reduce query memory utilization– capable at working on multiple Gigabit line speeds
• Variety of other uses– fault tolerance, performance analysis, distributed
query optimization
• Part of production version of Gigascope