Upload
norma-ray
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Master’s Thesis (30 credits)By: Morten Lindeberg
Supervisors: Vera Goebel and Jarle Søberg
Design, Implementation, and Evaluation of Network Monitoring
Tasks for the Borealis Stream Processing Engine
Slide no. 2
Outline
• Problem description• Application domains• Data stream management system (DSMS)• Borealis• Design• Experiment Setup• Implementation• Evaluation• Conclusion• Future Work
Network monitoring tasks
Slide no. 3
Problem Description
• Design, Implementation, and Evaluation of Network Monitoring Tasks for the Borealis Stream Processing Engine
• Network Monitoring Tasks:– Task-1: Verify Borealis load shedding mechanisms. – Task-2: Measure the average load of packets and network
load per second over a one minute interval. – Task-3: How many packets have been sent to certain ports
during the last five minutes? – Task-4: How many bytes have been exchanged on each
connection during the last ten seconds? – Task-5: Identify possible SYN flood attacks
Slide no. 4
Application Domains
• Network monitoring (Controlling and measuring the Internet or parts of it)
– Challenges• Traffic volumes• Get relevant data• Privacy
– On-line network measurements• Passive: Our network tasks• Active: E.g. Traceroute and Ping
– Off-line network measurements• Passive: E.g. InTraBase (Siekkinen, 2006)• Active: Pandora FMS(Pandora, 2007)
N.M
Private netw
ork
DB
Looks at all passing packets
Push - based
Slide no. 5
Cont. Application Domains
• Sensor networks– TinyDB
• Financial tickers– Traderbot
Pull-based
Push-based
Slide no. 6
DSMS
• Stream Data Model– Definition:A data stream is a real-time, continuous, ordered
sequence of items (Golab, 2003)
n
Slide no. 7
Cont. DSMS
• Requirements– Continuous query language
– Data reduction techniques• Sampling• Load shedding• Aggregations with window techniques
Without sliding windows aggregations would be a blocking operator, since one never will see the whole stream at once
– Adaptive
– Integration with a traditional database
– Low latency and high throughput
Hopping windows
Tumbling windows
Overlapping windows
Window techniques:
Windows are either time-based or tuple-based
Streaming tuples should only be kept in main
memory, never written to disk (too slow)
Slide no. 8
Cont. DSMS• Existing systems:
Name: Language:
TelegraphCQ (Berkeley Uni.) SQL-like
STREAM (Stanford Uni.) SQL-like
Aurora (Brown, M.I.T++) Boxes and arrows
Medusa (Brown, M.I.T++) Boxes and arrows
Borealis (Brown, M.I.T++) Boxes and arrows
Gigascope ($ AT&T) SQL-Like
Slide no. 9
Borealis
• Stream processing engine (SPE)– Academic research / Public domain– Distributed queries – General purpose
• Multi-player first person shooter game• Network monitoring
• Continuous query language– Operator boxes and stream arrows– XML + GUI– E.g., operators: Map, Aggregate, Join, Filter,
Random Drop and operators for integration with statically stored tables
n2 n5n3 n4
n1
n6
Distributedquery
Data stream
Result tuples
High Availability
Slide no. 10
Design
Task 2 - Version 1– Average load and packet
count
Task 1 - Version 1– Mapping
Slide no. 11
Cont. DesignTask 3 - Version 2
– Port destination cont
Task 4 - Version 2– Exchanged bytes
Slide no. 12
Cont. Design
Task 5 - Version 1– SYN Flood attack (Several hosts initiate half-open connections to a
server so that it has to deny service to others)– Identifies the relation between the count of SYN packets and
normal packets (Non-SYN). Joins aggregated tuples if SYN count is twice or more the normal packet count.
Slide no. 13
Cont. Design <box name="synfilter" type="filter" > <in stream="Packet" /> <out stream="Syn" /> <out stream="Normal" /> <parameter name="expression.0” value="syn == 1"/> <parameter name="pass-on-false-port” value="1" /> </box>
<box name="Normalcount" type="aggregate" > <in stream="Normal" /> <out stream="Aggregatenormal" /> <parameter name="aggregate-function.0” value="count()" /> <parameter name="aggregate-function-output-name.0” value="count" /> <parameter name="window-size-by” value="VALUES" /> <parameter name="window-size” value="1" /> <parameter name="advance” value="1" /> <parameter name="order-by” value="FIELD" /> <parameter name="order-on-field" value="timestamp" /> </box>
<box name="Syncount" type="aggregate" > <in stream="Syn" /> <out stream="Aggregatesyn" /> <parameter name="aggregate-function.0” value="count()" /> <parameter name="aggregate-function-output-name.0” value="count" /> <parameter name="window-size-by” value="VALUES" /> <parameter name="window-size” value="1" /> <parameter name="advance” value="1" /> <parameter name="order-by” value="FIELD" /> <parameter name="order-on-field” value="timestamp" /> </box>
<box name="SynfloodJoin" type="join" > <in stream="AggregateNormal" /> <in stream="AggregateSyn" /> <out stream="Result" />
<parameter name="predicate" value = "left.count * 2 < right.count
and left.count > 0" /> <parameter name="left-buffer-size" value = "1" /> <parameter name="left-order-by" value = "VALUES" /> <parameter name="left-order-on-field” value = "timestamp" /> <parameter name="right-buffer-size” value = "1" /> <parameter name="right-order-by” value = "VALUES" /> <parameter name="right-order-on-field” value = "timestamp" /> <parameter name="out-field-name.0” value="timestamp" /> <parameter name="out-field.0" value="left.timestamp" /> <parameter name="out-field-name.1" value="ratio" /> <parameter name="out-field.1” value="right.count / left.count" /> <parameter name="out-field-name.2" value="syn" /> <parameter name="out-field.2" value="right.count" /> <parameter name="out-field-name.3” value="normal" /> <parameter name="out-field.3" value="left.count" /> </box>
Slide no. 14
Experiment Setup• Scripts executes the different stages of each experiment• TG: Generates traffic• fyaf: Filters packet headers from NIC. Counts the number of packets retrieved by the
C.A• C.A: Transforms the packet headers into tuples. I/O to the Q.P• Q.P: Performs the query on the tuples retrieved from C.A
System resource consumption is logged
by the execution scripts..
fyaf calculates the number of lost
packets..TG controls the
amount ofgenerated traffic
per second..
Slide no. 15
Borealis
Implementation
• Client application main-method:int main( int argc, const char *argv[] ) {... sock = get_connection(); NOTICE << "Socket opened: " << sock; status = marshal.open();
if ( status ) { WARN << "Could not deply the network."; } else { //Start the timer.. timer = Time::now(); // Send the first batch of tuples. Queue up the next round with a delay. marshal.sentPacket();
// Run the client event loop. Return only on an exception. marshal.runClient(); }...}
fyaf Query processor
Results
<xml-query>
Data streamClient application
Slide no. 16
EvaluationResults for Task 1 ( The map task )
CPU Maximums
Drop box can lead to increased CPU utilization
Slide no. 17
Cont. EvaluationResults for Task 2 - (the simple task)
(Lost packets at different network loads)
40 Mbit/s
Slide no. 18
Cont. EvaluationResults for Task 2 - (the simple task)
(Task result - Measured Load)
Ac 98%
Ac 93%
Ac 96%
Slide no. 19
Cont. EvaluationResults for Task 3 - Memory Consumption
Low memory consumption. (31 Mbyte). No changes when increasing load.
Static tables causes increased memory consumption,
but not much.
Slide no. 20
Cont. EvaluationTask Network Load Memory
Consumption
Task 1 30,40 Mbit/s 31 Mbyte
Task 2 40 Mbit/s 31 Mbyte
Task 3 10, 30 Mbit/s 31, 33 Mbyte
Task 4 20 Mbit/s 31 Mbyte
Task 5 20 Mbit/s 30, 50+ Mbyte
Slide no. 21
Conclusion
• Support complex network monitor queries• Borealis can handle network loads:
– 40 Mbit/s for simple tasks– 20 - 30 Mbit/s for complex tasks– 10 Mbit/s when comparing input packets with several
thousands of statically stored tuples.
• Load Shedding– Not fully working, does not identify overload situations– random_drop box does not significantly increase supported
network load
• Low memory consumption– System code parameters might affect performance
Slide no. 22
Future Work
• Distribution of queries• Expand client application (fyaf and load
shedding)• Optimization of source code system
parameters• New version of Borealis (Winter 2007)• Comparison with results from TelegraphCQ
(Søberg, 2006) and STREAM (Hernes, 2006)
Slide no. 23
Bibliography
• (Søberg, 2006) - Design, implementation, and evaluation of network monitoring tasks with the TelegraphCQ data stream management system,Master’s Thesis 2006.
• (Hernes, 2006) - Design, implementation, and evaluation of network monitoring tasks with the STREAM data stream management system, Master’s Thesis 2006.
• (Siekkinen, 2006) - Root Cause Analysis of TCP Throughput: Methodology, Techniques, and Applications, Dr. Scient. Thesis 2006.
• (Golab, 2003) - Issues in Data Stream Management, Lukasz Golab and M. Tamer Ötzu, 2003
• (Pandora, 2007) - http://pandora.sourceforge.net