Upload
lambao
View
216
Download
0
Embed Size (px)
Citation preview
Labelled data for training – MAWILab(http://www.fukuda-lab.org/mawilab/)
7 months of traces from 2016
Real-time analysis: 1’’ time-slots
High-dimensional data: n = 245 features describing each time-slot
Features include empirical dist. sampling (e.g.: percentiles, entropy, min/avg/max)
High–Dimensional Network Data (labeled)
Big Data Analytics for Network Traffic Monitoring and AnalysisF. Soro, P. Casas, A. D'AlconzoAIT Austrian Institute of Technology
The increasing volume of network traffic puts a challenge on design of scalable Network Traffic Monitoring and Analysis systems
The large number of possible data sources calls for a Big Data Analytics Framework (BDAF) suited for NTMA applications
Big-DAMA is a flexible BDAF designed for comprehensive network monitoring
(Semi)-Supervised Machine Learning models implementation and benchmarking on top of real network traffic datasets
The research leading to these results has been partially funded by the Vienna Science and Technology Fund (WWTF), project ICT15-129, Big-DAMA (http://bigdama.ait.ac.at/)
MOTIVATION CHALLENGES
Big-DAMA SYSTEM OVERVIEW g
CURRENT WORK
FUTURE WORK
10-fold cross validation to reduce overfitting
Receiver Operating Characteristics (ROC) curves to evaluate True and False Positive Rates for each of attack type
R
Evaluation of the impact of feature selection on accuracy and execution time for each model
Performance comparison between the Big-DAMA framework and other existing platforms
Move from off-line batch processing to on-line stream analysis
Investigate solutions implementing on-line ML algorithms on distributed systems
Heterogeneous traffic sources may generate both structured and unstructured data
Network monitoring data usually come in the form of high-speed streams, which need to be rapidly and continuously analyzed
Despite the wide number of Big-Data platforms and frameworks, none of the existing solutions is tailored for NTMA
High data dimensionality leads to increased data processing overhead
Network attacks and traffic anomalies are generally difficult to predict in an automatic way, given their nature of evolving targets
Automatic Network Anomaly Detection and Diagnosis system to characterize symptomatic and diagnostic features in network traffic
Supervised ML models to detect network attacks
ML-models trained to detect real network attacks & anomalies:
(D)DoS, Flooding attacks, HTTP(s) flashcrowds, network scan (UDP/TCP)
Benchmark different learning models (classifiers):
(1) Bayesian Learning – Naïve Bayes (NB)
(2) Neural Networks (MLP)
(3) Support Vector Machines (SVM)
(4) Decision Trees/Random Forest (C4.5/RF)
1
2
3
4
Lambda architecture and Data Stream Warehousing paradigm: single system to handle massive quantities of data by taking advantage of both offline and online processing methods
Apache Spark streaming for stream-based analysis, Apache Spark for batch analysis
Cassandra as highly scalable, fully distributed NoSQL system for query and storage (no single point-of-failure as in the HDFS name node case)
Future adoption of Apache Beam to achieve a unified programming model for both processing methods
An
alyt
ics
Dat
a m
anag
emen
t
Machine Learning models
Models Evaluation
Submitted publications
• P. Casas, F. Soro, J. Vanerio, G. Settanni, A. D’Alconzo - "Network Security and Anomaly Detection with Big-DAMA, a Big Data Analytics Framework“, submitted to CloudNET 2017
• F. Soro, P. Casas, A. D’Alconzo - "Big Data Analytics for Network Traffic Monitoring and Analysis" (extended abstract), submitted to SIGCOMM N2Women'17
1 2 3
4