Big Data Analytics for Network Traffic Monitoring and...

Preview:

Citation preview

Labelled data for training – MAWILab(http://www.fukuda-lab.org/mawilab/)

7 months of traces from 2016

Real-time analysis: 1’’ time-slots

High-dimensional data: n = 245 features describing each time-slot

Features include empirical dist. sampling (e.g.: percentiles, entropy, min/avg/max)

High–Dimensional Network Data (labeled)

Big Data Analytics for Network Traffic Monitoring and AnalysisF. Soro, P. Casas, A. D'AlconzoAIT Austrian Institute of Technology

The increasing volume of network traffic puts a challenge on design of scalable Network Traffic Monitoring and Analysis systems

The large number of possible data sources calls for a Big Data Analytics Framework (BDAF) suited for NTMA applications

Big-DAMA is a flexible BDAF designed for comprehensive network monitoring

(Semi)-Supervised Machine Learning models implementation and benchmarking on top of real network traffic datasets

The research leading to these results has been partially funded by the Vienna Science and Technology Fund (WWTF), project ICT15-129, Big-DAMA (http://bigdama.ait.ac.at/)

MOTIVATION CHALLENGES

Big-DAMA SYSTEM OVERVIEW g

CURRENT WORK

FUTURE WORK

10-fold cross validation to reduce overfitting

Receiver Operating Characteristics (ROC) curves to evaluate True and False Positive Rates for each of attack type

R

Evaluation of the impact of feature selection on accuracy and execution time for each model

Performance comparison between the Big-DAMA framework and other existing platforms

Move from off-line batch processing to on-line stream analysis

Investigate solutions implementing on-line ML algorithms on distributed systems

Heterogeneous traffic sources may generate both structured and unstructured data

Network monitoring data usually come in the form of high-speed streams, which need to be rapidly and continuously analyzed

Despite the wide number of Big-Data platforms and frameworks, none of the existing solutions is tailored for NTMA

High data dimensionality leads to increased data processing overhead

Network attacks and traffic anomalies are generally difficult to predict in an automatic way, given their nature of evolving targets

Automatic Network Anomaly Detection and Diagnosis system to characterize symptomatic and diagnostic features in network traffic

Supervised ML models to detect network attacks

ML-models trained to detect real network attacks & anomalies:

(D)DoS, Flooding attacks, HTTP(s) flashcrowds, network scan (UDP/TCP)

Benchmark different learning models (classifiers):

(1) Bayesian Learning – Naïve Bayes (NB)

(2) Neural Networks (MLP)

(3) Support Vector Machines (SVM)

(4) Decision Trees/Random Forest (C4.5/RF)

1

2

3

4

Lambda architecture and Data Stream Warehousing paradigm: single system to handle massive quantities of data by taking advantage of both offline and online processing methods

Apache Spark streaming for stream-based analysis, Apache Spark for batch analysis

Cassandra as highly scalable, fully distributed NoSQL system for query and storage (no single point-of-failure as in the HDFS name node case)

Future adoption of Apache Beam to achieve a unified programming model for both processing methods

An

alyt

ics

Dat

a m

anag

emen

t

Machine Learning models

Models Evaluation

Submitted publications

• P. Casas, F. Soro, J. Vanerio, G. Settanni, A. D’Alconzo - "Network Security and Anomaly Detection with Big-DAMA, a Big Data Analytics Framework“, submitted to CloudNET 2017

• F. Soro, P. Casas, A. D’Alconzo - "Big Data Analytics for Network Traffic Monitoring and Analysis" (extended abstract), submitted to SIGCOMM N2Women'17

1 2 3

4