1
Labelled data for training MAWILab (http://www.fukuda- lab.org/mawilab/) 7 months of traces from 2016 Real-time analysis: 1’’ time-slots High-dimensional data : n = 245 features describing each time-slot Features include empirical dist. sampling (e.g.: percentiles, entropy, min/avg/max) High–Dimensional Network Data (labeled) Big Data Analytics for Network Traffic Monitoring and Analysis F. Soro, P. Casas, A. D'Alconzo AIT Austrian Institute of Technology The increasing volume of network traffic puts a challenge on design of scalable Network Traffic Monitoring and Analysis systems The large number of possible data sources calls for a Big Data Analytics Framework (BDAF) suited for NTMA applications Big-DAMA is a flexible BDAF designed for comprehensive network monitoring (Semi)-Supervised Machine Learning models implementation and benchmarking on top of real network traffic datasets The research leading to these results has been partially funded by the Vienna Science and Technology Fund (WWTF), project ICT15-129, Big-DAMA (http://bigdama.ait.ac.at/) MOTIVATION CHALLENGES Big-DAMA SYSTEM OVERVIEW g CURRENT WORK FUTURE WORK 10-fold cross validation to reduce overfitting Receiver Operating Characteristics (ROC) curves to evaluate True and False Positive Rates for each of attack type R Evaluation of the impact of feature selection on accuracy and execution time for each model Performance comparison between the Big-DAMA framework and other existing platforms Move from off-line batch processing to on-line stream analysis Investigate solutions implementing on-line ML algorithms on distributed systems Heterogeneous traffic sources may generate both structured and unstructured data Network monitoring data usually come in the form of high-speed streams, which need to be rapidly and continuously analyzed Despite the wide number of Big-Data platforms and frameworks, none of the existing solutions is tailored for NTMA High data dimensionality leads to increased data processing overhead Network attacks and traffic anomalies are generally difficult to predict in an automatic way, given their nature of evolving targets Automatic Network Anomaly Detection and Diagnosis system to characterize symptomatic and diagnostic features in network traffic Supervised ML models to detect network attacks ML-models trained to detect real network attacks & anomalies : (D)DoS, Flooding attacks, HTTP(s) flashcrowds, network scan (UDP/TCP) Benchmark different learning models (classifiers): (1) Bayesian Learning – Naïve Bayes (NB) (2) Neural Networks (MLP) (3) Support Vector Machines (SVM) (4) Decision Trees/Random Forest (C4.5/RF) 1 2 3 4 Lambda architecture and Data Stream Warehousing paradigm : single system to handle massive quantities of data by taking advantage of both offline and online processing methods Apache Spark streaming for stream-based analysis, Apache Spark for batch analysis Cassandra as highly scalable, fully distributed NoSQL system for query and storage (no single point-of-failure as in the HDFS name node case) Future adoption of Apache Beam to achieve a unified programming model for both processing methods Analytics Data management Machine Learning models Models Evaluation Submitted publications P. Casas, F. Soro, J. Vanerio, G. Settanni, A. D’Alconzo - "Network Security and Anomaly Detection with Big-DAMA, a Big Data Analytics Framework“, submitted to CloudNET 2017 F. Soro, P. Casas, A. D’Alconzo - "Big Data Analytics for Network Traffic Monitoring and Analysis" (extended abstract), submitted to SIGCOMM N2Women'17 1 2 3 4

Big Data Analytics for Network Traffic Monitoring and Analysistma.ifip.org/wordpress/wp-content/uploads/2016/08/Soro_poster.pdf · • P. Casas, F. Soro, J. Vanerio, G. Settanni,

  • Upload
    lambao

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data Analytics for Network Traffic Monitoring and Analysistma.ifip.org/wordpress/wp-content/uploads/2016/08/Soro_poster.pdf · • P. Casas, F. Soro, J. Vanerio, G. Settanni,

Labelled data for training – MAWILab(http://www.fukuda-lab.org/mawilab/)

7 months of traces from 2016

Real-time analysis: 1’’ time-slots

High-dimensional data: n = 245 features describing each time-slot

Features include empirical dist. sampling (e.g.: percentiles, entropy, min/avg/max)

High–Dimensional Network Data (labeled)

Big Data Analytics for Network Traffic Monitoring and AnalysisF. Soro, P. Casas, A. D'AlconzoAIT Austrian Institute of Technology

The increasing volume of network traffic puts a challenge on design of scalable Network Traffic Monitoring and Analysis systems

The large number of possible data sources calls for a Big Data Analytics Framework (BDAF) suited for NTMA applications

Big-DAMA is a flexible BDAF designed for comprehensive network monitoring

(Semi)-Supervised Machine Learning models implementation and benchmarking on top of real network traffic datasets

The research leading to these results has been partially funded by the Vienna Science and Technology Fund (WWTF), project ICT15-129, Big-DAMA (http://bigdama.ait.ac.at/)

MOTIVATION CHALLENGES

Big-DAMA SYSTEM OVERVIEW g

CURRENT WORK

FUTURE WORK

10-fold cross validation to reduce overfitting

Receiver Operating Characteristics (ROC) curves to evaluate True and False Positive Rates for each of attack type

R

Evaluation of the impact of feature selection on accuracy and execution time for each model

Performance comparison between the Big-DAMA framework and other existing platforms

Move from off-line batch processing to on-line stream analysis

Investigate solutions implementing on-line ML algorithms on distributed systems

Heterogeneous traffic sources may generate both structured and unstructured data

Network monitoring data usually come in the form of high-speed streams, which need to be rapidly and continuously analyzed

Despite the wide number of Big-Data platforms and frameworks, none of the existing solutions is tailored for NTMA

High data dimensionality leads to increased data processing overhead

Network attacks and traffic anomalies are generally difficult to predict in an automatic way, given their nature of evolving targets

Automatic Network Anomaly Detection and Diagnosis system to characterize symptomatic and diagnostic features in network traffic

Supervised ML models to detect network attacks

ML-models trained to detect real network attacks & anomalies:

(D)DoS, Flooding attacks, HTTP(s) flashcrowds, network scan (UDP/TCP)

Benchmark different learning models (classifiers):

(1) Bayesian Learning – Naïve Bayes (NB)

(2) Neural Networks (MLP)

(3) Support Vector Machines (SVM)

(4) Decision Trees/Random Forest (C4.5/RF)

1

2

3

4

Lambda architecture and Data Stream Warehousing paradigm: single system to handle massive quantities of data by taking advantage of both offline and online processing methods

Apache Spark streaming for stream-based analysis, Apache Spark for batch analysis

Cassandra as highly scalable, fully distributed NoSQL system for query and storage (no single point-of-failure as in the HDFS name node case)

Future adoption of Apache Beam to achieve a unified programming model for both processing methods

An

alyt

ics

Dat

a m

anag

emen

t

Machine Learning models

Models Evaluation

Submitted publications

• P. Casas, F. Soro, J. Vanerio, G. Settanni, A. D’Alconzo - "Network Security and Anomaly Detection with Big-DAMA, a Big Data Analytics Framework“, submitted to CloudNET 2017

• F. Soro, P. Casas, A. D’Alconzo - "Big Data Analytics for Network Traffic Monitoring and Analysis" (extended abstract), submitted to SIGCOMM N2Women'17

1 2 3

4