Deep in the Dark - Deep Learning-based Malware Traffic Detection … › TC › SPW2019 › DLS ›...

Deep in the Dark - Deep Learning-based MalwareTraffic Detection without Expert Knowledge

Gonzalo Marín1,2 Germán Capdehourat2 Pedro Casas3

(1) Tryolabs (2) IIE–FING, UDELAR, Uruguay (3) AIT Austrian Institute of Technology

2nd Deep Learning and Security WorkshopIEEE S&P – May 23, 2019 – San Francisco, CA

Instituto de Ingeniería Eléctrica

Outline

Learning in NTMA

Learning in NTMA – why?

The analysis of network traffic measurements is an active researchfield.Machine learning models are appealing since we have tons of dataand several problems to solve.Some examples:

Traffic prediction and classificationCongestion controlAnomaly detectionCybersecurity (e.g., malware detection, impersonation attacks)QoE estimation. . .

Learning in NTMA – which kind of models?

Traditional –shallow– machine learningmodels are commonly used.

Decision trees and random forest, SVM, k-NN,DBSCAN... the list is as vast as the associatedliterature.Feature engineering needed!Handcrafted-expert domain features arecritical to the success of the applied models.

Traditional –shallow– machine learningmodels are commonly used.Decision trees and random forest, SVM, k-NN,DBSCAN... the list is as vast as the associatedliterature.

Feature engineering needed!Handcrafted-expert domain features arecritical to the success of the applied models.

Traditional –shallow– machine learningmodels are commonly used.Decision trees and random forest, SVM, k-NN,DBSCAN... the list is as vast as the associatedliterature.Feature engineering needed!

Handcrafted-expert domain features arecritical to the success of the applied models.

Traditional –shallow– machine learningmodels are commonly used.Decision trees and random forest, SVM, k-NN,DBSCAN... the list is as vast as the associatedliterature.Feature engineering needed!Handcrafted-expert domain features arecritical to the success of the applied models.

Learning in NTMA – Expert knowledge

Features are heavily dependant on theexpert background and the specificproblem.

Each paper in the literature defines itsown set of input features for theconsidered problem, hinderinggeneralization and benchmarking ofdifferent approaches.Feature engineering is costly.All in all, good results can be achieved.

Features are heavily dependant on theexpert background and the specificproblem.Each paper in the literature defines itsown set of input features for theconsidered problem, hinderinggeneralization and benchmarking ofdifferent approaches.

Feature engineering is costly.All in all, good results can be achieved.

Features are heavily dependant on theexpert background and the specificproblem.Each paper in the literature defines itsown set of input features for theconsidered problem, hinderinggeneralization and benchmarking ofdifferent approaches.Feature engineering is costly.

All in all, good results can be achieved.

Features are heavily dependant on theexpert background and the specificproblem.Each paper in the literature defines itsown set of input features for theconsidered problem, hinderinggeneralization and benchmarking ofdifferent approaches.Feature engineering is costly.All in all, good results can be achieved.

Deep Learning in NTMA

Can Deep Learning enhance thepresented limitations oftraditional models?

This Work

Main goal: to explore the end-to-end application of deep learningmodels to complement traditional approaches for NTMA, usingdifferent representations of the input data.

To do this, malware traffic detection and classification problem isaddressed, using raw, bytestream-based data as input.Research questions

1. Is it possible to achieve high detection accuracy with low false alarmrates using the raw-input, deep learning-based models?

2. Are the proposed models better than the commonly used shallowmodels, when feeding them all with raw inputs?

3. How good are these models as compared to traditional approaches,where domain expert knowledge is used to build the set of features?

This Work

Main goal: to explore the end-to-end application of deep learningmodels to complement traditional approaches for NTMA, usingdifferent representations of the input data.To do this, malware traffic detection and classification problem isaddressed, using raw, bytestream-based data as input.

Research questions

This Work

Main goal: to explore the end-to-end application of deep learningmodels to complement traditional approaches for NTMA, usingdifferent representations of the input data.To do this, malware traffic detection and classification problem isaddressed, using raw, bytestream-based data as input.Research questions

This Work

Main goal: to explore the end-to-end application of deep learningmodels to complement traditional approaches for NTMA, usingdifferent representations of the input data.To do this, malware traffic detection and classification problem isaddressed, using raw, bytestream-based data as input.Research questions1. Is it possible to achieve high detection accuracy with low false alarmrates using the raw-input, deep learning-based models?

This Work

Input Representations

Raw Packets

n bytes

b1,1 | b2,1 | b3,1 | b4,1 | · · · | bn,1

Decimal normalized representation of each byte of each packet isconsidered as a different feature.

Each packet is considered as a different instance.It is necessary to choose the number of bytes from the packet to beconsidered (n).

Raw Packets

n bytes

b1,1 | b2,1 | b3,1 | b4,1 | · · · | bn,1

Decimal normalized representation of each byte of each packet isconsidered as a different feature.Each packet is considered as a different instance.

It is necessary to choose the number of bytes from the packet to beconsidered (n).

Raw Packets

n bytes

b1,1 | b2,1 | b3,1 | b4,1 | · · · | bn,1

Decimal normalized representation of each byte of each packet isconsidered as a different feature.Each packet is considered as a different instance.It is necessary to choose the number of bytes from the packet to beconsidered (n).

Raw Flows

n bytes

b1,m–1 | b2,m–2 | b3,m–3 | b4,m–4 | · · · | bn,m–1

b1,m | b2,m | b3,m | b4,m | · · · | bn,m

b1,2 | b2,2 | b3,2 | b4,2 | · · · | bn,2

b1,1 | b2,1 | b3,1 | b4,1 | · · · | bn,1

A group of bytes is considered as a different feature.

Each flow (group of packets) is considered as a different instance.It is necessary to choose the number of bytes from the packet toconsider (n) and the number of packets per flow to consider (m).

Raw Flows

n bytes

b1,m–1 | b2,m–2 | b3,m–3 | b4,m–4 | · · · | bn,m–1

b1,m | b2,m | b3,m | b4,m | · · · | bn,m

b1,2 | b2,2 | b3,2 | b4,2 | · · · | bn,2

b1,1 | b2,1 | b3,1 | b4,1 | · · · | bn,1

A group of bytes is considered as a different feature.Each flow (group of packets) is considered as a different instance.

It is necessary to choose the number of bytes from the packet toconsider (n) and the number of packets per flow to consider (m).

Raw Flows

n bytes

b1,m–1 | b2,m–2 | b3,m–3 | b4,m–4 | · · · | bn,m–1

b1,m | b2,m | b3,m | b4,m | · · · | bn,m

b1,2 | b2,2 | b3,2 | b4,2 | · · · | bn,2

b1,1 | b2,1 | b3,1 | b4,1 | · · · | bn,1

A group of bytes is considered as a different feature.Each flow (group of packets) is considered as a different instance.It is necessary to choose the number of bytes from the packet toconsider (n) and the number of packets per flow to consider (m).

Building the Datasets

Malware and normal captures performed by the Stratosphere IPSProject of the CTU University of Prague in Czech Republic wereconsidered.

Captures are gathered under controlled conditions: fixed scenario(IPs, ports, etc.)Not in the wild network traffic.Let’s consider the payload, as the key information to analyze and tobuild the datasets.

Malware and normal captures performed by the Stratosphere IPSProject of the CTU University of Prague in Czech Republic wereconsidered.Captures are gathered under controlled conditions: fixed scenario(IPs, ports, etc.)

Not in the wild network traffic.Let’s consider the payload, as the key information to analyze and tobuild the datasets.

Malware and normal captures performed by the Stratosphere IPSProject of the CTU University of Prague in Czech Republic wereconsidered.Captures are gathered under controlled conditions: fixed scenario(IPs, ports, etc.)Not in the wild network traffic.

Let’s consider the payload, as the key information to analyze and tobuild the datasets.

Malware and normal captures performed by the Stratosphere IPSProject of the CTU University of Prague in Czech Republic wereconsidered.Captures are gathered under controlled conditions: fixed scenario(IPs, ports, etc.)Not in the wild network traffic.Let’s consider the payload, as the key information to analyze and tobuild the datasets.

Datasets for Malware Detection

Representation Dataset size n (bytes) m (packets)

Raw Packets 248, 850 1024 N/A

Raw Flows 67, 494 100 2

Table 1: Parameters selection for building the input representation for trainingthe deep learning models.

Deep Learning Architectures

Finding the rightarchitecture.

Core layers

The core layers used for both models are basically two: convolutionaland recurrent.Convolutional, to build the feature representation of the spatial datainside the packets and flows.The recurrent layers will be used together with the convolutional toimprove the performance of Raw Packets architecture, allowing themodel to keep track of temporal information.Fully-connected layers to deal with the different combinations of thefeatures in order to arrive to the final decisions (i.e., classify).

Helper layers

Goals: reduce the generalization error and improve the learning process.

Batch Normalization: layer inputs are normalized for eachmini-batch. As a result: higher learning rates can be used, model lesssensitive to initialization and also adds regularization.Dropout: randomly drop units (along with their connections) from theneural network during training. A very efficient way to perform modelaveraging: similar to train a huge number of different networks andaverage the results.

Raw Packets Architecture

···

··· Output

C1: 1D-CNN layer32 filters, size 5

Activation maps C1 Activation maps C2

FlattenFC1:

200 unitsFC2:

200 units

Byte vectorized packet of size 1024

MP: Max-Pooling 1×8

Activation maps MP

LSTM: 200 units + return full sequence

Activation maps LSTM

2 1D-CNN layers of 32 and 64 filters of size 5, respectively.A max-pooling layer of size 8.A LSTM layer of 200 units, returning the outputs of each cell.2 fully-connected layers of 200 units each.Binary cross-entropy is used as the loss function.

Raw Flows Architecture

Activation maps C1

Flatten

Byte vectorized flow of size 100 1× ×2 ···

··· Output

FC1:50 units

FC2:100 units

Smaller capacity than Raw Packets (less number of features).1 1D-CNN layer of 32 filters of size 5 and 2 fully-connected layers of 50and 100 units each.Also, binary cross-entropy is used as the loss function.

Experimental Evaluation &Results

Malware Detection

Malware DetectionA First Approach UsingDeep Learning

Malware Detection: Raw Packets

Detect malware at packet level.

Raw Packets deep learning architecture trained using the respectivedataset version (∼ 250, 000 samples).Split using a 80/10/10 schema: 80% for training, 10% for validationand 10% for testing.Training held over 100 epochs.Adam used as the optimizer function, annealing the learning rate overtime.

0 10 20 30 40 50 60 70 80 90 1000.40

0 10 20 30 40 50 60 70 80 90 100Epoch

TrainValidation

Raw Packets learning process. 18

0 10 20 30 40 50 60 70 80 90 1000.40

0 10 20 30 40 50 60 70 80 90 100Epoch

TrainValidation

Raw Packets learning process. 18

Accuracy: 77.6% over the test.Comparison with a randomforest model (100 trees), usingexactly the same inputfeatures.Raw Packets deep learningmodel outperforms therandom forest one. 0 20 40 60 80 100

False Positive Rate (%)

True Positive Rate (%)

Raw Packets (AUC = 0.84)RF (AUC = 0.74)

Malware Detection: Raw Flows

Detect malware at flow level.

Raw Flows deep learning architecture trained using the respectivedataset version (∼ 68, 000 samples).Split using a 80/10/10 schema: 80% for training, 10% for validationand 10% for testing.Training held over 10 epochs.Adam used as the optimizer function.

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10Epoch

0.99Ac

TrainValidation

Raw Flows learning process. 20

Accuracy: 98.6% over the test.Comparison with random forest:data was flattened in order to fitthe input.Raw Flows model can detect asmuch as 98% of all malwareflows with a FPR as low as 0.2%.This suggests that operating atflow level, Raw Flows canactually provide highly accurateresults, applicable in practice.

0 20 40 60 80 100False Positive Rate (%)

Raw Flows (AUC = 0.997)RF (AUC = 0.936)

Domain knowledge vs. raw inputs

How good is Raw Flows as compared to a random forest trained with adataset made of expert-handcrafted features?

Flow-level features, such as: traffic throughput, packet sizes,inter-arrival times, frequency of IP addresses and ports, transportprotocols and share of specific flags (e.g., SYN packets).∼ 200 of these features were built to feed a random forest model.

How good is Raw Flows as compared to a random forest trained with adataset made of expert-handcrafted features?Flow-level features, such as: traffic throughput, packet sizes,inter-arrival times, frequency of IP addresses and ports, transportprotocols and share of specific flags (e.g., SYN packets).

∼ 200 of these features were built to feed a random forest model.

How good is Raw Flows as compared to a random forest trained with adataset made of expert-handcrafted features?Flow-level features, such as: traffic throughput, packet sizes,inter-arrival times, frequency of IP addresses and ports, transportprotocols and share of specific flags (e.g., SYN packets).∼ 200 of these features were built to feed a random forest model.

The random forest model using expert domain features achieveshighly accurate detection performance: ∼ 97% with FPR less than 1%.

The deep learning-based model using the Raw Flows still outperformsthis domain expert knowledge based detector.The deep learning model can perform as good as a more traditionalshallow-model based detector for detection of malware flows, withoutrequiring any sort of expert handcrafted inputs.

The random forest model using expert domain features achieveshighly accurate detection performance: ∼ 97% with FPR less than 1%.The deep learning-based model using the Raw Flows still outperformsthis domain expert knowledge based detector.

The deep learning model can perform as good as a more traditionalshallow-model based detector for detection of malware flows, withoutrequiring any sort of expert handcrafted inputs.

The random forest model using expert domain features achieveshighly accurate detection performance: ∼ 97% with FPR less than 1%.The deep learning-based model using the Raw Flows still outperformsthis domain expert knowledge based detector.The deep learning model can perform as good as a more traditionalshallow-model based detector for detection of malware flows, withoutrequiring any sort of expert handcrafted inputs.

Other contributions

From Malware Detectionto Malware ClassificationPlease, refer to the paper for furtherdetails.

Conclusions

This work explores the power of deep learning models to theanalysis of network traffic measurements.

The specific problem of malware network traffic detection andclassification is addressed using raw representations of the inputnetwork data.Using Raw Flows as input for the deep learning models achievesbetter results than using Raw Packets.In all studied cases, the deep learning models outperform a strongrandom forest model, using exactly the same input features.The Raw Flows architecture slightly outperforms a random forestmodel trained using expert domain knowledge features.

Conclusions

This work explores the power of deep learning models to theanalysis of network traffic measurements.The specific problem of malware network traffic detection andclassification is addressed using raw representations of the inputnetwork data.

Using Raw Flows as input for the deep learning models achievesbetter results than using Raw Packets.In all studied cases, the deep learning models outperform a strongrandom forest model, using exactly the same input features.The Raw Flows architecture slightly outperforms a random forestmodel trained using expert domain knowledge features.

Conclusions

This work explores the power of deep learning models to theanalysis of network traffic measurements.The specific problem of malware network traffic detection andclassification is addressed using raw representations of the inputnetwork data.Using Raw Flows as input for the deep learning models achievesbetter results than using Raw Packets.

In all studied cases, the deep learning models outperform a strongrandom forest model, using exactly the same input features.The Raw Flows architecture slightly outperforms a random forestmodel trained using expert domain knowledge features.

Conclusions

This work explores the power of deep learning models to theanalysis of network traffic measurements.The specific problem of malware network traffic detection andclassification is addressed using raw representations of the inputnetwork data.Using Raw Flows as input for the deep learning models achievesbetter results than using Raw Packets.In all studied cases, the deep learning models outperform a strongrandom forest model, using exactly the same input features.

The Raw Flows architecture slightly outperforms a random forestmodel trained using expert domain knowledge features.

Conclusions

This work explores the power of deep learning models to theanalysis of network traffic measurements.The specific problem of malware network traffic detection andclassification is addressed using raw representations of the inputnetwork data.Using Raw Flows as input for the deep learning models achievesbetter results than using Raw Packets.In all studied cases, the deep learning models outperform a strongrandom forest model, using exactly the same input features.The Raw Flows architecture slightly outperforms a random forestmodel trained using expert domain knowledge features.

THANKS for your attention!

Questions?

@stillyawning gonzalo@tryolabs.com

Deep in the Dark - Deep Learning-based Malware Traffic Detection … › TC › SPW2019 › DLS ›...

Documents

The dark side of the ForSSHe - Botconf 2019 · The dark side of the ForSSHe A journey into Linux malware abusing OpenSSH Hugo Porcher, Malware Researcher, ESET Romain Dumont, Malware

Deep Ground Truth Analysis of Current Android Malware

Bold and Beautiful, Dark and Deep

Malware Detection Using Deep Transferred Generative Adversarial Networkssclab.yonsei.ac.kr/publications/Papers/IC/2017_ICONIP... · 2017-11-21 · Malware Detection Using Deep Transferred

A Multimodal Deep Learning Method for Android Malware

DGArchive - A deep dive into domain generating malware · PDF fileA deep dive into domain generating malware Daniel Plohmann daniel.plohmann@fkie.fraunhofer.de. ... Algorithmically-Generated

Deep Dark Web - How to get inside?

Deep Learning for Classi cation of Malware System Call ... Learning... · We choose the Cuckoo sandbox which is ... Deep Learning for Classi cation of Malware System Call Sequences

The Deep Web and Dark Web - Illuminating Defense Strategies Deep Web and Dark Web... · come to the Dark Web, a smaller but potentially more dangerous subset of the Deep Web. The

Deep Malware Analysis - Joe Sandbox Sandbox Desktop Feature... · Sample: b19411d.js Startdate: 29/03/2016 Architecture: WINDOWS Score: 100 ... Sandbox Desktop Deep Malware Analysis

Dark Green or Deep (Ecocentric) Ethicsruby.fgcu.edu/courses/twimberley/EnviroPhilo/currypdf.pdf · 1 Dark Green or Deep (Ecocentric) Ethics By Patrick Curry Editor's Note: Reprinted

Windows Portable Executor Malware detection using Deep

Deep Malware Analysis - Joe Sandbox Sandbox Ultimate Feature... · Sample: b19411d.js Startdate: ... Deep Malware Analysis, ... Joe Sandbox Mail Monitor Malware similarity analysis

Permission Extraction Framework for Android Malware …...Fig. 1. The different Methods in Android Malware Detection. Recently, deep learning has been used for classification. In deep

Deep or dark web

SOPHOS - ONIF · Sophos Deep Learning Malware Detection Features •Prevents both known and never-seen-before malware •Blocks malware before it executes •Does not rely on signatures

Generative Malware Outbreak Detection - Trend Micro · Recently, several deep learning approaches have been attempted to detect malware binaries using convolutional neural networks

The Deep Dark Wood Resources - Auris

The Dark Side of Malware Analysis - Andrea Pompili - Codemotion Rome 2015

Deep web and Dark web