22
BITS Pilani Hyderabad Campus Pratik Narang, Jagan Mohan Reddy, Chittaranjan Hota BITS Pilani, Hyderabad Campus [email protected] 23 rd August 2013 ACM Compute 2013, Vellore Feature Selection for Detection of Peer-to-Peer Botnet traffic

Feature selection for detection of peer to-peer botnet traffic

Embed Size (px)

DESCRIPTION

Slides of my paper presentation at the ACM Compute 2013 conference (22-24 Aug 2013, Vellore, India).

Citation preview

Page 1: Feature selection for detection of peer to-peer botnet traffic

BITS PilaniHyderabad Campus

Pratik Narang, Jagan Mohan Reddy, Chittaranjan Hota

BITS Pilani, Hyderabad Campus

[email protected]

23rd August 2013

ACM Compute 2013, Vellore

Feature Selection for Detection of

Peer-to-Peer Botnet traffic

Page 2: Feature selection for detection of peer to-peer botnet traffic

Outline• Introduction

o P2P Networks

o P2P Botnets

• Work overview

• Related Work

• Our worko Generating traffic

o Feature extraction & selection

o Evaluation of feature selection techniques

o Future scope of work

Page 3: Feature selection for detection of peer to-peer botnet traffic

What is a P2P Network?

A

D

E F

G

H

FH

GA

EC

C

B

P2P overlay layer

Native IP layer

D

B

AS1

AS2

AS3

AS4

AS5

AS6

Page 4: Feature selection for detection of peer to-peer botnet traffic

Generic P2P Architecture

Capability &

Configuration

Peer Role Selection

Operating System

NAT/ Firewall Traversal

Routing and Forwarding

Neighbor Discovery

Join/Leave

Bootstrap

Overlay Messaging API

Content

Storage

Search API

Page 6: Feature selection for detection of peer to-peer botnet traffic

Traditional Botnets

Bot-Master

Page 7: Feature selection for detection of peer to-peer botnet traffic

Peer-to-Peer BotnetsBot-Master

Page 8: Feature selection for detection of peer to-peer botnet traffic

Work overview Evaluation of 3 feature selection algorithms-

Correlation-based Feature Selection

Consistency-based Subset Evaluation

Principal Component Analysis

Models built with 3 machine learning algorithms- Naïve Bayes classifier

Bayes Networks

C4.5 Decision trees

Performance evaluation for the detection of some

recent and well-known P2P botnets.

Page 9: Feature selection for detection of peer to-peer botnet traffic

Related work• Early work using feature selection algorithms [1] [2]

used the DARPA dataset, which is no longer suitable

for today’s security research.

• Early approaches for P2P botnet detection [3]

applied static, port based analysis- easily defeated

by modern botnets.

• Recent work [4] [5] has employed machine learning

and data mining techniques for detection of P2P

botnets.

Page 10: Feature selection for detection of peer to-peer botnet traffic

Our work

Machine Learning Algorithms

Bayes Network Naïve Bayes C4.5 Decision Trees

Feature Selection

Correlation-based Feature Selection Consistency-based Subset Evaluation Principal Component Analysis

Feature Extraction

source min. packet size dest. TCP Push flag count source avg. packet size dest. total volume duration …

Flow Extraction

<Source IP, Source port, Destination IP, Destination port, Protocol>

Network captures

jNetPcap Library with Java module

Page 11: Feature selection for detection of peer to-peer botnet traffic

Generating Traffic

Botnet traffic generation

InternetInfo. Sec. Lab

Dist. Sys. Lab Multimedia

Lab

HostelsWing

Data collection for P2P

and web traffic

Anonymization

(Anon tool)

Botnet

detection

module

Firewall

Core

Switch 6509

Distribution

Switch 4500Access

Switch 2500

Content

Mgmt.

Application

Servers

DB

Cluster

IDS

Ethernet

Page 12: Feature selection for detection of peer to-peer botnet traffic

Dataset

Data Application Number of flows

Benign dataHTTP, HTTPS, SMTP, FTP, POP 30,000 flows

P2P apps- eMule, BitTorrent, Mute, Gnutella etc. 50,000 flows

Botnet data[4,5]

Zero Access 720 flows

SkyNet 770 flows

Waledac 80,000 flows

Storm 2,20,000 flows

Page 13: Feature selection for detection of peer to-peer botnet traffic

Feature Extraction & Selection

• A ‘Flow’ defined by:

• <Source IP, Source port, Dest. IP, Dest. port, Protocol>

• Features extracted from each flow:• Packet count (bi-directional)

• Packet size (bytes) (min, max, mean and standard deviation)

(bi-directional)

• Total volume (bytes) (bi-directional)

• Inter-arrival times (min, max, mean and standard deviation)

(bi-directional)

• TCP Push flag count (bi-directional)

• Duration of the flow (no context of direction)

• TOTAL - 23 features extracted from each flow

Page 14: Feature selection for detection of peer to-peer botnet traffic
Page 15: Feature selection for detection of peer to-peer botnet traffic

Feature Extraction & Selection

• Three Feature Selection techniques used:

1. Correlation-based Feature Selection (CFS)

2. Consistency-based Subset Evaluation (CSE)

3. Principal Component Analysis (PCA)

• Evaluated with three algorithms:

1. Naïve Bayes

2. Bayes Network

3. C4.5 Decision Trees

Page 16: Feature selection for detection of peer to-peer botnet traffic

Feature Extraction & Selection

Feature Selection Search method

No. of features

Description

CFSBest first

search5

source packet count, source min. packet size, source max. packet size, dest. max. packet size, source inter-arrival time std.

CSEBest first

search8

source min. packet size, source max. packet size, dest. max. packet size, source avg. packet size, dest. avg. packet size, source max. inter-arrival time, flow duration, source volume

PCA - 12 A linear combination of features

Page 17: Feature selection for detection of peer to-peer botnet traffic

Evaluation of Feature Selection Techniques

0

10

20

30

40

50

60

70

80

90

100

NaiveBayes BayesNet C4.5

85.2

97.08 98.23

81.51

95.92 98.18

80.24

96.2 98.23

82.16

96.67 98.17

Acc

ura

cy i

n %

Classification Algorithm

Full CFS CSE PCA

93

94

95

96

97

98

99

NaiveBayes BayesNet C4.5

98.9

96.9

98.9

95.2 95.3

98.9

96.1

95.7

99

95.4

96.2

98.9

De

tect

ion

Ra

te i

n %

Classification algorithm

Full CFS CSE PCA

FNTNFPTP

TNTPAccuracy

FNTP

TPrate

Detection

Page 18: Feature selection for detection of peer to-peer botnet traffic

Evaluation of Feature Selection Techniques

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NaiveBayes BayesNet C4.5N

orm

aliz

ed c

lass

ific

atio

n s

pee

dClassification Algorithm

Full CFS CSE PCA

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NaiveBayes BayesNet C4.5

No

rmal

ized

Bu

ild

Tim

es

Classification Algorithm

FULL CFS CSE PCA

Page 19: Feature selection for detection of peer to-peer botnet traffic

Primary Observations

Page 20: Feature selection for detection of peer to-peer botnet traffic

Future Scope Ensemble of classifiers

(Work in Progress- paper submitted to I-CARE 2013)

Close-to-real-time Detection Tool

(Work in progress)

Space-efficient data structures

Page 21: Feature selection for detection of peer to-peer botnet traffic

References1. A. H. Sung and S. Mukkamala. The feature selection and intrusion detection

problems. In Advances in Computer Science-ASIAN 2004. Higher-Level

Decision Making, pages 468–482. Springer, 2005.

2. S. Chebrolu, A. Abraham, and J. P. Thomas. Feature deduction and

ensemble design of intrusion detection systems. Computers & Security,

24(4):295–307, 2005.

3. R. Schoof and R. Koning. Detecting peer-to-peer botnets. University of

Amsterdam, 2007.

4. S. Saad, I. Traore, A. Ghorbani, B. Sayed, D. Zhao, W. Lu, J. Felix, and P.

Hakimian. Detecting p2p botnets through network behavior analysis and

machine learning. In Privacy, Security and Trust (PST), 2011 Ninth Annual

International Conference on, pages 174–180. IEEE, 2011.

5. B. Rahbarinia, R. Perdisci, A. Lanzi, and K. Li. Peerrush: Mining for unwanted

p2p traffic. In DIMVA. 2013.

Page 22: Feature selection for detection of peer to-peer botnet traffic

[email protected]

Visit our Research Group: www.netclique.in