Download pdf - Acknowledgement - Texas A&M University–Corpus Christisci.tamucc.edu/~cams/projects/270.pdf · detection and network management. Over the past several years a growing number of research

Acknowledgement

With a deep sense of gratitude, I wish to express my sincere thanks to Dr. Mario Garcia,

my Project Chair and my committee members for their immense help in planning and

executing the project in time. The confidence and dynamism with which Dr. Garcia

guided the work requires no elaboration. His company and assurance at the time of crisis

would be remembered lifelong. His valuable suggestions as final words during the course

of work are greatly acknowledged.

I also want to thank my parents, who taught me the value of hard work by their

own example. I would like to share this moment of happiness with my mother and

brother. They rendered me enormous support during the whole tenure of my project. The

encouragement and motivation that was given to me to carry out my research by my

father is also remembered.

Finally, I would like to thank all whose direct and indirect support helped

me complete my project in time.

ii

ABSTRACT

In recent years Data mining techniques have been applied in many

different fields including marketing, manufacturing, process control, fraud

detection and network management. Over the past several years a growing

number of research projects have applied data mining to various problems in

intrusion detection. Anomaly detection as a mechanism for Intrusion detection has

been an active area of research. The goal of my project is to design and implement

an anomaly detector using data mining The project will include the use of open

source tools and/or modifications to existing tools to incorporate the goals of

collection, filtering, storage, archival, and attack detection in a cohesive software

system. This paper also surveys a representative cross section of these research

efforts. Conclusions are drawn and directions for future research are suggested.

iii

DEFINITIONS AND ABBREVIATIONS

Definitions

Anomaly It can be defined as a data point that for some system has characteristics significantly different when compared to earlier observed (Normal) data points from the same system. Training Set A set of sequences that contains ‘Normal’ sequences used for training a System Test Set A Set of Sequences containing ‘Normal’ sequences as well as labeled ‘Anomalous’ sequences, used to calculate the performance of a system. Key A set of labels attached to a test set identifying the anomalies in the set. Hit Ratio The Percentage of total Anomalies that were detected by the system Miss Ratio The Percentage of total Anomalies that were NOT detected by the system False Alarm Ratio The Percentage of the Total Normal cases that were wrongly classified as being anomalous Sequence A finite arrangement of the input tokens Abbreviations IDS Intrusion Detection Systems AD Anomaly Detector KDD Knowledge discovery in databases JAM Java Agents for Meta-learning MADAM ID Mining Audit Data for Automated Models for Intrusion Detection TCP/IP Transmission Control Protocol and Internet Protocol

iv

TABLE OF CONTENTS

Abstract…………………………………………………………………………………...iii

Definitions and Abbreviations……………………………………………………………iv

Table of Contents……………………………………………………………………….....v

List of Figures…………………………………………………………………………....vii

1. Introduction………………...……………………………………………..…………….1

1.1 Data Mining, KDD and related fields...…………………………...…………..4

1.1.1 Some Data Mining Techniques in Intrusion Detection……………..5

1.2 Intrusion Detection …………………………………………………………...7

1.2.1 Misuse Detection……………………………………………………8

1.2.2 Anomaly Detection………………………………………………...10

2. Narrative………………………………………………………...…………………….12

2.1 Building Intrusion Detection models using Data Mining…...……………….14

2.1.1 Data Collection and Enhancement…………………………………15

2.1.2 Analysis Tools……………...……………………………………...16

2.1.3 Data Preprocessing...………………………………………………17

2.1.4 Data Mining………..………………………………………………18

3. System Design……………………………………………………..…………………20

3.1 Data Collection and Enhancement…………………………………………...21

3.1.1 Collecting Attack free data on an Isolated Network………..………22

3.1.1.1 Setting up an Isolated Network…………………………….23

3.1.1.2 Traffic Generation…………..………………………….…..24

v

3.1.1.3 Data Collection on Isolated Network………….……….……25

3.1.2 Data Collection on the Router (with Attacks)……………………......25

3.2 Data Preprocessing and analysis……….…………………………………….27

3.2.1 Conn Analyser……………………………………………………….30

3.3 Mining Data for Complex Relationships…………….………………………32

3.3.1 Classification…………………………………………………………37

3.3.2 Clustering…………………………………………………………….41

3.3.3 Association-rule learners…………………………………………….47

4. Testing and Validation………………………………………………………………...49

4.1 System Performance Metrics……………………………………………….51

5. Conclusion…………………………………………………………………………….55

Appendix A……………………………………………………………………………....56

Appendix B………………………………………………………………………………58

Appendix C………………………………………………………………………………68

Appendix D………………………………………………………………………………72

Appendix E………………………………………………………………………………74

Bibliography …………………………………………………………………………….97

vi

LIST OF FIGURES

Fig.1 Data Mining based IDS………………………………………….………...………14

Fig .2 System Architecture…………………………….………………..……...………..20

Fig .3 Isolated Network with KVM switch and Hub……………………….……...…….23

Fig .4 Screen shot of Traffic…...………………………………………………………...24

Fig .5 Data Collection at a Network Tap on a router connected to the internet………....26

Fig .6 Bro-IDS Architecture………………………………………………………..……31

Fig .7 Screen shot of Weka GUI……………………………………………………...….34

Fig .8 Screen shot of Weka Explorer showing data loading and preprocessing……..…..34

Fig. 9 Screen shot of Weka - Classification using Normal data as Trainer….………….39

Fig.10 Screen shot of Weka - Classification using Attack data as Trainer…..……….…40

Fig. 11 Screen shot of Weka - Clustering using Normal data as Trainer……………..…42

Fig. 12 Clustering Visualization – Normal data as Trainer……………………………...44

Fig. 13 Screen shot of Weka - Clustering using Attack data as Trainer………..……..…45

Fig. 14 Clustering Visualization – Attack data as Trainer…………………….…………46

vii

Firoz Allahwali

1. INTRODUCTION

Intrusion detection is the process of monitoring and analyzing the events occurring in a

computer system in order to detect signs of security problems [Bace 2000]. Intrusion

detection has been an active area of research for more than a decade now. The importance of

Intrusion Detection Systems (IDS) has grown tremendously recently because of our

dependence on electronic forms of data. Sensitive information, which has to be kept secure,

is kept in an electronic form on computers. Military and the Government have been the most

vocal of the IDS users, but more and more private organizations are realizing the importance

of such a system.

There is an increase in the amount of data for which confidentiality, integrity and

availability are critical characteristics. Information stored about buying patterns of

consumers, their financial status and banking habits as well as on-line browsing patterns are

subject to frequent attacks. It has been generally known that building a secure system will

definitely reduce the probability of a successful attack, but it will not completely eliminate

such an incident. IDS assume that some of the probable attacks will get through and will

have to be detected by some sophisticated techniques. Another threat that is becoming real is

insider attack. All the security built into the system will not be able to detect an attack that is

initiated by someone with the allowed access. Such attacks are impossible to avoid and it is

imperative that they be detected. Timely detection of an attack can lead a supervisor to stop

the attack in the middle, track down the possible intruder, or revert the system back to a

consistent state after the attack.

1

Firoz Allahwali

The problem is that current NIDS are tuned specifically to detect known service

level network attacks. Attempts to expand beyond this limited realm typically

results in an unacceptable level of false positives. At the same time, enough data

exists or could be collected to allow network administrators to detect these policy

violations. Unfortunately, the data is so voluminous, and the analysis process so time

consuming, that the administrators don’t have the resources to go through it all

and find the relevant knowledge, save for the most exceptional situations, such as

after the organization has taken a large loss and the analysis is done as part of a

legal investigation. In other words, network administrators don’t have the resources

to proactively analyze the data for policy violations, especially in the presence of a

high number of false positives that cause them to waste their limited resources.

Given the nature of this problem, the natural solution is data mining in an offline

environment. Such an approach would add additional depth to the administrator’s defenses,

and allows them to more accurately determine what the threats against their network are

through the use of multiple methods on data from multiple sources. Hence, activity that it is

not efficient to detect in near real-time in an online NID, either due to the amount of state

that would need to be retained, or the amount of computational resources that would need to

be expended in a limited time window, can be more easily identified. Some examples of what

such a system could detect, that online NIDS can not detect effectively, include certain types

of malicious activity, such as low and slow scans, a slowly propagating worm, unusual

activity of a user based on some new pattern of activity (rather than a single connection or

small number of connections, which are bound to produce a number of false positives), or

2

Firoz Allahwali

even new forms of attacks that online sensors are not tuned for. Additionally, such a system

could more easily allow for the introduction of new metrics, which can use the historical data

as a baseline for comparison with current activity. It also serves to aid network

administrators, security officers, and analysts in the performance of their duties by allowing

them to ask questions that would not have occurred to them a priori. Ideally, such a system

should be able to derive a threat level for the network activity that it analyzes, and predict

future attacks based on past activity.

Designing and implementing a anomaly detector is the major goal of this project. A

data-mining approach will be used to build a detector in this project. Data mining has become

a very useful technique to reduce information overload and improve decision making by

extracting and refining useful knowledge through a process of searching for relationships

and patterns from the extensive data collected by organizations. The extracted information is

used to predict, classify, model and summarize the data being mined. Data mining

technologies such as rule induction, neural networks, genetic algorithms, fuzzy logic, and

rough sets are used for classification and pattern recognition. They have been extensively

used in distinguishing abnormal behavior in a variety of contexts. In recent years data mining

techniques have been successfully used in the context of intrusion detection [Lee 2002].

3

Firoz Allahwali

1.1. Data Mining, KDD and Related fields

The term knowledge discovery in databases (KDD) is used to denote the process of

extracting useful knowledge from large data sets. Data mining, by contrast, refers to one

particular step in this process. Generally, data mining is the process of extracting useful and

previously unnoticed models or patterns from large data stores [Bace 2000; Stolfo 1998;

2000; Lee et al. 1999a; Mannila 2002; Fayyad et al. 1996]. Specifically, the data mining step

applies so-called data mining techniques to extract patterns from the data. Data mining is a

component of the Knowledge Discovery in Databases (KDD) process [Carbone 1997;

Fayyad et al. 1996]. It is preceded and followed by other KDD steps, which ensure that the

extracted patterns actually correspond to useful knowledge. Indeed, without these additional

KDD steps, there is a high risk of finding meaningless or uninteresting patterns. In other

words, the KDD process uses data mining techniques along with any required pre- and post-

processing to extract high-level knowledge from low-level data. In practice, the KDD process

is interactive and iterative, involving numerous steps with many decisions being made by the

user. Broadly some of the most basic KDD steps are [Goebel 1999]:

1. Understanding the application domain: Developing an understanding of the

application domain, the relevant background knowledge, and the specific goals of the

KDD endeavor.

2. Data integration and selection: The integration of multiple (potentially

heterogeneous) data sources and the selection of the subset of data that is relevant to

the analysis task.

4

Firoz Allahwali

3. Data mining: The application of specific algorithms for extracting patterns from data.

4. Pattern evaluation: The interpretation and validation of the discovered patterns. The

goal of this step is to guarantee that actual knowledge is being discovered.

5. Knowledge representation: This step involves documenting and using the

discovered knowledge.

Data mining extensively uses known techniques from machine learning, statistics, and

other fields. Nevertheless, several differences between data mining and related fields have

been identified .Specifically, one of the most frequently cited characteristics of data mining is

its focus on finding relatively simple, but interpretable models in an efficient and scalable

manner. In other words, data mining emphasizes the efficient discovery of simple, but

understandable models that can be interpreted as interesting or useful knowledge. Thus, for

example, neural networks (although a powerful modeling tool) are relatively difficult to

understand compared to rules, trees, sequential patterns, or associations. As a consequence,

neural networks are of less practical importance in data mining. This should not come as a

surprise. In fact, data mining is just a step in the KDD process. As such, it has to contribute

to the overall goal of knowledge discovery. Clearly, only understandable patterns can qualify

as knowledge. Hence the importance of understandability in data mining

1.1.1 Some Data Mining Techniques in Intrusion Detection

Data mining techniques essentially are pattern discovery algorithms. Some techniques

such as association rules (Agrawal et al., 1993) are unique to data mining, but most are

5

Firoz Allahwali

drawn from related fields such as machine learning or pattern recognition. In this section,

well-known data mining techniques that have been widely used in intrusion detection are

discussed (Stolfo,1998).

• Classification categorizes the data records (training data set) into a predetermined set

of classes (Data Classes) used as attributes to label each record; distinguishing

elements belonging to the normal or abnormal class (a specific kind of intrusion),

using decision trees or rules. This technique has been popular to detect individual

attacks but has to be applied with complementary techniques finely tuned to reduce

its demonstrated high false alarm rate. With support tools as RIPPER (a classification

rule learning program) and using a preliminary set of intrusion features, accurate rules

and temporal statistical indexes can be generated to recognize anomalous activity.

They have to be inspected, edited and included in the desired model (frequently

misuse models).

• Association Rules: Associations of system features finding unseen and or unexpected

attribute correlations within data records of a data set, as a basis for behavior profiles.

• Frequent Episode Rules analyze relationships in the data stream to find recurrent and

sequential patterns of simultaneous events, to compute them later. Its results have

been useful for attacks with arbitrary patterns of noise or distributed attacks.

6

Firoz Allahwali

• Clustering discovers complex intrusions that occur over an extended periods of time

and at different places, correlating independent network events. The sets of data

belonging to the cluster (attack or normal activity profile) are modeled according to

pre-defined metrics and their common features. It is especially efficient to detect

hybrids of attack in the cluster, showing high performance but is computationally

expensive.

• Meta-rules derive change rules over a period of time, comparing the status of two

data sets and describing their “evolution” in time based on their common, modified

and new features.

Typically, the best features of these techniques and detection models are combined to

obtain a high detection performance and a complex profile for intruders. Recognized

practices merge models for new activity (attacks or normal events) and the existing models,

to generate adaptive processes able to learn inductively the existing correlations: this Meta-

Learning capability and its adaptability with other techniques have been evaluated

empirically as effective and scalable (Stolfo,1998). The rules reduce substantially the

impractical manual development process of patterns and profiles, computing statistical

patterns from the collected data.

1.2. Intrusion Detection:

The goal of intrusion detection is to detect security violations in information systems.

Intrusion detection is a passive approach to security as it monitors information systems and

7

Firoz Allahwali

raises alarms when security violations are detected. Examples of security violations include

the abuse of privileges or the use of attacks to exploit software or protocol vulnerabilities.

Intrusion detection systems (IDSs) are categorized according to the kind of input information

they analyze into host-based and network-based IDSs. Host-based IDSs analyze host-bound

audit sources such as operating system audit trails, system logs, or application logs. Network-

based IDSs analyze network packets that are captured on a network. Traditionally, intrusion

detection techniques are classified into two broad categories: misuse detection and anomaly

detection (Zhu, 2001).

1.2.1 Misuse Detection

In Misuse Detection each data record is classified and labeled as normal or anomalous

activity. This process is the basis for a learning algorithm able to detect known attacks and

new ones if they are cataloged appropriately under a statistical process. The basic step known

as discovery outliers, matches abnormal behavior against attack patterns knowledge base that

capture behavioral patterns of intrusion and typical activity. To do this, it is needed to

compute each measure with random variables implying more updating effort as more audit

records are analyzed but more accuracy with more mined data. Although the activity needs to

be analyzed individually, complementary visualization and data mining techniques can be

used to improve performance and reduce the computational requirements. Some projects

using this concept JAM (Java Agents for Meta-learning), MADAM ID (Mining Audit Data

for Automated Models for Intrusion Detection) and Automated Discovery and Concise

Predictive Rules for Intrusion Detection.

8

Firoz Allahwali

JAM (developed at Columbia University) uses data mining techniques to discover

patterns of intrusions (Allen, 2000). It then applies a meta-learning classifier to learn the

signature of attacks. The association rules algorithm determines relationships between fields

in the audit trail records, and the frequent episodes algorithm models sequential patterns of

audit events. Features are then extracted from both algorithms and used to compute models

of intrusion behavior. The classifiers build the signature of attacks. So essentially, data

mining in JAM builds a misuse detection model. JAM generates classifiers using a rule

learning program on training data of system usage. After training, resulting classification

rules is used to recognize anomalies and detect known intrusions. The system has been tested

with data from Sendmail-based attacks, and with network attacks using TCP dump data.

MADAM ID uses data mining to develop rules for misuse detection. The motivation is that

current systems require extensive manual effort to develop rules for misuse detection.

MADAM ID applies data mining to audit data to compute models that accurately capture

behavioral patterns of intrusions and normal activities. While MADAM ID performed well in

the 1998 DARPA evaluation of intrusion detection systems (Allen, 2000), it is ineffective in

detecting attacks that have not already been specified.

Researchers at Iowa State University report on Automated Discovery of Concise

Predictive Rules for Intrusion Detection (Allen, 2000). This system performs data mining to

provide global, temporal views of intrusions on a distributed system. The rules detect

intrusions against privileged programs (such as Sendmail) using feature vectors to describe

the system calls executed by each process. A genetic algorithm selects feature subsets to

9

Firoz Allahwali

reduce the number of observed features while maintaining or improving learning accuracy.

This is another example of data mining being used to develop rules for misuse detection.

1.2.2 Anomaly Detection:

Anomaly detection, on the other hand, uses a model of normal user or system behavior

and flags significant deviations from this model as potentially malicious. This model of

normal user or system behavior is commonly known as the user or system profile. Strength of

anomaly detection is its ability to detect previously unknown attacks. The most popular

anomaly detection system using data mining is ADAM (Audit Data Analysis and Meaning).

One of the most significant advantages of ADAM is the ability to detect novel attacks,

without depending on attack training data, through a novel application of the pseudo-Bayes

estimator (Barbara et al., 2001).

ADAM (developed at George Mason University Center for Secure Information

Systems) uses a combination of association rules mining and classification to discover

attacks in TCP dump data. The JAM system also combines association mining and

classification. But there are two significant differences between ADAM and JAM. First,

ADAM builds a repository of normal frequent itemsets that hold during attack-free periods.

It does so by mining data that is known to be free of attacks. Then, ADAM runs a sliding-

window algorithm that finds frequent item sets in the most recent set of TCP connections,

and compares them with those stored in the normal item-set repository, discarding those that

are deemed normal. With the rest, ADAM uses a classifier that has been previously trained to

10

Firoz Allahwali

classify the suspicious connections as a known type of attack, an unknown type, or a false

alarm. The system performs especially well with denial of service and probe attacks.

11

Firoz Allahwali

2. NARRATIVE

The seductive vision of automation is that it can and will solve all problems, making

human involvement unnecessary. This is a mirage in intrusion detection. Human analysts will

always be needed to monitor that the automated system is performing as desired, to identify

new categories of attacks, and to analyze the more sophisticated attacks. Real-time automated

response is very desirable in some intrusion detection contexts. But this puts a large demand

on database performance. The database must be fast enough to record alarms and produce

query results simultaneously. Real time scoring of anomaly or classification models is

possible, but this should not be confused with real-time model building. There is research in

this area (Zhang, 2001), but data mining is not currently capable of learning from large

amounts of real-time, dynamically changing data. It is better suited to batch processing of a

number of collected records in a daily processing regime, rather than an hourly or minute-by-

minute scheme.

In this project, the focus is on implementing an off-line "Anomaly Detection

Intrusion Detection Systems" (AD-IDS) to periodically analyze or audit batches of TCP/IP

network log data. The act of detecting intrusions is, intuitively, a real-time task by necessity.

While offline processing would seem to be solely a compromise between efficiency and

timeliness, it provides for some unique functionality. For instance, periodic batch processing

allows the related results (such as activity from the same source) to be grouped together, and

all of the activity can be ranked in the report by the relative threat levels. Another feature

12

Firoz Allahwali

unique to the offline environment is the ability to transfer logs from remote sites to a central

site for correlation during off-peak times.

Offline processing also allows us to more easily overcome shortcomings in realtime

IDSs. For example, many IDSs will start to drop packets when flooded with data faster than

they can process it. Other forms of denial of service can also be launched against an active

(real-time) IDS system, such as flooding it with fragmented IP packets in order to cause it to

spend an inordinate amount of time and memory attempting to reconstruct bogus traffic.

Meanwhile, the attacker can break into the real target system without fear of detection. The

off-line environment is significantly less vulnerable to such a threat, especially if given a

high degree of assurance that any traffic admitted to the local network is logged (such as by

the firewall responsible for the admittance of such traffic).

In principle an AD-IDS "learns" what constitutes "normal" network traffic, developing

sets of models that are updated over time. These models are then applied against new traffic,

and traffic that doesn't match the model of "normal" is flagged as suspicious. Further focus

will be on looking at how to apply an initial analysis to data collected in future so that

systems administrators can use exception reports to identify suspected intrusions. It is

important to note that no software can monitor all network traffic because the data processing

becomes prohibitive. By including network intrusion detection into the comprehensive

security infrastructure, system administrators can provide the organization with a more

secure computing environment.

13

Firoz Allahwali

2.1 Building Intrusion Detection Models Using Data Mining

The general architecture of the anomaly detector is as follows. The training data enters

the anomaly detector, which once trained, will thereafter be tested on test data. The

system classifies the data as normal or anomalous. This classification is compared with

the key to calculate the performance statistics. Given a training set, which is a set of labeled

sequences, and a test set, which is another set of labeled sequences unseen by the system, it

classifies each of the test sequences as either ‘Normal’ or ‘Anomalous’. For any given

anomaly detector, its performance is measured by calculating its hit ratio, miss ratio and the

false alarm ratio. Figure 1 describes an IDS based on Data Mining.

Figure 1. Data Mining based IDS. The data collected by the IDS is stored in the

alarm warehouse on which the data mining tools are run and anomalous behavior is identified which is in turn fed to the IDS.

14

Firoz Allahwali

2.1.1 Data Collection and Enhancement

TCP/IP traffic data is generated using a data collection application, which is referred

to as a collector. A collector can be a "home grown" application or purchased from one of

many network performance vendors available in the market. When network data is generated

it is stored in files commonly known as network data logs. These data logs can be

represented in a wide variety of formats, i.e. ASCII, binary, RDBMS tables which can pose

as a problem in reporting and analysis. Also pre-processing of the network data

within the logs is usually required so that the data can be standardized. This will ensure that

the data is processed correctly and that there are no discrepancies in the representation of the

data.

Network audit data was collected under a isolated network and on a real network using

Snort 2.4.3. Snort is a cross-platform, lightweight network intrusion detection tool that can be

deployed to monitor small TCP/IP networks and detect a wide variety of suspicious network

traffic as well as outright attacks [Snort]. Snort provides administrators with enough data to

make informed decisions on the proper course of action in the face of suspicious activity.

Snort can also be deployed rapidly to fill the holes of security in a network.

The main distribution site for Snort is http://www.snort.org. Snort is distributed under

the GNU GPL license by the author Martin Roesch. It can perform protocol analysis, content

searching/matching. It can be used to detect a variety of attacks and probes, such as buffer

overflows, stealth port scans, CGI attacks, SMB probes, OS fingerprinting attempts, and

15

Firoz Allahwali

more. Snort uses a flexible rules language to describe traffic that it should collect or pass, and

includes a detection engine utilizing a modular plug-in architecture. Snort has real-time

alerting capability as well, incorporating alerting mechanisms for Syslog, user- specified

files, a UNIX socket, or WinPopup messages to Windows clients using Samba's smbclient.

Snort has three primary uses. It can be used as a straight packet sniffer like tcpdump or as a

packet logger that is useful for network traffic debugging. It can also be used as a full blown

network intrusion detection system.

Snort logs packets in either tcpdump binary format or in Snort's decoded ASCII

format to logging directories that are named based on the IP address of the foreign host.

Plug-ins allow the detection and reporting subsystems to be extended. Available plug-ins

include database logging, small fragment detection, portscan detection, and HTTP URI

normalization. [Snort]

2.1.2 Analysis Tools

To solve the complex issues involved in turning TCP/IP network transactions into data

suitable for mining and exception reporting a combination of products namely tcpreplay and

Bro-IDS were utilized. Tcpreplay is a suite of utilities for *NIX systems for editing and

replaying network traffic which was previously captured by tools like tcpdump and ethereal.

The goal of tcpreplay is to provide the means for providing reliable and repeatable means for

testing a variety of network devices such as switches, router, firewalls, network intrusion

detection and prevention systems (IDS and IPS).

16

Firoz Allahwali

Tcpreplay provides the ability to classify traffic as client or server, edit packets at

layers 2-4 and replay the traffic at arbitrary speeds onto a network for sniffing or through a

device. Some of the advantages of using tcpreplay over using ``exploit code'' are:

• Since tcpreplay emulates the victim and the attacker, a tcpreplay box and the device

under test (DUT) are needed

• Tests can include background traffic of entire networks without the cost and effort of

setting up dozens of hosts or costly emulators

• Uses the open standard pcap file format for which dozens of command line and GUI

utilities exist

• Tests are fully repeatable without a complex test harnesses or network configuration

• Tests can be replayed at arbitrary speeds

• Actively developed and supported by it's author [TCPreplay]

2.1.3 Data Preprocessing

Since the study of Internet traffic requires working with large quantities of data,

selecting an appropriate tool for data analysis is crucial. This project utilizes the Bro

intrusion detection system. Bro is an open-source, Unix-based Network Intrusion Detection

System (NIDS) that passively monitors network traffic and looks for suspicious traffic. Bro

detects intrusions by comparing network traffic against a customizable set of rules describing

events that are deemed troublesome. These rules might describe specific attacks (including

those defined by “signatures”) or unusual activities (e.g., certain hosts connecting to certain

services or patterns of failed connection attempts).

17

Firoz Allahwali

Bro uses a specialized policy language that allows a site to tailor Bro’s operation,

both as site policies evolve and as new attacks are discovered. If Bro detects something of

interest, it can be instructed to either generate a log entry, alert the operator in real-time, or

initiate the execution of an operating system command (e.g., to terminate a connection or

block a malicious host on-the-fly). In addition, Bro’s detailed log files can be particularly

useful for forensics. Bro targets high-speed (Gbps), high-volume intrusion detection. By

using packet-filtering techniques, Bro is able to achieve the necessary performance while

running on commercially available PC hardware, and thus can serve as a cost-effective

means of monitoring a site's Internet connection. Although its primary purpose is for

detection of network intrusions, Bro comes with powerful scripting capabilities which make

analyzing large volumes of data more manageable. [Bro-ids]

2.1.4 Data Mining

Data mining strategies fall into two broad categories: supervised learning and

unsupervised learning. Supervised learning methods are deployed when there exists a field

or variable (target) with known values and about which predictions will be made by using

the values of other fields or variables (inputs). Unsupervised learning methods tend to be

deployed on data for which there does not exist a field or variable (target) with known

values, while fields or variables do exist for other fields or variables (inputs).Unsupervised

learning methods while more frequently used in cases where a target field does not exist, can

be deployed on data for which a target field exists.

18

Firoz Allahwali

WEKA, an open source data mining package developed at the University of Waikato,

New Zealand was used to analyze the preprocessed TCP/IP traffic data, for unauthorized

activities and build data mining models. "WEKA" stands for the Waikato Environment for

Knowledge Analysis, which was developed at the University of Waikato in New Zealand.

WEKA is extensible and has become a collection of machine learning algorithms for solving

real-world data mining problems. It is written in Java and runs on almost every platform.

WEKA is easy to use and to be applied at several different levels.

There are three major implemented schemes in WEKA.

(1) Implemented schemes for classification.

(2) Implemented schemes for numeric prediction.

(3) Implemented "meta-schemes”.

Besides actual learning schemes, WEKA also contains a large variety of tools that can be

used for pre-processing datasets, so that focus can be on the algorithm without considering

too much details as reading the data from files, implementing filtering algorithm and

providing code to evaluate the results. [Ian 2005]

19

Firoz Allahwali

3. SYSTEM DESIGN

Data Mining is a resource consuming computing process. An efficient deployment for

an Intrusion Detection system employing Data Mining requires an adaptive and scalable

architecture and infrastructure able to support the storage for the audited data, its processing,

the model generation and distribution, as well as the interaction with the pre-existing

elements in the organizational security infrastructure. Figure 2 explains the system

architecture. Data collected by Snort is transformed into connection records by Bro-ids which

in turn are used to generate models using the data mining tool WEKA.

Figure 2: System Architecture.Bro IDS converts the TCPDUMP data collected by Snort into Connection records which are used for data mining using WEKA

20

Firoz Allahwali

3.1 Data Collection and Enhancement

This study will use data collected on an isolated network and at the main router

connected to the internet. Data is collected using a Snort sensor which runs on a dual-

processor Dell server, with 2GB of RAM and 140GB of disk space. The operating system on

this machine is Red Hat Enterprise Linux 4. This system has minimal number of packages

installed, sufficient for a usable system with the unessential services turned off. By hardening

the OS and further securing the system, it will be ideal to act as a Snort sensor. Snort ver.

2.4.3 is installed on this system along with Apache, SSL, PHP, MySQL, and BASE

following the instructions provided on the www.snort.org website.

Since a Snort sensor is fundamentally passive i.e. it receives data but does not send

any, it makes sense security wise to run it in a stealthy mode. To achieve this two network

interface card’s are used, one for management and the other for sniffing. The NIC used for

sniffing in stealthy mode is not given any IP address where as the NIC used for management

is provided with an IP address. The NIC with the IP address is connected to a network

different from the sniffing interface for administrative purposes.

Snort can be run in three different modes:

1. Packet Sniffer: Snort is a libcap based packet sniffer which simply reads the packets

off of the network and displays them in a continuous stream on the console.

2. Packet Logger: Packet logger mode logs the packets to the disk. When snort runs in

this mode, it collects every packet it sees and places it in a directory hierarchy based

21

Firoz Allahwali

on the IP address of one of the hosts in the datagram. Once the packets have been

logged to the binary file, the packets can be read out back with any sniffer that

supports the tcpdump binary format such as tcpreplay or ethereal.

3. Network Intrusion Detection system: Running snort in this mode allows it to analyze

network traffic for matches against a user defined rule set and performs several

actions based upon what it sees.

Data for this project is collected on the isolated network for attack free network data

and on the main router for data with attacks by running Snort in Packet Logging mode. The

packets are logged to a single log file in tcpdump format using log_tcpdump module.

Log_tcpdump logs packets in the tcpdump file format. There are a wide assortment of

applications and tools designed to read tcpdump output.

Log_tcpdump has one configuration option.

Format: log_tcpdump [filename]

[filename] is the name of the output file. The [filename] will have the

<month><date>@<time> prepended to it. This is to keep data from separate Snort runs

distinct. [Snort]

3.1.1 Collecting Attack free data on an isolated network

This involves setting up an isolated network, running a traffic generator to generate

traffic and using snort to collect data.

22

Firoz Allahwali

3.1.1.1 Setting up an Isolated Network:

Normal data with out any intrusions and attacks was collected on an isolated simulated

network which was set up solely for this purpose. Figure 3 shows in detail the set up of the

network. This network consisted of five Dell Pentium III machines, with 256 MB RAM and

20 GB of disk space. This network consisted of a Windows 2003 server, three Windows XP

machines and a Linux machine. The Windows 2003 server runs the IIS, FTP, SMTP and

other services. Samba smb client was turned on the Linux machine to communicate with the

Windows network. All the computers were connected to a 3Com® Baseline Dual Speed 16-

Port hub with CAT5 cables. The computer running Snort was connected to this hub in a

stealthy mode. An 8-port KVM switch was utilized to control multiple computers from a

single keyboard, video monitor and mouse as it is shown in Figure 3.

Figure 3: Isolated Network with KVM switch and Hub

23

Firoz Allahwali

3.1.1.2 Traffic Generation:

Network traffic is generated on the isolated network using the open source network

traffic generator “Traffic” developed by Robert Sandilands. Traffic is a network traffic

generator following a server/client model for generating high volumes of traffic on a

network. The server module is run on the Windows 2003 server and the client module is run

on two Windows XP machines. The client let’s the user to choose the protocol, number of

packets and the time interval between the packets. [Robert]

Figure 4:Screen Shot of Traffic.Client showing the number of connections and the protocol

24

Firoz Allahwali

3.1.1.3 Data Collection on the Isolated Network:

Data was collected on the isolated network over a period of one week by running the

Network traffic generator “Traffic”. The Traffic client was installed on two Windows XP

machines. Snort was used in packet logging mode to collect the data. The data passing

through the 3Com hub is captured by Snort. Sample data collected by Snort in tcpdump

format is shown below:

Version 2.4.3 (Build 99) By Martin Roesch ( www.snort.org) 07/23-00:07:00.426313 192.168.1.21:21 -> 192.168.1.20:1238 TCP TTL:64 TOS:0x0 ID:6666 IpLen:20 DgmLen:123 DF ***AP*** Seq: 0xD1DA08CC Ack: 0xA12187 Win: 0x7D78 TcpLen: 32 TCP Options (3) => NOP NOP TS: 160418056 34614576 220 ProFTPD 1.2.0pre10 Server (Red Hat) [snort.delmar.edu].. 02:40:27.881867 192.168.238.1.1540 > 192.168.238.5.www: P 1:485(484) ack 1 win 6 4240 (DF) 0x0000 4500 020c 5af0 4000 8006 40a3 c0a8 ee01 E...Z.@...@..... 0x0010 c0a8 ee05 0604 0050 6a19 984a 87b4 aae9 .......Pj..J.... 0x0020 5018 faf0 59b7 0000 4745 5420 2f6f 7267 P...Y...GET./firoz 0x0030 616e 2d65 6e68 616e 6365 6d65 6e74 2e68 pictures……………h 0x0040 746d 6c20 4854 5450 2f31 2e31 0d0a 486f tml.HTTP/1.1..Ho 0x0050 7374 3a20 3139 322e 3136 382e 3233 382e st:.192.168.238. 0x0060 350d 0a55 7365 722d 4167 656e 743a 204d 5..User-Agent:.M 0x0070 6f7a 696c 6c61 2f35 2e30 2028 5769 6e64 ozilla/5.0.(Wind =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

3.1.2 Data Collection on the Router (with attacks):

The internal network of the organization where the data was collected is connected to the

internet through a single high speed internet connection. Snort was deployed in such a way

that it can monitor all the traffic coming (and leaving) an otherwise isolated network. One

way to accomplish this is to deploy a passive tap with minimal effect on network operations.

25

Firoz Allahwali

The sensor is deployed on this tap between the external firewall and the internal network

such that it can monitor all the traffic that enters (and departs) over that connection. This

allows the sensor to examine all of the data associated with the external link so that it can be

effectively used to monitor for incoming (and outgoing) attacks. The snort machine is located

at the main router on campus, which is connected to the Internet by a 100Mb/s full-duplex

Ethernet link. Figure 5 shows the location of the Network tap behind the firewall on the main

router connected to the internet.

Figure 5: Data Collection at a Network Tap on a Router connected to Internet

Data was collected on the network tap over a period of one week by running Snort in a

stealthy packet logging mode. Sample data collected by Snort in tcpdump format is provided

in Appendix D.

26

Firoz Allahwali

3.2 Data preprocessing and analysis:

The goal of the analysis is to create descriptive information from the raw TCPDUMP

files, then to mine the data in order to determine likely intrusive TCP/IP connections. The

raw data consists of packet level transmission data including source and destination IP

address and ports; flags, acknowledgements and packet sequence numbers; and window,

buffer and optional information.

In the audit data no single record represents a complete conversation between two IP

addresses. In fact, each record is only a portion of the conversation: the source sending

information to the destination, or vice versa. In order to create useful inputs for data mining,

we must first determine which records are parts of the same conversation. After the

conversation reconciliation, we can compute meaningful variables such as the number of

connections made to a one or more destination IP addresses within a two-second time

window. For data mining then, the data preparation goals is to establish the final state of a

conversation; then to understand the behavior of each source IP address on the network in

relation to destination IP addresses and destination ports. Towards this goal, we summarize

the final states, number of destination IP addresses, and time differences to destination

address fields by the source IP address. This provides, for each source IP address, counts of

the number of times each final state occurred, to how many IP addresses were connections

made, and counts of the time difference groupings. We also summarize the destination port

types and time differences to destination ports by the combined source and destination IP

addresses. This provides, for each unique IP source to destination IP connection, the number

27

Firoz Allahwali

of times each specific action was attempted, and the number of times port hits occurred

within the specified intervals.

To get information on what action the user was attempting, we map the destination

ports to specific user actions. The destination port determines the function the user is trying

to access, while the source port is assigned randomly. Most network administrators use a

common set of port mappings. By transforming port numbers into action types, we can

determine basic information about what a user was attempting. For this purpose we create a

high level groupings of the port types as login, email, system status check, SNMP, date, who,

chat, and other. This creates the destination port type filed.

In order to consider automated versus manual input stream, we have to determine the

amount of time elapsed between connections to different destination IP addresses from a

single IP source address, and to different destination ports on a single destination IP address

from a single IP Source address. We then create indicators for elapsed time within specific

ranges. For example we indicate if the elapsed time was less than 5 seconds, between 5 and

30 seconds, greater than 30 seconds, or undeterminable. This creates the time difference to

destination address and time difference to destination port fields.

Data captured by snort is preprocessed utilizing the excellent scripting abilities of Bro-

ids. The special scripting capabilities of Bro-ids make analyzing large volumes of data more

easy and manageable. Bro-ids was installed on a dual-processor Dell server, with 1GB of

28

Firoz Allahwali

RAM and 120GB of disk space. The operating system on this machine is Red Hat Enterprise

Linux.

Tcpreplay, installed on the Snort sensor, was used to replay the data collected by Snort.

A network was set up using the 3Com hub with just the Snort sensor and the Bro-ids

machine. The network packets broadcasted by tcpreplay on to the hub were captured by the

Bro-ids and transformed into the Network connection data suitable for data mining.

Bro is an intrusion detection system that works by passively watching traffic seen on

a network link. It is built around an event engine that pieces network packets into events that

reflect different types of activity. Some events are quite low-level, such as the monitor seeing

a connection attempt; some are specific to a particular network protocol, such as an FTP

request or reply; and some reflect fairly high-level notions, such as a user having successfully

authenticated during a login session.

Bro runs the events produced by the event engine through a policy script supplied to it

by the administrator. Bro scripts are made up of event handlers that specify what to do

whenever a given event occurs. Event handlers can maintain and update global state

information, write arbitrary information to disk files, generate new events, call functions

(either user-defined or predefined), generate alerts that produce syslog messages, and invoke

arbitrary shell commands. [Bro-ids]

29

Firoz Allahwali

Figure 6: Bro-ids Architecture, describing the data flow through the Event engine,

Policy Script detector and the Scan detector

3.2.1 Conn analyzer:

Bro performs the generic connection analysis like the connection start time, duration,

sizes, hosts etc. using the conn analyzer. The Connection record data type associated with the

conn analyzer keeps track of the state associated with each connection in the form of one line

connection summaries. A connection is defined by an initial packet that attempts to set up a

session and all subsequent packets that take part in the session. Initial packets that fail to set

up a session are also recorded as connections and are tagged with a failure state that

designates the reason for failure. Each entry contains the following data describing the

connection: date/time, the duration of the connection, the local and remote ip addresses and

ports, bytes transferred in each direction, the transport protocol (udp, tcp), the final state of

the connection, and other information describing the connection. The conn analyzer also has

a list of callable connection functions that help in the generic connection analysis.

The data was extracted from the tcpdump data by making changes to the conn analyzer. The

feature list employed was obtained from Darpa feature set.

30

Firoz Allahwali

type conn_id: record { orig_h: addr; # Address of originating host. orig_p: port; # Port used by originator. resp_h: addr; # Address of responding host. resp_p: port; # Port used by responder. }; type endpoint: record { size: count; # Bytes sent by this endpoint so far. state: count; # The endpoint's current state. };

type connection: record { duration; protocol_type; service; flag; src_bytes; dst_bytes;

land; wrong_fragment; urgent; hot; num_failed_logins; logged_in; num_compromised; root_shell; su_attempted; num_root; num_file_creations; num_shells; num_access_files; num_outbound_cmds; is_host_login; is_guest_login; count; srv_count; serror_rate; srv_serror_rate; rerror_rate; srv_rerror_rate; same_srv_rate; diff_srv_rate; srv_diff_host_rate; dst_host_count; dst_host_srv_count; dst_host_same_srv_rate; dst_host_diff_srv_rate; dst_host_same_src_port_rate; dst_host_srv_diff_host_rate; dst_host_serror_rate; dst_host_srv_serror_rate; dst_host_rerror_rate;

31

Firoz Allahwali

dst_host_srv_rerror_rate; conn_class_type;

};

The main output of conn analyser is a one-line ASCII summary of each connection. These

summaries are written to a file with the name conn.tag.log, where tag uniquely identifies the

Bro session generating the logs. [Bro-ids]

The output in the conn.tag.log looks something like this:

0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.01,0.00,0.00,0.00,0.00,0.00,snmpgetattack. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,0.01,0.00,0.00,0.00,0.00,0.00,snmpgetattack. 0,udp,domain_u,SF,29,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0.00,0.00,0.00,0.00,0.50,1.00,0.00,10,3,0.30,0.30,0.30,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,253,0.99,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack. 0,tcp,http,SF,223,185,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,4,4,0.00,0.00,0.00,0.00,1.00,0.00,0.00,71,255,1.00,0.00,0.01,0.01,0.00,0.00,0.00,0.00,normal.

3.3 Mining Data for Complex Relationships

Data mining is the most important process of the knowledge acquisition. There are

many methods of data mining, such as statistical methods, cluster analysis, pattern

32

Firoz Allahwali

recognition, decision tree, association rule, artificial neural network, genetic algorithm, rough

set theory, visualization technology, and so on. There is no single data mining technique that

is appropriate for all data mining problems. Real data sets vary and different experiments

have to be carried out to arrive at a suitable technique for a particular data mining problem.

For this study, the three data mining techniques of Classification, Clustering and Association

were looked into to study their suitability for processing Intrusion Detection data.

Weka an open source data mining tool set developed at the University of Waikato,

New Zealand was used for this project. Weka was written in java and can be run on any

platform and is distributed under GNU general public license. The work bench includes

methods for all standard data mining problems: regression, classification, clustering,

association rule mining and attribute selection. It has tools for preprocessing a dataset feed it

into a learning scheme and analyze the resulting classifier and its performance [Ian 2005] .

Weka can be used in three different ways as shown in Figure 7: Explorer,

Knowledge flow interface and the Experimenter. Explorer provides a graphical user interface

which provides access to all the features using menu selection and form filling. The

Knowledge flow interface allows designing configurations for data processing by connecting

components representing data sources, preprocessing tools, learning algorithms, evaluation

methods and visualization modules. Weka’s third interface the Experimenter provides an

environment to automate the process of comparing a variety of learning techniques.

Weka was installed on a Dell server, with 1GB of RAM and 80GB of disk space. The

operating system on this machine is Red Hat Enterprise Linux.

33

Firoz Allahwali

Figure 7: Screen shot of Weka GUI showing the Explorer

Figure 8: Screen shot of Weka Explorer showing data loading and preprocessing

34

Firoz Allahwali

Weka’s native data storage method is ARFF format. An ARFF (Attribute-Relation File

Format) file is an ASCII text file that describes a list of instances sharing a set of attributes.

Following is the ARFF file that was written for our experiment.

@RELATION ids @ATTRIBUTE duration numeric @ATTRIBUTE protocol_type {udp,tcp,icmp} @ATTRIBUTE service {private,ecr_i,ntp_u,X11,IRC,courier,ctf,ssh,time,ldap,supdup,Z39_50,discard,bgp,kshell,urp_i,uucp,uucp_path,netstat,nnsp,http_443,iso_tsap,http,pop_3,other,domain_u,exec,telnet,ftp,sql_net,ftp_data,smtp,sunrpc,shell,vmnet,netbios_dgm,efs,finger,rje,mtp,imap4,link,auth,domain,nntp,daytime,gopher,echo,printer,whois,klogin,ecr,netbios_ssn,remote_job,hostnames,eco_i,login,pop_2,netbios_ns} @ATTRIBUTE flag {REJ,SF,S1,S0,RSTO,S3,S2,RSTR,SH} @ATTRIBUTE src_bytes numeric @ATTRIBUTE dst_bytes numeric @ATTRIBUTE land numeric @ATTRIBUTE wrong_fragment numeric @ATTRIBUTE urgent numeric @ATTRIBUTE hot numeric @ATTRIBUTE num_failed_logins numeric @ATTRIBUTE logged_in numeric @ATTRIBUTE num_compromised numeric @ATTRIBUTE root_shell numeric @ATTRIBUTE su_attempted numeric @ATTRIBUTE num_root numeric @ATTRIBUTE num_file_creations numeric @ATTRIBUTE num_shells numeric @ATTRIBUTE num_access_files numeric @ATTRIBUTE num_outbound_cmds numeric @ATTRIBUTE is_host_login numeric @ATTRIBUTE is_guest_login numeric @ATTRIBUTE count numeric @ATTRIBUTE srv_count numeric @ATTRIBUTE serror_rate numeric @ATTRIBUTE srv_serror_rate numeric @ATTRIBUTE rerror_rate numeric @ATTRIBUTE srv_rerror_rate numeric @ATTRIBUTE same_srv_rate numeric @ATTRIBUTE diff_srv_rate numeric @ATTRIBUTE srv_diff_host_rate numeric @ATTRIBUTE dst_host_count numeric @ATTRIBUTE dst_host_srv_count numeric

35

Firoz Allahwali

@ATTRIBUTE dst_host_same_srv_rate numeric @ATTRIBUTE dst_host_diff_srv_rate numeric @ATTRIBUTE dst_host_same_src_port_rate numeric @ATTRIBUTE dst_host_srv_diff_host_rate numeric @ATTRIBUTE dst_host_serror_rate numeric @ATTRIBUTE dst_host_srv_serror_rate numeric @ATTRIBUTE dst_host_rerror_rate numeric @ATTRIBUTE dst_host_srv_rerror_rate numeric @ATTRIBUTE class {'back.','snmpguess.','apache2.','httptunnel.','mscan.','saint.','snmpgetattack.','processtable.','mailbomb.','buffer_overflow.','ftp_write.','guess_passwd.','imap.','ipsweep.','land.','loadmodule.','multihop.','neptune.','nmap.','normal.','perl.','phf.','pod.','portsweep.','rootkit.','satan.','smurf.','spy.','teardrop.','warezclient.','warezmaster.'} @DATA 0,tcp,private,REJ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,204,5,0.00,0.00,1.00,1.00,0.02,0.06,0.00,255,5,0.02,0.07,0.00,0.00,0.00,0.00,1.00,1.00,neptune. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,178,178,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,tcp,http,SF,208,15127,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,2,0.00,0.00,0.00,0.00,1.00,0.00,1.00,1,255,1.00,0.00,1.00,0.03,0.00,0.00,0.00,0.00,normal. 0,icmp,ecr_i,SF,520,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,tcp,http,SF,170,2003,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,2,255,1.00,0.00,0.50,0.04,0.00,0.00,0.00,0.00,normal. 0,icmp,ecr_i,SF,520,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,510,510,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,510,510,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,520,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf.

36

Firoz Allahwali

3.3.1 Classification:

Classification involves data which is divided into two or more groups, or classes to predict

the category of categorical data by building a model based on some predictor variables.

Classification learning is sometimes called Supervised because this method operates under

supervision by being provided with the actual outcome for each of the training examples.

This outcome is called the class of the example. The Classification algorithm is inductively

learned to construct a model from the preclassified data set. Each data item is defined by

values of the attributes. Classification may be viewed as mapping from a set of attributes to a

particular class. The Decision tree classifies the given data item using the values of its

attributes. The decision tree is initially constructed from a set of pre-classified data. The main

approach is to select the attributes, which best divides the data items into their classes.

According to the values of these attributes the data items are partitioned. This process is

recursively applied to each partitioned subset of the data items. The process terminates when

all the data items in current subset belongs to the same class. A node of a decision tree

specifies an attribute by which the data is to be partitioned. Each node has a number of edges,

which are labeled according to a possible value of the attribute in the parent node. An edge

connects either two nodes or a node and a leaf. Leaves are labeled with a decision value for

categorization of the data. The success of classification learning can be judged by trying out

the concept description that is learned on an independent set of test data. [Sandhya]

WEKA has implementations of numerous classification and prediction algorithms.

The basic ideas behind using all of these are similar. The dataset in this experiment is

37

Firoz Allahwali

classified by using J4.8 algorithm which is Weka’s implementation of C4.5 decision tree

learner. Since J4.8 algorithm can handle numeric attributes, there is no need to discretize any

of the attributes.

The experiment was carried out using the normal data as the training data and a

subset of attack data as the test data initially and subsequently using attack data as both

training and test data. Weka cannot handle large datasets since it loads everything into

memory before it processes the data. To overcome this shortcoming, 5900 records were

generated randomly from the normal data and two sets of 6600 and 3400 records were

generated randomly from the attack data. The subset of attack data with 3400 records was

used as test data in both the cases. The results of using J4.8 decision tree on the data are

listed below.

Normal data as training data and attack data as test data:

Here the normal data (attack-free subset) was used as training data and attack data (3400

records) was used as testing data as shown in Figure 9.

38

Firoz Allahwali

Figure 9. Screen shot of Weka - Classification using Normal data as Trainer

=== Evaluation on test set === === Summary === Correctly Classified Instances 622 18.2941 % Incorrectly Classified Instances 2778 81.7059 % Kappa statistic 0 Mean absolute error 0.0527 Root mean squared error 0.2296 Relative absolute error 99.9038 % Root relative squared error 100.2613 % Total Number of Instances 3400

39

Firoz Allahwali

Attack Data as both training and testing data:

In this set up the subset of attack data with 6600 records was used as training data and the

subset with 3400 records was used as testing data.

Figure 10. Screen Shot of Weka - Classification using Attack data as trainer

=== Evaluation on test set === === Summary === Correctly Classified Instances 3298 97 % Incorrectly Classified Instances 102 3 % Kappa statistic 0.9533 Mean absolute error 0.0022 Root mean squared error 0.0364 Relative absolute error 5.1806 % Root relative squared error 25.2454 % Total Number of Instances 3400

40

Firoz Allahwali

For definitions of results see Appendix A. For detailed results see Appendix B Results Analysis: By comparing the percentage of correctly classified instances it is clear that using

attack data (97%) as the training set is way more efficient than using the attack free data

(18.2%) as the training set.

Training

Time Testing Time Accuracy

Normal Data as Trainer

0.07 sec 0.01sec 18.2%

Attack Data as Trainer

4.08 sec 3.98sec 97%

3.3.2 Clustering: Cluster analysis can be defined as "a wide variety of procedures that can be used to

create a classification. These procedures empirically form "clusters" or groups of highly

similar entities." In other words it can be said that cluster analysis defines groups of cases,

through a number of procedures, which are more similar between them than all the others.

Clustering algorithms segment the data into groups of records, or clusters that have similar

characteristics. [Data Mining]

Clustering discovers complex intrusions that occur over an extended periods of

time and at different places, correlating independent network events. The sets of data

belonging to the cluster (attack or normal activity profile) are modeled according to pre-

41

Firoz Allahwali

defined metrics and their common features. Clustering is shown to provide high performance

but is supposedly computationally expensive.

Weka’s SimpleKMeans algorithm was used for clustering the data. The WEKA

SimpleKMeans algorithm uses Euclidean distance measure to compute distances between

instances and clusters. SimpleKMeans clusters data using k-means; the number of clusters is

specified by a parameter. The results of using the algorithm on the experimental data are

shown below:

Normal data as training data and attack data as test data:

Here the normal data (attack-free subset) was used as training data and attack data (3400

records) was used as testing data.

Figure 11. Screen shot of Weka - Clustering using Normal data as trainer

42

Firoz Allahwali

=== Model and evaluation on test set === kMeans ====== Number of iterations: 6 Within cluster sum of squared errors: 4708.820136735875 Cluster centroids: Cluster 0 Mean/Mode: 13.9458 tcp http SF 669.8249 4500.8885 0 0 0 0.0265 0 0.944 0.0015 0.0009 0 0.0027 0.0195 0 0.0015 0 0 0.004 6.968 10.1401 0.002 0.0012 0.0006 0.0006 0.9985 0.0026 0.1776 49.6375 212.0548 0.9271 0.0158 0.1123 0.0329 0.0012 0.0022 0.0033 0.0045 normal. Std Devs: 361.8696 N/A N/A N/A 5292.9004 14974.2892 0 0 0 0.4808 0 0.23 0.0462 0.0302 0 0.1571 0.7411 0 0.039 0 0 0.0628 7.8685 10.7181 0.0379 0.0238 0.0247 0.0247 0.0291 0.0503 0.3045 42.7964 77.5284 0.1933 0.0557 0.2141 0.0379 0.0109 0.0379 0.0307 0.0434 N/A Cluster 1 Mean/Mode: 11.4195 tcp http SF 588.2563 2679.55 0 0 0 0.0061 0 0.7162 0.0053 0 0.0004 0.0046 0.0092 0 0.0072 0 0 0.0004 8.8776 11.106 0.0018 0.0018 0.0004 0.0004 0.9959 0.0072 0.1126 242.0809 234.0984 0.9249 0.0097 0.0101 0.0031 0.0009 0.0007 0.015 0.0014 normal. Std Devs: 365.0786 N/A N/A N/A 5095.1714 9146.5406 0 0 0 0.161 0 0.4509 0.1933 0 0.0195 0.1657 0.3314 0 0.0892 0 0 0.0195 9.6922 11.7241 0.0308 0.0307 0.0195 0.0195 0.0433 0.0752 0.221 29.7548 58.5857 0.219 0.0456 0.0514 0.0129 0.0175 0.0165 0.0958 0.0272 N/A Clustered Instances 0 291 ( 9%) 1 3109 ( 91%)

43

Firoz Allahwali

Figure 12. Clustering Visualization – Normal data as trainer Attack Data as both training and testing data:

In this set up the subset of attack data with 6600 records was used as training data and the

subset with 3400 records was used as testing data.

44

Firoz Allahwali

Figure 13. Screen shot of Weka - Clustering using Attack data as trainer

=== Model and evaluation on test set === kMeans ====== Number of iterations: 6 Within cluster sum of squared errors: 13422.458547689479 Cluster centroids: Cluster 0 Mean/Mode: 17.9759 tcp private REJ 278.2244 16.492 0 0 0 0.0161 0.0015 0.0124 0.0015 0.0007 0 0.0066 0.0007 0 0 0 0 0.0037 177.9196 10.242 0.2879 0.2869 0.6942 0.6945 0.1013 0.1117 0.0122 252.0892 14.0819 0.0552 0.1073 0.0128 0.0004 0.2825 0.283 0.6944 0.6941 neptune. Std Devs: 319.4349 N/A N/A N/A 4068.0853 424.4576 0 0 0 0.3935 0.0382 0.1108 0.0541 0.027 0 0.2433 0.027 0 0 0 0 0.0604 89.6316 10.5083 0.4476 0.4504 0.4544 0.4573 0.1852 0.2014 0.1049 22.1166 28.7856 0.1133 0.1874 0.1077 0.0033 0.4429 0.4469 0.4484 0.4555 N/A Cluster 1 Mean/Mode: 21.6137 icmp ecr_i SF 1706.66 919.7385 0 0.0013 0 0.0159 0.0025 0.2108 0.0048 0 0 0 0.0004 0 0.0011 0 0 0.0027 291.5721 292.0342 0.0004 0.0004 0.0009 0.0009 0.9983 0.0026 0.0295 231.5205 246.8259 0.9836 0.0048 0.6899 0.006 0.0018 0.0012 0.0022 0.0007 smurf.

45

Firoz Allahwali

Std Devs: 382.0624 N/A N/A N/A 14244.6981 13263.7434 0 0.0366 0 0.1924 0.0498 0.4079 0.069 0 0 0 0.0195 0 0.0391 0 0 0.0517 235.6648 235.0086 0.0151 0.0148 0.0253 0.0259 0.034 0.0471 0.1327 66.2879 37.3966 0.096 0.0397 0.455 0.0424 0.0318 0.0232 0.0244 0.0129 N/A Clustered Instances 0 678 ( 20%) 1 2722 ( 80%)

Figure 14. Clustering Visualization using Attack data as trainer

Results Analysis:

Comparing the percentage of the number of clustered instances it is clearly evident that using

attack data (20%) for training is more efficient than the normal data (9%). For detailed

results see Appendix C.

46

Firoz Allahwali

3.3.3 Association-rule learners:

An association rule identifies a combination of attribute values or items that occur

together with greater frequency than might be expected if the values or items were

independent of one-another. Association rules like classification rules attempts to predict the

class given a set of conditions on the LHS. The conditions are typically attribute value pairs,

also referred to as item-sets. However, the main difference is that the prediction associated

with the RHS of the rule is not confined to a single class attribute, instead it can be associated

with one or more attribute combinations.

Weka’s Apriori implements the Apriori algorithm. It starts with a minimum

support of 100% of the data items and decreases this in steps of 5% until there are atleast 10

rules with the required minimum confidence of 0.9 or until the support has reached a lower

bound of 10% which ever occurs first. There are four metrics for ranking rules: Confidence,

Lift, Leverage and Conviction.

The results of running the Apriori algorithm on the dataset containing 6400 records of the

attack data are listed below:

=== Run information === Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 Relation: ids Instances: 6600 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment

47

Firoz Allahwali

urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class

48

Firoz Allahwali

4. TESTING AND VALIDATION

Evaluating detection systems is a difficult undertaking, complicated by several

common practices. For example, most evaluations are done according to a black-box testing

régime. While black-box testing can demonstrate the overall performance capabilities of a

detection system, it reveals almost nothing about the performance of components inside the

black box, such as how phenomena affecting the components (e.g., a feature extractor or an

anomaly detector) or the interactions among them will influence detection performance. If

the performance aspects of components like anomaly detectors are not fully understood, then

the performance aspects of any system composed of such elements cannot be understood

either.

Testing was carried out by using different attacks on the system trained with real time

data using J48, Weka’s version of decision tree algorithm C4.5. The results are listed are

described below.

Using one type of Attack (Mailbomb):

== Evaluation on test set === == Summary === Correctly Classified Instances 41 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances 41

49

Firoz Allahwali

Using two types of attacks (Mailbomb and Neptune):

=== Evaluation on test set === === Summary === Correctly Classified Instances 160 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances 160 Using three types of attacks (Mailbomb, Neptune and Smurf): === Evaluation on test set === === Summary === Correctly Classified Instances 236 99.5781 % Incorrectly Classified Instances 1 0.4219 % Kappa statistic 0.9931 Mean absolute error 0.0003 Root mean squared error 0.0165 Relative absolute error 0.5767 % Root relative squared error 10.179 % Total Number of Instances 237 Summary of the testing:

Test Time taken Accuracy (%) 1. Using one type of attack 1.8 100

2. Using two types of attacks 1.59 100

3. Using three types of attacks

1.61 99.57

For complete results see Appendix E.

50

Firoz Allahwali

4.1 System Performance Metrics

Listed below is a partial set of measurements that can be made on IDSs. These measurements

are quantitative and relate to performance accuracy (Lippman, 1999).

• Coverage. This measurement determines which attacks an IDS can detect under ideal

conditions. For non-signature-based systems like anomaly based IDS, one would need

to determine which attacks out of the set of all known attacks could be detected by a

particular methodology. The number of dimensions that make up each attack makes

this measurement difficult.

• Probability of False Alarms. This measurement determines the rate of false

positives produced by an IDS in a given environment during a particular time frame.

A false positive or false alarm is an alert caused by normal non-malicious background

traffic. Some causes for Network IDS (NIDS) include weak signatures that alert on all

traffic to a high-numbered port used by a backdoor; search for the occurrence of a

common word such as help in the first 100 bytes of SNMP or other TCP connections;

or detection of common violations of the TCP protocol. They can also be caused by

normal network monitoring and maintenance traffic generated by network

management tools. It is difficult to measure false alarms because an IDS may have a

different false positive rate in each network environment, and there is no such thing

as a standard network.

51

Firoz Allahwali

• Probability of Detection. This measurement determines the rate of attacks detected

correctly by an IDS in a given environment during a particular time frame. The

difficulty in measuring the detection rate is that the success of an IDS is largely

dependent upon the set of attacks used during the test. Also, the probability of

detection varies with the false positive rate, and an IDS can be configured or tuned to

favor either the ability to detect attacks or to minimize false positives. One must be

careful to use the same configuration during testing for false positives and hit rates.

• Resistance to Attacks Directed at the IDS. This measurement demonstrates how

resistant an IDS is to an attacker's attempt to disrupt the correct operation of the IDS.

One example is sending a large amount of non-attack traffic with volume exceeding

the processing capability of the IDS. With too much traffic to process, an IDS may

drop packets and be unable to detect attacks. Another example is sending to the IDS

non-attack packets that are specially crafted to trigger many signatures within the

IDS, thereby overwhelming the human operator of the IDS with false positives or

crashing alert processing or display tools.

• Ability to Handle High Bandwidth Traffic. This measurement demonstrates how

well an IDS will function when presented with a large volume of traffic. Most

network-based IDSs will begin to drop packets as the traffic volume increases,

thereby causing the IDS to miss a percentage of the attacks. At a certain threshold,

most IDSs will stop detecting any attacks.

52

Firoz Allahwali

• Ability to Correlate Events. This measurement demonstrates how well an IDS

correlates attack events. These events may be gathered from IDSs, routers, firewalls,

application logs, or a wide variety of other devices. One of the primary goals of this

correlation is to identify staged penetration attacks. Currently, IDSs have only limited

capabilities in this area.

• Ability to Detect Never-Before-Seen Attacks. This measurement demonstrates how

well an IDS can detect attacks that have not occurred before.

• Ability to Identify an Attack. This measurement demonstrates how well an IDS can

identify the attack that it has detected by labeling each attack with a common name or

vulnerability name or by assigning the attack to a category.

• Ability to Determine Attack Success. This measurement demonstrates if the IDS

can determine the success of attacks from remote sites that give the attacker higher-

level privileges on the attacked system. In current network environments, many

remote privilege-gaining attacks (or probes) fail and do not damage the system

attacked. Many IDSs, however, do not distinguish the failed from the successful

attacks.

• Other Measurements. There are other measurements, such as ease of use, ease of

maintenance, deployments issues, resource requirements, availability and quality of

support, etc. These measurements are not directly related to the IDS performance but

may be more significant in many commercial situations.

53

Firoz Allahwali

Once the mining analysis is producing meaningful results, a strategy to automate the

scoring of new log files should be developed. Simple exception reports can then alert system

administrators to new IP addresses and ports involved in potentially intrusive activity.

54

Firoz Allahwali

5. CONCLUSION

The integration of data mining in Intrusion Detection is in an interesting phase of

development where data mining has made it possible to obtain performance improvements

over current commercial products, with special features as:

• Better feature extraction and meaningful information from large amount of raw data,

getting more accurate models from applying DM techniques and correlation

algorithms. These practices reduce the analysis overload to human operators and the

associated issues.

• Potential for predictive analysis of suspicious activity, enabling defensive action

before a severe injury occurs or the system is totally compromised.

• The data mining adaptability for specific environments allows recognizing individual

trends more efficiently making appropriate statistical recognition of new or hidden

malicious activity.

The complexity involved in setting up this kind of system provides new challenges for

implementing scalable and incremental deployments. The goal is to ensure accuracy with

efficient time and resource investments specifically improving data selection, data

preparation, data quality (from all the information sources), resulting in precision and

reduction in the associated computing costs.

55

Firoz Allahwali

Appendix A

Some of the terms that we come across in the results are defined below.

Accuracy

Itgives a measure for the overall accuracy of the classifier:

Number of correctly classified instances Accuracy = ------------------------------------------------- Number of instances Precision Number of correctly classified instances of class X Precision(X) = ------------------------------------------------------------------ Number of instances classified as belonging to class X

Number of correctly classified instances of class X Recall(X) = ------------------------------------------------------------------- Number of instances in class X

Confusion matrix:

Confusion matrices are very useful for evaluating classifiers, as they provide

an efficient snapshot of its performance—displaying the distribution of correct

and incorrect instances. Typical Weka output contains the following:

=== Confusion Matrix ===

a b <-- classified as

7 2 | a = yes

3 2 | b = no

Weka was trying to classify instances into two possible classes: yes or no.

56

Firoz Allahwali

For the sake of simplicity, Weka substitutes ‘yes’ for a, and ‘no’ for b. The

columns represent the instances that were classified as that class. So, the first

column shows that in total 10 instances were classified a by Weka, and 4 were

classified as b. The rows represent the actual instances that belong to that

class.

57

Firoz Allahwali

Appendix B Classification Results: Attack free data as Trainer: === Run information === Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: ids Instances: 5905 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class Test mode: user supplied test set: 3400 instances

58

Firoz Allahwali

=== Classifier model (full training set) === J48 pruned tree ------------------ : normal. (5905.0) Number of Leaves : 1 Size of the tree : 1 Time taken to build model: 0.01 seconds === Evaluation on test set === === Summary === Correctly Classified Instances 622 18.2941 % Incorrectly Classified Instances 2778 81.7059 % Kappa statistic 0 Mean absolute error 0.0527 Root mean squared error 0.2296 Relative absolute error 99.9038 % Root relative squared error 100.2613 % Total Number of Instances 3400 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0 0 0 0 0 back. 0 0 0 0 0 snmpguess. 0 0 0 0 0 apache2. 0 0 0 0 0 httptunnel. 0 0 0 0 0 mscan. 0 0 0 0 0 saint. 0 0 0 0 0 snmpgetattack. 0 0 0 0 0 processtable. 0 0 0 0 0 mailbomb. 0 0 0 0 0 buffer_overflow. 0 0 0 0 0 ftp_write. 0 0 0 0 0 guess_passwd. 0 0 0 0 0 imap. 0 0 0 0 0 ipsweep. 0 0 0 0 0 land. 0 0 0 0 0 loadmodule. 0 0 0 0 0 multihop. 0 0 0 0 0 neptune. 0 0 0 0 0 nmap. 1 1 0.183 1 0.309 normal. 0 0 0 0 0 perl. 0 0 0 0 0 phf. 0 0 0 0 0 pod. 0 0 0 0 0 portsweep. 0 0 0 0 0 rootkit. 0 0 0 0 0 satan.

59

Firoz Allahwali

0 0 0 0 0 smurf. 0 0 0 0 0 spy. 0 0 0 0 0 teardrop. 0 0 0 0 0 warezclient. 0 0 0 0 0 warezmaster. === Confusion Matrix === a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae <-- classified as 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 0 0 0 0 0 0 0 0 0 0 0 | a = back. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 0 0 0 0 0 0 0 0 0 0 0 | b = snmpguess. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 | c = apache2. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | d = httptunnel. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 | e = mscan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 | f = saint. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 79 0 0 0 0 0 0 0 0 0 0 0 | g = snmpgetattack. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 | h = processtable. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 59 0 0 0 0 0 0 0 0 0 0 0 | i = mailbomb. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | j = buffer_overflow. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | k = ftp_write. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 59 0 0 0 0 0 0 0 0 0 0 0 | l = guess_passwd. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | m = imap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 | n = ipsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | o = land. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | p = loadmodule. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | q = multihop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 610 0 0 0 0 0 0 0 0 0 0 0 | r = neptune. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | s = nmap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 622 0 0 0 0 0 0 0 0 0 0 0 | t = normal. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | u = perl. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | v = phf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 | w = pod. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 | x = portsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | y = rootkit. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 27 0 0 0 0 0 0 0 0 0 0 0 | z = satan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1827 0 0 0 0 0 0 0 0 0 0 0 | aa = smurf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ab = spy. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ac = teardrop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ad = warezclient. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 0 0 0 0 0 0 0 0 0 0 0 | ae = warezmaster.

Attack data as Trainer:

=== Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: ids Instances: 6600 Attributes: 42 duration protocol_type service flag src_bytes

60

Firoz Allahwali

dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class Test mode: user supplied test set: 3400 instances === Classifier model (full training set) === J48 pruned tree ------------------ same_srv_rate <= 0.21 | diff_srv_rate <= 0.23: neptune. (1244.0) | diff_srv_rate > 0.23 | | dst_host_same_srv_rate <= 0.01 | | | dst_host_rerror_rate <= 0.99 | | | | dst_host_serror_rate <= 0.11 | | | | | serror_rate <= 0.03: satan. (18.0/2.0) | | | | | serror_rate > 0.03: saint. (14.0) | | | | dst_host_serror_rate > 0.11: satan. (9.0) | | | dst_host_rerror_rate > 0.99: satan. (11.0) | | dst_host_same_srv_rate > 0.01: normal. (4.0/1.0) same_srv_rate > 0.21 | count <= 72 | | src_bytes <= 2564

61

Firoz Allahwali

| | | duration <= 2 | | | | flag = REJ | | | | | srv_diff_host_rate <= 0.4 | | | | | | srv_count <= 5: httptunnel. (2.0/1.0) | | | | | | srv_count > 5: neptune. (2.0/1.0) | | | | | srv_diff_host_rate > 0.4: mscan. (7.0) | | | | flag = SF | | | | | protocol_type = udp | | | | | | src_bytes <= 55 | | | | | | | service = private | | | | | | | | srv_count <= 8: snmpguess. (49.0) | | | | | | | | srv_count > 8: normal. (3.0) | | | | | | | service = ecr_i: normal. (0.0) | | | | | | | service = ntp_u: normal. (1.0) | | | | | | | service = X11: normal. (0.0) | | | | | | | service = IRC: normal. (0.0) | | | | | | | service = courier: normal. (0.0) | | | | | | | service = ctf: normal. (0.0) | | | | | | | service = ssh: normal. (0.0) | | | | | | | service = time: normal. (0.0) | | | | | | | service = ldap: normal. (0.0) | | | | | | | service = supdup: normal. (0.0) | | | | | | | service = Z39_50: normal. (0.0) | | | | | | | service = discard: normal. (0.0) | | | | | | | service = bgp: normal. (0.0) | | | | | | | service = kshell: normal. (0.0) | | | | | | | service = urp_i: normal. (0.0) | | | | | | | service = uucp: normal. (0.0) | | | | | | | service = uucp_path: normal. (0.0) | | | | | | | service = netstat: normal. (0.0) | | | | | | | service = nnsp: normal. (0.0) | | | | | | | service = http_443: normal. (0.0) | | | | | | | service = iso_tsap: normal. (0.0) | | | | | | | service = http: normal. (0.0) | | | | | | | service = pop_3: normal. (0.0) | | | | | | | service = other: normal. (0.0) | | | | | | | service = domain_u: normal. (54.0) | | | | | | | service = exec: normal. (0.0) | | | | | | | service = telnet: normal. (0.0) | | | | | | | service = ftp: normal. (0.0) | | | | | | | service = sql_net: normal. (0.0) | | | | | | | service = ftp_data: normal. (0.0) | | | | | | | service = smtp: normal. (0.0) | | | | | | | service = sunrpc: normal. (0.0) | | | | | | | service = shell: normal. (0.0) | | | | | | | service = vmnet: normal. (0.0) | | | | | | | service = netbios_dgm: normal. (0.0) | | | | | | | service = efs: normal. (0.0) | | | | | | | service = finger: normal. (0.0) | | | | | | | service = rje: normal. (0.0) | | | | | | | service = mtp: normal. (0.0) | | | | | | | service = imap4: normal. (0.0) | | | | | | | service = link: normal. (0.0) | | | | | | | service = auth: normal. (0.0) | | | | | | | service = domain: normal. (0.0) | | | | | | | service = nntp: normal. (0.0) | | | | | | | service = daytime: normal. (0.0) | | | | | | | service = gopher: normal. (0.0) | | | | | | | service = echo: normal. (0.0) | | | | | | | service = printer: normal. (0.0) | | | | | | | service = whois: normal. (0.0) | | | | | | | service = klogin: normal. (0.0)

62

Firoz Allahwali

| | | | | | | service = ecr: normal. (0.0) | | | | | | | service = netbios_ssn: normal. (0.0) | | | | | | | service = remote_job: normal. (0.0) | | | | | | | service = hostnames: normal. (0.0) | | | | | | | service = eco_i: normal. (0.0) | | | | | | | service = login: normal. (0.0) | | | | | | | service = pop_2: normal. (0.0) | | | | | | | service = netbios_ns: normal. (0.0) | | | | | | src_bytes > 55 | | | | | | | count <= 1 | | | | | | | | dst_host_srv_count <= 253 | | | | | | | | | dst_bytes <= 124: normal. (79.0/3.0) | | | | | | | | | dst_bytes > 124: snmpgetattack. (51.0/15.0) | | | | | | | | dst_host_srv_count > 253: normal. (106.0/48.0) | | | | | | | count > 1: normal. (175.0/57.0) | | | | | protocol_type = tcp | | | | | | num_failed_logins <= 0 | | | | | | | dst_host_srv_diff_host_rate <= 0.11 | | | | | | | | dst_host_same_srv_rate <= 0.04 | | | | | | | | | src_bytes <= 20: normal. (2.0) | | | | | | | | | src_bytes > 20: guess_passwd. (3.0) | | | | | | | | dst_host_same_srv_rate > 0.04 | | | | | | | | | dst_host_srv_count <= 5: warezmaster. (3.0/1.0) | | | | | | | | | dst_host_srv_count > 5: normal. (882.0) | | | | | | | dst_host_srv_diff_host_rate > 0.11 | | | | | | | | dst_bytes <= 15: warezmaster. (7.0) | | | | | | | | dst_bytes > 15: normal. (5.0) | | | | | | num_failed_logins > 0: guess_passwd. (13.0) | | | | | protocol_type = icmp | | | | | | srv_diff_host_rate <= 0.57 | | | | | | | dst_host_srv_diff_host_rate <= 0.67 | | | | | | | | count <= 13: normal. (5.0) | | | | | | | | count > 13: smurf. (2.0) | | | | | | | dst_host_srv_diff_host_rate > 0.67: ipsweep. (5.0) | | | | | | srv_diff_host_rate > 0.57 | | | | | | | service = private: pod. (0.0) | | | | | | | service = ecr_i: pod. (5.0) | | | | | | | service = ntp_u: pod. (0.0) | | | | | | | service = X11: pod. (0.0) | | | | | | | service = IRC: pod. (0.0) | | | | | | | service = courier: pod. (0.0) | | | | | | | service = ctf: pod. (0.0) | | | | | | | service = ssh: pod. (0.0) | | | | | | | service = time: pod. (0.0) | | | | | | | service = ldap: pod. (0.0) | | | | | | | service = supdup: pod. (0.0) | | | | | | | service = Z39_50: pod. (0.0) | | | | | | | service = discard: pod. (0.0) | | | | | | | service = bgp: pod. (0.0) | | | | | | | service = kshell: pod. (0.0) | | | | | | | service = urp_i: pod. (0.0) | | | | | | | service = uucp: pod. (0.0) | | | | | | | service = uucp_path: pod. (0.0) | | | | | | | service = netstat: pod. (0.0) | | | | | | | service = nnsp: pod. (0.0) | | | | | | | service = http_443: pod. (0.0) | | | | | | | service = iso_tsap: pod. (0.0) | | | | | | | service = http: pod. (0.0) | | | | | | | service = pop_3: pod. (0.0) | | | | | | | service = other: pod. (0.0) | | | | | | | service = domain_u: pod. (0.0) | | | | | | | service = exec: pod. (0.0)

63

Firoz Allahwali

| | | | | | | service = telnet: pod. (0.0) | | | | | | | service = ftp: pod. (0.0) | | | | | | | service = sql_net: pod. (0.0) | | | | | | | service = ftp_data: pod. (0.0) | | | | | | | service = smtp: pod. (0.0) | | | | | | | service = sunrpc: pod. (0.0) | | | | | | | service = shell: pod. (0.0) | | | | | | | service = vmnet: pod. (0.0) | | | | | | | service = netbios_dgm: pod. (0.0) | | | | | | | service = efs: pod. (0.0) | | | | | | | service = finger: pod. (0.0) | | | | | | | service = rje: pod. (0.0) | | | | | | | service = mtp: pod. (0.0) | | | | | | | service = imap4: pod. (0.0) | | | | | | | service = link: pod. (0.0) | | | | | | | service = auth: pod. (0.0) | | | | | | | service = domain: pod. (0.0) | | | | | | | service = nntp: pod. (0.0) | | | | | | | service = daytime: pod. (0.0) | | | | | | | service = gopher: pod. (0.0) | | | | | | | service = echo: pod. (0.0) | | | | | | | service = printer: pod. (0.0) | | | | | | | service = whois: pod. (0.0) | | | | | | | service = klogin: pod. (0.0) | | | | | | | service = ecr: pod. (0.0) | | | | | | | service = netbios_ssn: pod. (0.0) | | | | | | | service = remote_job: pod. (0.0) | | | | | | | service = hostnames: pod. (0.0) | | | | | | | service = eco_i: saint. (2.0) | | | | | | | service = login: pod. (0.0) | | | | | | | service = pop_2: pod. (0.0) | | | | | | | service = netbios_ns: pod. (0.0) | | | | flag = S1: normal. (0.0) | | | | flag = S0 | | | | | dst_host_rerror_rate <= 0.28: mscan. (6.0/1.0) | | | | | dst_host_rerror_rate > 0.28: apache2. (3.0) | | | | flag = RSTO | | | | | src_bytes <= 55: mscan. (6.0) | | | | | src_bytes > 55: guess_passwd. (2.0) | | | | flag = S3: processtable. (6.0) | | | | flag = S2: normal. (0.0) | | | | flag = RSTR: portsweep. (7.0/2.0) | | | | flag = SH: nmap. (1.0) | | | duration > 2 | | | | src_bytes <= 55 | | | | | src_bytes <= 24 | | | | | | srv_rerror_rate <= 0.17: processtable. (16.0/1.0) | | | | | | srv_rerror_rate > 0.17: mscan. (5.0) | | | | | src_bytes > 24: guess_passwd. (73.0) | | | | src_bytes > 55 | | | | | duration <= 261: normal. (14.0) | | | | | duration > 261: warezmaster. (15.0/1.0) | | src_bytes > 2564 | | | num_compromised <= 0 | | | | src_bytes <= 2599: mailbomb. (99.0) | | | | src_bytes > 2599 | | | | | flag = REJ: normal. (0.0) | | | | | flag = SF | | | | | | duration <= 38: normal. (17.0/1.0) | | | | | | duration > 38: warezmaster. (12.0) | | | | | flag = S1: normal. (0.0) | | | | | flag = S0: normal. (0.0)

64

Firoz Allahwali

| | | | | flag = RSTO: normal. (0.0) | | | | | flag = S3: normal. (0.0) | | | | | flag = S2: normal. (0.0) | | | | | flag = RSTR: apache2. (10.0) | | | | | flag = SH: normal. (0.0) | | | num_compromised > 0: back. (25.0) | count > 72 | | protocol_type = udp: normal. (28.0) | | protocol_type = tcp | | | service = private: neptune. (0.0) | | | service = ecr_i: neptune. (0.0) | | | service = ntp_u: neptune. (0.0) | | | service = X11: neptune. (0.0) | | | service = IRC: neptune. (0.0) | | | service = courier: neptune. (0.0) | | | service = ctf: neptune. (0.0) | | | service = ssh: neptune. (0.0) | | | service = time: neptune. (0.0) | | | service = ldap: neptune. (0.0) | | | service = supdup: neptune. (0.0) | | | service = Z39_50: neptune. (0.0) | | | service = discard: neptune. (0.0) | | | service = bgp: neptune. (0.0) | | | service = kshell: neptune. (0.0) | | | service = urp_i: neptune. (0.0) | | | service = uucp: neptune. (0.0) | | | service = uucp_path: neptune. (0.0) | | | service = netstat: neptune. (0.0) | | | service = nnsp: neptune. (0.0) | | | service = http_443: neptune. (0.0) | | | service = iso_tsap: neptune. (0.0) | | | service = http: apache2. (2.0) | | | service = pop_3: neptune. (0.0) | | | service = other: neptune. (0.0) | | | service = domain_u: neptune. (0.0) | | | service = exec: neptune. (0.0) | | | service = telnet: neptune. (5.0) | | | service = ftp: neptune. (0.0) | | | service = sql_net: neptune. (0.0) | | | service = ftp_data: neptune. (0.0) | | | service = smtp: neptune. (0.0) | | | service = sunrpc: neptune. (0.0) | | | service = shell: neptune. (0.0) | | | service = vmnet: neptune. (0.0) | | | service = netbios_dgm: neptune. (0.0) | | | service = efs: neptune. (0.0) | | | service = finger: neptune. (0.0) | | | service = rje: neptune. (0.0) | | | service = mtp: neptune. (0.0) | | | service = imap4: neptune. (0.0) | | | service = link: neptune. (0.0) | | | service = auth: neptune. (0.0) | | | service = domain: neptune. (0.0) | | | service = nntp: neptune. (0.0) | | | service = daytime: neptune. (0.0) | | | service = gopher: neptune. (0.0) | | | service = echo: neptune. (0.0) | | | service = printer: neptune. (0.0) | | | service = whois: neptune. (0.0) | | | service = klogin: neptune. (0.0) | | | service = ecr: neptune. (0.0) | | | service = netbios_ssn: neptune. (0.0)

65

Firoz Allahwali

| | | service = remote_job: neptune. (0.0) | | | service = hostnames: neptune. (0.0) | | | service = eco_i: neptune. (0.0) | | | service = login: neptune. (0.0) | | | service = pop_2: neptune. (0.0) | | | service = netbios_ns: neptune. (0.0) | | protocol_type = icmp: smurf. (3485.0) Number of Leaves : 229 Size of the tree : 270 Time taken to build model: 4.24 seconds === Evaluation on test set === === Summary === Correctly Classified Instances 3298 97 % Incorrectly Classified Instances 102 3 % Kappa statistic 0.9533 Mean absolute error 0.0022 Root mean squared error 0.0364 Relative absolute error 5.1806 % Root relative squared error 25.2454 % Total Number of Instances 3400 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 1 0 1 1 1 back. 1 0 1 1 1 snmpguess. 1 0 1 1 1 apache2. 0 0.001 0 0 0 httptunnel. 0.909 0.001 0.833 0.909 0.87 mscan. 0.857 0.001 0.545 0.857 0.667 saint. 0.19 0.003 0.6 0.19 0.288 snmpgetattack. 0.917 0 1 0.917 0.957 processtable. 1 0 1 1 1 mailbomb. 0 0 0 0 0 buffer_overflow. 0 0 0 0 0 ftp_write. 0.966 0.001 0.966 0.966 0.966 guess_passwd. 0 0 0 0 0 imap. 1 0 1 1 1 ipsweep. 0 0 0 0 0 land. 0 0 0 0 0 loadmodule. 0 0 0 0 0 multihop. 0.998 0 0.998 0.998 0.998 neptune. 0 0 0 0 0 nmap. 0.971 0.027 0.891 0.971 0.929 normal. 0 0 0 0 0 perl. 0 0 0 0 0 phf. 0.5 0 1 0.5 0.667 pod. 0.833 0.001 0.625 0.833 0.714 portsweep. 0 0 0 0 0 rootkit. 0.778 0 1 0.778 0.875 satan. 0.999 0 1 0.999 1 smurf. 0 0 0 0 0 spy. 0 0 0 0 0 teardrop. 0 0 0 0 0 warezclient. 0.905 0.001 0.864 0.905 0.884 warezmaster.

66

Firoz Allahwali

=== Confusion Matrix === a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae <-- classified as 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = back. 0 26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = snmpguess. 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = apache2. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 | d = httptunnel. 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | e = mscan. 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | f = saint. 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 64 0 0 0 0 0 0 0 0 0 0 0 | g = snmpgetattack. 0 0 0 0 1 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | h = processtable. 0 0 0 0 0 0 0 0 59 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | i = mailbomb. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | j = buffer_overflow. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | k = ftp_write. 0 0 0 0 0 0 0 0 0 0 0 57 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 | l = guess_passwd. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | m = imap. 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | n = ipsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | o = land. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | p = loadmodule. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | q = multihop. 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 609 0 0 0 0 0 0 0 0 0 0 0 0 0 | r = neptune. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | s = nmap. 0 0 0 1 0 0 10 0 0 0 0 2 0 0 0 0 0 0 0 604 0 0 0 2 0 0 0 0 0 0 3 | t = normal. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | u = perl. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | v = phf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 | w = pod. 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 | x = portsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | y = rootkit. 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 21 0 0 0 0 0 | z = satan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1826 0 0 0 0 | aa = smurf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ab = spy. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ac = teardrop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ad = warezclient. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 19 | ae = warezmaster.

67

Firoz Allahwali

Appendix C

Clustering Results:

Normal Data as Trainer

=== Run information === Scheme: �eak.clusterers.SimpleKMeans –N 2 –S 10 Relation: ids Instances: 5905 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class

68

Firoz Allahwali

Test mode: user supplied test set: 3400 instances === Model and evaluation on test set === kMeans Number of iterations: 6 Within cluster sum of squared errors: 4708.820136735875 Cluster centroids: Cluster 0 Mean/Mode: 13.9458 tcp http SF 669.8249 4500.8885 0 0 0 0.0265 0 0.944 0.0015 0.0009 0 0.0027 0.0195 0 0.0015 0 0 0.004 6.968 10.1401 0.002 0.0012 0.0006 0.0006 0.9985 0.0026 0.1776 49.6375 212.0548 0.9271 0.0158 0.1123 0.0329 0.0012 0.0022 0.0033 0.0045 normal. Std Devs: 361.8696 N/A N/A N/A 5292.9004 14974.2892 0 0 0 0.4808 0 0.23 0.0462 0.0302 0 0.1571 0.7411 0 0.039 0 0 0.0628 7.8685 10.7181 0.0379 0.0238 0.0247 0.0247 0.0291 0.0503 0.3045 42.7964 77.5284 0.1933 0.0557 0.2141 0.0379 0.0109 0.0379 0.0307 0.0434 N/A Cluster 1 Mean/Mode: 11.4195 tcp http SF 588.2563 2679.55 0 0 0 0.0061 0 0.7162 0.0053 0 0.0004 0.0046 0.0092 0 0.0072 0 0 0.0004 8.8776 11.106 0.0018 0.0018 0.0004 0.0004 0.9959 0.0072 0.1126 242.0809 234.0984 0.9249 0.0097 0.0101 0.0031 0.0009 0.0007 0.015 0.0014 normal. Std Devs: 365.0786 N/A N/A N/A 5095.1714 9146.5406 0 0 0 0.161 0 0.4509 0.1933 0 0.0195 0.1657 0.3314 0 0.0892 0 0 0.0195 9.6922 11.7241 0.0308 0.0307 0.0195 0.0195 0.0433 0.0752 0.221 29.7548 58.5857 0.219 0.0456 0.0514 0.0129 0.0175 0.0165 0.0958 0.0272 N/A Clustered Instances 0 291 ( 9%) 1 3109 ( 91%)

Attack Data as Trainer:


Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10 Relation: ids Instances: 6600 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent

69

Firoz Allahwali

hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class Test mode: user supplied test set: 3400 instances === Model and evaluation on test set === kMeans ====== Number of iterations: 6 Within cluster sum of squared errors: 13422.458547689479 Cluster centroids: Cluster 0 Mean/Mode: 17.9759 tcp private REJ 278.2244 16.492 0 0 0 0.0161 0.0015 0.0124 0.0015 0.0007 0 0.0066 0.0007 0 0 0 0 0.0037 177.9196 10.242 0.2879 0.2869 0.6942 0.6945 0.1013 0.1117 0.0122 252.0892 14.0819 0.0552 0.1073 0.0128 0.0004 0.2825 0.283 0.6944 0.6941 neptune. Std Devs: 319.4349 N/A N/A N/A 4068.0853 424.4576 0 0 0 0.3935 0.0382 0.1108 0.0541 0.027 0 0.2433 0.027 0 0 0 0 0.0604 89.6316 10.5083 0.4476 0.4504 0.4544 0.4573 0.1852 0.2014 0.1049 22.1166 28.7856 0.1133 0.1874 0.1077 0.0033 0.4429 0.4469 0.4484 0.4555 N/A Cluster 1 Mean/Mode: 21.6137 icmp ecr_i SF 1706.66 919.7385 0 0.0013 0 0.0159 0.0025 0.2108 0.0048 0 0 0 0.0004 0 0.0011 0 0 0.0027 291.5721 292.0342 0.0004 0.0004

70

Firoz Allahwali

0.0009 0.0009 0.9983 0.0026 0.0295 231.5205 246.8259 0.9836 0.0048 0.6899 0.006 0.0018 0.0012 0.0022 0.0007 smurf. Std Devs: 382.0624 N/A N/A N/A 14244.6981 13263.7434 0 0.0366 0 0.1924 0.0498 0.4079 0.069 0 0 0 0.0195 0 0.0391 0 0 0.0517 235.6648 235.0086 0.0151 0.0148 0.0253 0.0259 0.034 0.0471 0.1327 66.2879 37.3966 0.096 0.0397 0.455 0.0424 0.0318 0.0232 0.0244 0.0129 N/A Clustered Instances 0 678 ( 20%) 1 2722 ( 80%)

71

Firoz Allahwali

Appendix D Sample data collected on the main router on the network tap: 11/13-20:08:02.807867 0:E0:81:2F:FE:2C -> 0:0:C:7:AC:2 type:0x800 len:0x5EA 66.179.164.20:22 -> 24.136.161.188:62456 TCP TTL:64 TOS:0x10 ID:27401 IpLen:20 DgmLen:1500 DF ***A**** Seq: 0x50692152 Ack: 0xDD1E2B42 Win: 0x2180 TcpLen: 20 AA A8 5A 92 A7 BF DF 32 7D BF F7 7B 1B 5C 35 47 ..Z....2}..{.\5G D6 52 B3 E2 97 6D 68 41 A6 53 2B 89 92 8E 10 8D .R...mhA.S+..... 1B E0 C9 87 A8 71 91 EA D0 F4 1C 6C B4 DC D7 C4 .....q.....l.... B5 22 84 40 E5 09 0F B5 E9 1F 4E AC 96 6C A5 9D ."[email protected].. D9 AF 38 88 5F 2B 4B 8D 32 FC 4C 37 AB DA E1 EA ..8._+K.2.L7.... 61 D9 23 22 FD DA 20 32 E7 6C 48 60 2C 55 CB 99 a.#".. 2.lH`,U.. 74 3A F5 7D 30 77 75 58 6B AE 80 2F 48 A5 FD F4 t:.}0wuXk../H... B5 C7 CA 9D 42 EA 9B BF B6 74 E5 19 8F EF F1 8A ....B....t...... 2D 7E D0 55 0A 92 3E 72 CF 5F 89 53 FC 85 2F 25 -~.U..>r._.S../% 72 8F B4 DD 40 BF 33 46 82 6D 21 98 8E A7 A0 A5 [email protected]!..... 2E 32 54 ED 41 D4 2F C4 B6 6E BE 55 C0 95 78 7C .2T.A./..n.U..x| 35 CA 06 4D 59 32 68 C8 3D 77 7A 73 FA 6A 78 1C 5..MY2h.=wzs.jx. 90 C7 CD 48 6D AE 0E 74 39 A0 4C F4 4E AC 49 06 ...Hm..t9.L.N.I. A2 3F F3 BB 24 B7 05 7C B3 00 70 2E 65 E1 ED 1A .?..$..|..p.e... 96 4C 93 CB A6 F5 68 B5 83 F8 08 F1 5C F2 9F 32 .L....h.....\..2 E1 F7 47 CF 2D 0B 35 DA 6A B5 D0 6D 49 9D 61 63 ..G.-.5.j..mI.ac 75 F2 4B 18 1F 02 C6 E4 9A 23 95 FE 21 6B A4 3E u.K......#..!k.> 06 40 CB 23 34 68 8F A1 C7 3C 98 20 14 8F 20 63 .@.#4h...<. .. c F7 FB 37 2B CC B9 2F 97 ED 5B 92 8D 96 84 0C 08 ..7+../..[...... E5 D4 29 A1 DF 4D 5B 33 EE 68 D3 F1 29 54 DF 0C ..)..M[3.h..)T.. F0 37 44 4A DF 2F 07 68 49 9B 09 0A C1 C7 EC 89 .7DJ./.hI....... 50 CA 40 D3 5B A5 27 69 12 7E 49 34 1A F8 26 9C P.@.[.'i.~I4..&. 44 A0 87 C7 BC CB 46 8A 33 25 94 F6 89 72 64 E0 D.....F.3%...rd. F0 AB 16 DB 52 A1 BE AC 3C 8B D6 CC 22 C7 0F B4 ....R...<..."... 86 6B BF EE A8 7E 1F 74 C7 34 14 AF 7C 50 BC 7F .k...~.t.4..|P.. 42 0C B8 98 8C C3 EC D6 FC 51 CE 1F B3 7D A1 48 B........Q...}.H 1D 89 96 AB 79 AA E0 A5 B8 F5 39 7C 27 4C 25 D0 ....y.....9|'L%. 5A 0C 81 13 07 19 6E 81 1C 3C 9F E5 1A 6D BA 18 Z.....n..<...m.. DC 35 51 90 A1 1D 8E 57 7A 0A 56 BB 09 CB 3D 81 .5Q....Wz.V...=. 8F C5 84 83 88 ED CD 89 DB 81 4D F6 C7 04 A9 71 ..........M....q 43 65 FB 05 A4 56 E4 91 21 B1 AB 44 85 D8 12 BA Ce...V..!..D.... CD 65 AA BA 32 D1 B7 FA 84 0E 18 56 BF 2E A5 10 .e..2......V.... 72 C8 89 B8 6A 3B 75 33 3F 5F E4 77 24 EF 0C 13 r...j;u3?_.w$... A8 56 BB 68 E3 88 D8 AF 18 83 02 B9 B1 2A E8 83 .V.h.........*.. 33 2C 72 B4 49 9C F8 F3 92 03 2A 34 FB 4B 88 D6 3,r.I.....*4.K.. A3 FC C2 3D 14 2D 40 4C 4F A6 26 9F 17 22 F9 F3 ...=.-@LO.&..".. EE 7E 3F 5D 5E DE B5 D3 55 D7 CE 9B A5 68 DB 81 .~?]^...U....h.. C9 B1 16 96 11 59 6C D7 19 22 F1 62 D3 24 EB E1 .....Yl..".b.$.. D1 51 9F 4E 6C B9 0F 7A 61 FE 4F 00 7E 88 9B EE .Q.Nl..za.O.~... 3E 27 7E 18 07 D9 27 F2 90 17 AA 11 7A 48 C5 57 >'~...'.....zH.W 81 62 77 B6 A1 DF 72 AF E0 43 46 12 91 F1 5C FA .bw...r..CF...\. 86 DF 7D 45 CF FC 45 63 21 A0 F7 6D 16 79 9F 14 ..}E..Ec!..m.y.. 91 92 09 FB 33 E0 89 93 EF 95 F4 35 F3 B4 32 30 ....3......5..20 9A 0C 97 EE CF 9B 5D 73 07 E9 DC 74 B8 ED 48 00 ......]s...t..H. DF 00 0A 69 6B F3 88 30 73 ED 98 8E 7C C8 FC 2C ...ik..0s...|.., 0E 0C 84 74 3F 7A B2 CA 93 2F 21 AF 4F 62 D7 61 ...t?z.../!.Ob.a

72

Firoz Allahwali

04 56 28 30 61 91 C2 78 2D 04 63 2A E0 86 9C 84 .V(0a..x-.c*.... 72 36 49 6E B7 91 F4 43 C2 A2 4C 03 6C F4 5B 14 r6In...C..L.l.[. 99 A2 12 3C A0 E3 18 CD BA 11 DF 0F 03 E0 A7 34 ...<...........4 F9 7A 22 EE 09 62 1C 7B 24 DA 73 A8 5D 41 92 77 .z"..b.{$.s.]A.w E0 7D 44 E1 C0 27 A0 14 48 BE 5C 7B 89 39 25 34 .}D..'..H.\{.9%4 08 6E D6 0C 47 72 1B 96 DF 06 7E 9D 39 FE 3D 5E .n..Gr....~.9.=^ 04 D9 4F 96 4A E1 C8 B9 D5 33 26 AC E7 13 A2 F6 ..O.J....3&..... F2 4C 0F 22 E5 89 45 32 7E 03 CF 3A 53 F0 0E A6 .L."..E2~..:S... 8C 01 D3 FB 5B 0A 44 BF 7A 81 78 81 D7 63 AA 5F ....[.D.z.x..c._ 23 B0 23 7A B0 5C 12 75 E5 80 CD 47 AE FF 83 AE #.#z.\.u...G.... 46 B0 E9 3B 76 44 09 43 31 22 94 FE 1E 36 F7 40 F..;vD.C1"...6.@ A7 20 A4 80 04 E1 23 25 B9 1E 63 A2 11 4C 12 57 . ....#%..c..L.W 16 AC E2 00 A1 4B C9 24 C1 60 7C 4C 5C 7A 7E F7 .....K.$.`|L\z~. 6D 99 03 26 58 B4 DB EF A7 CE BE 68 EA 5A 4C F2 m..&X......h.ZL. 0F 07 7B 2E A2 7C A3 DD 71 0A AF 96 2A 47 9D D3 ..{..|..q...*G.. 54 42 5B 38 03 4A 4C CB 65 BE A2 C3 6B ED DD EB TB[8.JL.e...k... F6 D0 37 9D 00 66 E1 CA 8A 89 A5 03 5E A2 62 66 ..7..f......^.bf 07 EB F4 21 88 19 8C 06 44 E5 34 9D 9B 3D 6B 6E ...!....D.4..=kn CA 84 97 98 79 C1 EF 6A E9 7B 26 5B 03 73 61 6F ....y..j.{&[.sao 68 D1 03 E3 D6 D9 71 4E 08 BE 16 CE 6A 27 6E BE h.....qN....j'n. 4F 5E E4 28 61 D9 55 FA 67 26 90 C5 52 76 D6 2D O^.(a.U.g&..Rv.- 9E 6E F5 C7 0C 87 A2 7B BA 4A 26 0C FB 4F 65 1A .n.....{.J&..Oe. 70 2F 44 98 8C 24 B6 91 60 91 39 FB D0 B7 7A E9 p/D..$..`.9...z. 24 0D D5 51 14 49 7D 0F 11 39 94 87 5D C8 7F 63 $..Q.I}..9..]..c 7C 8D C0 C8 6E C1 C5 D5 CD 39 9F 61 4A 76 9A 07 |...n....9.aJv.. 9D 7B 03 2B 80 4F 30 48 F1 F1 AF 2F AB 9B CC 88 .{.+.O0H.../.... 8D 51 3B A6 A0 C3 99 77 BF 56 86 36 3F 9E D9 94 .Q;....w.V.6?... 67 17 9C B7 3E C0 B0 16 85 21 61 78 BE 2B 4C DC g...>....!ax.+L. 71 A2 9A C9 8D 2F 60 D5 EA CD E1 D8 05 8D FA 4F q..../`........O D1 33 54 88 D1 73 47 AA 65 F2 30 DD 61 01 82 DC .3T..sG.e.0.a... 2E 17 62 5D 87 F2 D7 88 4D E8 CD 50 BB 67 67 E3 ..b]....M..P.gg. D7 D0 96 89 A2 9C 7F AB 56 F6 BF FD 88 CA 0B 95 ........V....... 3C B9 85 65 7C 0F D9 89 76 8F 74 F6 DE 1A 7B 99 <..e|...v.t...{. 06 4F 18 AF DC DE 18 D0 75 FD 80 AD 0E 8B 9A D0 .O......u....... DD F6 A7 E3 55 95 E8 FB 5A A9 AE 17 D7 0D DA B2 ....U...Z....... FF 1D B0 0A AD 38 6C C0 1B BB 50 2E 85 49 F3 20 .....8l...P..I. 21 C2 A8 17 EF 70 1D EA EC E4 99 C0 DC 6F A5 96 !....p.......o.. DC D9 FD 90 73 FF 22 03 F0 C1 7E 2F 75 5F 6F 36 ....s."...~/u_o6 A5 8E 1C FE C1 CB B1 CC D4 C6 2C 0E FA 51 15 43 ..........,..Q.C B0 70 2F E9 E5 A2 23 75 63 D8 2C D5 2B AD 36 EB .p/...#uc.,.+.6. 8A 52 7D EE FA C0 15 F5 1B 21 9C 18 D0 76 06 52 .R}......!...v.R FC 48 E2 D2 4F FD 0E 7C 85 C8 A4 C2 8E 7A 5A 27 .H..O..|.....zZ' 37 D8 4C E5 1A E6 94 9B A6 30 A3 BB 9C EC 59 ED 7.L......0....Y. F6 94 49 51 46 1B D8 CE 98 F2 D1 0A 2F C2 07 3C ..IQF......./..< 87 58 FC EB .X.. 4A D5 ED AE 36 5C DA 65 2D BF 11 5B 5D B3 B6 08 J...6\.e-..[]...

=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

73

Firoz Allahwali

Appendix E

Testing using one type of Attack(mailbomb):


Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: ids Instances: 6600 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class Test mode: user supplied test set: 41 instances

74

Firoz Allahwali

=== Classifier model (full training set) === J48 pruned tree ------------------ same_srv_rate <= 0.21 | diff_srv_rate <= 0.23: neptune. (1244.0) | diff_srv_rate > 0.23 | | dst_host_same_srv_rate <= 0.01 | | | dst_host_rerror_rate <= 0.99 | | | | dst_host_serror_rate <= 0.11 | | | | | serror_rate <= 0.03: satan. (18.0/2.0) | | | | | serror_rate > 0.03: saint. (14.0) | | | | dst_host_serror_rate > 0.11: satan. (9.0) | | | dst_host_rerror_rate > 0.99: satan. (11.0) | | dst_host_same_srv_rate > 0.01: normal. (4.0/1.0) same_srv_rate > 0.21 | count <= 72 | | src_bytes <= 2564 | | | duration <= 2 | | | | flag = REJ | | | | | srv_diff_host_rate <= 0.4 | | | | | | srv_count <= 5: httptunnel. (2.0/1.0) | | | | | | srv_count > 5: neptune. (2.0/1.0) | | | | | srv_diff_host_rate > 0.4: mscan. (7.0) | | | | flag = SF | | | | | protocol_type = udp | | | | | | src_bytes <= 55 | | | | | | | service = private | | | | | | | | srv_count <= 8: snmpguess. (49.0) | | | | | | | | srv_count > 8: normal. (3.0) | | | | | | | service = ecr_i: normal. (0.0) | | | | | | | service = ntp_u: normal. (1.0) | | | | | | | service = X11: normal. (0.0) | | | | | | | service = IRC: normal. (0.0) | | | | | | | service = courier: normal. (0.0) | | | | | | | service = ctf: normal. (0.0) | | | | | | | service = ssh: normal. (0.0) | | | | | | | service = time: normal. (0.0) | | | | | | | service = ldap: normal. (0.0) | | | | | | | service = supdup: normal. (0.0) | | | | | | | service = Z39_50: normal. (0.0) | | | | | | | service = discard: normal. (0.0) | | | | | | | service = bgp: normal. (0.0) | | | | | | | service = kshell: normal. (0.0) | | | | | | | service = urp_i: normal. (0.0) | | | | | | | service = uucp: normal. (0.0) | | | | | | | service = uucp_path: normal. (0.0) | | | | | | | service = netstat: normal. (0.0) | | | | | | | service = nnsp: normal. (0.0) | | | | | | | service = http_443: normal. (0.0) | | | | | | | service = iso_tsap: normal. (0.0) | | | | | | | service = http: normal. (0.0) | | | | | | | service = pop_3: normal. (0.0) | | | | | | | service = other: normal. (0.0)

75

Firoz Allahwali

| | | | | | | service = domain_u: normal. (54.0) | | | | | | | service = exec: normal. (0.0) | | | | | | | service = telnet: normal. (0.0) | | | | | | | service = ftp: normal. (0.0) | | | | | | | service = sql_net: normal. (0.0) | | | | | | | service = ftp_data: normal. (0.0) | | | | | | | service = smtp: normal. (0.0) | | | | | | | service = sunrpc: normal. (0.0) | | | | | | | service = shell: normal. (0.0) | | | | | | | service = vmnet: normal. (0.0) | | | | | | | service = netbios_dgm: normal. (0.0) | | | | | | | service = efs: normal. (0.0) | | | | | | | service = finger: normal. (0.0) | | | | | | | service = rje: normal. (0.0) | | | | | | | service = mtp: normal. (0.0) | | | | | | | service = imap4: normal. (0.0) | | | | | | | service = link: normal. (0.0) | | | | | | | service = auth: normal. (0.0) | | | | | | | service = domain: normal. (0.0) | | | | | | | service = nntp: normal. (0.0) | | | | | | | service = daytime: normal. (0.0) | | | | | | | service = gopher: normal. (0.0) | | | | | | | service = echo: normal. (0.0) | | | | | | | service = printer: normal. (0.0) | | | | | | | service = whois: normal. (0.0) | | | | | | | service = klogin: normal. (0.0) | | | | | | | service = ecr: normal. (0.0) | | | | | | | service = netbios_ssn: normal. (0.0) | | | | | | | service = remote_job: normal. (0.0) | | | | | | | service = hostnames: normal. (0.0) | | | | | | | service = eco_i: normal. (0.0) | | | | | | | service = login: normal. (0.0) | | | | | | | service = pop_2: normal. (0.0) | | | | | | | service = netbios_ns: normal. (0.0) | | | | | | src_bytes > 55 | | | | | | | count <= 1 | | | | | | | | dst_host_srv_count <= 253 | | | | | | | | | dst_bytes <= 124: normal. (79.0/3.0) | | | | | | | | | dst_bytes > 124: snmpgetattack. (51.0/15.0) | | | | | | | | dst_host_srv_count > 253: normal. (106.0/48.0) | | | | | | | count > 1: normal. (175.0/57.0) | | | | | protocol_type = tcp | | | | | | num_failed_logins <= 0 | | | | | | | dst_host_srv_diff_host_rate <= 0.11 | | | | | | | | dst_host_same_srv_rate <= 0.04 | | | | | | | | | src_bytes <= 20: normal. (2.0) | | | | | | | | | src_bytes > 20: guess_passwd. (3.0) | | | | | | | | dst_host_same_srv_rate > 0.04 | | | | | | | | | dst_host_srv_count <= 5: warezmaster. (3.0/1.0) | | | | | | | | | dst_host_srv_count > 5: normal. (882.0) | | | | | | | dst_host_srv_diff_host_rate > 0.11 | | | | | | | | dst_bytes <= 15: warezmaster. (7.0) | | | | | | | | dst_bytes > 15: normal. (5.0) | | | | | | num_failed_logins > 0: guess_passwd. (13.0) | | | | | protocol_type = icmp | | | | | | srv_diff_host_rate <= 0.57

76

Firoz Allahwali

| | | | | | | dst_host_srv_diff_host_rate <= 0.67 | | | | | | | | count <= 13: normal. (5.0) | | | | | | | | count > 13: smurf. (2.0) | | | | | | | dst_host_srv_diff_host_rate > 0.67: ipsweep. (5.0) | | | | | | srv_diff_host_rate > 0.57 | | | | | | | service = private: pod. (0.0) | | | | | | | service = ecr_i: pod. (5.0) | | | | | | | service = ntp_u: pod. (0.0) | | | | | | | service = X11: pod. (0.0) | | | | | | | service = IRC: pod. (0.0) | | | | | | | service = courier: pod. (0.0) | | | | | | | service = ctf: pod. (0.0) | | | | | | | service = ssh: pod. (0.0) | | | | | | | service = time: pod. (0.0) | | | | | | | service = ldap: pod. (0.0) | | | | | | | service = supdup: pod. (0.0) | | | | | | | service = Z39_50: pod. (0.0) | | | | | | | service = discard: pod. (0.0) | | | | | | | service = bgp: pod. (0.0) | | | | | | | service = kshell: pod. (0.0) | | | | | | | service = urp_i: pod. (0.0) | | | | | | | service = uucp: pod. (0.0) | | | | | | | service = uucp_path: pod. (0.0) | | | | | | | service = netstat: pod. (0.0) | | | | | | | service = nnsp: pod. (0.0) | | | | | | | service = http_443: pod. (0.0) | | | | | | | service = iso_tsap: pod. (0.0) | | | | | | | service = http: pod. (0.0) | | | | | | | service = pop_3: pod. (0.0) | | | | | | | service = other: pod. (0.0) | | | | | | | service = domain_u: pod. (0.0) | | | | | | | service = exec: pod. (0.0) | | | | | | | service = telnet: pod. (0.0) | | | | | | | service = ftp: pod. (0.0) | | | | | | | service = sql_net: pod. (0.0) | | | | | | | service = ftp_data: pod. (0.0) | | | | | | | service = smtp: pod. (0.0) | | | | | | | service = sunrpc: pod. (0.0) | | | | | | | service = shell: pod. (0.0) | | | | | | | service = vmnet: pod. (0.0) | | | | | | | service = netbios_dgm: pod. (0.0) | | | | | | | service = efs: pod. (0.0) | | | | | | | service = finger: pod. (0.0) | | | | | | | service = rje: pod. (0.0) | | | | | | | service = mtp: pod. (0.0) | | | | | | | service = imap4: pod. (0.0) | | | | | | | service = link: pod. (0.0) | | | | | | | service = auth: pod. (0.0) | | | | | | | service = domain: pod. (0.0) | | | | | | | service = nntp: pod. (0.0) | | | | | | | service = daytime: pod. (0.0) | | | | | | | service = gopher: pod. (0.0) | | | | | | | service = echo: pod. (0.0) | | | | | | | service = printer: pod. (0.0) | | | | | | | service = whois: pod. (0.0) | | | | | | | service = klogin: pod. (0.0)

77

Firoz Allahwali

| | | | | | | service = ecr: pod. (0.0) | | | | | | | service = netbios_ssn: pod. (0.0) | | | | | | | service = remote_job: pod. (0.0) | | | | | | | service = hostnames: pod. (0.0) | | | | | | | service = eco_i: saint. (2.0) | | | | | | | service = login: pod. (0.0) | | | | | | | service = pop_2: pod. (0.0) | | | | | | | service = netbios_ns: pod. (0.0) | | | | flag = S1: normal. (0.0) | | | | flag = S0 | | | | | dst_host_rerror_rate <= 0.28: mscan. (6.0/1.0) | | | | | dst_host_rerror_rate > 0.28: apache2. (3.0) | | | | flag = RSTO | | | | | src_bytes <= 55: mscan. (6.0) | | | | | src_bytes > 55: guess_passwd. (2.0) | | | | flag = S3: processtable. (6.0) | | | | flag = S2: normal. (0.0) | | | | flag = RSTR: portsweep. (7.0/2.0) | | | | flag = SH: nmap. (1.0) | | | duration > 2 | | | | src_bytes <= 55 | | | | | src_bytes <= 24 | | | | | | srv_rerror_rate <= 0.17: processtable. (16.0/1.0) | | | | | | srv_rerror_rate > 0.17: mscan. (5.0) | | | | | src_bytes > 24: guess_passwd. (73.0) | | | | src_bytes > 55 | | | | | duration <= 261: normal. (14.0) | | | | | duration > 261: warezmaster. (15.0/1.0) | | src_bytes > 2564 | | | num_compromised <= 0 | | | | src_bytes <= 2599: mailbomb. (99.0) | | | | src_bytes > 2599 | | | | | flag = REJ: normal. (0.0) | | | | | flag = SF | | | | | | duration <= 38: normal. (17.0/1.0) | | | | | | duration > 38: warezmaster. (12.0) | | | | | flag = S1: normal. (0.0) | | | | | flag = S0: normal. (0.0) | | | | | flag = RSTO: normal. (0.0) | | | | | flag = S3: normal. (0.0) | | | | | flag = S2: normal. (0.0) | | | | | flag = RSTR: apache2. (10.0) | | | | | flag = SH: normal. (0.0) | | | num_compromised > 0: back. (25.0) | count > 72 | | protocol_type = udp: normal. (28.0) | | protocol_type = tcp | | | service = private: neptune. (0.0) | | | service = ecr_i: neptune. (0.0) | | | service = ntp_u: neptune. (0.0) | | | service = X11: neptune. (0.0) | | | service = IRC: neptune. (0.0) | | | service = courier: neptune. (0.0) | | | service = ctf: neptune. (0.0) | | | service = ssh: neptune. (0.0) | | | service = time: neptune. (0.0)

78

Firoz Allahwali

| | | service = ldap: neptune. (0.0) | | | service = supdup: neptune. (0.0) | | | service = Z39_50: neptune. (0.0) | | | service = discard: neptune. (0.0) | | | service = bgp: neptune. (0.0) | | | service = kshell: neptune. (0.0) | | | service = urp_i: neptune. (0.0) | | | service = uucp: neptune. (0.0) | | | service = uucp_path: neptune. (0.0) | | | service = netstat: neptune. (0.0) | | | service = nnsp: neptune. (0.0) | | | service = http_443: neptune. (0.0) | | | service = iso_tsap: neptune. (0.0) | | | service = http: apache2. (2.0) | | | service = pop_3: neptune. (0.0) | | | service = other: neptune. (0.0) | | | service = domain_u: neptune. (0.0) | | | service = exec: neptune. (0.0) | | | service = telnet: neptune. (5.0) | | | service = ftp: neptune. (0.0) | | | service = sql_net: neptune. (0.0) | | | service = ftp_data: neptune. (0.0) | | | service = smtp: neptune. (0.0) | | | service = sunrpc: neptune. (0.0) | | | service = shell: neptune. (0.0) | | | service = vmnet: neptune. (0.0) | | | service = netbios_dgm: neptune. (0.0) | | | service = efs: neptune. (0.0) | | | service = finger: neptune. (0.0) | | | service = rje: neptune. (0.0) | | | service = mtp: neptune. (0.0) | | | service = imap4: neptune. (0.0) | | | service = link: neptune. (0.0) | | | service = auth: neptune. (0.0) | | | service = domain: neptune. (0.0) | | | service = nntp: neptune. (0.0) | | | service = daytime: neptune. (0.0) | | | service = gopher: neptune. (0.0) | | | service = echo: neptune. (0.0) | | | service = printer: neptune. (0.0) | | | service = whois: neptune. (0.0) | | | service = klogin: neptune. (0.0) | | | service = ecr: neptune. (0.0) | | | service = netbios_ssn: neptune. (0.0) | | | service = remote_job: neptune. (0.0) | | | service = hostnames: neptune. (0.0) | | | service = eco_i: neptune. (0.0) | | | service = login: neptune. (0.0) | | | service = pop_2: neptune. (0.0) | | | service = netbios_ns: neptune. (0.0) | | protocol_type = icmp: smurf. (3485.0) Number of Leaves : 229 Size of the tree : 270

79

Firoz Allahwali

Time taken to build model: 1.8 seconds

== Evaluation on test set ===

== Summary ===

Correctly Classified Instances 41 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances 41

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure Class 0 0 0 0 0 back. 0 0 0 0 0 snmpguess. 0 0 0 0 0 apache2. 0 0 0 0 0 httptunnel. 0 0 0 0 0 mscan. 0 0 0 0 0 saint. 0 0 0 0 0 snmpgetattack. 0 0 0 0 0 processtable. 1 0 1 1 1 mailbomb. 0 0 0 0 0 buffer_overflow. 0 0 0 0 0 ftp_write. 0 0 0 0 0 guess_passwd. 0 0 0 0 0 imap. 0 0 0 0 0 ipsweep. 0 0 0 0 0 land. 0 0 0 0 0 loadmodule. 0 0 0 0 0 multihop. 0 0 0 0 0 neptune. 0 0 0 0 0 nmap. 0 0 0 0 0 normal. 0 0 0 0 0 perl. 0 0 0 0 0 phf. 0 0 0 0 0 pod. 0 0 0 0 0 portsweep. 0 0 0 0 0 rootkit. 0 0 0 0 0 satan. 0 0 0 0 0 smurf.

80

Firoz Allahwali

0 0 0 0 0 spy. 0 0 0 0 0 teardrop. 0 0 0 0 0 warezclient. 0 0 0 0 0 warezmaster.

=== Confusion Matrix ===

a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae <-- classified as 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = back. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = snmpguess. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = apache2. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = httptunnel. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = mscan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = saint. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | g = snmpgetattack. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | h = processtable. 0 0 0 0 0 0 0 0 41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | i = mailbomb. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | j = buffer_overflow. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | k = ftp_write. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | l = guess_passwd. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | m = imap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | n = ipsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | o = land. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | p = loadmodule. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | q = multihop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | r = neptune. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | s = nmap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | t = normal. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | u = perl. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | v = phf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | w = pod. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | x = portsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | y = rootkit. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | z = satan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | aa = smurf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ab = spy. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ac = teardrop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ad = warezclient. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ae = warezmaster.

81

Firoz Allahwali

Testing using two types of Attacks (Mailbomb and Neptune):


Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: ids Instances: 6600 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class Test mode: user supplied test set: 160 instances

82

Firoz Allahwali

=== Classifier model (full training set) === J48 pruned tree ------------------ same_srv_rate <= 0.21 | diff_srv_rate <= 0.23: neptune. (1244.0) | diff_srv_rate > 0.23 | | dst_host_same_srv_rate <= 0.01 | | | dst_host_rerror_rate <= 0.99 | | | | dst_host_serror_rate <= 0.11 | | | | | serror_rate <= 0.03: satan. (18.0/2.0) | | | | | serror_rate > 0.03: saint. (14.0) | | | | dst_host_serror_rate > 0.11: satan. (9.0) | | | dst_host_rerror_rate > 0.99: satan. (11.0) | | dst_host_same_srv_rate > 0.01: normal. (4.0/1.0) same_srv_rate > 0.21 | count <= 72 | | src_bytes <= 2564 | | | duration <= 2 | | | | flag = REJ | | | | | srv_diff_host_rate <= 0.4 | | | | | | srv_count <= 5: httptunnel. (2.0/1.0) | | | | | | srv_count > 5: neptune. (2.0/1.0) | | | | | srv_diff_host_rate > 0.4: mscan. (7.0) | | | | flag = SF | | | | | protocol_type = udp | | | | | | src_bytes <= 55 | | | | | | | service = private | | | | | | | | srv_count <= 8: snmpguess. (49.0) | | | | | | | | srv_count > 8: normal. (3.0) | | | | | | | service = ecr_i: normal. (0.0) | | | | | | | service = ntp_u: normal. (1.0) | | | | | | | service = X11: normal. (0.0) | | | | | | | service = IRC: normal. (0.0) | | | | | | | service = courier: normal. (0.0) | | | | | | | service = ctf: normal. (0.0) | | | | | | | service = ssh: normal. (0.0) | | | | | | | service = time: normal. (0.0) | | | | | | | service = ldap: normal. (0.0) | | | | | | | service = supdup: normal. (0.0) | | | | | | | service = Z39_50: normal. (0.0) | | | | | | | service = discard: normal. (0.0) | | | | | | | service = bgp: normal. (0.0) | | | | | | | service = kshell: normal. (0.0) | | | | | | | service = urp_i: normal. (0.0) | | | | | | | service = uucp: normal. (0.0) | | | | | | | service = uucp_path: normal. (0.0) | | | | | | | service = netstat: normal. (0.0) | | | | | | | service = nnsp: normal. (0.0) | | | | | | | service = http_443: normal. (0.0) | | | | | | | service = iso_tsap: normal. (0.0) | | | | | | | service = http: normal. (0.0) | | | | | | | service = pop_3: normal. (0.0) | | | | | | | service = other: normal. (0.0) | | | | | | | service = domain_u: normal. (54.0) | | | | | | | service = exec: normal. (0.0) | | | | | | | service = telnet: normal. (0.0) | | | | | | | service = ftp: normal. (0.0) | | | | | | | service = sql_net: normal. (0.0)

83

Firoz Allahwali

| | | | | | | service = ftp_data: normal. (0.0) | | | | | | | service = smtp: normal. (0.0) | | | | | | | service = sunrpc: normal. (0.0) | | | | | | | service = shell: normal. (0.0) | | | | | | | service = vmnet: normal. (0.0) | | | | | | | service = netbios_dgm: normal. (0.0) | | | | | | | service = efs: normal. (0.0) | | | | | | | service = finger: normal. (0.0) | | | | | | | service = rje: normal. (0.0) | | | | | | | service = mtp: normal. (0.0) | | | | | | | service = imap4: normal. (0.0) | | | | | | | service = link: normal. (0.0) | | | | | | | service = auth: normal. (0.0) | | | | | | | service = domain: normal. (0.0) | | | | | | | service = nntp: normal. (0.0) | | | | | | | service = daytime: normal. (0.0) | | | | | | | service = gopher: normal. (0.0) | | | | | | | service = echo: normal. (0.0) | | | | | | | service = printer: normal. (0.0) | | | | | | | service = whois: normal. (0.0) | | | | | | | service = klogin: normal. (0.0) | | | | | | | service = ecr: normal. (0.0) | | | | | | | service = netbios_ssn: normal. (0.0) | | | | | | | service = remote_job: normal. (0.0) | | | | | | | service = hostnames: normal. (0.0) | | | | | | | service = eco_i: normal. (0.0) | | | | | | | service = login: normal. (0.0) | | | | | | | service = pop_2: normal. (0.0) | | | | | | | service = netbios_ns: normal. (0.0) | | | | | | src_bytes > 55 | | | | | | | count <= 1 | | | | | | | | dst_host_srv_count <= 253 | | | | | | | | | dst_bytes <= 124: normal. (79.0/3.0) | | | | | | | | | dst_bytes > 124: snmpgetattack. (51.0/15.0) | | | | | | | | dst_host_srv_count > 253: normal. (106.0/48.0) | | | | | | | count > 1: normal. (175.0/57.0) | | | | | protocol_type = tcp | | | | | | num_failed_logins <= 0 | | | | | | | dst_host_srv_diff_host_rate <= 0.11 | | | | | | | | dst_host_same_srv_rate <= 0.04 | | | | | | | | | src_bytes <= 20: normal. (2.0) | | | | | | | | | src_bytes > 20: guess_passwd. (3.0) | | | | | | | | dst_host_same_srv_rate > 0.04 | | | | | | | | | dst_host_srv_count <= 5: warezmaster. (3.0/1.0) | | | | | | | | | dst_host_srv_count > 5: normal. (882.0) | | | | | | | dst_host_srv_diff_host_rate > 0.11 | | | | | | | | dst_bytes <= 15: warezmaster. (7.0) | | | | | | | | dst_bytes > 15: normal. (5.0) | | | | | | num_failed_logins > 0: guess_passwd. (13.0) | | | | | protocol_type = icmp | | | | | | srv_diff_host_rate <= 0.57 | | | | | | | dst_host_srv_diff_host_rate <= 0.67 | | | | | | | | count <= 13: normal. (5.0) | | | | | | | | count > 13: smurf. (2.0) | | | | | | | dst_host_srv_diff_host_rate > 0.67: ipsweep. (5.0) | | | | | | srv_diff_host_rate > 0.57 | | | | | | | service = private: pod. (0.0) | | | | | | | service = ecr_i: pod. (5.0) | | | | | | | service = ntp_u: pod. (0.0) | | | | | | | service = X11: pod. (0.0) | | | | | | | service = IRC: pod. (0.0) | | | | | | | service = courier: pod. (0.0)

84

Firoz Allahwali

| | | | | | | service = ctf: pod. (0.0) | | | | | | | service = ssh: pod. (0.0) | | | | | | | service = time: pod. (0.0) | | | | | | | service = ldap: pod. (0.0) | | | | | | | service = supdup: pod. (0.0) | | | | | | | service = Z39_50: pod. (0.0) | | | | | | | service = discard: pod. (0.0) | | | | | | | service = bgp: pod. (0.0) | | | | | | | service = kshell: pod. (0.0) | | | | | | | service = urp_i: pod. (0.0) | | | | | | | service = uucp: pod. (0.0) | | | | | | | service = uucp_path: pod. (0.0) | | | | | | | service = netstat: pod. (0.0) | | | | | | | service = nnsp: pod. (0.0) | | | | | | | service = http_443: pod. (0.0) | | | | | | | service = iso_tsap: pod. (0.0) | | | | | | | service = http: pod. (0.0) | | | | | | | service = pop_3: pod. (0.0) | | | | | | | service = other: pod. (0.0) | | | | | | | service = domain_u: pod. (0.0) | | | | | | | service = exec: pod. (0.0) | | | | | | | service = telnet: pod. (0.0) | | | | | | | service = ftp: pod. (0.0) | | | | | | | service = sql_net: pod. (0.0) | | | | | | | service = ftp_data: pod. (0.0) | | | | | | | service = smtp: pod. (0.0) | | | | | | | service = sunrpc: pod. (0.0) | | | | | | | service = shell: pod. (0.0) | | | | | | | service = vmnet: pod. (0.0) | | | | | | | service = netbios_dgm: pod. (0.0) | | | | | | | service = efs: pod. (0.0) | | | | | | | service = finger: pod. (0.0) | | | | | | | service = rje: pod. (0.0) | | | | | | | service = mtp: pod. (0.0) | | | | | | | service = imap4: pod. (0.0) | | | | | | | service = link: pod. (0.0) | | | | | | | service = auth: pod. (0.0) | | | | | | | service = domain: pod. (0.0) | | | | | | | service = nntp: pod. (0.0) | | | | | | | service = daytime: pod. (0.0) | | | | | | | service = gopher: pod. (0.0) | | | | | | | service = echo: pod. (0.0) | | | | | | | service = printer: pod. (0.0) | | | | | | | service = whois: pod. (0.0) | | | | | | | service = klogin: pod. (0.0) | | | | | | | service = ecr: pod. (0.0) | | | | | | | service = netbios_ssn: pod. (0.0) | | | | | | | service = remote_job: pod. (0.0) | | | | | | | service = hostnames: pod. (0.0) | | | | | | | service = eco_i: saint. (2.0) | | | | | | | service = login: pod. (0.0) | | | | | | | service = pop_2: pod. (0.0) | | | | | | | service = netbios_ns: pod. (0.0) | | | | flag = S1: normal. (0.0) | | | | flag = S0 | | | | | dst_host_rerror_rate <= 0.28: mscan. (6.0/1.0) | | | | | dst_host_rerror_rate > 0.28: apache2. (3.0) | | | | flag = RSTO | | | | | src_bytes <= 55: mscan. (6.0) | | | | | src_bytes > 55: guess_passwd. (2.0) | | | | flag = S3: processtable. (6.0) | | | | flag = S2: normal. (0.0)

85

Firoz Allahwali

| | | | flag = RSTR: portsweep. (7.0/2.0) | | | | flag = SH: nmap. (1.0) | | | duration > 2 | | | | src_bytes <= 55 | | | | | src_bytes <= 24 | | | | | | srv_rerror_rate <= 0.17: processtable. (16.0/1.0) | | | | | | srv_rerror_rate > 0.17: mscan. (5.0) | | | | | src_bytes > 24: guess_passwd. (73.0) | | | | src_bytes > 55 | | | | | duration <= 261: normal. (14.0) | | | | | duration > 261: warezmaster. (15.0/1.0) | | src_bytes > 2564 | | | num_compromised <= 0 | | | | src_bytes <= 2599: mailbomb. (99.0) | | | | src_bytes > 2599 | | | | | flag = REJ: normal. (0.0) | | | | | flag = SF | | | | | | duration <= 38: normal. (17.0/1.0) | | | | | | duration > 38: warezmaster. (12.0) | | | | | flag = S1: normal. (0.0) | | | | | flag = S0: normal. (0.0) | | | | | flag = RSTO: normal. (0.0) | | | | | flag = S3: normal. (0.0) | | | | | flag = S2: normal. (0.0) | | | | | flag = RSTR: apache2. (10.0) | | | | | flag = SH: normal. (0.0) | | | num_compromised > 0: back. (25.0) | count > 72 | | protocol_type = udp: normal. (28.0) | | protocol_type = tcp | | | service = private: neptune. (0.0) | | | service = ecr_i: neptune. (0.0) | | | service = ntp_u: neptune. (0.0) | | | service = X11: neptune. (0.0) | | | service = IRC: neptune. (0.0) | | | service = courier: neptune. (0.0) | | | service = ctf: neptune. (0.0) | | | service = ssh: neptune. (0.0) | | | service = time: neptune. (0.0) | | | service = ldap: neptune. (0.0) | | | service = supdup: neptune. (0.0) | | | service = Z39_50: neptune. (0.0) | | | service = discard: neptune. (0.0) | | | service = bgp: neptune. (0.0) | | | service = kshell: neptune. (0.0) | | | service = urp_i: neptune. (0.0) | | | service = uucp: neptune. (0.0) | | | service = uucp_path: neptune. (0.0) | | | service = netstat: neptune. (0.0) | | | service = nnsp: neptune. (0.0) | | | service = http_443: neptune. (0.0) | | | service = iso_tsap: neptune. (0.0) | | | service = http: apache2. (2.0) | | | service = pop_3: neptune. (0.0) | | | service = other: neptune. (0.0) | | | service = domain_u: neptune. (0.0) | | | service = exec: neptune. (0.0) | | | service = telnet: neptune. (5.0) | | | service = ftp: neptune. (0.0) | | | service = sql_net: neptune. (0.0) | | | service = ftp_data: neptune. (0.0) | | | service = smtp: neptune. (0.0)

86

Firoz Allahwali

| | | service = sunrpc: neptune. (0.0) | | | service = shell: neptune. (0.0) | | | service = vmnet: neptune. (0.0) | | | service = netbios_dgm: neptune. (0.0) | | | service = efs: neptune. (0.0) | | | service = finger: neptune. (0.0) | | | service = rje: neptune. (0.0) | | | service = mtp: neptune. (0.0) | | | service = imap4: neptune. (0.0) | | | service = link: neptune. (0.0) | | | service = auth: neptune. (0.0) | | | service = domain: neptune. (0.0) | | | service = nntp: neptune. (0.0) | | | service = daytime: neptune. (0.0) | | | service = gopher: neptune. (0.0) | | | service = echo: neptune. (0.0) | | | service = printer: neptune. (0.0) | | | service = whois: neptune. (0.0) | | | service = klogin: neptune. (0.0) | | | service = ecr: neptune. (0.0) | | | service = netbios_ssn: neptune. (0.0) | | | service = remote_job: neptune. (0.0) | | | service = hostnames: neptune. (0.0) | | | service = eco_i: neptune. (0.0) | | | service = login: neptune. (0.0) | | | service = pop_2: neptune. (0.0) | | | service = netbios_ns: neptune. (0.0) | | protocol_type = icmp: smurf. (3485.0) Number of Leaves : 229 Size of the tree : 270 Time taken to build model: 1.59 seconds === Evaluation on test set === === Summary === Correctly Classified Instances 160 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances 160 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0 0 0 0 0 back. 0 0 0 0 0 snmpguess. 0 0 0 0 0 apache2. 0 0 0 0 0 httptunnel.

87

Firoz Allahwali

0 0 0 0 0 mscan. 0 0 0 0 0 saint. 0 0 0 0 0 snmpgetattack. 0 0 0 0 0 processtable. 1 0 1 1 1 mailbomb. 0 0 0 0 0 buffer_overflow. 0 0 0 0 0 ftp_write. 0 0 0 0 0 guess_passwd. 0 0 0 0 0 imap. 0 0 0 0 0 ipsweep. 0 0 0 0 0 land. 0 0 0 0 0 loadmodule. 0 0 0 0 0 multihop. 1 0 1 1 1 neptune. 0 0 0 0 0 nmap. 0 0 0 0 0 normal. 0 0 0 0 0 perl. 0 0 0 0 0 phf. 0 0 0 0 0 pod. 0 0 0 0 0 portsweep. 0 0 0 0 0 rootkit. 0 0 0 0 0 satan. 0 0 0 0 0 smurf. 0 0 0 0 0 spy. 0 0 0 0 0 teardrop. 0 0 0 0 0 warezclient. 0 0 0 0 0 warezmaster. === Confusion Matrix === a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae <-- classified as 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = back. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = snmpguess. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = apache2. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = httptunnel. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = mscan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = saint. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | g = snmpgetattack. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | h = processtable. 0 0 0 0 0 0 0 0 41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | i = mailbomb. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | j = buffer_overflow. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | k = ftp_write. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | l = guess_passwd. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | m = imap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | n = ipsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | o = land. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | p = loadmodule. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | q = multihop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 119 0 0 0 0 0 0 0 0 0 0 0 0 0 | r = neptune. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | s = nmap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | t = normal. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | u = perl. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | v = phf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | w = pod. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | x = portsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | y = rootkit. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | z = satan.

88

Firoz Allahwali

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | aa = smurf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ab = spy. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ac = teardrop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ad = warezclient. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ae = warezmaster.

Testing using three types of Attacks(Mailbomb, Neptune and Smurf):

=== Run information === Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: ids Instances: 6600 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate

89

Firoz Allahwali

dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class Test mode: user supplied test set: 237 instances === Classifier model (full training set) === J48 pruned tree ------------------ same_srv_rate <= 0.21 | diff_srv_rate <= 0.23: neptune. (1244.0) | diff_srv_rate > 0.23 | | dst_host_same_srv_rate <= 0.01 | | | dst_host_rerror_rate <= 0.99 | | | | dst_host_serror_rate <= 0.11 | | | | | serror_rate <= 0.03: satan. (18.0/2.0) | | | | | serror_rate > 0.03: saint. (14.0) | | | | dst_host_serror_rate > 0.11: satan. (9.0) | | | dst_host_rerror_rate > 0.99: satan. (11.0) | | dst_host_same_srv_rate > 0.01: normal. (4.0/1.0) same_srv_rate > 0.21 | count <= 72 | | src_bytes <= 2564 | | | duration <= 2 | | | | flag = REJ | | | | | srv_diff_host_rate <= 0.4 | | | | | | srv_count <= 5: httptunnel. (2.0/1.0) | | | | | | srv_count > 5: neptune. (2.0/1.0) | | | | | srv_diff_host_rate > 0.4: mscan. (7.0) | | | | flag = SF | | | | | protocol_type = udp | | | | | | src_bytes <= 55 | | | | | | | service = private | | | | | | | | srv_count <= 8: snmpguess. (49.0) | | | | | | | | srv_count > 8: normal. (3.0) | | | | | | | service = ecr_i: normal. (0.0) | | | | | | | service = ntp_u: normal. (1.0) | | | | | | | service = X11: normal. (0.0) | | | | | | | service = IRC: normal. (0.0) | | | | | | | service = courier: normal. (0.0) | | | | | | | service = ctf: normal. (0.0) | | | | | | | service = ssh: normal. (0.0) | | | | | | | service = time: normal. (0.0) | | | | | | | service = ldap: normal. (0.0) | | | | | | | service = supdup: normal. (0.0) | | | | | | | service = Z39_50: normal. (0.0) | | | | | | | service = discard: normal. (0.0) | | | | | | | service = bgp: normal. (0.0) | | | | | | | service = kshell: normal. (0.0) | | | | | | | service = urp_i: normal. (0.0) | | | | | | | service = uucp: normal. (0.0) | | | | | | | service = uucp_path: normal. (0.0) | | | | | | | service = netstat: normal. (0.0) | | | | | | | service = nnsp: normal. (0.0)

90

Firoz Allahwali

| | | | | | | service = http_443: normal. (0.0) | | | | | | | service = iso_tsap: normal. (0.0) | | | | | | | service = http: normal. (0.0) | | | | | | | service = pop_3: normal. (0.0) | | | | | | | service = other: normal. (0.0) | | | | | | | service = domain_u: normal. (54.0) | | | | | | | service = exec: normal. (0.0) | | | | | | | service = telnet: normal. (0.0) | | | | | | | service = ftp: normal. (0.0) | | | | | | | service = sql_net: normal. (0.0) | | | | | | | service = ftp_data: normal. (0.0) | | | | | | | service = smtp: normal. (0.0) | | | | | | | service = sunrpc: normal. (0.0) | | | | | | | service = shell: normal. (0.0) | | | | | | | service = vmnet: normal. (0.0) | | | | | | | service = netbios_dgm: normal. (0.0) | | | | | | | service = efs: normal. (0.0) | | | | | | | service = finger: normal. (0.0) | | | | | | | service = rje: normal. (0.0) | | | | | | | service = mtp: normal. (0.0) | | | | | | | service = imap4: normal. (0.0) | | | | | | | service = link: normal. (0.0) | | | | | | | service = auth: normal. (0.0) | | | | | | | service = domain: normal. (0.0) | | | | | | | service = nntp: normal. (0.0) | | | | | | | service = daytime: normal. (0.0) | | | | | | | service = gopher: normal. (0.0) | | | | | | | service = echo: normal. (0.0) | | | | | | | service = printer: normal. (0.0) | | | | | | | service = whois: normal. (0.0) | | | | | | | service = klogin: normal. (0.0) | | | | | | | service = ecr: normal. (0.0) | | | | | | | service = netbios_ssn: normal. (0.0) | | | | | | | service = remote_job: normal. (0.0) | | | | | | | service = hostnames: normal. (0.0) | | | | | | | service = eco_i: normal. (0.0) | | | | | | | service = login: normal. (0.0) | | | | | | | service = pop_2: normal. (0.0) | | | | | | | service = netbios_ns: normal. (0.0) | | | | | | src_bytes > 55 | | | | | | | count <= 1 | | | | | | | | dst_host_srv_count <= 253 | | | | | | | | | dst_bytes <= 124: normal. (79.0/3.0) | | | | | | | | | dst_bytes > 124: snmpgetattack. (51.0/15.0) | | | | | | | | dst_host_srv_count > 253: normal. (106.0/48.0) | | | | | | | count > 1: normal. (175.0/57.0) | | | | | protocol_type = tcp | | | | | | num_failed_logins <= 0 | | | | | | | dst_host_srv_diff_host_rate <= 0.11 | | | | | | | | dst_host_same_srv_rate <= 0.04 | | | | | | | | | src_bytes <= 20: normal. (2.0) | | | | | | | | | src_bytes > 20: guess_passwd. (3.0) | | | | | | | | dst_host_same_srv_rate > 0.04 | | | | | | | | | dst_host_srv_count <= 5: warezmaster. (3.0/1.0) | | | | | | | | | dst_host_srv_count > 5: normal. (882.0) | | | | | | | dst_host_srv_diff_host_rate > 0.11

91

Firoz Allahwali

| | | | | | | | dst_bytes <= 15: warezmaster. (7.0) | | | | | | | | dst_bytes > 15: normal. (5.0) | | | | | | num_failed_logins > 0: guess_passwd. (13.0) | | | | | protocol_type = icmp | | | | | | srv_diff_host_rate <= 0.57 | | | | | | | dst_host_srv_diff_host_rate <= 0.67 | | | | | | | | count <= 13: normal. (5.0) | | | | | | | | count > 13: smurf. (2.0) | | | | | | | dst_host_srv_diff_host_rate > 0.67: ipsweep. (5.0) | | | | | | srv_diff_host_rate > 0.57 | | | | | | | service = private: pod. (0.0) | | | | | | | service = ecr_i: pod. (5.0) | | | | | | | service = ntp_u: pod. (0.0) | | | | | | | service = X11: pod. (0.0) | | | | | | | service = IRC: pod. (0.0) | | | | | | | service = courier: pod. (0.0) | | | | | | | service = ctf: pod. (0.0) | | | | | | | service = ssh: pod. (0.0) | | | | | | | service = time: pod. (0.0) | | | | | | | service = ldap: pod. (0.0) | | | | | | | service = supdup: pod. (0.0) | | | | | | | service = Z39_50: pod. (0.0) | | | | | | | service = discard: pod. (0.0) | | | | | | | service = bgp: pod. (0.0) | | | | | | | service = kshell: pod. (0.0) | | | | | | | service = urp_i: pod. (0.0) | | | | | | | service = uucp: pod. (0.0) | | | | | | | service = uucp_path: pod. (0.0) | | | | | | | service = netstat: pod. (0.0) | | | | | | | service = nnsp: pod. (0.0) | | | | | | | service = http_443: pod. (0.0) | | | | | | | service = iso_tsap: pod. (0.0) | | | | | | | service = http: pod. (0.0) | | | | | | | service = pop_3: pod. (0.0) | | | | | | | service = other: pod. (0.0) | | | | | | | service = domain_u: pod. (0.0) | | | | | | | service = exec: pod. (0.0) | | | | | | | service = telnet: pod. (0.0) | | | | | | | service = ftp: pod. (0.0) | | | | | | | service = sql_net: pod. (0.0) | | | | | | | service = ftp_data: pod. (0.0) | | | | | | | service = smtp: pod. (0.0) | | | | | | | service = sunrpc: pod. (0.0) | | | | | | | service = shell: pod. (0.0) | | | | | | | service = vmnet: pod. (0.0) | | | | | | | service = netbios_dgm: pod. (0.0) | | | | | | | service = efs: pod. (0.0) | | | | | | | service = finger: pod. (0.0) | | | | | | | service = rje: pod. (0.0) | | | | | | | service = mtp: pod. (0.0) | | | | | | | service = imap4: pod. (0.0) | | | | | | | service = link: pod. (0.0) | | | | | | | service = auth: pod. (0.0) | | | | | | | service = domain: pod. (0.0) | | | | | | | service = nntp: pod. (0.0) | | | | | | | service = daytime: pod. (0.0)

92

Firoz Allahwali

| | | | | | | service = gopher: pod. (0.0) | | | | | | | service = echo: pod. (0.0) | | | | | | | service = printer: pod. (0.0) | | | | | | | service = whois: pod. (0.0) | | | | | | | service = klogin: pod. (0.0) | | | | | | | service = ecr: pod. (0.0) | | | | | | | service = netbios_ssn: pod. (0.0) | | | | | | | service = remote_job: pod. (0.0) | | | | | | | service = hostnames: pod. (0.0) | | | | | | | service = eco_i: saint. (2.0) | | | | | | | service = login: pod. (0.0) | | | | | | | service = pop_2: pod. (0.0) | | | | | | | service = netbios_ns: pod. (0.0) | | | | flag = S1: normal. (0.0) | | | | flag = S0 | | | | | dst_host_rerror_rate <= 0.28: mscan. (6.0/1.0) | | | | | dst_host_rerror_rate > 0.28: apache2. (3.0) | | | | flag = RSTO | | | | | src_bytes <= 55: mscan. (6.0) | | | | | src_bytes > 55: guess_passwd. (2.0) | | | | flag = S3: processtable. (6.0) | | | | flag = S2: normal. (0.0) | | | | flag = RSTR: portsweep. (7.0/2.0) | | | | flag = SH: nmap. (1.0) | | | duration > 2 | | | | src_bytes <= 55 | | | | | src_bytes <= 24 | | | | | | srv_rerror_rate <= 0.17: processtable. (16.0/1.0) | | | | | | srv_rerror_rate > 0.17: mscan. (5.0) | | | | | src_bytes > 24: guess_passwd. (73.0) | | | | src_bytes > 55 | | | | | duration <= 261: normal. (14.0) | | | | | duration > 261: warezmaster. (15.0/1.0) | | src_bytes > 2564 | | | num_compromised <= 0 | | | | src_bytes <= 2599: mailbomb. (99.0) | | | | src_bytes > 2599 | | | | | flag = REJ: normal. (0.0) | | | | | flag = SF | | | | | | duration <= 38: normal. (17.0/1.0) | | | | | | duration > 38: warezmaster. (12.0) | | | | | flag = S1: normal. (0.0) | | | | | flag = S0: normal. (0.0) | | | | | flag = RSTO: normal. (0.0) | | | | | flag = S3: normal. (0.0) | | | | | flag = S2: normal. (0.0) | | | | | flag = RSTR: apache2. (10.0) | | | | | flag = SH: normal. (0.0) | | | num_compromised > 0: back. (25.0) | count > 72 | | protocol_type = udp: normal. (28.0) | | protocol_type = tcp | | | service = private: neptune. (0.0) | | | service = ecr_i: neptune. (0.0) | | | service = ntp_u: neptune. (0.0) | | | service = X11: neptune. (0.0)

93

Firoz Allahwali

| | | service = IRC: neptune. (0.0) | | | service = courier: neptune. (0.0) | | | service = ctf: neptune. (0.0) | | | service = ssh: neptune. (0.0) | | | service = time: neptune. (0.0) | | | service = ldap: neptune. (0.0) | | | service = supdup: neptune. (0.0) | | | service = Z39_50: neptune. (0.0) | | | service = discard: neptune. (0.0) | | | service = bgp: neptune. (0.0) | | | service = kshell: neptune. (0.0) | | | service = urp_i: neptune. (0.0) | | | service = uucp: neptune. (0.0) | | | service = uucp_path: neptune. (0.0) | | | service = netstat: neptune. (0.0) | | | service = nnsp: neptune. (0.0) | | | service = http_443: neptune. (0.0) | | | service = iso_tsap: neptune. (0.0) | | | service = http: apache2. (2.0) | | | service = pop_3: neptune. (0.0) | | | service = other: neptune. (0.0) | | | service = domain_u: neptune. (0.0) | | | service = exec: neptune. (0.0) | | | service = telnet: neptune. (5.0) | | | service = ftp: neptune. (0.0) | | | service = sql_net: neptune. (0.0) | | | service = ftp_data: neptune. (0.0) | | | service = smtp: neptune. (0.0) | | | service = sunrpc: neptune. (0.0) | | | service = shell: neptune. (0.0) | | | service = vmnet: neptune. (0.0) | | | service = netbios_dgm: neptune. (0.0) | | | service = efs: neptune. (0.0) | | | service = finger: neptune. (0.0) | | | service = rje: neptune. (0.0) | | | service = mtp: neptune. (0.0) | | | service = imap4: neptune. (0.0) | | | service = link: neptune. (0.0) | | | service = auth: neptune. (0.0) | | | service = domain: neptune. (0.0) | | | service = nntp: neptune. (0.0) | | | service = daytime: neptune. (0.0) | | | service = gopher: neptune. (0.0) | | | service = echo: neptune. (0.0) | | | service = printer: neptune. (0.0) | | | service = whois: neptune. (0.0) | | | service = klogin: neptune. (0.0) | | | service = ecr: neptune. (0.0) | | | service = netbios_ssn: neptune. (0.0) | | | service = remote_job: neptune. (0.0) | | | service = hostnames: neptune. (0.0) | | | service = eco_i: neptune. (0.0) | | | service = login: neptune. (0.0) | | | service = pop_2: neptune. (0.0) | | | service = netbios_ns: neptune. (0.0) | | protocol_type = icmp: smurf. (3485.0)

94

Firoz Allahwali

Number of Leaves : 229 Size of the tree : 270 Time taken to build model: 1.61 seconds === Evaluation on test set === === Summary === Correctly Classified Instances 236 99.5781 % Incorrectly Classified Instances 1 0.4219 % Kappa statistic 0.9931 Mean absolute error 0.0003 Root mean squared error 0.0165 Relative absolute error 0.5767 % Root relative squared error 10.179 % Total Number of Instances 237 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0 0 0 0 0 back. 0 0 0 0 0 snmpguess. 0 0 0 0 0 apache2. 0 0 0 0 0 httptunnel. 0 0 0 0 0 mscan. 0 0 0 0 0 saint. 0 0 0 0 0 snmpgetattack. 0 0 0 0 0 processtable. 1 0 1 1 1 mailbomb. 0 0 0 0 0 buffer_overflow. 0 0 0 0 0 ftp_write. 0 0 0 0 0 guess_passwd. 0 0 0 0 0 imap. 0 0 0 0 0 ipsweep. 0 0 0 0 0 land. 0 0 0 0 0 loadmodule. 0 0 0 0 0 multihop. 1 0 1 1 1 neptune. 0 0 0 0 0 nmap. 0 0.004 0 0 0 normal. 0 0 0 0 0 perl. 0 0 0 0 0 phf. 0 0 0 0 0 pod. 0 0 0 0 0 portsweep. 0 0 0 0 0 rootkit. 0 0 0 0 0 satan. 0.987 0 1 0.987 0.993 smurf. 0 0 0 0 0 spy. 0 0 0 0 0 teardrop.

95

Firoz Allahwali

0 0 0 0 0 warezclient. 0 0 0 0 0 warezmaster. === Confusion Matrix === a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae <-- classified as 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = back. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = snmpguess. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = apache2. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = httptunnel. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = mscan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = saint. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | g = snmpgetattack. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | h = processtable. 0 0 0 0 0 0 0 0 41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | i = mailbomb. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | j = buffer_overflow. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | k = ftp_write. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | l = guess_passwd. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | m = imap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | n = ipsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | o = land. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | p = loadmodule. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | q = multihop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 119 0 0 0 0 0 0 0 0 0 0 0 0 0 | r = neptune. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | s = nmap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | t = normal. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | u = perl. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | v = phf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | w = pod. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | x = portsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | y = rootkit. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | z = satan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 76 0 0 0 0 | aa = smurf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ab = spy. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ac = teardrop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ad = warezclient. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ae = warezmaster.

96

Firoz Allahwali

BIBLIOGRAPHY

[Agarwal 1993] Agarwal,R.,Imielinski,T.,and Swami,A. Mining Associations between Sets

of Items in Massive Databases. In proceedings of the ACM-SIGMOD 1993

International Conference on Management of Data, pages 207-216.

[Allen 2000]Allen ,J.,Christie,A.,Fithen,W.,McHugh,J.,Pickel,J.,and Stoner, E. State of the

Practice of Intrusion Detection Technologies. Technical report, Carnegie Mellon

University.

[Bace 2000] Rebecca Gurley Bace. Intrusion Detection. Macmillan Computer Publishing

(MCP), Indianopolis. 2000.

[Barbara 2001] Daniel Barbara and Sushil Jajodia. Applications of Data Mining in Computer

Security. Kluwer Academic Publishers, Norrwell, MA. 2002.

[Bro-ids.org] bro-ids.org. Home page. Available from www.bro-ids.org(Visited on Jan 24,

2006).

[Fayyad et al. 1996] Fayyad, U. M., Piatetsky-Shapiro, G., and Smyth, P. (1996c) From data

mining to knowledge discovery: an overview. In Fayyad, U. M., Piatetsky-Shapiro,

G., Smyth, P., and Uthurusamy, R., editors, Advances in Knowledge Discovery and

Data Mining, pages 1-34. AAAI Press / MIT Press, Menlo Park, CA.

97

Firoz Allahwali

[Goebel 1999]Michael Goebel, Le Gruenwald: A Survey of Data Mining and knowledge

Discovery Software Tools. SIGKDD Explorations 1(1): 20-33 (1999)

[Ian 2005] Witten, Ian H; Frank, Eibe. Data Mining Practical Machine Learning Tools and

Techniques. Morgan Kauffman Publishers, San Francisco ,CA ,2005.

[Lee 2002] Wenke Lee. Applying Data Mining to Intrusion Detection: The Quest for

Automation, Efficiency, and Credibility. SIGKDD Explorations, 4(2), December

2002.

[Mannila 2002] Mannila H., Local and Global Methods in Data Mining: Basic Techniques

and Open Problems, proceedings of 29th International Colloquium on Automata,

Languages and Programming, Lecture Notes on Computer Science, pages 57-68,

Springer-Verlag, 2002

[Robert] Traffic. Homepage. Available from http://robert.rsa3.com/traffic.html

[Sandhya ] Peddabachigari,Sandhya : Abraham, Ajith: Thomas, Johnson. Intrusion Detection

Systems Using Decision Trees and Support Vector Machines. Information

Management Journal v.4: 635-660, 2001.

[Snort.] Snort.org. Home page. Available from www.snort.org (Visited on Jan15, 2005).

98

Firoz Allahwali

[Stolfo 1998] Wenke Lee and Sal Stolfo. Data Mining Approaches for Intrusion Detection.

Proceedings of the Seventh USENIX Security Symposium (SECURITY '98), San

Antonio, TX, January 1998.

[TCPreplay] TCPreplay. Homepage. Available from http://tcpreplay.synfin.net/

[Weka] Weka. Home page. Available from http://www.cs.waikato.ac.nz/ml/weka/

[Zhang 2001] Zhang,Junxin ., Lee,Wenke ., Stolfo,Sal ., Chan,Phil ., Eleazar Eskin, Wei

Fan, Matt Miller, Shlomo Hershkop. Real Time Data Mining-based Intrusion

Detection. In Proceedings of the 2001 DARPA Information Survivability Conference

and Exposition (DISCEX II) (selected for presentation), Anaheim, CA, June 2001.

99