Acknowledgement
With a deep sense of gratitude, I wish to express my sincere thanks to Dr. Mario Garcia,
my Project Chair and my committee members for their immense help in planning and
executing the project in time. The confidence and dynamism with which Dr. Garcia
guided the work requires no elaboration. His company and assurance at the time of crisis
would be remembered lifelong. His valuable suggestions as final words during the course
of work are greatly acknowledged.
I also want to thank my parents, who taught me the value of hard work by their
own example. I would like to share this moment of happiness with my mother and
brother. They rendered me enormous support during the whole tenure of my project. The
encouragement and motivation that was given to me to carry out my research by my
father is also remembered.
Finally, I would like to thank all whose direct and indirect support helped
me complete my project in time.
ii
ABSTRACT
In recent years Data mining techniques have been applied in many
different fields including marketing, manufacturing, process control, fraud
detection and network management. Over the past several years a growing
number of research projects have applied data mining to various problems in
intrusion detection. Anomaly detection as a mechanism for Intrusion detection has
been an active area of research. The goal of my project is to design and implement
an anomaly detector using data mining The project will include the use of open
source tools and/or modifications to existing tools to incorporate the goals of
collection, filtering, storage, archival, and attack detection in a cohesive software
system. This paper also surveys a representative cross section of these research
efforts. Conclusions are drawn and directions for future research are suggested.
iii
DEFINITIONS AND ABBREVIATIONS
Definitions
Anomaly It can be defined as a data point that for some system has characteristics significantly different when compared to earlier observed (Normal) data points from the same system. Training Set A set of sequences that contains ‘Normal’ sequences used for training a System Test Set A Set of Sequences containing ‘Normal’ sequences as well as labeled ‘Anomalous’ sequences, used to calculate the performance of a system. Key A set of labels attached to a test set identifying the anomalies in the set. Hit Ratio The Percentage of total Anomalies that were detected by the system Miss Ratio The Percentage of total Anomalies that were NOT detected by the system False Alarm Ratio The Percentage of the Total Normal cases that were wrongly classified as being anomalous Sequence A finite arrangement of the input tokens Abbreviations IDS Intrusion Detection Systems AD Anomaly Detector KDD Knowledge discovery in databases JAM Java Agents for Meta-learning MADAM ID Mining Audit Data for Automated Models for Intrusion Detection TCP/IP Transmission Control Protocol and Internet Protocol
iv
TABLE OF CONTENTS
Abstract…………………………………………………………………………………...iii
Definitions and Abbreviations……………………………………………………………iv
Table of Contents……………………………………………………………………….....v
List of Figures…………………………………………………………………………....vii
1. Introduction………………...……………………………………………..…………….1
1.1 Data Mining, KDD and related fields...…………………………...…………..4
1.1.1 Some Data Mining Techniques in Intrusion Detection……………..5
1.2 Intrusion Detection …………………………………………………………...7
1.2.1 Misuse Detection……………………………………………………8
1.2.2 Anomaly Detection………………………………………………...10
2. Narrative………………………………………………………...…………………….12
2.1 Building Intrusion Detection models using Data Mining…...……………….14
2.1.1 Data Collection and Enhancement…………………………………15
2.1.2 Analysis Tools……………...……………………………………...16
2.1.3 Data Preprocessing...………………………………………………17
2.1.4 Data Mining………..………………………………………………18
3. System Design……………………………………………………..…………………20
3.1 Data Collection and Enhancement…………………………………………...21
3.1.1 Collecting Attack free data on an Isolated Network………..………22
3.1.1.1 Setting up an Isolated Network…………………………….23
3.1.1.2 Traffic Generation…………..………………………….…..24
v
3.1.1.3 Data Collection on Isolated Network………….……….……25
3.1.2 Data Collection on the Router (with Attacks)……………………......25
3.2 Data Preprocessing and analysis……….…………………………………….27
3.2.1 Conn Analyser……………………………………………………….30
3.3 Mining Data for Complex Relationships…………….………………………32
3.3.1 Classification…………………………………………………………37
3.3.2 Clustering…………………………………………………………….41
3.3.3 Association-rule learners…………………………………………….47
4. Testing and Validation………………………………………………………………...49
4.1 System Performance Metrics……………………………………………….51
5. Conclusion…………………………………………………………………………….55
Appendix A……………………………………………………………………………....56
Appendix B………………………………………………………………………………58
Appendix C………………………………………………………………………………68
Appendix D………………………………………………………………………………72
Appendix E………………………………………………………………………………74
Bibliography …………………………………………………………………………….97
vi
LIST OF FIGURES
Fig.1 Data Mining based IDS………………………………………….………...………14
Fig .2 System Architecture…………………………….………………..……...………..20
Fig .3 Isolated Network with KVM switch and Hub……………………….……...…….23
Fig .4 Screen shot of Traffic…...………………………………………………………...24
Fig .5 Data Collection at a Network Tap on a router connected to the internet………....26
Fig .6 Bro-IDS Architecture………………………………………………………..……31
Fig .7 Screen shot of Weka GUI……………………………………………………...….34
Fig .8 Screen shot of Weka Explorer showing data loading and preprocessing……..…..34
Fig. 9 Screen shot of Weka - Classification using Normal data as Trainer….………….39
Fig.10 Screen shot of Weka - Classification using Attack data as Trainer…..……….…40
Fig. 11 Screen shot of Weka - Clustering using Normal data as Trainer……………..…42
Fig. 12 Clustering Visualization – Normal data as Trainer……………………………...44
Fig. 13 Screen shot of Weka - Clustering using Attack data as Trainer………..……..…45
Fig. 14 Clustering Visualization – Attack data as Trainer…………………….…………46
vii
Firoz Allahwali
1. INTRODUCTION
Intrusion detection is the process of monitoring and analyzing the events occurring in a
computer system in order to detect signs of security problems [Bace 2000]. Intrusion
detection has been an active area of research for more than a decade now. The importance of
Intrusion Detection Systems (IDS) has grown tremendously recently because of our
dependence on electronic forms of data. Sensitive information, which has to be kept secure,
is kept in an electronic form on computers. Military and the Government have been the most
vocal of the IDS users, but more and more private organizations are realizing the importance
of such a system.
There is an increase in the amount of data for which confidentiality, integrity and
availability are critical characteristics. Information stored about buying patterns of
consumers, their financial status and banking habits as well as on-line browsing patterns are
subject to frequent attacks. It has been generally known that building a secure system will
definitely reduce the probability of a successful attack, but it will not completely eliminate
such an incident. IDS assume that some of the probable attacks will get through and will
have to be detected by some sophisticated techniques. Another threat that is becoming real is
insider attack. All the security built into the system will not be able to detect an attack that is
initiated by someone with the allowed access. Such attacks are impossible to avoid and it is
imperative that they be detected. Timely detection of an attack can lead a supervisor to stop
the attack in the middle, track down the possible intruder, or revert the system back to a
consistent state after the attack.
1
Firoz Allahwali
The problem is that current NIDS are tuned specifically to detect known service
level network attacks. Attempts to expand beyond this limited realm typically
results in an unacceptable level of false positives. At the same time, enough data
exists or could be collected to allow network administrators to detect these policy
violations. Unfortunately, the data is so voluminous, and the analysis process so time
consuming, that the administrators don’t have the resources to go through it all
and find the relevant knowledge, save for the most exceptional situations, such as
after the organization has taken a large loss and the analysis is done as part of a
legal investigation. In other words, network administrators don’t have the resources
to proactively analyze the data for policy violations, especially in the presence of a
high number of false positives that cause them to waste their limited resources.
Given the nature of this problem, the natural solution is data mining in an offline
environment. Such an approach would add additional depth to the administrator’s defenses,
and allows them to more accurately determine what the threats against their network are
through the use of multiple methods on data from multiple sources. Hence, activity that it is
not efficient to detect in near real-time in an online NID, either due to the amount of state
that would need to be retained, or the amount of computational resources that would need to
be expended in a limited time window, can be more easily identified. Some examples of what
such a system could detect, that online NIDS can not detect effectively, include certain types
of malicious activity, such as low and slow scans, a slowly propagating worm, unusual
activity of a user based on some new pattern of activity (rather than a single connection or
small number of connections, which are bound to produce a number of false positives), or
2
Firoz Allahwali
even new forms of attacks that online sensors are not tuned for. Additionally, such a system
could more easily allow for the introduction of new metrics, which can use the historical data
as a baseline for comparison with current activity. It also serves to aid network
administrators, security officers, and analysts in the performance of their duties by allowing
them to ask questions that would not have occurred to them a priori. Ideally, such a system
should be able to derive a threat level for the network activity that it analyzes, and predict
future attacks based on past activity.
Designing and implementing a anomaly detector is the major goal of this project. A
data-mining approach will be used to build a detector in this project. Data mining has become
a very useful technique to reduce information overload and improve decision making by
extracting and refining useful knowledge through a process of searching for relationships
and patterns from the extensive data collected by organizations. The extracted information is
used to predict, classify, model and summarize the data being mined. Data mining
technologies such as rule induction, neural networks, genetic algorithms, fuzzy logic, and
rough sets are used for classification and pattern recognition. They have been extensively
used in distinguishing abnormal behavior in a variety of contexts. In recent years data mining
techniques have been successfully used in the context of intrusion detection [Lee 2002].
3
Firoz Allahwali
1.1. Data Mining, KDD and Related fields
The term knowledge discovery in databases (KDD) is used to denote the process of
extracting useful knowledge from large data sets. Data mining, by contrast, refers to one
particular step in this process. Generally, data mining is the process of extracting useful and
previously unnoticed models or patterns from large data stores [Bace 2000; Stolfo 1998;
2000; Lee et al. 1999a; Mannila 2002; Fayyad et al. 1996]. Specifically, the data mining step
applies so-called data mining techniques to extract patterns from the data. Data mining is a
component of the Knowledge Discovery in Databases (KDD) process [Carbone 1997;
Fayyad et al. 1996]. It is preceded and followed by other KDD steps, which ensure that the
extracted patterns actually correspond to useful knowledge. Indeed, without these additional
KDD steps, there is a high risk of finding meaningless or uninteresting patterns. In other
words, the KDD process uses data mining techniques along with any required pre- and post-
processing to extract high-level knowledge from low-level data. In practice, the KDD process
is interactive and iterative, involving numerous steps with many decisions being made by the
user. Broadly some of the most basic KDD steps are [Goebel 1999]:
1. Understanding the application domain: Developing an understanding of the
application domain, the relevant background knowledge, and the specific goals of the
KDD endeavor.
2. Data integration and selection: The integration of multiple (potentially
heterogeneous) data sources and the selection of the subset of data that is relevant to
the analysis task.
4
Firoz Allahwali
3. Data mining: The application of specific algorithms for extracting patterns from data.
4. Pattern evaluation: The interpretation and validation of the discovered patterns. The
goal of this step is to guarantee that actual knowledge is being discovered.
5. Knowledge representation: This step involves documenting and using the
discovered knowledge.
Data mining extensively uses known techniques from machine learning, statistics, and
other fields. Nevertheless, several differences between data mining and related fields have
been identified .Specifically, one of the most frequently cited characteristics of data mining is
its focus on finding relatively simple, but interpretable models in an efficient and scalable
manner. In other words, data mining emphasizes the efficient discovery of simple, but
understandable models that can be interpreted as interesting or useful knowledge. Thus, for
example, neural networks (although a powerful modeling tool) are relatively difficult to
understand compared to rules, trees, sequential patterns, or associations. As a consequence,
neural networks are of less practical importance in data mining. This should not come as a
surprise. In fact, data mining is just a step in the KDD process. As such, it has to contribute
to the overall goal of knowledge discovery. Clearly, only understandable patterns can qualify
as knowledge. Hence the importance of understandability in data mining
1.1.1 Some Data Mining Techniques in Intrusion Detection
Data mining techniques essentially are pattern discovery algorithms. Some techniques
such as association rules (Agrawal et al., 1993) are unique to data mining, but most are
5
Firoz Allahwali
drawn from related fields such as machine learning or pattern recognition. In this section,
well-known data mining techniques that have been widely used in intrusion detection are
discussed (Stolfo,1998).
• Classification categorizes the data records (training data set) into a predetermined set
of classes (Data Classes) used as attributes to label each record; distinguishing
elements belonging to the normal or abnormal class (a specific kind of intrusion),
using decision trees or rules. This technique has been popular to detect individual
attacks but has to be applied with complementary techniques finely tuned to reduce
its demonstrated high false alarm rate. With support tools as RIPPER (a classification
rule learning program) and using a preliminary set of intrusion features, accurate rules
and temporal statistical indexes can be generated to recognize anomalous activity.
They have to be inspected, edited and included in the desired model (frequently
misuse models).
• Association Rules: Associations of system features finding unseen and or unexpected
attribute correlations within data records of a data set, as a basis for behavior profiles.
• Frequent Episode Rules analyze relationships in the data stream to find recurrent and
sequential patterns of simultaneous events, to compute them later. Its results have
been useful for attacks with arbitrary patterns of noise or distributed attacks.
6
Firoz Allahwali
• Clustering discovers complex intrusions that occur over an extended periods of time
and at different places, correlating independent network events. The sets of data
belonging to the cluster (attack or normal activity profile) are modeled according to
pre-defined metrics and their common features. It is especially efficient to detect
hybrids of attack in the cluster, showing high performance but is computationally
expensive.
• Meta-rules derive change rules over a period of time, comparing the status of two
data sets and describing their “evolution” in time based on their common, modified
and new features.
Typically, the best features of these techniques and detection models are combined to
obtain a high detection performance and a complex profile for intruders. Recognized
practices merge models for new activity (attacks or normal events) and the existing models,
to generate adaptive processes able to learn inductively the existing correlations: this Meta-
Learning capability and its adaptability with other techniques have been evaluated
empirically as effective and scalable (Stolfo,1998). The rules reduce substantially the
impractical manual development process of patterns and profiles, computing statistical
patterns from the collected data.
1.2. Intrusion Detection:
The goal of intrusion detection is to detect security violations in information systems.
Intrusion detection is a passive approach to security as it monitors information systems and
7
Firoz Allahwali
raises alarms when security violations are detected. Examples of security violations include
the abuse of privileges or the use of attacks to exploit software or protocol vulnerabilities.
Intrusion detection systems (IDSs) are categorized according to the kind of input information
they analyze into host-based and network-based IDSs. Host-based IDSs analyze host-bound
audit sources such as operating system audit trails, system logs, or application logs. Network-
based IDSs analyze network packets that are captured on a network. Traditionally, intrusion
detection techniques are classified into two broad categories: misuse detection and anomaly
detection (Zhu, 2001).
1.2.1 Misuse Detection
In Misuse Detection each data record is classified and labeled as normal or anomalous
activity. This process is the basis for a learning algorithm able to detect known attacks and
new ones if they are cataloged appropriately under a statistical process. The basic step known
as discovery outliers, matches abnormal behavior against attack patterns knowledge base that
capture behavioral patterns of intrusion and typical activity. To do this, it is needed to
compute each measure with random variables implying more updating effort as more audit
records are analyzed but more accuracy with more mined data. Although the activity needs to
be analyzed individually, complementary visualization and data mining techniques can be
used to improve performance and reduce the computational requirements. Some projects
using this concept JAM (Java Agents for Meta-learning), MADAM ID (Mining Audit Data
for Automated Models for Intrusion Detection) and Automated Discovery and Concise
Predictive Rules for Intrusion Detection.
8
Firoz Allahwali
JAM (developed at Columbia University) uses data mining techniques to discover
patterns of intrusions (Allen, 2000). It then applies a meta-learning classifier to learn the
signature of attacks. The association rules algorithm determines relationships between fields
in the audit trail records, and the frequent episodes algorithm models sequential patterns of
audit events. Features are then extracted from both algorithms and used to compute models
of intrusion behavior. The classifiers build the signature of attacks. So essentially, data
mining in JAM builds a misuse detection model. JAM generates classifiers using a rule
learning program on training data of system usage. After training, resulting classification
rules is used to recognize anomalies and detect known intrusions. The system has been tested
with data from Sendmail-based attacks, and with network attacks using TCP dump data.
MADAM ID uses data mining to develop rules for misuse detection. The motivation is that
current systems require extensive manual effort to develop rules for misuse detection.
MADAM ID applies data mining to audit data to compute models that accurately capture
behavioral patterns of intrusions and normal activities. While MADAM ID performed well in
the 1998 DARPA evaluation of intrusion detection systems (Allen, 2000), it is ineffective in
detecting attacks that have not already been specified.
Researchers at Iowa State University report on Automated Discovery of Concise
Predictive Rules for Intrusion Detection (Allen, 2000). This system performs data mining to
provide global, temporal views of intrusions on a distributed system. The rules detect
intrusions against privileged programs (such as Sendmail) using feature vectors to describe
the system calls executed by each process. A genetic algorithm selects feature subsets to
9
Firoz Allahwali
reduce the number of observed features while maintaining or improving learning accuracy.
This is another example of data mining being used to develop rules for misuse detection.
1.2.2 Anomaly Detection:
Anomaly detection, on the other hand, uses a model of normal user or system behavior
and flags significant deviations from this model as potentially malicious. This model of
normal user or system behavior is commonly known as the user or system profile. Strength of
anomaly detection is its ability to detect previously unknown attacks. The most popular
anomaly detection system using data mining is ADAM (Audit Data Analysis and Meaning).
One of the most significant advantages of ADAM is the ability to detect novel attacks,
without depending on attack training data, through a novel application of the pseudo-Bayes
estimator (Barbara et al., 2001).
ADAM (developed at George Mason University Center for Secure Information
Systems) uses a combination of association rules mining and classification to discover
attacks in TCP dump data. The JAM system also combines association mining and
classification. But there are two significant differences between ADAM and JAM. First,
ADAM builds a repository of normal frequent itemsets that hold during attack-free periods.
It does so by mining data that is known to be free of attacks. Then, ADAM runs a sliding-
window algorithm that finds frequent item sets in the most recent set of TCP connections,
and compares them with those stored in the normal item-set repository, discarding those that
are deemed normal. With the rest, ADAM uses a classifier that has been previously trained to
10
Firoz Allahwali
classify the suspicious connections as a known type of attack, an unknown type, or a false
alarm. The system performs especially well with denial of service and probe attacks.
11
Firoz Allahwali
2. NARRATIVE
The seductive vision of automation is that it can and will solve all problems, making
human involvement unnecessary. This is a mirage in intrusion detection. Human analysts will
always be needed to monitor that the automated system is performing as desired, to identify
new categories of attacks, and to analyze the more sophisticated attacks. Real-time automated
response is very desirable in some intrusion detection contexts. But this puts a large demand
on database performance. The database must be fast enough to record alarms and produce
query results simultaneously. Real time scoring of anomaly or classification models is
possible, but this should not be confused with real-time model building. There is research in
this area (Zhang, 2001), but data mining is not currently capable of learning from large
amounts of real-time, dynamically changing data. It is better suited to batch processing of a
number of collected records in a daily processing regime, rather than an hourly or minute-by-
minute scheme.
In this project, the focus is on implementing an off-line "Anomaly Detection
Intrusion Detection Systems" (AD-IDS) to periodically analyze or audit batches of TCP/IP
network log data. The act of detecting intrusions is, intuitively, a real-time task by necessity.
While offline processing would seem to be solely a compromise between efficiency and
timeliness, it provides for some unique functionality. For instance, periodic batch processing
allows the related results (such as activity from the same source) to be grouped together, and
all of the activity can be ranked in the report by the relative threat levels. Another feature
12
Firoz Allahwali
unique to the offline environment is the ability to transfer logs from remote sites to a central
site for correlation during off-peak times.
Offline processing also allows us to more easily overcome shortcomings in realtime
IDSs. For example, many IDSs will start to drop packets when flooded with data faster than
they can process it. Other forms of denial of service can also be launched against an active
(real-time) IDS system, such as flooding it with fragmented IP packets in order to cause it to
spend an inordinate amount of time and memory attempting to reconstruct bogus traffic.
Meanwhile, the attacker can break into the real target system without fear of detection. The
off-line environment is significantly less vulnerable to such a threat, especially if given a
high degree of assurance that any traffic admitted to the local network is logged (such as by
the firewall responsible for the admittance of such traffic).
In principle an AD-IDS "learns" what constitutes "normal" network traffic, developing
sets of models that are updated over time. These models are then applied against new traffic,
and traffic that doesn't match the model of "normal" is flagged as suspicious. Further focus
will be on looking at how to apply an initial analysis to data collected in future so that
systems administrators can use exception reports to identify suspected intrusions. It is
important to note that no software can monitor all network traffic because the data processing
becomes prohibitive. By including network intrusion detection into the comprehensive
security infrastructure, system administrators can provide the organization with a more
secure computing environment.
13
Firoz Allahwali
2.1 Building Intrusion Detection Models Using Data Mining
The general architecture of the anomaly detector is as follows. The training data enters
the anomaly detector, which once trained, will thereafter be tested on test data. The
system classifies the data as normal or anomalous. This classification is compared with
the key to calculate the performance statistics. Given a training set, which is a set of labeled
sequences, and a test set, which is another set of labeled sequences unseen by the system, it
classifies each of the test sequences as either ‘Normal’ or ‘Anomalous’. For any given
anomaly detector, its performance is measured by calculating its hit ratio, miss ratio and the
false alarm ratio. Figure 1 describes an IDS based on Data Mining.
Figure 1. Data Mining based IDS. The data collected by the IDS is stored in the
alarm warehouse on which the data mining tools are run and anomalous behavior is identified which is in turn fed to the IDS.
14
Firoz Allahwali
2.1.1 Data Collection and Enhancement
TCP/IP traffic data is generated using a data collection application, which is referred
to as a collector. A collector can be a "home grown" application or purchased from one of
many network performance vendors available in the market. When network data is generated
it is stored in files commonly known as network data logs. These data logs can be
represented in a wide variety of formats, i.e. ASCII, binary, RDBMS tables which can pose
as a problem in reporting and analysis. Also pre-processing of the network data
within the logs is usually required so that the data can be standardized. This will ensure that
the data is processed correctly and that there are no discrepancies in the representation of the
data.
Network audit data was collected under a isolated network and on a real network using
Snort 2.4.3. Snort is a cross-platform, lightweight network intrusion detection tool that can be
deployed to monitor small TCP/IP networks and detect a wide variety of suspicious network
traffic as well as outright attacks [Snort]. Snort provides administrators with enough data to
make informed decisions on the proper course of action in the face of suspicious activity.
Snort can also be deployed rapidly to fill the holes of security in a network.
The main distribution site for Snort is http://www.snort.org. Snort is distributed under
the GNU GPL license by the author Martin Roesch. It can perform protocol analysis, content
searching/matching. It can be used to detect a variety of attacks and probes, such as buffer
overflows, stealth port scans, CGI attacks, SMB probes, OS fingerprinting attempts, and
15
Firoz Allahwali
more. Snort uses a flexible rules language to describe traffic that it should collect or pass, and
includes a detection engine utilizing a modular plug-in architecture. Snort has real-time
alerting capability as well, incorporating alerting mechanisms for Syslog, user- specified
files, a UNIX socket, or WinPopup messages to Windows clients using Samba's smbclient.
Snort has three primary uses. It can be used as a straight packet sniffer like tcpdump or as a
packet logger that is useful for network traffic debugging. It can also be used as a full blown
network intrusion detection system.
Snort logs packets in either tcpdump binary format or in Snort's decoded ASCII
format to logging directories that are named based on the IP address of the foreign host.
Plug-ins allow the detection and reporting subsystems to be extended. Available plug-ins
include database logging, small fragment detection, portscan detection, and HTTP URI
normalization. [Snort]
2.1.2 Analysis Tools
To solve the complex issues involved in turning TCP/IP network transactions into data
suitable for mining and exception reporting a combination of products namely tcpreplay and
Bro-IDS were utilized. Tcpreplay is a suite of utilities for *NIX systems for editing and
replaying network traffic which was previously captured by tools like tcpdump and ethereal.
The goal of tcpreplay is to provide the means for providing reliable and repeatable means for
testing a variety of network devices such as switches, router, firewalls, network intrusion
detection and prevention systems (IDS and IPS).
16
Firoz Allahwali
Tcpreplay provides the ability to classify traffic as client or server, edit packets at
layers 2-4 and replay the traffic at arbitrary speeds onto a network for sniffing or through a
device. Some of the advantages of using tcpreplay over using ``exploit code'' are:
• Since tcpreplay emulates the victim and the attacker, a tcpreplay box and the device
under test (DUT) are needed
• Tests can include background traffic of entire networks without the cost and effort of
setting up dozens of hosts or costly emulators
• Uses the open standard pcap file format for which dozens of command line and GUI
utilities exist
• Tests are fully repeatable without a complex test harnesses or network configuration
• Tests can be replayed at arbitrary speeds
• Actively developed and supported by it's author [TCPreplay]
2.1.3 Data Preprocessing
Since the study of Internet traffic requires working with large quantities of data,
selecting an appropriate tool for data analysis is crucial. This project utilizes the Bro
intrusion detection system. Bro is an open-source, Unix-based Network Intrusion Detection
System (NIDS) that passively monitors network traffic and looks for suspicious traffic. Bro
detects intrusions by comparing network traffic against a customizable set of rules describing
events that are deemed troublesome. These rules might describe specific attacks (including
those defined by “signatures”) or unusual activities (e.g., certain hosts connecting to certain
services or patterns of failed connection attempts).
17
Firoz Allahwali
Bro uses a specialized policy language that allows a site to tailor Bro’s operation,
both as site policies evolve and as new attacks are discovered. If Bro detects something of
interest, it can be instructed to either generate a log entry, alert the operator in real-time, or
initiate the execution of an operating system command (e.g., to terminate a connection or
block a malicious host on-the-fly). In addition, Bro’s detailed log files can be particularly
useful for forensics. Bro targets high-speed (Gbps), high-volume intrusion detection. By
using packet-filtering techniques, Bro is able to achieve the necessary performance while
running on commercially available PC hardware, and thus can serve as a cost-effective
means of monitoring a site's Internet connection. Although its primary purpose is for
detection of network intrusions, Bro comes with powerful scripting capabilities which make
analyzing large volumes of data more manageable. [Bro-ids]
2.1.4 Data Mining
Data mining strategies fall into two broad categories: supervised learning and
unsupervised learning. Supervised learning methods are deployed when there exists a field
or variable (target) with known values and about which predictions will be made by using
the values of other fields or variables (inputs). Unsupervised learning methods tend to be
deployed on data for which there does not exist a field or variable (target) with known
values, while fields or variables do exist for other fields or variables (inputs).Unsupervised
learning methods while more frequently used in cases where a target field does not exist, can
be deployed on data for which a target field exists.
18
Firoz Allahwali
WEKA, an open source data mining package developed at the University of Waikato,
New Zealand was used to analyze the preprocessed TCP/IP traffic data, for unauthorized
activities and build data mining models. "WEKA" stands for the Waikato Environment for
Knowledge Analysis, which was developed at the University of Waikato in New Zealand.
WEKA is extensible and has become a collection of machine learning algorithms for solving
real-world data mining problems. It is written in Java and runs on almost every platform.
WEKA is easy to use and to be applied at several different levels.
There are three major implemented schemes in WEKA.
(1) Implemented schemes for classification.
(2) Implemented schemes for numeric prediction.
(3) Implemented "meta-schemes”.
Besides actual learning schemes, WEKA also contains a large variety of tools that can be
used for pre-processing datasets, so that focus can be on the algorithm without considering
too much details as reading the data from files, implementing filtering algorithm and
providing code to evaluate the results. [Ian 2005]
19
Firoz Allahwali
3. SYSTEM DESIGN
Data Mining is a resource consuming computing process. An efficient deployment for
an Intrusion Detection system employing Data Mining requires an adaptive and scalable
architecture and infrastructure able to support the storage for the audited data, its processing,
the model generation and distribution, as well as the interaction with the pre-existing
elements in the organizational security infrastructure. Figure 2 explains the system
architecture. Data collected by Snort is transformed into connection records by Bro-ids which
in turn are used to generate models using the data mining tool WEKA.
Figure 2: System Architecture.Bro IDS converts the TCPDUMP data collected by Snort into Connection records which are used for data mining using WEKA
20
Firoz Allahwali
3.1 Data Collection and Enhancement
This study will use data collected on an isolated network and at the main router
connected to the internet. Data is collected using a Snort sensor which runs on a dual-
processor Dell server, with 2GB of RAM and 140GB of disk space. The operating system on
this machine is Red Hat Enterprise Linux 4. This system has minimal number of packages
installed, sufficient for a usable system with the unessential services turned off. By hardening
the OS and further securing the system, it will be ideal to act as a Snort sensor. Snort ver.
2.4.3 is installed on this system along with Apache, SSL, PHP, MySQL, and BASE
following the instructions provided on the www.snort.org website.
Since a Snort sensor is fundamentally passive i.e. it receives data but does not send
any, it makes sense security wise to run it in a stealthy mode. To achieve this two network
interface card’s are used, one for management and the other for sniffing. The NIC used for
sniffing in stealthy mode is not given any IP address where as the NIC used for management
is provided with an IP address. The NIC with the IP address is connected to a network
different from the sniffing interface for administrative purposes.
Snort can be run in three different modes:
1. Packet Sniffer: Snort is a libcap based packet sniffer which simply reads the packets
off of the network and displays them in a continuous stream on the console.
2. Packet Logger: Packet logger mode logs the packets to the disk. When snort runs in
this mode, it collects every packet it sees and places it in a directory hierarchy based
21
Firoz Allahwali
on the IP address of one of the hosts in the datagram. Once the packets have been
logged to the binary file, the packets can be read out back with any sniffer that
supports the tcpdump binary format such as tcpreplay or ethereal.
3. Network Intrusion Detection system: Running snort in this mode allows it to analyze
network traffic for matches against a user defined rule set and performs several
actions based upon what it sees.
Data for this project is collected on the isolated network for attack free network data
and on the main router for data with attacks by running Snort in Packet Logging mode. The
packets are logged to a single log file in tcpdump format using log_tcpdump module.
Log_tcpdump logs packets in the tcpdump file format. There are a wide assortment of
applications and tools designed to read tcpdump output.
Log_tcpdump has one configuration option.
Format: log_tcpdump [filename]
[filename] is the name of the output file. The [filename] will have the
<month><date>@<time> prepended to it. This is to keep data from separate Snort runs
distinct. [Snort]
3.1.1 Collecting Attack free data on an isolated network
This involves setting up an isolated network, running a traffic generator to generate
traffic and using snort to collect data.
22
Firoz Allahwali
3.1.1.1 Setting up an Isolated Network:
Normal data with out any intrusions and attacks was collected on an isolated simulated
network which was set up solely for this purpose. Figure 3 shows in detail the set up of the
network. This network consisted of five Dell Pentium III machines, with 256 MB RAM and
20 GB of disk space. This network consisted of a Windows 2003 server, three Windows XP
machines and a Linux machine. The Windows 2003 server runs the IIS, FTP, SMTP and
other services. Samba smb client was turned on the Linux machine to communicate with the
Windows network. All the computers were connected to a 3Com® Baseline Dual Speed 16-
Port hub with CAT5 cables. The computer running Snort was connected to this hub in a
stealthy mode. An 8-port KVM switch was utilized to control multiple computers from a
single keyboard, video monitor and mouse as it is shown in Figure 3.
Figure 3: Isolated Network with KVM switch and Hub
23
Firoz Allahwali
3.1.1.2 Traffic Generation:
Network traffic is generated on the isolated network using the open source network
traffic generator “Traffic” developed by Robert Sandilands. Traffic is a network traffic
generator following a server/client model for generating high volumes of traffic on a
network. The server module is run on the Windows 2003 server and the client module is run
on two Windows XP machines. The client let’s the user to choose the protocol, number of
packets and the time interval between the packets. [Robert]
Figure 4:Screen Shot of Traffic.Client showing the number of connections and the protocol
24
Firoz Allahwali
3.1.1.3 Data Collection on the Isolated Network:
Data was collected on the isolated network over a period of one week by running the
Network traffic generator “Traffic”. The Traffic client was installed on two Windows XP
machines. Snort was used in packet logging mode to collect the data. The data passing
through the 3Com hub is captured by Snort. Sample data collected by Snort in tcpdump
format is shown below:
Version 2.4.3 (Build 99) By Martin Roesch ( www.snort.org) 07/23-00:07:00.426313 192.168.1.21:21 -> 192.168.1.20:1238 TCP TTL:64 TOS:0x0 ID:6666 IpLen:20 DgmLen:123 DF ***AP*** Seq: 0xD1DA08CC Ack: 0xA12187 Win: 0x7D78 TcpLen: 32 TCP Options (3) => NOP NOP TS: 160418056 34614576 220 ProFTPD 1.2.0pre10 Server (Red Hat) [snort.delmar.edu].. 02:40:27.881867 192.168.238.1.1540 > 192.168.238.5.www: P 1:485(484) ack 1 win 6 4240 (DF) 0x0000 4500 020c 5af0 4000 8006 40a3 c0a8 ee01 E...Z.@...@..... 0x0010 c0a8 ee05 0604 0050 6a19 984a 87b4 aae9 .......Pj..J.... 0x0020 5018 faf0 59b7 0000 4745 5420 2f6f 7267 P...Y...GET./firoz 0x0030 616e 2d65 6e68 616e 6365 6d65 6e74 2e68 pictures……………h 0x0040 746d 6c20 4854 5450 2f31 2e31 0d0a 486f tml.HTTP/1.1..Ho 0x0050 7374 3a20 3139 322e 3136 382e 3233 382e st:.192.168.238. 0x0060 350d 0a55 7365 722d 4167 656e 743a 204d 5..User-Agent:.M 0x0070 6f7a 696c 6c61 2f35 2e30 2028 5769 6e64 ozilla/5.0.(Wind =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
3.1.2 Data Collection on the Router (with attacks):
The internal network of the organization where the data was collected is connected to the
internet through a single high speed internet connection. Snort was deployed in such a way
that it can monitor all the traffic coming (and leaving) an otherwise isolated network. One
way to accomplish this is to deploy a passive tap with minimal effect on network operations.
25
Firoz Allahwali
The sensor is deployed on this tap between the external firewall and the internal network
such that it can monitor all the traffic that enters (and departs) over that connection. This
allows the sensor to examine all of the data associated with the external link so that it can be
effectively used to monitor for incoming (and outgoing) attacks. The snort machine is located
at the main router on campus, which is connected to the Internet by a 100Mb/s full-duplex
Ethernet link. Figure 5 shows the location of the Network tap behind the firewall on the main
router connected to the internet.
Figure 5: Data Collection at a Network Tap on a Router connected to Internet
Data was collected on the network tap over a period of one week by running Snort in a
stealthy packet logging mode. Sample data collected by Snort in tcpdump format is provided
in Appendix D.
26
Firoz Allahwali
3.2 Data preprocessing and analysis:
The goal of the analysis is to create descriptive information from the raw TCPDUMP
files, then to mine the data in order to determine likely intrusive TCP/IP connections. The
raw data consists of packet level transmission data including source and destination IP
address and ports; flags, acknowledgements and packet sequence numbers; and window,
buffer and optional information.
In the audit data no single record represents a complete conversation between two IP
addresses. In fact, each record is only a portion of the conversation: the source sending
information to the destination, or vice versa. In order to create useful inputs for data mining,
we must first determine which records are parts of the same conversation. After the
conversation reconciliation, we can compute meaningful variables such as the number of
connections made to a one or more destination IP addresses within a two-second time
window. For data mining then, the data preparation goals is to establish the final state of a
conversation; then to understand the behavior of each source IP address on the network in
relation to destination IP addresses and destination ports. Towards this goal, we summarize
the final states, number of destination IP addresses, and time differences to destination
address fields by the source IP address. This provides, for each source IP address, counts of
the number of times each final state occurred, to how many IP addresses were connections
made, and counts of the time difference groupings. We also summarize the destination port
types and time differences to destination ports by the combined source and destination IP
addresses. This provides, for each unique IP source to destination IP connection, the number
27
Firoz Allahwali
of times each specific action was attempted, and the number of times port hits occurred
within the specified intervals.
To get information on what action the user was attempting, we map the destination
ports to specific user actions. The destination port determines the function the user is trying
to access, while the source port is assigned randomly. Most network administrators use a
common set of port mappings. By transforming port numbers into action types, we can
determine basic information about what a user was attempting. For this purpose we create a
high level groupings of the port types as login, email, system status check, SNMP, date, who,
chat, and other. This creates the destination port type filed.
In order to consider automated versus manual input stream, we have to determine the
amount of time elapsed between connections to different destination IP addresses from a
single IP source address, and to different destination ports on a single destination IP address
from a single IP Source address. We then create indicators for elapsed time within specific
ranges. For example we indicate if the elapsed time was less than 5 seconds, between 5 and
30 seconds, greater than 30 seconds, or undeterminable. This creates the time difference to
destination address and time difference to destination port fields.
Data captured by snort is preprocessed utilizing the excellent scripting abilities of Bro-
ids. The special scripting capabilities of Bro-ids make analyzing large volumes of data more
easy and manageable. Bro-ids was installed on a dual-processor Dell server, with 1GB of
28
Firoz Allahwali
RAM and 120GB of disk space. The operating system on this machine is Red Hat Enterprise
Linux.
Tcpreplay, installed on the Snort sensor, was used to replay the data collected by Snort.
A network was set up using the 3Com hub with just the Snort sensor and the Bro-ids
machine. The network packets broadcasted by tcpreplay on to the hub were captured by the
Bro-ids and transformed into the Network connection data suitable for data mining.
Bro is an intrusion detection system that works by passively watching traffic seen on
a network link. It is built around an event engine that pieces network packets into events that
reflect different types of activity. Some events are quite low-level, such as the monitor seeing
a connection attempt; some are specific to a particular network protocol, such as an FTP
request or reply; and some reflect fairly high-level notions, such as a user having successfully
authenticated during a login session.
Bro runs the events produced by the event engine through a policy script supplied to it
by the administrator. Bro scripts are made up of event handlers that specify what to do
whenever a given event occurs. Event handlers can maintain and update global state
information, write arbitrary information to disk files, generate new events, call functions
(either user-defined or predefined), generate alerts that produce syslog messages, and invoke
arbitrary shell commands. [Bro-ids]
29
Firoz Allahwali
Figure 6: Bro-ids Architecture, describing the data flow through the Event engine,
Policy Script detector and the Scan detector
3.2.1 Conn analyzer:
Bro performs the generic connection analysis like the connection start time, duration,
sizes, hosts etc. using the conn analyzer. The Connection record data type associated with the
conn analyzer keeps track of the state associated with each connection in the form of one line
connection summaries. A connection is defined by an initial packet that attempts to set up a
session and all subsequent packets that take part in the session. Initial packets that fail to set
up a session are also recorded as connections and are tagged with a failure state that
designates the reason for failure. Each entry contains the following data describing the
connection: date/time, the duration of the connection, the local and remote ip addresses and
ports, bytes transferred in each direction, the transport protocol (udp, tcp), the final state of
the connection, and other information describing the connection. The conn analyzer also has
a list of callable connection functions that help in the generic connection analysis.
The data was extracted from the tcpdump data by making changes to the conn analyzer. The
feature list employed was obtained from Darpa feature set.
30
Firoz Allahwali
type conn_id: record { orig_h: addr; # Address of originating host. orig_p: port; # Port used by originator. resp_h: addr; # Address of responding host. resp_p: port; # Port used by responder. }; type endpoint: record { size: count; # Bytes sent by this endpoint so far. state: count; # The endpoint's current state. };
type connection: record { duration; protocol_type; service; flag; src_bytes; dst_bytes;
land; wrong_fragment; urgent; hot; num_failed_logins; logged_in; num_compromised; root_shell; su_attempted; num_root; num_file_creations; num_shells; num_access_files; num_outbound_cmds; is_host_login; is_guest_login; count; srv_count; serror_rate; srv_serror_rate; rerror_rate; srv_rerror_rate; same_srv_rate; diff_srv_rate; srv_diff_host_rate; dst_host_count; dst_host_srv_count; dst_host_same_srv_rate; dst_host_diff_srv_rate; dst_host_same_src_port_rate; dst_host_srv_diff_host_rate; dst_host_serror_rate; dst_host_srv_serror_rate; dst_host_rerror_rate;
31
Firoz Allahwali
dst_host_srv_rerror_rate; conn_class_type;
};
The main output of conn analyser is a one-line ASCII summary of each connection. These
summaries are written to a file with the name conn.tag.log, where tag uniquely identifies the
Bro session generating the logs. [Bro-ids]
The output in the conn.tag.log looks something like this:
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.01,0.00,0.00,0.00,0.00,0.00,snmpgetattack. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,0.01,0.00,0.00,0.00,0.00,0.00,snmpgetattack. 0,udp,domain_u,SF,29,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0.00,0.00,0.00,0.00,0.50,1.00,0.00,10,3,0.30,0.30,0.30,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,253,0.99,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack. 0,tcp,http,SF,223,185,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,4,4,0.00,0.00,0.00,0.00,1.00,0.00,0.00,71,255,1.00,0.00,0.01,0.01,0.00,0.00,0.00,0.00,normal.
3.3 Mining Data for Complex Relationships
Data mining is the most important process of the knowledge acquisition. There are
many methods of data mining, such as statistical methods, cluster analysis, pattern
32
Firoz Allahwali
recognition, decision tree, association rule, artificial neural network, genetic algorithm, rough
set theory, visualization technology, and so on. There is no single data mining technique that
is appropriate for all data mining problems. Real data sets vary and different experiments
have to be carried out to arrive at a suitable technique for a particular data mining problem.
For this study, the three data mining techniques of Classification, Clustering and Association
were looked into to study their suitability for processing Intrusion Detection data.
Weka an open source data mining tool set developed at the University of Waikato,
New Zealand was used for this project. Weka was written in java and can be run on any
platform and is distributed under GNU general public license. The work bench includes
methods for all standard data mining problems: regression, classification, clustering,
association rule mining and attribute selection. It has tools for preprocessing a dataset feed it
into a learning scheme and analyze the resulting classifier and its performance [Ian 2005] .
Weka can be used in three different ways as shown in Figure 7: Explorer,
Knowledge flow interface and the Experimenter. Explorer provides a graphical user interface
which provides access to all the features using menu selection and form filling. The
Knowledge flow interface allows designing configurations for data processing by connecting
components representing data sources, preprocessing tools, learning algorithms, evaluation
methods and visualization modules. Weka’s third interface the Experimenter provides an
environment to automate the process of comparing a variety of learning techniques.
Weka was installed on a Dell server, with 1GB of RAM and 80GB of disk space. The
operating system on this machine is Red Hat Enterprise Linux.
33
Firoz Allahwali
Figure 7: Screen shot of Weka GUI showing the Explorer
Figure 8: Screen shot of Weka Explorer showing data loading and preprocessing
34
Firoz Allahwali
Weka’s native data storage method is ARFF format. An ARFF (Attribute-Relation File
Format) file is an ASCII text file that describes a list of instances sharing a set of attributes.
Following is the ARFF file that was written for our experiment.
@RELATION ids @ATTRIBUTE duration numeric @ATTRIBUTE protocol_type {udp,tcp,icmp} @ATTRIBUTE service {private,ecr_i,ntp_u,X11,IRC,courier,ctf,ssh,time,ldap,supdup,Z39_50,discard,bgp,kshell,urp_i,uucp,uucp_path,netstat,nnsp,http_443,iso_tsap,http,pop_3,other,domain_u,exec,telnet,ftp,sql_net,ftp_data,smtp,sunrpc,shell,vmnet,netbios_dgm,efs,finger,rje,mtp,imap4,link,auth,domain,nntp,daytime,gopher,echo,printer,whois,klogin,ecr,netbios_ssn,remote_job,hostnames,eco_i,login,pop_2,netbios_ns} @ATTRIBUTE flag {REJ,SF,S1,S0,RSTO,S3,S2,RSTR,SH} @ATTRIBUTE src_bytes numeric @ATTRIBUTE dst_bytes numeric @ATTRIBUTE land numeric @ATTRIBUTE wrong_fragment numeric @ATTRIBUTE urgent numeric @ATTRIBUTE hot numeric @ATTRIBUTE num_failed_logins numeric @ATTRIBUTE logged_in numeric @ATTRIBUTE num_compromised numeric @ATTRIBUTE root_shell numeric @ATTRIBUTE su_attempted numeric @ATTRIBUTE num_root numeric @ATTRIBUTE num_file_creations numeric @ATTRIBUTE num_shells numeric @ATTRIBUTE num_access_files numeric @ATTRIBUTE num_outbound_cmds numeric @ATTRIBUTE is_host_login numeric @ATTRIBUTE is_guest_login numeric @ATTRIBUTE count numeric @ATTRIBUTE srv_count numeric @ATTRIBUTE serror_rate numeric @ATTRIBUTE srv_serror_rate numeric @ATTRIBUTE rerror_rate numeric @ATTRIBUTE srv_rerror_rate numeric @ATTRIBUTE same_srv_rate numeric @ATTRIBUTE diff_srv_rate numeric @ATTRIBUTE srv_diff_host_rate numeric @ATTRIBUTE dst_host_count numeric @ATTRIBUTE dst_host_srv_count numeric
35
Firoz Allahwali
@ATTRIBUTE dst_host_same_srv_rate numeric @ATTRIBUTE dst_host_diff_srv_rate numeric @ATTRIBUTE dst_host_same_src_port_rate numeric @ATTRIBUTE dst_host_srv_diff_host_rate numeric @ATTRIBUTE dst_host_serror_rate numeric @ATTRIBUTE dst_host_srv_serror_rate numeric @ATTRIBUTE dst_host_rerror_rate numeric @ATTRIBUTE dst_host_srv_rerror_rate numeric @ATTRIBUTE class {'back.','snmpguess.','apache2.','httptunnel.','mscan.','saint.','snmpgetattack.','processtable.','mailbomb.','buffer_overflow.','ftp_write.','guess_passwd.','imap.','ipsweep.','land.','loadmodule.','multihop.','neptune.','nmap.','normal.','perl.','phf.','pod.','portsweep.','rootkit.','satan.','smurf.','spy.','teardrop.','warezclient.','warezmaster.'} @DATA 0,tcp,private,REJ,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,204,5,0.00,0.00,1.00,1.00,0.02,0.06,0.00,255,5,0.02,0.07,0.00,0.00,0.00,0.00,1.00,1.00,neptune. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,178,178,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,tcp,http,SF,208,15127,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,2,0.00,0.00,0.00,0.00,1.00,0.00,1.00,1,255,1.00,0.00,1.00,0.03,0.00,0.00,0.00,0.00,normal. 0,icmp,ecr_i,SF,520,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,tcp,http,SF,170,2003,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,2,255,1.00,0.00,0.50,0.04,0.00,0.00,0.00,0.00,normal. 0,icmp,ecr_i,SF,520,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,510,510,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,510,510,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,520,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf. 0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,255,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,smurf.
36
Firoz Allahwali
3.3.1 Classification:
Classification involves data which is divided into two or more groups, or classes to predict
the category of categorical data by building a model based on some predictor variables.
Classification learning is sometimes called Supervised because this method operates under
supervision by being provided with the actual outcome for each of the training examples.
This outcome is called the class of the example. The Classification algorithm is inductively
learned to construct a model from the preclassified data set. Each data item is defined by
values of the attributes. Classification may be viewed as mapping from a set of attributes to a
particular class. The Decision tree classifies the given data item using the values of its
attributes. The decision tree is initially constructed from a set of pre-classified data. The main
approach is to select the attributes, which best divides the data items into their classes.
According to the values of these attributes the data items are partitioned. This process is
recursively applied to each partitioned subset of the data items. The process terminates when
all the data items in current subset belongs to the same class. A node of a decision tree
specifies an attribute by which the data is to be partitioned. Each node has a number of edges,
which are labeled according to a possible value of the attribute in the parent node. An edge
connects either two nodes or a node and a leaf. Leaves are labeled with a decision value for
categorization of the data. The success of classification learning can be judged by trying out
the concept description that is learned on an independent set of test data. [Sandhya]
WEKA has implementations of numerous classification and prediction algorithms.
The basic ideas behind using all of these are similar. The dataset in this experiment is
37
Firoz Allahwali
classified by using J4.8 algorithm which is Weka’s implementation of C4.5 decision tree
learner. Since J4.8 algorithm can handle numeric attributes, there is no need to discretize any
of the attributes.
The experiment was carried out using the normal data as the training data and a
subset of attack data as the test data initially and subsequently using attack data as both
training and test data. Weka cannot handle large datasets since it loads everything into
memory before it processes the data. To overcome this shortcoming, 5900 records were
generated randomly from the normal data and two sets of 6600 and 3400 records were
generated randomly from the attack data. The subset of attack data with 3400 records was
used as test data in both the cases. The results of using J4.8 decision tree on the data are
listed below.
Normal data as training data and attack data as test data:
Here the normal data (attack-free subset) was used as training data and attack data (3400
records) was used as testing data as shown in Figure 9.
38
Firoz Allahwali
Figure 9. Screen shot of Weka - Classification using Normal data as Trainer
=== Evaluation on test set === === Summary === Correctly Classified Instances 622 18.2941 % Incorrectly Classified Instances 2778 81.7059 % Kappa statistic 0 Mean absolute error 0.0527 Root mean squared error 0.2296 Relative absolute error 99.9038 % Root relative squared error 100.2613 % Total Number of Instances 3400
39
Firoz Allahwali
Attack Data as both training and testing data:
In this set up the subset of attack data with 6600 records was used as training data and the
subset with 3400 records was used as testing data.
Figure 10. Screen Shot of Weka - Classification using Attack data as trainer
=== Evaluation on test set === === Summary === Correctly Classified Instances 3298 97 % Incorrectly Classified Instances 102 3 % Kappa statistic 0.9533 Mean absolute error 0.0022 Root mean squared error 0.0364 Relative absolute error 5.1806 % Root relative squared error 25.2454 % Total Number of Instances 3400
40
Firoz Allahwali
For definitions of results see Appendix A. For detailed results see Appendix B Results Analysis: By comparing the percentage of correctly classified instances it is clear that using
attack data (97%) as the training set is way more efficient than using the attack free data
(18.2%) as the training set.
Training
Time Testing Time Accuracy
Normal Data as Trainer
0.07 sec 0.01sec 18.2%
Attack Data as Trainer
4.08 sec 3.98sec 97%
3.3.2 Clustering: Cluster analysis can be defined as "a wide variety of procedures that can be used to
create a classification. These procedures empirically form "clusters" or groups of highly
similar entities." In other words it can be said that cluster analysis defines groups of cases,
through a number of procedures, which are more similar between them than all the others.
Clustering algorithms segment the data into groups of records, or clusters that have similar
characteristics. [Data Mining]
Clustering discovers complex intrusions that occur over an extended periods of
time and at different places, correlating independent network events. The sets of data
belonging to the cluster (attack or normal activity profile) are modeled according to pre-
41
Firoz Allahwali
defined metrics and their common features. Clustering is shown to provide high performance
but is supposedly computationally expensive.
Weka’s SimpleKMeans algorithm was used for clustering the data. The WEKA
SimpleKMeans algorithm uses Euclidean distance measure to compute distances between
instances and clusters. SimpleKMeans clusters data using k-means; the number of clusters is
specified by a parameter. The results of using the algorithm on the experimental data are
shown below:
Normal data as training data and attack data as test data:
Here the normal data (attack-free subset) was used as training data and attack data (3400
records) was used as testing data.
Figure 11. Screen shot of Weka - Clustering using Normal data as trainer
42
Firoz Allahwali
=== Model and evaluation on test set === kMeans ====== Number of iterations: 6 Within cluster sum of squared errors: 4708.820136735875 Cluster centroids: Cluster 0 Mean/Mode: 13.9458 tcp http SF 669.8249 4500.8885 0 0 0 0.0265 0 0.944 0.0015 0.0009 0 0.0027 0.0195 0 0.0015 0 0 0.004 6.968 10.1401 0.002 0.0012 0.0006 0.0006 0.9985 0.0026 0.1776 49.6375 212.0548 0.9271 0.0158 0.1123 0.0329 0.0012 0.0022 0.0033 0.0045 normal. Std Devs: 361.8696 N/A N/A N/A 5292.9004 14974.2892 0 0 0 0.4808 0 0.23 0.0462 0.0302 0 0.1571 0.7411 0 0.039 0 0 0.0628 7.8685 10.7181 0.0379 0.0238 0.0247 0.0247 0.0291 0.0503 0.3045 42.7964 77.5284 0.1933 0.0557 0.2141 0.0379 0.0109 0.0379 0.0307 0.0434 N/A Cluster 1 Mean/Mode: 11.4195 tcp http SF 588.2563 2679.55 0 0 0 0.0061 0 0.7162 0.0053 0 0.0004 0.0046 0.0092 0 0.0072 0 0 0.0004 8.8776 11.106 0.0018 0.0018 0.0004 0.0004 0.9959 0.0072 0.1126 242.0809 234.0984 0.9249 0.0097 0.0101 0.0031 0.0009 0.0007 0.015 0.0014 normal. Std Devs: 365.0786 N/A N/A N/A 5095.1714 9146.5406 0 0 0 0.161 0 0.4509 0.1933 0 0.0195 0.1657 0.3314 0 0.0892 0 0 0.0195 9.6922 11.7241 0.0308 0.0307 0.0195 0.0195 0.0433 0.0752 0.221 29.7548 58.5857 0.219 0.0456 0.0514 0.0129 0.0175 0.0165 0.0958 0.0272 N/A Clustered Instances 0 291 ( 9%) 1 3109 ( 91%)
43
Firoz Allahwali
Figure 12. Clustering Visualization – Normal data as trainer Attack Data as both training and testing data:
In this set up the subset of attack data with 6600 records was used as training data and the
subset with 3400 records was used as testing data.
44
Firoz Allahwali
Figure 13. Screen shot of Weka - Clustering using Attack data as trainer
=== Model and evaluation on test set === kMeans ====== Number of iterations: 6 Within cluster sum of squared errors: 13422.458547689479 Cluster centroids: Cluster 0 Mean/Mode: 17.9759 tcp private REJ 278.2244 16.492 0 0 0 0.0161 0.0015 0.0124 0.0015 0.0007 0 0.0066 0.0007 0 0 0 0 0.0037 177.9196 10.242 0.2879 0.2869 0.6942 0.6945 0.1013 0.1117 0.0122 252.0892 14.0819 0.0552 0.1073 0.0128 0.0004 0.2825 0.283 0.6944 0.6941 neptune. Std Devs: 319.4349 N/A N/A N/A 4068.0853 424.4576 0 0 0 0.3935 0.0382 0.1108 0.0541 0.027 0 0.2433 0.027 0 0 0 0 0.0604 89.6316 10.5083 0.4476 0.4504 0.4544 0.4573 0.1852 0.2014 0.1049 22.1166 28.7856 0.1133 0.1874 0.1077 0.0033 0.4429 0.4469 0.4484 0.4555 N/A Cluster 1 Mean/Mode: 21.6137 icmp ecr_i SF 1706.66 919.7385 0 0.0013 0 0.0159 0.0025 0.2108 0.0048 0 0 0 0.0004 0 0.0011 0 0 0.0027 291.5721 292.0342 0.0004 0.0004 0.0009 0.0009 0.9983 0.0026 0.0295 231.5205 246.8259 0.9836 0.0048 0.6899 0.006 0.0018 0.0012 0.0022 0.0007 smurf.
45
Firoz Allahwali
Std Devs: 382.0624 N/A N/A N/A 14244.6981 13263.7434 0 0.0366 0 0.1924 0.0498 0.4079 0.069 0 0 0 0.0195 0 0.0391 0 0 0.0517 235.6648 235.0086 0.0151 0.0148 0.0253 0.0259 0.034 0.0471 0.1327 66.2879 37.3966 0.096 0.0397 0.455 0.0424 0.0318 0.0232 0.0244 0.0129 N/A Clustered Instances 0 678 ( 20%) 1 2722 ( 80%)
Figure 14. Clustering Visualization using Attack data as trainer
Results Analysis:
Comparing the percentage of the number of clustered instances it is clearly evident that using
attack data (20%) for training is more efficient than the normal data (9%). For detailed
results see Appendix C.
46
Firoz Allahwali
3.3.3 Association-rule learners:
An association rule identifies a combination of attribute values or items that occur
together with greater frequency than might be expected if the values or items were
independent of one-another. Association rules like classification rules attempts to predict the
class given a set of conditions on the LHS. The conditions are typically attribute value pairs,
also referred to as item-sets. However, the main difference is that the prediction associated
with the RHS of the rule is not confined to a single class attribute, instead it can be associated
with one or more attribute combinations.
Weka’s Apriori implements the Apriori algorithm. It starts with a minimum
support of 100% of the data items and decreases this in steps of 5% until there are atleast 10
rules with the required minimum confidence of 0.9 or until the support has reached a lower
bound of 10% which ever occurs first. There are four metrics for ranking rules: Confidence,
Lift, Leverage and Conviction.
The results of running the Apriori algorithm on the dataset containing 6400 records of the
attack data are listed below:
=== Run information === Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 Relation: ids Instances: 6600 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment
47
Firoz Allahwali
urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class
48
Firoz Allahwali
4. TESTING AND VALIDATION
Evaluating detection systems is a difficult undertaking, complicated by several
common practices. For example, most evaluations are done according to a black-box testing
régime. While black-box testing can demonstrate the overall performance capabilities of a
detection system, it reveals almost nothing about the performance of components inside the
black box, such as how phenomena affecting the components (e.g., a feature extractor or an
anomaly detector) or the interactions among them will influence detection performance. If
the performance aspects of components like anomaly detectors are not fully understood, then
the performance aspects of any system composed of such elements cannot be understood
either.
Testing was carried out by using different attacks on the system trained with real time
data using J48, Weka’s version of decision tree algorithm C4.5. The results are listed are
described below.
Using one type of Attack (Mailbomb):
== Evaluation on test set === == Summary === Correctly Classified Instances 41 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances 41
49
Firoz Allahwali
Using two types of attacks (Mailbomb and Neptune):
=== Evaluation on test set === === Summary === Correctly Classified Instances 160 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances 160 Using three types of attacks (Mailbomb, Neptune and Smurf): === Evaluation on test set === === Summary === Correctly Classified Instances 236 99.5781 % Incorrectly Classified Instances 1 0.4219 % Kappa statistic 0.9931 Mean absolute error 0.0003 Root mean squared error 0.0165 Relative absolute error 0.5767 % Root relative squared error 10.179 % Total Number of Instances 237 Summary of the testing:
Test Time taken Accuracy (%) 1. Using one type of attack 1.8 100
2. Using two types of attacks 1.59 100
3. Using three types of attacks
1.61 99.57
For complete results see Appendix E.
50
Firoz Allahwali
4.1 System Performance Metrics
Listed below is a partial set of measurements that can be made on IDSs. These measurements
are quantitative and relate to performance accuracy (Lippman, 1999).
• Coverage. This measurement determines which attacks an IDS can detect under ideal
conditions. For non-signature-based systems like anomaly based IDS, one would need
to determine which attacks out of the set of all known attacks could be detected by a
particular methodology. The number of dimensions that make up each attack makes
this measurement difficult.
• Probability of False Alarms. This measurement determines the rate of false
positives produced by an IDS in a given environment during a particular time frame.
A false positive or false alarm is an alert caused by normal non-malicious background
traffic. Some causes for Network IDS (NIDS) include weak signatures that alert on all
traffic to a high-numbered port used by a backdoor; search for the occurrence of a
common word such as help in the first 100 bytes of SNMP or other TCP connections;
or detection of common violations of the TCP protocol. They can also be caused by
normal network monitoring and maintenance traffic generated by network
management tools. It is difficult to measure false alarms because an IDS may have a
different false positive rate in each network environment, and there is no such thing
as a standard network.
51
Firoz Allahwali
• Probability of Detection. This measurement determines the rate of attacks detected
correctly by an IDS in a given environment during a particular time frame. The
difficulty in measuring the detection rate is that the success of an IDS is largely
dependent upon the set of attacks used during the test. Also, the probability of
detection varies with the false positive rate, and an IDS can be configured or tuned to
favor either the ability to detect attacks or to minimize false positives. One must be
careful to use the same configuration during testing for false positives and hit rates.
• Resistance to Attacks Directed at the IDS. This measurement demonstrates how
resistant an IDS is to an attacker's attempt to disrupt the correct operation of the IDS.
One example is sending a large amount of non-attack traffic with volume exceeding
the processing capability of the IDS. With too much traffic to process, an IDS may
drop packets and be unable to detect attacks. Another example is sending to the IDS
non-attack packets that are specially crafted to trigger many signatures within the
IDS, thereby overwhelming the human operator of the IDS with false positives or
crashing alert processing or display tools.
• Ability to Handle High Bandwidth Traffic. This measurement demonstrates how
well an IDS will function when presented with a large volume of traffic. Most
network-based IDSs will begin to drop packets as the traffic volume increases,
thereby causing the IDS to miss a percentage of the attacks. At a certain threshold,
most IDSs will stop detecting any attacks.
52
Firoz Allahwali
• Ability to Correlate Events. This measurement demonstrates how well an IDS
correlates attack events. These events may be gathered from IDSs, routers, firewalls,
application logs, or a wide variety of other devices. One of the primary goals of this
correlation is to identify staged penetration attacks. Currently, IDSs have only limited
capabilities in this area.
• Ability to Detect Never-Before-Seen Attacks. This measurement demonstrates how
well an IDS can detect attacks that have not occurred before.
• Ability to Identify an Attack. This measurement demonstrates how well an IDS can
identify the attack that it has detected by labeling each attack with a common name or
vulnerability name or by assigning the attack to a category.
• Ability to Determine Attack Success. This measurement demonstrates if the IDS
can determine the success of attacks from remote sites that give the attacker higher-
level privileges on the attacked system. In current network environments, many
remote privilege-gaining attacks (or probes) fail and do not damage the system
attacked. Many IDSs, however, do not distinguish the failed from the successful
attacks.
• Other Measurements. There are other measurements, such as ease of use, ease of
maintenance, deployments issues, resource requirements, availability and quality of
support, etc. These measurements are not directly related to the IDS performance but
may be more significant in many commercial situations.
53
Firoz Allahwali
Once the mining analysis is producing meaningful results, a strategy to automate the
scoring of new log files should be developed. Simple exception reports can then alert system
administrators to new IP addresses and ports involved in potentially intrusive activity.
54
Firoz Allahwali
5. CONCLUSION
The integration of data mining in Intrusion Detection is in an interesting phase of
development where data mining has made it possible to obtain performance improvements
over current commercial products, with special features as:
• Better feature extraction and meaningful information from large amount of raw data,
getting more accurate models from applying DM techniques and correlation
algorithms. These practices reduce the analysis overload to human operators and the
associated issues.
• Potential for predictive analysis of suspicious activity, enabling defensive action
before a severe injury occurs or the system is totally compromised.
• The data mining adaptability for specific environments allows recognizing individual
trends more efficiently making appropriate statistical recognition of new or hidden
malicious activity.
The complexity involved in setting up this kind of system provides new challenges for
implementing scalable and incremental deployments. The goal is to ensure accuracy with
efficient time and resource investments specifically improving data selection, data
preparation, data quality (from all the information sources), resulting in precision and
reduction in the associated computing costs.
55
Firoz Allahwali
Appendix A
Some of the terms that we come across in the results are defined below.
Accuracy
Itgives a measure for the overall accuracy of the classifier:
Number of correctly classified instances Accuracy = ------------------------------------------------- Number of instances Precision Number of correctly classified instances of class X Precision(X) = ------------------------------------------------------------------ Number of instances classified as belonging to class X
Number of correctly classified instances of class X Recall(X) = ------------------------------------------------------------------- Number of instances in class X
Confusion matrix:
Confusion matrices are very useful for evaluating classifiers, as they provide
an efficient snapshot of its performance—displaying the distribution of correct
and incorrect instances. Typical Weka output contains the following:
=== Confusion Matrix ===
a b <-- classified as
7 2 | a = yes
3 2 | b = no
Weka was trying to classify instances into two possible classes: yes or no.
56
Firoz Allahwali
For the sake of simplicity, Weka substitutes ‘yes’ for a, and ‘no’ for b. The
columns represent the instances that were classified as that class. So, the first
column shows that in total 10 instances were classified a by Weka, and 4 were
classified as b. The rows represent the actual instances that belong to that
class.
57
Firoz Allahwali
Appendix B Classification Results: Attack free data as Trainer: === Run information === Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: ids Instances: 5905 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class Test mode: user supplied test set: 3400 instances
58
Firoz Allahwali
=== Classifier model (full training set) === J48 pruned tree ------------------ : normal. (5905.0) Number of Leaves : 1 Size of the tree : 1 Time taken to build model: 0.01 seconds === Evaluation on test set === === Summary === Correctly Classified Instances 622 18.2941 % Incorrectly Classified Instances 2778 81.7059 % Kappa statistic 0 Mean absolute error 0.0527 Root mean squared error 0.2296 Relative absolute error 99.9038 % Root relative squared error 100.2613 % Total Number of Instances 3400 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0 0 0 0 0 back. 0 0 0 0 0 snmpguess. 0 0 0 0 0 apache2. 0 0 0 0 0 httptunnel. 0 0 0 0 0 mscan. 0 0 0 0 0 saint. 0 0 0 0 0 snmpgetattack. 0 0 0 0 0 processtable. 0 0 0 0 0 mailbomb. 0 0 0 0 0 buffer_overflow. 0 0 0 0 0 ftp_write. 0 0 0 0 0 guess_passwd. 0 0 0 0 0 imap. 0 0 0 0 0 ipsweep. 0 0 0 0 0 land. 0 0 0 0 0 loadmodule. 0 0 0 0 0 multihop. 0 0 0 0 0 neptune. 0 0 0 0 0 nmap. 1 1 0.183 1 0.309 normal. 0 0 0 0 0 perl. 0 0 0 0 0 phf. 0 0 0 0 0 pod. 0 0 0 0 0 portsweep. 0 0 0 0 0 rootkit. 0 0 0 0 0 satan.
59
Firoz Allahwali
0 0 0 0 0 smurf. 0 0 0 0 0 spy. 0 0 0 0 0 teardrop. 0 0 0 0 0 warezclient. 0 0 0 0 0 warezmaster. === Confusion Matrix === a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae <-- classified as 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 0 0 0 0 0 0 0 0 0 0 0 | a = back. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26 0 0 0 0 0 0 0 0 0 0 0 | b = snmpguess. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 0 0 0 0 0 0 0 0 0 | c = apache2. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | d = httptunnel. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 | e = mscan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 | f = saint. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 79 0 0 0 0 0 0 0 0 0 0 0 | g = snmpgetattack. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 0 0 0 0 0 0 0 0 0 0 | h = processtable. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 59 0 0 0 0 0 0 0 0 0 0 0 | i = mailbomb. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | j = buffer_overflow. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | k = ftp_write. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 59 0 0 0 0 0 0 0 0 0 0 0 | l = guess_passwd. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | m = imap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 | n = ipsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | o = land. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | p = loadmodule. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | q = multihop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 610 0 0 0 0 0 0 0 0 0 0 0 | r = neptune. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | s = nmap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 622 0 0 0 0 0 0 0 0 0 0 0 | t = normal. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | u = perl. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | v = phf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 | w = pod. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 | x = portsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | y = rootkit. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 27 0 0 0 0 0 0 0 0 0 0 0 | z = satan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1827 0 0 0 0 0 0 0 0 0 0 0 | aa = smurf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ab = spy. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ac = teardrop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ad = warezclient. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 0 0 0 0 0 0 0 0 0 0 0 | ae = warezmaster.
Attack data as Trainer:
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: ids Instances: 6600 Attributes: 42 duration protocol_type service flag src_bytes
60
Firoz Allahwali
dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class Test mode: user supplied test set: 3400 instances === Classifier model (full training set) === J48 pruned tree ------------------ same_srv_rate <= 0.21 | diff_srv_rate <= 0.23: neptune. (1244.0) | diff_srv_rate > 0.23 | | dst_host_same_srv_rate <= 0.01 | | | dst_host_rerror_rate <= 0.99 | | | | dst_host_serror_rate <= 0.11 | | | | | serror_rate <= 0.03: satan. (18.0/2.0) | | | | | serror_rate > 0.03: saint. (14.0) | | | | dst_host_serror_rate > 0.11: satan. (9.0) | | | dst_host_rerror_rate > 0.99: satan. (11.0) | | dst_host_same_srv_rate > 0.01: normal. (4.0/1.0) same_srv_rate > 0.21 | count <= 72 | | src_bytes <= 2564
61
Firoz Allahwali
| | | duration <= 2 | | | | flag = REJ | | | | | srv_diff_host_rate <= 0.4 | | | | | | srv_count <= 5: httptunnel. (2.0/1.0) | | | | | | srv_count > 5: neptune. (2.0/1.0) | | | | | srv_diff_host_rate > 0.4: mscan. (7.0) | | | | flag = SF | | | | | protocol_type = udp | | | | | | src_bytes <= 55 | | | | | | | service = private | | | | | | | | srv_count <= 8: snmpguess. (49.0) | | | | | | | | srv_count > 8: normal. (3.0) | | | | | | | service = ecr_i: normal. (0.0) | | | | | | | service = ntp_u: normal. (1.0) | | | | | | | service = X11: normal. (0.0) | | | | | | | service = IRC: normal. (0.0) | | | | | | | service = courier: normal. (0.0) | | | | | | | service = ctf: normal. (0.0) | | | | | | | service = ssh: normal. (0.0) | | | | | | | service = time: normal. (0.0) | | | | | | | service = ldap: normal. (0.0) | | | | | | | service = supdup: normal. (0.0) | | | | | | | service = Z39_50: normal. (0.0) | | | | | | | service = discard: normal. (0.0) | | | | | | | service = bgp: normal. (0.0) | | | | | | | service = kshell: normal. (0.0) | | | | | | | service = urp_i: normal. (0.0) | | | | | | | service = uucp: normal. (0.0) | | | | | | | service = uucp_path: normal. (0.0) | | | | | | | service = netstat: normal. (0.0) | | | | | | | service = nnsp: normal. (0.0) | | | | | | | service = http_443: normal. (0.0) | | | | | | | service = iso_tsap: normal. (0.0) | | | | | | | service = http: normal. (0.0) | | | | | | | service = pop_3: normal. (0.0) | | | | | | | service = other: normal. (0.0) | | | | | | | service = domain_u: normal. (54.0) | | | | | | | service = exec: normal. (0.0) | | | | | | | service = telnet: normal. (0.0) | | | | | | | service = ftp: normal. (0.0) | | | | | | | service = sql_net: normal. (0.0) | | | | | | | service = ftp_data: normal. (0.0) | | | | | | | service = smtp: normal. (0.0) | | | | | | | service = sunrpc: normal. (0.0) | | | | | | | service = shell: normal. (0.0) | | | | | | | service = vmnet: normal. (0.0) | | | | | | | service = netbios_dgm: normal. (0.0) | | | | | | | service = efs: normal. (0.0) | | | | | | | service = finger: normal. (0.0) | | | | | | | service = rje: normal. (0.0) | | | | | | | service = mtp: normal. (0.0) | | | | | | | service = imap4: normal. (0.0) | | | | | | | service = link: normal. (0.0) | | | | | | | service = auth: normal. (0.0) | | | | | | | service = domain: normal. (0.0) | | | | | | | service = nntp: normal. (0.0) | | | | | | | service = daytime: normal. (0.0) | | | | | | | service = gopher: normal. (0.0) | | | | | | | service = echo: normal. (0.0) | | | | | | | service = printer: normal. (0.0) | | | | | | | service = whois: normal. (0.0) | | | | | | | service = klogin: normal. (0.0)
62
Firoz Allahwali
| | | | | | | service = ecr: normal. (0.0) | | | | | | | service = netbios_ssn: normal. (0.0) | | | | | | | service = remote_job: normal. (0.0) | | | | | | | service = hostnames: normal. (0.0) | | | | | | | service = eco_i: normal. (0.0) | | | | | | | service = login: normal. (0.0) | | | | | | | service = pop_2: normal. (0.0) | | | | | | | service = netbios_ns: normal. (0.0) | | | | | | src_bytes > 55 | | | | | | | count <= 1 | | | | | | | | dst_host_srv_count <= 253 | | | | | | | | | dst_bytes <= 124: normal. (79.0/3.0) | | | | | | | | | dst_bytes > 124: snmpgetattack. (51.0/15.0) | | | | | | | | dst_host_srv_count > 253: normal. (106.0/48.0) | | | | | | | count > 1: normal. (175.0/57.0) | | | | | protocol_type = tcp | | | | | | num_failed_logins <= 0 | | | | | | | dst_host_srv_diff_host_rate <= 0.11 | | | | | | | | dst_host_same_srv_rate <= 0.04 | | | | | | | | | src_bytes <= 20: normal. (2.0) | | | | | | | | | src_bytes > 20: guess_passwd. (3.0) | | | | | | | | dst_host_same_srv_rate > 0.04 | | | | | | | | | dst_host_srv_count <= 5: warezmaster. (3.0/1.0) | | | | | | | | | dst_host_srv_count > 5: normal. (882.0) | | | | | | | dst_host_srv_diff_host_rate > 0.11 | | | | | | | | dst_bytes <= 15: warezmaster. (7.0) | | | | | | | | dst_bytes > 15: normal. (5.0) | | | | | | num_failed_logins > 0: guess_passwd. (13.0) | | | | | protocol_type = icmp | | | | | | srv_diff_host_rate <= 0.57 | | | | | | | dst_host_srv_diff_host_rate <= 0.67 | | | | | | | | count <= 13: normal. (5.0) | | | | | | | | count > 13: smurf. (2.0) | | | | | | | dst_host_srv_diff_host_rate > 0.67: ipsweep. (5.0) | | | | | | srv_diff_host_rate > 0.57 | | | | | | | service = private: pod. (0.0) | | | | | | | service = ecr_i: pod. (5.0) | | | | | | | service = ntp_u: pod. (0.0) | | | | | | | service = X11: pod. (0.0) | | | | | | | service = IRC: pod. (0.0) | | | | | | | service = courier: pod. (0.0) | | | | | | | service = ctf: pod. (0.0) | | | | | | | service = ssh: pod. (0.0) | | | | | | | service = time: pod. (0.0) | | | | | | | service = ldap: pod. (0.0) | | | | | | | service = supdup: pod. (0.0) | | | | | | | service = Z39_50: pod. (0.0) | | | | | | | service = discard: pod. (0.0) | | | | | | | service = bgp: pod. (0.0) | | | | | | | service = kshell: pod. (0.0) | | | | | | | service = urp_i: pod. (0.0) | | | | | | | service = uucp: pod. (0.0) | | | | | | | service = uucp_path: pod. (0.0) | | | | | | | service = netstat: pod. (0.0) | | | | | | | service = nnsp: pod. (0.0) | | | | | | | service = http_443: pod. (0.0) | | | | | | | service = iso_tsap: pod. (0.0) | | | | | | | service = http: pod. (0.0) | | | | | | | service = pop_3: pod. (0.0) | | | | | | | service = other: pod. (0.0) | | | | | | | service = domain_u: pod. (0.0) | | | | | | | service = exec: pod. (0.0)
63
Firoz Allahwali
| | | | | | | service = telnet: pod. (0.0) | | | | | | | service = ftp: pod. (0.0) | | | | | | | service = sql_net: pod. (0.0) | | | | | | | service = ftp_data: pod. (0.0) | | | | | | | service = smtp: pod. (0.0) | | | | | | | service = sunrpc: pod. (0.0) | | | | | | | service = shell: pod. (0.0) | | | | | | | service = vmnet: pod. (0.0) | | | | | | | service = netbios_dgm: pod. (0.0) | | | | | | | service = efs: pod. (0.0) | | | | | | | service = finger: pod. (0.0) | | | | | | | service = rje: pod. (0.0) | | | | | | | service = mtp: pod. (0.0) | | | | | | | service = imap4: pod. (0.0) | | | | | | | service = link: pod. (0.0) | | | | | | | service = auth: pod. (0.0) | | | | | | | service = domain: pod. (0.0) | | | | | | | service = nntp: pod. (0.0) | | | | | | | service = daytime: pod. (0.0) | | | | | | | service = gopher: pod. (0.0) | | | | | | | service = echo: pod. (0.0) | | | | | | | service = printer: pod. (0.0) | | | | | | | service = whois: pod. (0.0) | | | | | | | service = klogin: pod. (0.0) | | | | | | | service = ecr: pod. (0.0) | | | | | | | service = netbios_ssn: pod. (0.0) | | | | | | | service = remote_job: pod. (0.0) | | | | | | | service = hostnames: pod. (0.0) | | | | | | | service = eco_i: saint. (2.0) | | | | | | | service = login: pod. (0.0) | | | | | | | service = pop_2: pod. (0.0) | | | | | | | service = netbios_ns: pod. (0.0) | | | | flag = S1: normal. (0.0) | | | | flag = S0 | | | | | dst_host_rerror_rate <= 0.28: mscan. (6.0/1.0) | | | | | dst_host_rerror_rate > 0.28: apache2. (3.0) | | | | flag = RSTO | | | | | src_bytes <= 55: mscan. (6.0) | | | | | src_bytes > 55: guess_passwd. (2.0) | | | | flag = S3: processtable. (6.0) | | | | flag = S2: normal. (0.0) | | | | flag = RSTR: portsweep. (7.0/2.0) | | | | flag = SH: nmap. (1.0) | | | duration > 2 | | | | src_bytes <= 55 | | | | | src_bytes <= 24 | | | | | | srv_rerror_rate <= 0.17: processtable. (16.0/1.0) | | | | | | srv_rerror_rate > 0.17: mscan. (5.0) | | | | | src_bytes > 24: guess_passwd. (73.0) | | | | src_bytes > 55 | | | | | duration <= 261: normal. (14.0) | | | | | duration > 261: warezmaster. (15.0/1.0) | | src_bytes > 2564 | | | num_compromised <= 0 | | | | src_bytes <= 2599: mailbomb. (99.0) | | | | src_bytes > 2599 | | | | | flag = REJ: normal. (0.0) | | | | | flag = SF | | | | | | duration <= 38: normal. (17.0/1.0) | | | | | | duration > 38: warezmaster. (12.0) | | | | | flag = S1: normal. (0.0) | | | | | flag = S0: normal. (0.0)
64
Firoz Allahwali
| | | | | flag = RSTO: normal. (0.0) | | | | | flag = S3: normal. (0.0) | | | | | flag = S2: normal. (0.0) | | | | | flag = RSTR: apache2. (10.0) | | | | | flag = SH: normal. (0.0) | | | num_compromised > 0: back. (25.0) | count > 72 | | protocol_type = udp: normal. (28.0) | | protocol_type = tcp | | | service = private: neptune. (0.0) | | | service = ecr_i: neptune. (0.0) | | | service = ntp_u: neptune. (0.0) | | | service = X11: neptune. (0.0) | | | service = IRC: neptune. (0.0) | | | service = courier: neptune. (0.0) | | | service = ctf: neptune. (0.0) | | | service = ssh: neptune. (0.0) | | | service = time: neptune. (0.0) | | | service = ldap: neptune. (0.0) | | | service = supdup: neptune. (0.0) | | | service = Z39_50: neptune. (0.0) | | | service = discard: neptune. (0.0) | | | service = bgp: neptune. (0.0) | | | service = kshell: neptune. (0.0) | | | service = urp_i: neptune. (0.0) | | | service = uucp: neptune. (0.0) | | | service = uucp_path: neptune. (0.0) | | | service = netstat: neptune. (0.0) | | | service = nnsp: neptune. (0.0) | | | service = http_443: neptune. (0.0) | | | service = iso_tsap: neptune. (0.0) | | | service = http: apache2. (2.0) | | | service = pop_3: neptune. (0.0) | | | service = other: neptune. (0.0) | | | service = domain_u: neptune. (0.0) | | | service = exec: neptune. (0.0) | | | service = telnet: neptune. (5.0) | | | service = ftp: neptune. (0.0) | | | service = sql_net: neptune. (0.0) | | | service = ftp_data: neptune. (0.0) | | | service = smtp: neptune. (0.0) | | | service = sunrpc: neptune. (0.0) | | | service = shell: neptune. (0.0) | | | service = vmnet: neptune. (0.0) | | | service = netbios_dgm: neptune. (0.0) | | | service = efs: neptune. (0.0) | | | service = finger: neptune. (0.0) | | | service = rje: neptune. (0.0) | | | service = mtp: neptune. (0.0) | | | service = imap4: neptune. (0.0) | | | service = link: neptune. (0.0) | | | service = auth: neptune. (0.0) | | | service = domain: neptune. (0.0) | | | service = nntp: neptune. (0.0) | | | service = daytime: neptune. (0.0) | | | service = gopher: neptune. (0.0) | | | service = echo: neptune. (0.0) | | | service = printer: neptune. (0.0) | | | service = whois: neptune. (0.0) | | | service = klogin: neptune. (0.0) | | | service = ecr: neptune. (0.0) | | | service = netbios_ssn: neptune. (0.0)
65
Firoz Allahwali
| | | service = remote_job: neptune. (0.0) | | | service = hostnames: neptune. (0.0) | | | service = eco_i: neptune. (0.0) | | | service = login: neptune. (0.0) | | | service = pop_2: neptune. (0.0) | | | service = netbios_ns: neptune. (0.0) | | protocol_type = icmp: smurf. (3485.0) Number of Leaves : 229 Size of the tree : 270 Time taken to build model: 4.24 seconds === Evaluation on test set === === Summary === Correctly Classified Instances 3298 97 % Incorrectly Classified Instances 102 3 % Kappa statistic 0.9533 Mean absolute error 0.0022 Root mean squared error 0.0364 Relative absolute error 5.1806 % Root relative squared error 25.2454 % Total Number of Instances 3400 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 1 0 1 1 1 back. 1 0 1 1 1 snmpguess. 1 0 1 1 1 apache2. 0 0.001 0 0 0 httptunnel. 0.909 0.001 0.833 0.909 0.87 mscan. 0.857 0.001 0.545 0.857 0.667 saint. 0.19 0.003 0.6 0.19 0.288 snmpgetattack. 0.917 0 1 0.917 0.957 processtable. 1 0 1 1 1 mailbomb. 0 0 0 0 0 buffer_overflow. 0 0 0 0 0 ftp_write. 0.966 0.001 0.966 0.966 0.966 guess_passwd. 0 0 0 0 0 imap. 1 0 1 1 1 ipsweep. 0 0 0 0 0 land. 0 0 0 0 0 loadmodule. 0 0 0 0 0 multihop. 0.998 0 0.998 0.998 0.998 neptune. 0 0 0 0 0 nmap. 0.971 0.027 0.891 0.971 0.929 normal. 0 0 0 0 0 perl. 0 0 0 0 0 phf. 0.5 0 1 0.5 0.667 pod. 0.833 0.001 0.625 0.833 0.714 portsweep. 0 0 0 0 0 rootkit. 0.778 0 1 0.778 0.875 satan. 0.999 0 1 0.999 1 smurf. 0 0 0 0 0 spy. 0 0 0 0 0 teardrop. 0 0 0 0 0 warezclient. 0.905 0.001 0.864 0.905 0.884 warezmaster.
66
Firoz Allahwali
=== Confusion Matrix === a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae <-- classified as 13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = back. 0 26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = snmpguess. 0 0 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = apache2. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 | d = httptunnel. 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | e = mscan. 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | f = saint. 0 0 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 64 0 0 0 0 0 0 0 0 0 0 0 | g = snmpgetattack. 0 0 0 0 1 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | h = processtable. 0 0 0 0 0 0 0 0 59 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | i = mailbomb. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | j = buffer_overflow. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | k = ftp_write. 0 0 0 0 0 0 0 0 0 0 0 57 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 | l = guess_passwd. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | m = imap. 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | n = ipsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | o = land. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | p = loadmodule. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 | q = multihop. 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 609 0 0 0 0 0 0 0 0 0 0 0 0 0 | r = neptune. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | s = nmap. 0 0 0 1 0 0 10 0 0 0 0 2 0 0 0 0 0 0 0 604 0 0 0 2 0 0 0 0 0 0 3 | t = normal. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | u = perl. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | v = phf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 | w = pod. 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 | x = portsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | y = rootkit. 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 21 0 0 0 0 0 | z = satan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1826 0 0 0 0 | aa = smurf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ab = spy. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ac = teardrop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ad = warezclient. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 19 | ae = warezmaster.
67
Firoz Allahwali
Appendix C
Clustering Results:
Normal Data as Trainer
=== Run information === Scheme: �eak.clusterers.SimpleKMeans –N 2 –S 10 Relation: ids Instances: 5905 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class
68
Firoz Allahwali
Test mode: user supplied test set: 3400 instances === Model and evaluation on test set === kMeans Number of iterations: 6 Within cluster sum of squared errors: 4708.820136735875 Cluster centroids: Cluster 0 Mean/Mode: 13.9458 tcp http SF 669.8249 4500.8885 0 0 0 0.0265 0 0.944 0.0015 0.0009 0 0.0027 0.0195 0 0.0015 0 0 0.004 6.968 10.1401 0.002 0.0012 0.0006 0.0006 0.9985 0.0026 0.1776 49.6375 212.0548 0.9271 0.0158 0.1123 0.0329 0.0012 0.0022 0.0033 0.0045 normal. Std Devs: 361.8696 N/A N/A N/A 5292.9004 14974.2892 0 0 0 0.4808 0 0.23 0.0462 0.0302 0 0.1571 0.7411 0 0.039 0 0 0.0628 7.8685 10.7181 0.0379 0.0238 0.0247 0.0247 0.0291 0.0503 0.3045 42.7964 77.5284 0.1933 0.0557 0.2141 0.0379 0.0109 0.0379 0.0307 0.0434 N/A Cluster 1 Mean/Mode: 11.4195 tcp http SF 588.2563 2679.55 0 0 0 0.0061 0 0.7162 0.0053 0 0.0004 0.0046 0.0092 0 0.0072 0 0 0.0004 8.8776 11.106 0.0018 0.0018 0.0004 0.0004 0.9959 0.0072 0.1126 242.0809 234.0984 0.9249 0.0097 0.0101 0.0031 0.0009 0.0007 0.015 0.0014 normal. Std Devs: 365.0786 N/A N/A N/A 5095.1714 9146.5406 0 0 0 0.161 0 0.4509 0.1933 0 0.0195 0.1657 0.3314 0 0.0892 0 0 0.0195 9.6922 11.7241 0.0308 0.0307 0.0195 0.0195 0.0433 0.0752 0.221 29.7548 58.5857 0.219 0.0456 0.0514 0.0129 0.0175 0.0165 0.0958 0.0272 N/A Clustered Instances 0 291 ( 9%) 1 3109 ( 91%)
Attack Data as Trainer:
=== Run information ===
Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10 Relation: ids Instances: 6600 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent
69
Firoz Allahwali
hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class Test mode: user supplied test set: 3400 instances === Model and evaluation on test set === kMeans ====== Number of iterations: 6 Within cluster sum of squared errors: 13422.458547689479 Cluster centroids: Cluster 0 Mean/Mode: 17.9759 tcp private REJ 278.2244 16.492 0 0 0 0.0161 0.0015 0.0124 0.0015 0.0007 0 0.0066 0.0007 0 0 0 0 0.0037 177.9196 10.242 0.2879 0.2869 0.6942 0.6945 0.1013 0.1117 0.0122 252.0892 14.0819 0.0552 0.1073 0.0128 0.0004 0.2825 0.283 0.6944 0.6941 neptune. Std Devs: 319.4349 N/A N/A N/A 4068.0853 424.4576 0 0 0 0.3935 0.0382 0.1108 0.0541 0.027 0 0.2433 0.027 0 0 0 0 0.0604 89.6316 10.5083 0.4476 0.4504 0.4544 0.4573 0.1852 0.2014 0.1049 22.1166 28.7856 0.1133 0.1874 0.1077 0.0033 0.4429 0.4469 0.4484 0.4555 N/A Cluster 1 Mean/Mode: 21.6137 icmp ecr_i SF 1706.66 919.7385 0 0.0013 0 0.0159 0.0025 0.2108 0.0048 0 0 0 0.0004 0 0.0011 0 0 0.0027 291.5721 292.0342 0.0004 0.0004
70
Firoz Allahwali
0.0009 0.0009 0.9983 0.0026 0.0295 231.5205 246.8259 0.9836 0.0048 0.6899 0.006 0.0018 0.0012 0.0022 0.0007 smurf. Std Devs: 382.0624 N/A N/A N/A 14244.6981 13263.7434 0 0.0366 0 0.1924 0.0498 0.4079 0.069 0 0 0 0.0195 0 0.0391 0 0 0.0517 235.6648 235.0086 0.0151 0.0148 0.0253 0.0259 0.034 0.0471 0.1327 66.2879 37.3966 0.096 0.0397 0.455 0.0424 0.0318 0.0232 0.0244 0.0129 N/A Clustered Instances 0 678 ( 20%) 1 2722 ( 80%)
71
Firoz Allahwali
Appendix D Sample data collected on the main router on the network tap: 11/13-20:08:02.807867 0:E0:81:2F:FE:2C -> 0:0:C:7:AC:2 type:0x800 len:0x5EA 66.179.164.20:22 -> 24.136.161.188:62456 TCP TTL:64 TOS:0x10 ID:27401 IpLen:20 DgmLen:1500 DF ***A**** Seq: 0x50692152 Ack: 0xDD1E2B42 Win: 0x2180 TcpLen: 20 AA A8 5A 92 A7 BF DF 32 7D BF F7 7B 1B 5C 35 47 ..Z....2}..{.\5G D6 52 B3 E2 97 6D 68 41 A6 53 2B 89 92 8E 10 8D .R...mhA.S+..... 1B E0 C9 87 A8 71 91 EA D0 F4 1C 6C B4 DC D7 C4 .....q.....l.... B5 22 84 40 E5 09 0F B5 E9 1F 4E AC 96 6C A5 9D ."[email protected].. D9 AF 38 88 5F 2B 4B 8D 32 FC 4C 37 AB DA E1 EA ..8._+K.2.L7.... 61 D9 23 22 FD DA 20 32 E7 6C 48 60 2C 55 CB 99 a.#".. 2.lH`,U.. 74 3A F5 7D 30 77 75 58 6B AE 80 2F 48 A5 FD F4 t:.}0wuXk../H... B5 C7 CA 9D 42 EA 9B BF B6 74 E5 19 8F EF F1 8A ....B....t...... 2D 7E D0 55 0A 92 3E 72 CF 5F 89 53 FC 85 2F 25 -~.U..>r._.S../% 72 8F B4 DD 40 BF 33 46 82 6D 21 98 8E A7 A0 A5 [email protected]!..... 2E 32 54 ED 41 D4 2F C4 B6 6E BE 55 C0 95 78 7C .2T.A./..n.U..x| 35 CA 06 4D 59 32 68 C8 3D 77 7A 73 FA 6A 78 1C 5..MY2h.=wzs.jx. 90 C7 CD 48 6D AE 0E 74 39 A0 4C F4 4E AC 49 06 ...Hm..t9.L.N.I. A2 3F F3 BB 24 B7 05 7C B3 00 70 2E 65 E1 ED 1A .?..$..|..p.e... 96 4C 93 CB A6 F5 68 B5 83 F8 08 F1 5C F2 9F 32 .L....h.....\..2 E1 F7 47 CF 2D 0B 35 DA 6A B5 D0 6D 49 9D 61 63 ..G.-.5.j..mI.ac 75 F2 4B 18 1F 02 C6 E4 9A 23 95 FE 21 6B A4 3E u.K......#..!k.> 06 40 CB 23 34 68 8F A1 C7 3C 98 20 14 8F 20 63 .@.#4h...<. .. c F7 FB 37 2B CC B9 2F 97 ED 5B 92 8D 96 84 0C 08 ..7+../..[...... E5 D4 29 A1 DF 4D 5B 33 EE 68 D3 F1 29 54 DF 0C ..)..M[3.h..)T.. F0 37 44 4A DF 2F 07 68 49 9B 09 0A C1 C7 EC 89 .7DJ./.hI....... 50 CA 40 D3 5B A5 27 69 12 7E 49 34 1A F8 26 9C P.@.[.'i.~I4..&. 44 A0 87 C7 BC CB 46 8A 33 25 94 F6 89 72 64 E0 D.....F.3%...rd. F0 AB 16 DB 52 A1 BE AC 3C 8B D6 CC 22 C7 0F B4 ....R...<..."... 86 6B BF EE A8 7E 1F 74 C7 34 14 AF 7C 50 BC 7F .k...~.t.4..|P.. 42 0C B8 98 8C C3 EC D6 FC 51 CE 1F B3 7D A1 48 B........Q...}.H 1D 89 96 AB 79 AA E0 A5 B8 F5 39 7C 27 4C 25 D0 ....y.....9|'L%. 5A 0C 81 13 07 19 6E 81 1C 3C 9F E5 1A 6D BA 18 Z.....n..<...m.. DC 35 51 90 A1 1D 8E 57 7A 0A 56 BB 09 CB 3D 81 .5Q....Wz.V...=. 8F C5 84 83 88 ED CD 89 DB 81 4D F6 C7 04 A9 71 ..........M....q 43 65 FB 05 A4 56 E4 91 21 B1 AB 44 85 D8 12 BA Ce...V..!..D.... CD 65 AA BA 32 D1 B7 FA 84 0E 18 56 BF 2E A5 10 .e..2......V.... 72 C8 89 B8 6A 3B 75 33 3F 5F E4 77 24 EF 0C 13 r...j;u3?_.w$... A8 56 BB 68 E3 88 D8 AF 18 83 02 B9 B1 2A E8 83 .V.h.........*.. 33 2C 72 B4 49 9C F8 F3 92 03 2A 34 FB 4B 88 D6 3,r.I.....*4.K.. A3 FC C2 3D 14 2D 40 4C 4F A6 26 9F 17 22 F9 F3 ...=.-@LO.&..".. EE 7E 3F 5D 5E DE B5 D3 55 D7 CE 9B A5 68 DB 81 .~?]^...U....h.. C9 B1 16 96 11 59 6C D7 19 22 F1 62 D3 24 EB E1 .....Yl..".b.$.. D1 51 9F 4E 6C B9 0F 7A 61 FE 4F 00 7E 88 9B EE .Q.Nl..za.O.~... 3E 27 7E 18 07 D9 27 F2 90 17 AA 11 7A 48 C5 57 >'~...'.....zH.W 81 62 77 B6 A1 DF 72 AF E0 43 46 12 91 F1 5C FA .bw...r..CF...\. 86 DF 7D 45 CF FC 45 63 21 A0 F7 6D 16 79 9F 14 ..}E..Ec!..m.y.. 91 92 09 FB 33 E0 89 93 EF 95 F4 35 F3 B4 32 30 ....3......5..20 9A 0C 97 EE CF 9B 5D 73 07 E9 DC 74 B8 ED 48 00 ......]s...t..H. DF 00 0A 69 6B F3 88 30 73 ED 98 8E 7C C8 FC 2C ...ik..0s...|.., 0E 0C 84 74 3F 7A B2 CA 93 2F 21 AF 4F 62 D7 61 ...t?z.../!.Ob.a
72
Firoz Allahwali
04 56 28 30 61 91 C2 78 2D 04 63 2A E0 86 9C 84 .V(0a..x-.c*.... 72 36 49 6E B7 91 F4 43 C2 A2 4C 03 6C F4 5B 14 r6In...C..L.l.[. 99 A2 12 3C A0 E3 18 CD BA 11 DF 0F 03 E0 A7 34 ...<...........4 F9 7A 22 EE 09 62 1C 7B 24 DA 73 A8 5D 41 92 77 .z"..b.{$.s.]A.w E0 7D 44 E1 C0 27 A0 14 48 BE 5C 7B 89 39 25 34 .}D..'..H.\{.9%4 08 6E D6 0C 47 72 1B 96 DF 06 7E 9D 39 FE 3D 5E .n..Gr....~.9.=^ 04 D9 4F 96 4A E1 C8 B9 D5 33 26 AC E7 13 A2 F6 ..O.J....3&..... F2 4C 0F 22 E5 89 45 32 7E 03 CF 3A 53 F0 0E A6 .L."..E2~..:S... 8C 01 D3 FB 5B 0A 44 BF 7A 81 78 81 D7 63 AA 5F ....[.D.z.x..c._ 23 B0 23 7A B0 5C 12 75 E5 80 CD 47 AE FF 83 AE #.#z.\.u...G.... 46 B0 E9 3B 76 44 09 43 31 22 94 FE 1E 36 F7 40 F..;vD.C1"...6.@ A7 20 A4 80 04 E1 23 25 B9 1E 63 A2 11 4C 12 57 . ....#%..c..L.W 16 AC E2 00 A1 4B C9 24 C1 60 7C 4C 5C 7A 7E F7 .....K.$.`|L\z~. 6D 99 03 26 58 B4 DB EF A7 CE BE 68 EA 5A 4C F2 m..&X......h.ZL. 0F 07 7B 2E A2 7C A3 DD 71 0A AF 96 2A 47 9D D3 ..{..|..q...*G.. 54 42 5B 38 03 4A 4C CB 65 BE A2 C3 6B ED DD EB TB[8.JL.e...k... F6 D0 37 9D 00 66 E1 CA 8A 89 A5 03 5E A2 62 66 ..7..f......^.bf 07 EB F4 21 88 19 8C 06 44 E5 34 9D 9B 3D 6B 6E ...!....D.4..=kn CA 84 97 98 79 C1 EF 6A E9 7B 26 5B 03 73 61 6F ....y..j.{&[.sao 68 D1 03 E3 D6 D9 71 4E 08 BE 16 CE 6A 27 6E BE h.....qN....j'n. 4F 5E E4 28 61 D9 55 FA 67 26 90 C5 52 76 D6 2D O^.(a.U.g&..Rv.- 9E 6E F5 C7 0C 87 A2 7B BA 4A 26 0C FB 4F 65 1A .n.....{.J&..Oe. 70 2F 44 98 8C 24 B6 91 60 91 39 FB D0 B7 7A E9 p/D..$..`.9...z. 24 0D D5 51 14 49 7D 0F 11 39 94 87 5D C8 7F 63 $..Q.I}..9..]..c 7C 8D C0 C8 6E C1 C5 D5 CD 39 9F 61 4A 76 9A 07 |...n....9.aJv.. 9D 7B 03 2B 80 4F 30 48 F1 F1 AF 2F AB 9B CC 88 .{.+.O0H.../.... 8D 51 3B A6 A0 C3 99 77 BF 56 86 36 3F 9E D9 94 .Q;....w.V.6?... 67 17 9C B7 3E C0 B0 16 85 21 61 78 BE 2B 4C DC g...>....!ax.+L. 71 A2 9A C9 8D 2F 60 D5 EA CD E1 D8 05 8D FA 4F q..../`........O D1 33 54 88 D1 73 47 AA 65 F2 30 DD 61 01 82 DC .3T..sG.e.0.a... 2E 17 62 5D 87 F2 D7 88 4D E8 CD 50 BB 67 67 E3 ..b]....M..P.gg. D7 D0 96 89 A2 9C 7F AB 56 F6 BF FD 88 CA 0B 95 ........V....... 3C B9 85 65 7C 0F D9 89 76 8F 74 F6 DE 1A 7B 99 <..e|...v.t...{. 06 4F 18 AF DC DE 18 D0 75 FD 80 AD 0E 8B 9A D0 .O......u....... DD F6 A7 E3 55 95 E8 FB 5A A9 AE 17 D7 0D DA B2 ....U...Z....... FF 1D B0 0A AD 38 6C C0 1B BB 50 2E 85 49 F3 20 .....8l...P..I. 21 C2 A8 17 EF 70 1D EA EC E4 99 C0 DC 6F A5 96 !....p.......o.. DC D9 FD 90 73 FF 22 03 F0 C1 7E 2F 75 5F 6F 36 ....s."...~/u_o6 A5 8E 1C FE C1 CB B1 CC D4 C6 2C 0E FA 51 15 43 ..........,..Q.C B0 70 2F E9 E5 A2 23 75 63 D8 2C D5 2B AD 36 EB .p/...#uc.,.+.6. 8A 52 7D EE FA C0 15 F5 1B 21 9C 18 D0 76 06 52 .R}......!...v.R FC 48 E2 D2 4F FD 0E 7C 85 C8 A4 C2 8E 7A 5A 27 .H..O..|.....zZ' 37 D8 4C E5 1A E6 94 9B A6 30 A3 BB 9C EC 59 ED 7.L......0....Y. F6 94 49 51 46 1B D8 CE 98 F2 D1 0A 2F C2 07 3C ..IQF......./..< 87 58 FC EB .X.. 4A D5 ED AE 36 5C DA 65 2D BF 11 5B 5D B3 B6 08 J...6\.e-..[]...
=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
73
Firoz Allahwali
Appendix E
Testing using one type of Attack(mailbomb):
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: ids Instances: 6600 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class Test mode: user supplied test set: 41 instances
74
Firoz Allahwali
=== Classifier model (full training set) === J48 pruned tree ------------------ same_srv_rate <= 0.21 | diff_srv_rate <= 0.23: neptune. (1244.0) | diff_srv_rate > 0.23 | | dst_host_same_srv_rate <= 0.01 | | | dst_host_rerror_rate <= 0.99 | | | | dst_host_serror_rate <= 0.11 | | | | | serror_rate <= 0.03: satan. (18.0/2.0) | | | | | serror_rate > 0.03: saint. (14.0) | | | | dst_host_serror_rate > 0.11: satan. (9.0) | | | dst_host_rerror_rate > 0.99: satan. (11.0) | | dst_host_same_srv_rate > 0.01: normal. (4.0/1.0) same_srv_rate > 0.21 | count <= 72 | | src_bytes <= 2564 | | | duration <= 2 | | | | flag = REJ | | | | | srv_diff_host_rate <= 0.4 | | | | | | srv_count <= 5: httptunnel. (2.0/1.0) | | | | | | srv_count > 5: neptune. (2.0/1.0) | | | | | srv_diff_host_rate > 0.4: mscan. (7.0) | | | | flag = SF | | | | | protocol_type = udp | | | | | | src_bytes <= 55 | | | | | | | service = private | | | | | | | | srv_count <= 8: snmpguess. (49.0) | | | | | | | | srv_count > 8: normal. (3.0) | | | | | | | service = ecr_i: normal. (0.0) | | | | | | | service = ntp_u: normal. (1.0) | | | | | | | service = X11: normal. (0.0) | | | | | | | service = IRC: normal. (0.0) | | | | | | | service = courier: normal. (0.0) | | | | | | | service = ctf: normal. (0.0) | | | | | | | service = ssh: normal. (0.0) | | | | | | | service = time: normal. (0.0) | | | | | | | service = ldap: normal. (0.0) | | | | | | | service = supdup: normal. (0.0) | | | | | | | service = Z39_50: normal. (0.0) | | | | | | | service = discard: normal. (0.0) | | | | | | | service = bgp: normal. (0.0) | | | | | | | service = kshell: normal. (0.0) | | | | | | | service = urp_i: normal. (0.0) | | | | | | | service = uucp: normal. (0.0) | | | | | | | service = uucp_path: normal. (0.0) | | | | | | | service = netstat: normal. (0.0) | | | | | | | service = nnsp: normal. (0.0) | | | | | | | service = http_443: normal. (0.0) | | | | | | | service = iso_tsap: normal. (0.0) | | | | | | | service = http: normal. (0.0) | | | | | | | service = pop_3: normal. (0.0) | | | | | | | service = other: normal. (0.0)
75
Firoz Allahwali
| | | | | | | service = domain_u: normal. (54.0) | | | | | | | service = exec: normal. (0.0) | | | | | | | service = telnet: normal. (0.0) | | | | | | | service = ftp: normal. (0.0) | | | | | | | service = sql_net: normal. (0.0) | | | | | | | service = ftp_data: normal. (0.0) | | | | | | | service = smtp: normal. (0.0) | | | | | | | service = sunrpc: normal. (0.0) | | | | | | | service = shell: normal. (0.0) | | | | | | | service = vmnet: normal. (0.0) | | | | | | | service = netbios_dgm: normal. (0.0) | | | | | | | service = efs: normal. (0.0) | | | | | | | service = finger: normal. (0.0) | | | | | | | service = rje: normal. (0.0) | | | | | | | service = mtp: normal. (0.0) | | | | | | | service = imap4: normal. (0.0) | | | | | | | service = link: normal. (0.0) | | | | | | | service = auth: normal. (0.0) | | | | | | | service = domain: normal. (0.0) | | | | | | | service = nntp: normal. (0.0) | | | | | | | service = daytime: normal. (0.0) | | | | | | | service = gopher: normal. (0.0) | | | | | | | service = echo: normal. (0.0) | | | | | | | service = printer: normal. (0.0) | | | | | | | service = whois: normal. (0.0) | | | | | | | service = klogin: normal. (0.0) | | | | | | | service = ecr: normal. (0.0) | | | | | | | service = netbios_ssn: normal. (0.0) | | | | | | | service = remote_job: normal. (0.0) | | | | | | | service = hostnames: normal. (0.0) | | | | | | | service = eco_i: normal. (0.0) | | | | | | | service = login: normal. (0.0) | | | | | | | service = pop_2: normal. (0.0) | | | | | | | service = netbios_ns: normal. (0.0) | | | | | | src_bytes > 55 | | | | | | | count <= 1 | | | | | | | | dst_host_srv_count <= 253 | | | | | | | | | dst_bytes <= 124: normal. (79.0/3.0) | | | | | | | | | dst_bytes > 124: snmpgetattack. (51.0/15.0) | | | | | | | | dst_host_srv_count > 253: normal. (106.0/48.0) | | | | | | | count > 1: normal. (175.0/57.0) | | | | | protocol_type = tcp | | | | | | num_failed_logins <= 0 | | | | | | | dst_host_srv_diff_host_rate <= 0.11 | | | | | | | | dst_host_same_srv_rate <= 0.04 | | | | | | | | | src_bytes <= 20: normal. (2.0) | | | | | | | | | src_bytes > 20: guess_passwd. (3.0) | | | | | | | | dst_host_same_srv_rate > 0.04 | | | | | | | | | dst_host_srv_count <= 5: warezmaster. (3.0/1.0) | | | | | | | | | dst_host_srv_count > 5: normal. (882.0) | | | | | | | dst_host_srv_diff_host_rate > 0.11 | | | | | | | | dst_bytes <= 15: warezmaster. (7.0) | | | | | | | | dst_bytes > 15: normal. (5.0) | | | | | | num_failed_logins > 0: guess_passwd. (13.0) | | | | | protocol_type = icmp | | | | | | srv_diff_host_rate <= 0.57
76
Firoz Allahwali
| | | | | | | dst_host_srv_diff_host_rate <= 0.67 | | | | | | | | count <= 13: normal. (5.0) | | | | | | | | count > 13: smurf. (2.0) | | | | | | | dst_host_srv_diff_host_rate > 0.67: ipsweep. (5.0) | | | | | | srv_diff_host_rate > 0.57 | | | | | | | service = private: pod. (0.0) | | | | | | | service = ecr_i: pod. (5.0) | | | | | | | service = ntp_u: pod. (0.0) | | | | | | | service = X11: pod. (0.0) | | | | | | | service = IRC: pod. (0.0) | | | | | | | service = courier: pod. (0.0) | | | | | | | service = ctf: pod. (0.0) | | | | | | | service = ssh: pod. (0.0) | | | | | | | service = time: pod. (0.0) | | | | | | | service = ldap: pod. (0.0) | | | | | | | service = supdup: pod. (0.0) | | | | | | | service = Z39_50: pod. (0.0) | | | | | | | service = discard: pod. (0.0) | | | | | | | service = bgp: pod. (0.0) | | | | | | | service = kshell: pod. (0.0) | | | | | | | service = urp_i: pod. (0.0) | | | | | | | service = uucp: pod. (0.0) | | | | | | | service = uucp_path: pod. (0.0) | | | | | | | service = netstat: pod. (0.0) | | | | | | | service = nnsp: pod. (0.0) | | | | | | | service = http_443: pod. (0.0) | | | | | | | service = iso_tsap: pod. (0.0) | | | | | | | service = http: pod. (0.0) | | | | | | | service = pop_3: pod. (0.0) | | | | | | | service = other: pod. (0.0) | | | | | | | service = domain_u: pod. (0.0) | | | | | | | service = exec: pod. (0.0) | | | | | | | service = telnet: pod. (0.0) | | | | | | | service = ftp: pod. (0.0) | | | | | | | service = sql_net: pod. (0.0) | | | | | | | service = ftp_data: pod. (0.0) | | | | | | | service = smtp: pod. (0.0) | | | | | | | service = sunrpc: pod. (0.0) | | | | | | | service = shell: pod. (0.0) | | | | | | | service = vmnet: pod. (0.0) | | | | | | | service = netbios_dgm: pod. (0.0) | | | | | | | service = efs: pod. (0.0) | | | | | | | service = finger: pod. (0.0) | | | | | | | service = rje: pod. (0.0) | | | | | | | service = mtp: pod. (0.0) | | | | | | | service = imap4: pod. (0.0) | | | | | | | service = link: pod. (0.0) | | | | | | | service = auth: pod. (0.0) | | | | | | | service = domain: pod. (0.0) | | | | | | | service = nntp: pod. (0.0) | | | | | | | service = daytime: pod. (0.0) | | | | | | | service = gopher: pod. (0.0) | | | | | | | service = echo: pod. (0.0) | | | | | | | service = printer: pod. (0.0) | | | | | | | service = whois: pod. (0.0) | | | | | | | service = klogin: pod. (0.0)
77
Firoz Allahwali
| | | | | | | service = ecr: pod. (0.0) | | | | | | | service = netbios_ssn: pod. (0.0) | | | | | | | service = remote_job: pod. (0.0) | | | | | | | service = hostnames: pod. (0.0) | | | | | | | service = eco_i: saint. (2.0) | | | | | | | service = login: pod. (0.0) | | | | | | | service = pop_2: pod. (0.0) | | | | | | | service = netbios_ns: pod. (0.0) | | | | flag = S1: normal. (0.0) | | | | flag = S0 | | | | | dst_host_rerror_rate <= 0.28: mscan. (6.0/1.0) | | | | | dst_host_rerror_rate > 0.28: apache2. (3.0) | | | | flag = RSTO | | | | | src_bytes <= 55: mscan. (6.0) | | | | | src_bytes > 55: guess_passwd. (2.0) | | | | flag = S3: processtable. (6.0) | | | | flag = S2: normal. (0.0) | | | | flag = RSTR: portsweep. (7.0/2.0) | | | | flag = SH: nmap. (1.0) | | | duration > 2 | | | | src_bytes <= 55 | | | | | src_bytes <= 24 | | | | | | srv_rerror_rate <= 0.17: processtable. (16.0/1.0) | | | | | | srv_rerror_rate > 0.17: mscan. (5.0) | | | | | src_bytes > 24: guess_passwd. (73.0) | | | | src_bytes > 55 | | | | | duration <= 261: normal. (14.0) | | | | | duration > 261: warezmaster. (15.0/1.0) | | src_bytes > 2564 | | | num_compromised <= 0 | | | | src_bytes <= 2599: mailbomb. (99.0) | | | | src_bytes > 2599 | | | | | flag = REJ: normal. (0.0) | | | | | flag = SF | | | | | | duration <= 38: normal. (17.0/1.0) | | | | | | duration > 38: warezmaster. (12.0) | | | | | flag = S1: normal. (0.0) | | | | | flag = S0: normal. (0.0) | | | | | flag = RSTO: normal. (0.0) | | | | | flag = S3: normal. (0.0) | | | | | flag = S2: normal. (0.0) | | | | | flag = RSTR: apache2. (10.0) | | | | | flag = SH: normal. (0.0) | | | num_compromised > 0: back. (25.0) | count > 72 | | protocol_type = udp: normal. (28.0) | | protocol_type = tcp | | | service = private: neptune. (0.0) | | | service = ecr_i: neptune. (0.0) | | | service = ntp_u: neptune. (0.0) | | | service = X11: neptune. (0.0) | | | service = IRC: neptune. (0.0) | | | service = courier: neptune. (0.0) | | | service = ctf: neptune. (0.0) | | | service = ssh: neptune. (0.0) | | | service = time: neptune. (0.0)
78
Firoz Allahwali
| | | service = ldap: neptune. (0.0) | | | service = supdup: neptune. (0.0) | | | service = Z39_50: neptune. (0.0) | | | service = discard: neptune. (0.0) | | | service = bgp: neptune. (0.0) | | | service = kshell: neptune. (0.0) | | | service = urp_i: neptune. (0.0) | | | service = uucp: neptune. (0.0) | | | service = uucp_path: neptune. (0.0) | | | service = netstat: neptune. (0.0) | | | service = nnsp: neptune. (0.0) | | | service = http_443: neptune. (0.0) | | | service = iso_tsap: neptune. (0.0) | | | service = http: apache2. (2.0) | | | service = pop_3: neptune. (0.0) | | | service = other: neptune. (0.0) | | | service = domain_u: neptune. (0.0) | | | service = exec: neptune. (0.0) | | | service = telnet: neptune. (5.0) | | | service = ftp: neptune. (0.0) | | | service = sql_net: neptune. (0.0) | | | service = ftp_data: neptune. (0.0) | | | service = smtp: neptune. (0.0) | | | service = sunrpc: neptune. (0.0) | | | service = shell: neptune. (0.0) | | | service = vmnet: neptune. (0.0) | | | service = netbios_dgm: neptune. (0.0) | | | service = efs: neptune. (0.0) | | | service = finger: neptune. (0.0) | | | service = rje: neptune. (0.0) | | | service = mtp: neptune. (0.0) | | | service = imap4: neptune. (0.0) | | | service = link: neptune. (0.0) | | | service = auth: neptune. (0.0) | | | service = domain: neptune. (0.0) | | | service = nntp: neptune. (0.0) | | | service = daytime: neptune. (0.0) | | | service = gopher: neptune. (0.0) | | | service = echo: neptune. (0.0) | | | service = printer: neptune. (0.0) | | | service = whois: neptune. (0.0) | | | service = klogin: neptune. (0.0) | | | service = ecr: neptune. (0.0) | | | service = netbios_ssn: neptune. (0.0) | | | service = remote_job: neptune. (0.0) | | | service = hostnames: neptune. (0.0) | | | service = eco_i: neptune. (0.0) | | | service = login: neptune. (0.0) | | | service = pop_2: neptune. (0.0) | | | service = netbios_ns: neptune. (0.0) | | protocol_type = icmp: smurf. (3485.0) Number of Leaves : 229 Size of the tree : 270
79
Firoz Allahwali
Time taken to build model: 1.8 seconds
== Evaluation on test set ===
== Summary ===
Correctly Classified Instances 41 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances 41
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class 0 0 0 0 0 back. 0 0 0 0 0 snmpguess. 0 0 0 0 0 apache2. 0 0 0 0 0 httptunnel. 0 0 0 0 0 mscan. 0 0 0 0 0 saint. 0 0 0 0 0 snmpgetattack. 0 0 0 0 0 processtable. 1 0 1 1 1 mailbomb. 0 0 0 0 0 buffer_overflow. 0 0 0 0 0 ftp_write. 0 0 0 0 0 guess_passwd. 0 0 0 0 0 imap. 0 0 0 0 0 ipsweep. 0 0 0 0 0 land. 0 0 0 0 0 loadmodule. 0 0 0 0 0 multihop. 0 0 0 0 0 neptune. 0 0 0 0 0 nmap. 0 0 0 0 0 normal. 0 0 0 0 0 perl. 0 0 0 0 0 phf. 0 0 0 0 0 pod. 0 0 0 0 0 portsweep. 0 0 0 0 0 rootkit. 0 0 0 0 0 satan. 0 0 0 0 0 smurf.
80
Firoz Allahwali
0 0 0 0 0 spy. 0 0 0 0 0 teardrop. 0 0 0 0 0 warezclient. 0 0 0 0 0 warezmaster.
=== Confusion Matrix ===
a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae <-- classified as 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = back. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = snmpguess. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = apache2. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = httptunnel. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = mscan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = saint. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | g = snmpgetattack. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | h = processtable. 0 0 0 0 0 0 0 0 41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | i = mailbomb. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | j = buffer_overflow. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | k = ftp_write. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | l = guess_passwd. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | m = imap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | n = ipsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | o = land. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | p = loadmodule. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | q = multihop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | r = neptune. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | s = nmap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | t = normal. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | u = perl. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | v = phf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | w = pod. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | x = portsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | y = rootkit. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | z = satan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | aa = smurf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ab = spy. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ac = teardrop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ad = warezclient. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ae = warezmaster.
81
Firoz Allahwali
Testing using two types of Attacks (Mailbomb and Neptune):
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: ids Instances: 6600 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class Test mode: user supplied test set: 160 instances
82
Firoz Allahwali
=== Classifier model (full training set) === J48 pruned tree ------------------ same_srv_rate <= 0.21 | diff_srv_rate <= 0.23: neptune. (1244.0) | diff_srv_rate > 0.23 | | dst_host_same_srv_rate <= 0.01 | | | dst_host_rerror_rate <= 0.99 | | | | dst_host_serror_rate <= 0.11 | | | | | serror_rate <= 0.03: satan. (18.0/2.0) | | | | | serror_rate > 0.03: saint. (14.0) | | | | dst_host_serror_rate > 0.11: satan. (9.0) | | | dst_host_rerror_rate > 0.99: satan. (11.0) | | dst_host_same_srv_rate > 0.01: normal. (4.0/1.0) same_srv_rate > 0.21 | count <= 72 | | src_bytes <= 2564 | | | duration <= 2 | | | | flag = REJ | | | | | srv_diff_host_rate <= 0.4 | | | | | | srv_count <= 5: httptunnel. (2.0/1.0) | | | | | | srv_count > 5: neptune. (2.0/1.0) | | | | | srv_diff_host_rate > 0.4: mscan. (7.0) | | | | flag = SF | | | | | protocol_type = udp | | | | | | src_bytes <= 55 | | | | | | | service = private | | | | | | | | srv_count <= 8: snmpguess. (49.0) | | | | | | | | srv_count > 8: normal. (3.0) | | | | | | | service = ecr_i: normal. (0.0) | | | | | | | service = ntp_u: normal. (1.0) | | | | | | | service = X11: normal. (0.0) | | | | | | | service = IRC: normal. (0.0) | | | | | | | service = courier: normal. (0.0) | | | | | | | service = ctf: normal. (0.0) | | | | | | | service = ssh: normal. (0.0) | | | | | | | service = time: normal. (0.0) | | | | | | | service = ldap: normal. (0.0) | | | | | | | service = supdup: normal. (0.0) | | | | | | | service = Z39_50: normal. (0.0) | | | | | | | service = discard: normal. (0.0) | | | | | | | service = bgp: normal. (0.0) | | | | | | | service = kshell: normal. (0.0) | | | | | | | service = urp_i: normal. (0.0) | | | | | | | service = uucp: normal. (0.0) | | | | | | | service = uucp_path: normal. (0.0) | | | | | | | service = netstat: normal. (0.0) | | | | | | | service = nnsp: normal. (0.0) | | | | | | | service = http_443: normal. (0.0) | | | | | | | service = iso_tsap: normal. (0.0) | | | | | | | service = http: normal. (0.0) | | | | | | | service = pop_3: normal. (0.0) | | | | | | | service = other: normal. (0.0) | | | | | | | service = domain_u: normal. (54.0) | | | | | | | service = exec: normal. (0.0) | | | | | | | service = telnet: normal. (0.0) | | | | | | | service = ftp: normal. (0.0) | | | | | | | service = sql_net: normal. (0.0)
83
Firoz Allahwali
| | | | | | | service = ftp_data: normal. (0.0) | | | | | | | service = smtp: normal. (0.0) | | | | | | | service = sunrpc: normal. (0.0) | | | | | | | service = shell: normal. (0.0) | | | | | | | service = vmnet: normal. (0.0) | | | | | | | service = netbios_dgm: normal. (0.0) | | | | | | | service = efs: normal. (0.0) | | | | | | | service = finger: normal. (0.0) | | | | | | | service = rje: normal. (0.0) | | | | | | | service = mtp: normal. (0.0) | | | | | | | service = imap4: normal. (0.0) | | | | | | | service = link: normal. (0.0) | | | | | | | service = auth: normal. (0.0) | | | | | | | service = domain: normal. (0.0) | | | | | | | service = nntp: normal. (0.0) | | | | | | | service = daytime: normal. (0.0) | | | | | | | service = gopher: normal. (0.0) | | | | | | | service = echo: normal. (0.0) | | | | | | | service = printer: normal. (0.0) | | | | | | | service = whois: normal. (0.0) | | | | | | | service = klogin: normal. (0.0) | | | | | | | service = ecr: normal. (0.0) | | | | | | | service = netbios_ssn: normal. (0.0) | | | | | | | service = remote_job: normal. (0.0) | | | | | | | service = hostnames: normal. (0.0) | | | | | | | service = eco_i: normal. (0.0) | | | | | | | service = login: normal. (0.0) | | | | | | | service = pop_2: normal. (0.0) | | | | | | | service = netbios_ns: normal. (0.0) | | | | | | src_bytes > 55 | | | | | | | count <= 1 | | | | | | | | dst_host_srv_count <= 253 | | | | | | | | | dst_bytes <= 124: normal. (79.0/3.0) | | | | | | | | | dst_bytes > 124: snmpgetattack. (51.0/15.0) | | | | | | | | dst_host_srv_count > 253: normal. (106.0/48.0) | | | | | | | count > 1: normal. (175.0/57.0) | | | | | protocol_type = tcp | | | | | | num_failed_logins <= 0 | | | | | | | dst_host_srv_diff_host_rate <= 0.11 | | | | | | | | dst_host_same_srv_rate <= 0.04 | | | | | | | | | src_bytes <= 20: normal. (2.0) | | | | | | | | | src_bytes > 20: guess_passwd. (3.0) | | | | | | | | dst_host_same_srv_rate > 0.04 | | | | | | | | | dst_host_srv_count <= 5: warezmaster. (3.0/1.0) | | | | | | | | | dst_host_srv_count > 5: normal. (882.0) | | | | | | | dst_host_srv_diff_host_rate > 0.11 | | | | | | | | dst_bytes <= 15: warezmaster. (7.0) | | | | | | | | dst_bytes > 15: normal. (5.0) | | | | | | num_failed_logins > 0: guess_passwd. (13.0) | | | | | protocol_type = icmp | | | | | | srv_diff_host_rate <= 0.57 | | | | | | | dst_host_srv_diff_host_rate <= 0.67 | | | | | | | | count <= 13: normal. (5.0) | | | | | | | | count > 13: smurf. (2.0) | | | | | | | dst_host_srv_diff_host_rate > 0.67: ipsweep. (5.0) | | | | | | srv_diff_host_rate > 0.57 | | | | | | | service = private: pod. (0.0) | | | | | | | service = ecr_i: pod. (5.0) | | | | | | | service = ntp_u: pod. (0.0) | | | | | | | service = X11: pod. (0.0) | | | | | | | service = IRC: pod. (0.0) | | | | | | | service = courier: pod. (0.0)
84
Firoz Allahwali
| | | | | | | service = ctf: pod. (0.0) | | | | | | | service = ssh: pod. (0.0) | | | | | | | service = time: pod. (0.0) | | | | | | | service = ldap: pod. (0.0) | | | | | | | service = supdup: pod. (0.0) | | | | | | | service = Z39_50: pod. (0.0) | | | | | | | service = discard: pod. (0.0) | | | | | | | service = bgp: pod. (0.0) | | | | | | | service = kshell: pod. (0.0) | | | | | | | service = urp_i: pod. (0.0) | | | | | | | service = uucp: pod. (0.0) | | | | | | | service = uucp_path: pod. (0.0) | | | | | | | service = netstat: pod. (0.0) | | | | | | | service = nnsp: pod. (0.0) | | | | | | | service = http_443: pod. (0.0) | | | | | | | service = iso_tsap: pod. (0.0) | | | | | | | service = http: pod. (0.0) | | | | | | | service = pop_3: pod. (0.0) | | | | | | | service = other: pod. (0.0) | | | | | | | service = domain_u: pod. (0.0) | | | | | | | service = exec: pod. (0.0) | | | | | | | service = telnet: pod. (0.0) | | | | | | | service = ftp: pod. (0.0) | | | | | | | service = sql_net: pod. (0.0) | | | | | | | service = ftp_data: pod. (0.0) | | | | | | | service = smtp: pod. (0.0) | | | | | | | service = sunrpc: pod. (0.0) | | | | | | | service = shell: pod. (0.0) | | | | | | | service = vmnet: pod. (0.0) | | | | | | | service = netbios_dgm: pod. (0.0) | | | | | | | service = efs: pod. (0.0) | | | | | | | service = finger: pod. (0.0) | | | | | | | service = rje: pod. (0.0) | | | | | | | service = mtp: pod. (0.0) | | | | | | | service = imap4: pod. (0.0) | | | | | | | service = link: pod. (0.0) | | | | | | | service = auth: pod. (0.0) | | | | | | | service = domain: pod. (0.0) | | | | | | | service = nntp: pod. (0.0) | | | | | | | service = daytime: pod. (0.0) | | | | | | | service = gopher: pod. (0.0) | | | | | | | service = echo: pod. (0.0) | | | | | | | service = printer: pod. (0.0) | | | | | | | service = whois: pod. (0.0) | | | | | | | service = klogin: pod. (0.0) | | | | | | | service = ecr: pod. (0.0) | | | | | | | service = netbios_ssn: pod. (0.0) | | | | | | | service = remote_job: pod. (0.0) | | | | | | | service = hostnames: pod. (0.0) | | | | | | | service = eco_i: saint. (2.0) | | | | | | | service = login: pod. (0.0) | | | | | | | service = pop_2: pod. (0.0) | | | | | | | service = netbios_ns: pod. (0.0) | | | | flag = S1: normal. (0.0) | | | | flag = S0 | | | | | dst_host_rerror_rate <= 0.28: mscan. (6.0/1.0) | | | | | dst_host_rerror_rate > 0.28: apache2. (3.0) | | | | flag = RSTO | | | | | src_bytes <= 55: mscan. (6.0) | | | | | src_bytes > 55: guess_passwd. (2.0) | | | | flag = S3: processtable. (6.0) | | | | flag = S2: normal. (0.0)
85
Firoz Allahwali
| | | | flag = RSTR: portsweep. (7.0/2.0) | | | | flag = SH: nmap. (1.0) | | | duration > 2 | | | | src_bytes <= 55 | | | | | src_bytes <= 24 | | | | | | srv_rerror_rate <= 0.17: processtable. (16.0/1.0) | | | | | | srv_rerror_rate > 0.17: mscan. (5.0) | | | | | src_bytes > 24: guess_passwd. (73.0) | | | | src_bytes > 55 | | | | | duration <= 261: normal. (14.0) | | | | | duration > 261: warezmaster. (15.0/1.0) | | src_bytes > 2564 | | | num_compromised <= 0 | | | | src_bytes <= 2599: mailbomb. (99.0) | | | | src_bytes > 2599 | | | | | flag = REJ: normal. (0.0) | | | | | flag = SF | | | | | | duration <= 38: normal. (17.0/1.0) | | | | | | duration > 38: warezmaster. (12.0) | | | | | flag = S1: normal. (0.0) | | | | | flag = S0: normal. (0.0) | | | | | flag = RSTO: normal. (0.0) | | | | | flag = S3: normal. (0.0) | | | | | flag = S2: normal. (0.0) | | | | | flag = RSTR: apache2. (10.0) | | | | | flag = SH: normal. (0.0) | | | num_compromised > 0: back. (25.0) | count > 72 | | protocol_type = udp: normal. (28.0) | | protocol_type = tcp | | | service = private: neptune. (0.0) | | | service = ecr_i: neptune. (0.0) | | | service = ntp_u: neptune. (0.0) | | | service = X11: neptune. (0.0) | | | service = IRC: neptune. (0.0) | | | service = courier: neptune. (0.0) | | | service = ctf: neptune. (0.0) | | | service = ssh: neptune. (0.0) | | | service = time: neptune. (0.0) | | | service = ldap: neptune. (0.0) | | | service = supdup: neptune. (0.0) | | | service = Z39_50: neptune. (0.0) | | | service = discard: neptune. (0.0) | | | service = bgp: neptune. (0.0) | | | service = kshell: neptune. (0.0) | | | service = urp_i: neptune. (0.0) | | | service = uucp: neptune. (0.0) | | | service = uucp_path: neptune. (0.0) | | | service = netstat: neptune. (0.0) | | | service = nnsp: neptune. (0.0) | | | service = http_443: neptune. (0.0) | | | service = iso_tsap: neptune. (0.0) | | | service = http: apache2. (2.0) | | | service = pop_3: neptune. (0.0) | | | service = other: neptune. (0.0) | | | service = domain_u: neptune. (0.0) | | | service = exec: neptune. (0.0) | | | service = telnet: neptune. (5.0) | | | service = ftp: neptune. (0.0) | | | service = sql_net: neptune. (0.0) | | | service = ftp_data: neptune. (0.0) | | | service = smtp: neptune. (0.0)
86
Firoz Allahwali
| | | service = sunrpc: neptune. (0.0) | | | service = shell: neptune. (0.0) | | | service = vmnet: neptune. (0.0) | | | service = netbios_dgm: neptune. (0.0) | | | service = efs: neptune. (0.0) | | | service = finger: neptune. (0.0) | | | service = rje: neptune. (0.0) | | | service = mtp: neptune. (0.0) | | | service = imap4: neptune. (0.0) | | | service = link: neptune. (0.0) | | | service = auth: neptune. (0.0) | | | service = domain: neptune. (0.0) | | | service = nntp: neptune. (0.0) | | | service = daytime: neptune. (0.0) | | | service = gopher: neptune. (0.0) | | | service = echo: neptune. (0.0) | | | service = printer: neptune. (0.0) | | | service = whois: neptune. (0.0) | | | service = klogin: neptune. (0.0) | | | service = ecr: neptune. (0.0) | | | service = netbios_ssn: neptune. (0.0) | | | service = remote_job: neptune. (0.0) | | | service = hostnames: neptune. (0.0) | | | service = eco_i: neptune. (0.0) | | | service = login: neptune. (0.0) | | | service = pop_2: neptune. (0.0) | | | service = netbios_ns: neptune. (0.0) | | protocol_type = icmp: smurf. (3485.0) Number of Leaves : 229 Size of the tree : 270 Time taken to build model: 1.59 seconds === Evaluation on test set === === Summary === Correctly Classified Instances 160 100 % Incorrectly Classified Instances 0 0 % Kappa statistic 1 Mean absolute error 0 Root mean squared error 0 Relative absolute error 0 % Root relative squared error 0 % Total Number of Instances 160 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0 0 0 0 0 back. 0 0 0 0 0 snmpguess. 0 0 0 0 0 apache2. 0 0 0 0 0 httptunnel.
87
Firoz Allahwali
0 0 0 0 0 mscan. 0 0 0 0 0 saint. 0 0 0 0 0 snmpgetattack. 0 0 0 0 0 processtable. 1 0 1 1 1 mailbomb. 0 0 0 0 0 buffer_overflow. 0 0 0 0 0 ftp_write. 0 0 0 0 0 guess_passwd. 0 0 0 0 0 imap. 0 0 0 0 0 ipsweep. 0 0 0 0 0 land. 0 0 0 0 0 loadmodule. 0 0 0 0 0 multihop. 1 0 1 1 1 neptune. 0 0 0 0 0 nmap. 0 0 0 0 0 normal. 0 0 0 0 0 perl. 0 0 0 0 0 phf. 0 0 0 0 0 pod. 0 0 0 0 0 portsweep. 0 0 0 0 0 rootkit. 0 0 0 0 0 satan. 0 0 0 0 0 smurf. 0 0 0 0 0 spy. 0 0 0 0 0 teardrop. 0 0 0 0 0 warezclient. 0 0 0 0 0 warezmaster. === Confusion Matrix === a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae <-- classified as 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = back. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = snmpguess. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = apache2. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = httptunnel. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = mscan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = saint. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | g = snmpgetattack. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | h = processtable. 0 0 0 0 0 0 0 0 41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | i = mailbomb. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | j = buffer_overflow. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | k = ftp_write. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | l = guess_passwd. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | m = imap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | n = ipsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | o = land. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | p = loadmodule. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | q = multihop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 119 0 0 0 0 0 0 0 0 0 0 0 0 0 | r = neptune. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | s = nmap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | t = normal. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | u = perl. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | v = phf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | w = pod. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | x = portsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | y = rootkit. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | z = satan.
88
Firoz Allahwali
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | aa = smurf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ab = spy. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ac = teardrop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ad = warezclient. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ae = warezmaster.
Testing using three types of Attacks(Mailbomb, Neptune and Smurf):
=== Run information === Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2 Relation: ids Instances: 6600 Attributes: 42 duration protocol_type service flag src_bytes dst_bytes land wrong_fragment urgent hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbound_cmds is_host_login is_guest_login count srv_count serror_rate srv_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate dst_host_count dst_host_srv_count dst_host_same_srv_rate dst_host_diff_srv_rate dst_host_same_src_port_rate dst_host_srv_diff_host_rate dst_host_serror_rate
89
Firoz Allahwali
dst_host_srv_serror_rate dst_host_rerror_rate dst_host_srv_rerror_rate class Test mode: user supplied test set: 237 instances === Classifier model (full training set) === J48 pruned tree ------------------ same_srv_rate <= 0.21 | diff_srv_rate <= 0.23: neptune. (1244.0) | diff_srv_rate > 0.23 | | dst_host_same_srv_rate <= 0.01 | | | dst_host_rerror_rate <= 0.99 | | | | dst_host_serror_rate <= 0.11 | | | | | serror_rate <= 0.03: satan. (18.0/2.0) | | | | | serror_rate > 0.03: saint. (14.0) | | | | dst_host_serror_rate > 0.11: satan. (9.0) | | | dst_host_rerror_rate > 0.99: satan. (11.0) | | dst_host_same_srv_rate > 0.01: normal. (4.0/1.0) same_srv_rate > 0.21 | count <= 72 | | src_bytes <= 2564 | | | duration <= 2 | | | | flag = REJ | | | | | srv_diff_host_rate <= 0.4 | | | | | | srv_count <= 5: httptunnel. (2.0/1.0) | | | | | | srv_count > 5: neptune. (2.0/1.0) | | | | | srv_diff_host_rate > 0.4: mscan. (7.0) | | | | flag = SF | | | | | protocol_type = udp | | | | | | src_bytes <= 55 | | | | | | | service = private | | | | | | | | srv_count <= 8: snmpguess. (49.0) | | | | | | | | srv_count > 8: normal. (3.0) | | | | | | | service = ecr_i: normal. (0.0) | | | | | | | service = ntp_u: normal. (1.0) | | | | | | | service = X11: normal. (0.0) | | | | | | | service = IRC: normal. (0.0) | | | | | | | service = courier: normal. (0.0) | | | | | | | service = ctf: normal. (0.0) | | | | | | | service = ssh: normal. (0.0) | | | | | | | service = time: normal. (0.0) | | | | | | | service = ldap: normal. (0.0) | | | | | | | service = supdup: normal. (0.0) | | | | | | | service = Z39_50: normal. (0.0) | | | | | | | service = discard: normal. (0.0) | | | | | | | service = bgp: normal. (0.0) | | | | | | | service = kshell: normal. (0.0) | | | | | | | service = urp_i: normal. (0.0) | | | | | | | service = uucp: normal. (0.0) | | | | | | | service = uucp_path: normal. (0.0) | | | | | | | service = netstat: normal. (0.0) | | | | | | | service = nnsp: normal. (0.0)
90
Firoz Allahwali
| | | | | | | service = http_443: normal. (0.0) | | | | | | | service = iso_tsap: normal. (0.0) | | | | | | | service = http: normal. (0.0) | | | | | | | service = pop_3: normal. (0.0) | | | | | | | service = other: normal. (0.0) | | | | | | | service = domain_u: normal. (54.0) | | | | | | | service = exec: normal. (0.0) | | | | | | | service = telnet: normal. (0.0) | | | | | | | service = ftp: normal. (0.0) | | | | | | | service = sql_net: normal. (0.0) | | | | | | | service = ftp_data: normal. (0.0) | | | | | | | service = smtp: normal. (0.0) | | | | | | | service = sunrpc: normal. (0.0) | | | | | | | service = shell: normal. (0.0) | | | | | | | service = vmnet: normal. (0.0) | | | | | | | service = netbios_dgm: normal. (0.0) | | | | | | | service = efs: normal. (0.0) | | | | | | | service = finger: normal. (0.0) | | | | | | | service = rje: normal. (0.0) | | | | | | | service = mtp: normal. (0.0) | | | | | | | service = imap4: normal. (0.0) | | | | | | | service = link: normal. (0.0) | | | | | | | service = auth: normal. (0.0) | | | | | | | service = domain: normal. (0.0) | | | | | | | service = nntp: normal. (0.0) | | | | | | | service = daytime: normal. (0.0) | | | | | | | service = gopher: normal. (0.0) | | | | | | | service = echo: normal. (0.0) | | | | | | | service = printer: normal. (0.0) | | | | | | | service = whois: normal. (0.0) | | | | | | | service = klogin: normal. (0.0) | | | | | | | service = ecr: normal. (0.0) | | | | | | | service = netbios_ssn: normal. (0.0) | | | | | | | service = remote_job: normal. (0.0) | | | | | | | service = hostnames: normal. (0.0) | | | | | | | service = eco_i: normal. (0.0) | | | | | | | service = login: normal. (0.0) | | | | | | | service = pop_2: normal. (0.0) | | | | | | | service = netbios_ns: normal. (0.0) | | | | | | src_bytes > 55 | | | | | | | count <= 1 | | | | | | | | dst_host_srv_count <= 253 | | | | | | | | | dst_bytes <= 124: normal. (79.0/3.0) | | | | | | | | | dst_bytes > 124: snmpgetattack. (51.0/15.0) | | | | | | | | dst_host_srv_count > 253: normal. (106.0/48.0) | | | | | | | count > 1: normal. (175.0/57.0) | | | | | protocol_type = tcp | | | | | | num_failed_logins <= 0 | | | | | | | dst_host_srv_diff_host_rate <= 0.11 | | | | | | | | dst_host_same_srv_rate <= 0.04 | | | | | | | | | src_bytes <= 20: normal. (2.0) | | | | | | | | | src_bytes > 20: guess_passwd. (3.0) | | | | | | | | dst_host_same_srv_rate > 0.04 | | | | | | | | | dst_host_srv_count <= 5: warezmaster. (3.0/1.0) | | | | | | | | | dst_host_srv_count > 5: normal. (882.0) | | | | | | | dst_host_srv_diff_host_rate > 0.11
91
Firoz Allahwali
| | | | | | | | dst_bytes <= 15: warezmaster. (7.0) | | | | | | | | dst_bytes > 15: normal. (5.0) | | | | | | num_failed_logins > 0: guess_passwd. (13.0) | | | | | protocol_type = icmp | | | | | | srv_diff_host_rate <= 0.57 | | | | | | | dst_host_srv_diff_host_rate <= 0.67 | | | | | | | | count <= 13: normal. (5.0) | | | | | | | | count > 13: smurf. (2.0) | | | | | | | dst_host_srv_diff_host_rate > 0.67: ipsweep. (5.0) | | | | | | srv_diff_host_rate > 0.57 | | | | | | | service = private: pod. (0.0) | | | | | | | service = ecr_i: pod. (5.0) | | | | | | | service = ntp_u: pod. (0.0) | | | | | | | service = X11: pod. (0.0) | | | | | | | service = IRC: pod. (0.0) | | | | | | | service = courier: pod. (0.0) | | | | | | | service = ctf: pod. (0.0) | | | | | | | service = ssh: pod. (0.0) | | | | | | | service = time: pod. (0.0) | | | | | | | service = ldap: pod. (0.0) | | | | | | | service = supdup: pod. (0.0) | | | | | | | service = Z39_50: pod. (0.0) | | | | | | | service = discard: pod. (0.0) | | | | | | | service = bgp: pod. (0.0) | | | | | | | service = kshell: pod. (0.0) | | | | | | | service = urp_i: pod. (0.0) | | | | | | | service = uucp: pod. (0.0) | | | | | | | service = uucp_path: pod. (0.0) | | | | | | | service = netstat: pod. (0.0) | | | | | | | service = nnsp: pod. (0.0) | | | | | | | service = http_443: pod. (0.0) | | | | | | | service = iso_tsap: pod. (0.0) | | | | | | | service = http: pod. (0.0) | | | | | | | service = pop_3: pod. (0.0) | | | | | | | service = other: pod. (0.0) | | | | | | | service = domain_u: pod. (0.0) | | | | | | | service = exec: pod. (0.0) | | | | | | | service = telnet: pod. (0.0) | | | | | | | service = ftp: pod. (0.0) | | | | | | | service = sql_net: pod. (0.0) | | | | | | | service = ftp_data: pod. (0.0) | | | | | | | service = smtp: pod. (0.0) | | | | | | | service = sunrpc: pod. (0.0) | | | | | | | service = shell: pod. (0.0) | | | | | | | service = vmnet: pod. (0.0) | | | | | | | service = netbios_dgm: pod. (0.0) | | | | | | | service = efs: pod. (0.0) | | | | | | | service = finger: pod. (0.0) | | | | | | | service = rje: pod. (0.0) | | | | | | | service = mtp: pod. (0.0) | | | | | | | service = imap4: pod. (0.0) | | | | | | | service = link: pod. (0.0) | | | | | | | service = auth: pod. (0.0) | | | | | | | service = domain: pod. (0.0) | | | | | | | service = nntp: pod. (0.0) | | | | | | | service = daytime: pod. (0.0)
92
Firoz Allahwali
| | | | | | | service = gopher: pod. (0.0) | | | | | | | service = echo: pod. (0.0) | | | | | | | service = printer: pod. (0.0) | | | | | | | service = whois: pod. (0.0) | | | | | | | service = klogin: pod. (0.0) | | | | | | | service = ecr: pod. (0.0) | | | | | | | service = netbios_ssn: pod. (0.0) | | | | | | | service = remote_job: pod. (0.0) | | | | | | | service = hostnames: pod. (0.0) | | | | | | | service = eco_i: saint. (2.0) | | | | | | | service = login: pod. (0.0) | | | | | | | service = pop_2: pod. (0.0) | | | | | | | service = netbios_ns: pod. (0.0) | | | | flag = S1: normal. (0.0) | | | | flag = S0 | | | | | dst_host_rerror_rate <= 0.28: mscan. (6.0/1.0) | | | | | dst_host_rerror_rate > 0.28: apache2. (3.0) | | | | flag = RSTO | | | | | src_bytes <= 55: mscan. (6.0) | | | | | src_bytes > 55: guess_passwd. (2.0) | | | | flag = S3: processtable. (6.0) | | | | flag = S2: normal. (0.0) | | | | flag = RSTR: portsweep. (7.0/2.0) | | | | flag = SH: nmap. (1.0) | | | duration > 2 | | | | src_bytes <= 55 | | | | | src_bytes <= 24 | | | | | | srv_rerror_rate <= 0.17: processtable. (16.0/1.0) | | | | | | srv_rerror_rate > 0.17: mscan. (5.0) | | | | | src_bytes > 24: guess_passwd. (73.0) | | | | src_bytes > 55 | | | | | duration <= 261: normal. (14.0) | | | | | duration > 261: warezmaster. (15.0/1.0) | | src_bytes > 2564 | | | num_compromised <= 0 | | | | src_bytes <= 2599: mailbomb. (99.0) | | | | src_bytes > 2599 | | | | | flag = REJ: normal. (0.0) | | | | | flag = SF | | | | | | duration <= 38: normal. (17.0/1.0) | | | | | | duration > 38: warezmaster. (12.0) | | | | | flag = S1: normal. (0.0) | | | | | flag = S0: normal. (0.0) | | | | | flag = RSTO: normal. (0.0) | | | | | flag = S3: normal. (0.0) | | | | | flag = S2: normal. (0.0) | | | | | flag = RSTR: apache2. (10.0) | | | | | flag = SH: normal. (0.0) | | | num_compromised > 0: back. (25.0) | count > 72 | | protocol_type = udp: normal. (28.0) | | protocol_type = tcp | | | service = private: neptune. (0.0) | | | service = ecr_i: neptune. (0.0) | | | service = ntp_u: neptune. (0.0) | | | service = X11: neptune. (0.0)
93
Firoz Allahwali
| | | service = IRC: neptune. (0.0) | | | service = courier: neptune. (0.0) | | | service = ctf: neptune. (0.0) | | | service = ssh: neptune. (0.0) | | | service = time: neptune. (0.0) | | | service = ldap: neptune. (0.0) | | | service = supdup: neptune. (0.0) | | | service = Z39_50: neptune. (0.0) | | | service = discard: neptune. (0.0) | | | service = bgp: neptune. (0.0) | | | service = kshell: neptune. (0.0) | | | service = urp_i: neptune. (0.0) | | | service = uucp: neptune. (0.0) | | | service = uucp_path: neptune. (0.0) | | | service = netstat: neptune. (0.0) | | | service = nnsp: neptune. (0.0) | | | service = http_443: neptune. (0.0) | | | service = iso_tsap: neptune. (0.0) | | | service = http: apache2. (2.0) | | | service = pop_3: neptune. (0.0) | | | service = other: neptune. (0.0) | | | service = domain_u: neptune. (0.0) | | | service = exec: neptune. (0.0) | | | service = telnet: neptune. (5.0) | | | service = ftp: neptune. (0.0) | | | service = sql_net: neptune. (0.0) | | | service = ftp_data: neptune. (0.0) | | | service = smtp: neptune. (0.0) | | | service = sunrpc: neptune. (0.0) | | | service = shell: neptune. (0.0) | | | service = vmnet: neptune. (0.0) | | | service = netbios_dgm: neptune. (0.0) | | | service = efs: neptune. (0.0) | | | service = finger: neptune. (0.0) | | | service = rje: neptune. (0.0) | | | service = mtp: neptune. (0.0) | | | service = imap4: neptune. (0.0) | | | service = link: neptune. (0.0) | | | service = auth: neptune. (0.0) | | | service = domain: neptune. (0.0) | | | service = nntp: neptune. (0.0) | | | service = daytime: neptune. (0.0) | | | service = gopher: neptune. (0.0) | | | service = echo: neptune. (0.0) | | | service = printer: neptune. (0.0) | | | service = whois: neptune. (0.0) | | | service = klogin: neptune. (0.0) | | | service = ecr: neptune. (0.0) | | | service = netbios_ssn: neptune. (0.0) | | | service = remote_job: neptune. (0.0) | | | service = hostnames: neptune. (0.0) | | | service = eco_i: neptune. (0.0) | | | service = login: neptune. (0.0) | | | service = pop_2: neptune. (0.0) | | | service = netbios_ns: neptune. (0.0) | | protocol_type = icmp: smurf. (3485.0)
94
Firoz Allahwali
Number of Leaves : 229 Size of the tree : 270 Time taken to build model: 1.61 seconds === Evaluation on test set === === Summary === Correctly Classified Instances 236 99.5781 % Incorrectly Classified Instances 1 0.4219 % Kappa statistic 0.9931 Mean absolute error 0.0003 Root mean squared error 0.0165 Relative absolute error 0.5767 % Root relative squared error 10.179 % Total Number of Instances 237 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0 0 0 0 0 back. 0 0 0 0 0 snmpguess. 0 0 0 0 0 apache2. 0 0 0 0 0 httptunnel. 0 0 0 0 0 mscan. 0 0 0 0 0 saint. 0 0 0 0 0 snmpgetattack. 0 0 0 0 0 processtable. 1 0 1 1 1 mailbomb. 0 0 0 0 0 buffer_overflow. 0 0 0 0 0 ftp_write. 0 0 0 0 0 guess_passwd. 0 0 0 0 0 imap. 0 0 0 0 0 ipsweep. 0 0 0 0 0 land. 0 0 0 0 0 loadmodule. 0 0 0 0 0 multihop. 1 0 1 1 1 neptune. 0 0 0 0 0 nmap. 0 0.004 0 0 0 normal. 0 0 0 0 0 perl. 0 0 0 0 0 phf. 0 0 0 0 0 pod. 0 0 0 0 0 portsweep. 0 0 0 0 0 rootkit. 0 0 0 0 0 satan. 0.987 0 1 0.987 0.993 smurf. 0 0 0 0 0 spy. 0 0 0 0 0 teardrop.
95
Firoz Allahwali
0 0 0 0 0 warezclient. 0 0 0 0 0 warezmaster. === Confusion Matrix === a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae <-- classified as 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = back. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | b = snmpguess. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = apache2. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = httptunnel. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = mscan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = saint. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | g = snmpgetattack. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | h = processtable. 0 0 0 0 0 0 0 0 41 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | i = mailbomb. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | j = buffer_overflow. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | k = ftp_write. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | l = guess_passwd. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | m = imap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | n = ipsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | o = land. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | p = loadmodule. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | q = multihop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 119 0 0 0 0 0 0 0 0 0 0 0 0 0 | r = neptune. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | s = nmap. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | t = normal. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | u = perl. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | v = phf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | w = pod. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | x = portsweep. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | y = rootkit. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | z = satan. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 76 0 0 0 0 | aa = smurf. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ab = spy. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ac = teardrop. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ad = warezclient. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | ae = warezmaster.
96
Firoz Allahwali
BIBLIOGRAPHY
[Agarwal 1993] Agarwal,R.,Imielinski,T.,and Swami,A. Mining Associations between Sets
of Items in Massive Databases. In proceedings of the ACM-SIGMOD 1993
International Conference on Management of Data, pages 207-216.
[Allen 2000]Allen ,J.,Christie,A.,Fithen,W.,McHugh,J.,Pickel,J.,and Stoner, E. State of the
Practice of Intrusion Detection Technologies. Technical report, Carnegie Mellon
University.
[Bace 2000] Rebecca Gurley Bace. Intrusion Detection. Macmillan Computer Publishing
(MCP), Indianopolis. 2000.
[Barbara 2001] Daniel Barbara and Sushil Jajodia. Applications of Data Mining in Computer
Security. Kluwer Academic Publishers, Norrwell, MA. 2002.
[Bro-ids.org] bro-ids.org. Home page. Available from www.bro-ids.org(Visited on Jan 24,
2006).
[Fayyad et al. 1996] Fayyad, U. M., Piatetsky-Shapiro, G., and Smyth, P. (1996c) From data
mining to knowledge discovery: an overview. In Fayyad, U. M., Piatetsky-Shapiro,
G., Smyth, P., and Uthurusamy, R., editors, Advances in Knowledge Discovery and
Data Mining, pages 1-34. AAAI Press / MIT Press, Menlo Park, CA.
97
Firoz Allahwali
[Goebel 1999]Michael Goebel, Le Gruenwald: A Survey of Data Mining and knowledge
Discovery Software Tools. SIGKDD Explorations 1(1): 20-33 (1999)
[Ian 2005] Witten, Ian H; Frank, Eibe. Data Mining Practical Machine Learning Tools and
Techniques. Morgan Kauffman Publishers, San Francisco ,CA ,2005.
[Lee 2002] Wenke Lee. Applying Data Mining to Intrusion Detection: The Quest for
Automation, Efficiency, and Credibility. SIGKDD Explorations, 4(2), December
2002.
[Mannila 2002] Mannila H., Local and Global Methods in Data Mining: Basic Techniques
and Open Problems, proceedings of 29th International Colloquium on Automata,
Languages and Programming, Lecture Notes on Computer Science, pages 57-68,
Springer-Verlag, 2002
[Robert] Traffic. Homepage. Available from http://robert.rsa3.com/traffic.html
[Sandhya ] Peddabachigari,Sandhya : Abraham, Ajith: Thomas, Johnson. Intrusion Detection
Systems Using Decision Trees and Support Vector Machines. Information
Management Journal v.4: 635-660, 2001.
[Snort.] Snort.org. Home page. Available from www.snort.org (Visited on Jan15, 2005).
98
Firoz Allahwali
[Stolfo 1998] Wenke Lee and Sal Stolfo. Data Mining Approaches for Intrusion Detection.
Proceedings of the Seventh USENIX Security Symposium (SECURITY '98), San
Antonio, TX, January 1998.
[TCPreplay] TCPreplay. Homepage. Available from http://tcpreplay.synfin.net/
[Weka] Weka. Home page. Available from http://www.cs.waikato.ac.nz/ml/weka/
[Zhang 2001] Zhang,Junxin ., Lee,Wenke ., Stolfo,Sal ., Chan,Phil ., Eleazar Eskin, Wei
Fan, Matt Miller, Shlomo Hershkop. Real Time Data Mining-based Intrusion
Detection. In Proceedings of the 2001 DARPA Information Survivability Conference
and Exposition (DISCEX II) (selected for presentation), Anaheim, CA, June 2001.
99