Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
Synopsis on
Anomaly based Network Intrusion
Detection System
Submitted by : Dinakara K (06CS6026) MTech (CSE) 2nd Year
Under the guidance of : Prof. Jayanta Mukhopadhyay
Dept. of CSE
Prof. S K Ghosh School of IT
Indian Institute Of Technology, Kharagpur
Version 1.00 Date: 30 April 2008
2
Table of Contents 1 INTRODUCTION................................................................................................ 3
1.1 ORGANIZATION OF THESIS .............................................................................................. 4
1.2 RELATED WORK................................................................................................................. 4
2 MOTIVATION AND OBJECTIVE ..................................................................... 5
2.1 OBJECTIVE .......................................................................................................................... 5
3 SYSTEM ARCHITECTURE ................................................................................ 6
3.1 OPERATING ENVIRONMENT............................................................................................. 8
4 APPROACH........................................................................................................... 9
4.1 STATISTICAL MOMENTS OR “MEAN AND STANDARD DEVIATION MODEL”.......... 11
4.2 HOTELLING’S T2 HYPOTHESIS, A MULTIVARIATE STATISTICAL TECHNIQUE[10]..... 11
4.3 BAYESIAN CLASSIFICATION, A PROBABILISTIC TECHNIQUE ..................................... 12
5 EXPERIMENTAL RESULTS AND DISCUSSION...........................................13
5.1 DISCUSSION....................................................................................................................... 15
5.2 CONCLUSION .................................................................................................................... 15
6 BIBLIOGRAPHY.................................................................................................16
3
1 INTRODUCTION
Internet is forcing organizations into an era of open and trusted communications. This
openness at the same time brings its share of vulnerabilities and problems such as financial
losses, damage to reputation, maintaining availability of services, protecting the personal and
customer data and many more, pushing both enterprises and service providers to take steps
to guard their valuable data from intruders, hackers and insiders. Intrusion Detection System
has become the fundamental need for the successful content networking.
IDS provide two primary benefits: Visibility and Control. It is the combination of
these two benefits that makes it possible to create and enforce an enterprise security policy to
make the private computer network secure. Visibility is the ability to see and understand the
nature of the network and the traffic on the network while Control is the ability to affect
network traffic including access to the network or parts thereof. Visibility is paramount to
decision making and makes it possible to create a security policy based on quantifiable, real
world data. Control is paramount to enforcement and makes it possible to enforce
compliance with security policy.
There are two general approaches to detecting intrusions: anomaly detection (also
called behavior-based) and signature based (also named misuse or pattern based) [1]. Signature
based techniques identify and store signature patterns of known intrusions, match activities
in an information system with known patterns of intrusion signatures, and signal intrusions
when there is a match. Pattern recognition techniques are efficient and accurate in detecting
known intrusions, but cannot detect novel intrusions whose signature patterns are unknown.
Anomaly detection techniques can detect both novel and known attacks if they demonstrate
large differences from the norm profile. Since anomaly detection techniques signal all
anomalies as intrusions, false alarms are expected when anomalies are caused by behavioral
irregularity instead of intrusions. Hence, pattern recognition techniques and anomaly
detection techniques are often used together to complement each other.
In the research work, a Anomaly based IDS is designed and developed which is
integrated with the open source signature based network IDS, called SNORT [2] to give best
results. In the Anomaly based IDS, three different techniques are used for detecting the
network based attacks leading to abnormal network activities. The IDS is evaluated with
MIT_LL DARPA 1999 Data set [3] and a comparative analysis of the techniques is given in
the later section of the synopsis.
4
1.1 ORGANIZATION OF THESIS
The synopsis covers the work accomplished so far in the realization of the Anomaly
based network intrusion detection system. It is organized as follows. Section 2 gives
Motivation and Objective for taking up the project. Section 3 deals with the system
architecture of the Anomaly based Network IDS. Section 4 presents the Approach followed
in executing the project and Section 5 gives the experimental results and conclusion.
1.2 RELATED WORK
Network intrusion detection systems like snort (2001) or Bro (Paxson, 1998) typically
use signature detection, matching patterns in network traffic to the patterns of known attacks.
This works well, but has the obvious disadvantage of being vulnerable to novel attacks. An
alternative approach is anomaly detection, which models normal traffic and signals any
deviation from this model as suspicious. The idea is based on work by Forrest et al. (1996),
who found that most UNIX processes make (mostly) highly predictable sequences of system
calls in normal use.
Network anomaly detectors look for unusual traffic rather than unusual system calls.
ADAM (Audit Data and Mining) [4] is an anomaly detector trained on both attack-free traffic
and traffic with labeled attacks. It monitors port numbers, IP addresses and subnets, and
TCP state. ADAM uses a naive Bayes classifier which means that the probability that a
packet belongs to some class (normal, known attack, or unknown) depends on the a-priori
probability of the class, and the combined probabilities of a large collection of rules under
the assumption that they are independent.
Matthew V. Mahoney and Philip K. Chan developed “Packet Header Anomaly
detection for identifying Hostile Network (PHAD)” [5],[7] that learns the normal ranges of
values for each packet header field at the data link (Ethernet), network (IP), and
transport/control layers (TCP, UDP, ICMP). PHAD detects some of the attacks in the
DARPA data set that involve exploits at the transport layer and below.
The paper, “Detecting Novel Network Intrusions Using Bayes Estimators” [6] authored
by Daniel Barbara and et al suggests a method called pseudo-Bayes estimators as a means to
estimate the prior and posterior probabilities of new attacks. Then a Naive Bayes classifier is
used to classify the instances into normal instances, known attacks and new attacks.
5
2 MOTIVATION AND OBJECTIVE
Despite the fact that intrusion detection systems are commercially developed and used
for more than a decade, there still exist many issues around IDS. Some of the shortcomings
of the current IDS which handicap its effectiveness are discussed below.
a) Only the known attacks are detected in signature based techniques which simply means
no protection is offered against novel attacks or new variants of existing intrusions.
b) How well a signature captures the attacks in its string is again a matter of concern.
There is quite some number of such poorly written signature codes. So the actual attack
pattern may stretch across multiple packets, easily evading the detection system
c) In order to perform an exhaustive signature based search, the processing and memory
needs are very high and in the real time scenario, there is quite likely hood of missing genuine
attacks. Also, there is the problem of ever increasing attack signature databases.
d) In anomaly approach, though new kinds of intrusions are detected, this benefit is
paralyzed by high number of false alarms. More over improper/insufficient training to
anomaly module results in showing the genuine changes in the network traffic pattern as
suspicious activities only to raise the false positives and false negatives.
2.1 OBJECTIVE
The aim of the present work was to design and develop of a Anomaly or behavioral
based Network Intrusion Detection System which can detect intrusions based on behavioral
patterns (i.e. without the use of signatures) and can also detect novel attacks which are
anomalous in nature.
The work also aimed at reducing number of false alarms by characterizing the target
network with appropriate network parameters and analyzing them with mathematical
models.
Literature survey reveals that, the Bayesian Analysis is successfully used in the SPAM
filters but in the area of IDS it is still not explored to great extent. So in this work, Bayesian
classification technique is used for discriminating the anomalous attacks from that of normal
activities. Hotelling’s Multivariate statistical hypothesis technique is also being used.
The project is integrated with a open source signature based IDS called SNORT so
that it forms a complete package having both signature and anomaly techniques for effective
defense against the Network attacks
6
3 SYSTEM ARCHITECTURE
The proposed architecture of Network IDS has various components as depicted in the
figure 1. This architecture is based on SNORT, which is a open source Network IDS [8]. The
components execute different functionalities which are discussed below.
Figure 1. The overall system architecture
Sensor/Decoder The NIC is put in promiscuous mode to sniff all the packets in the network
irrespective of their target. The decoder receives the packets from the libpcap packet
capturing library and processes them. This module executes following functionalities.
- Sniffs all the network packets visible to it in real time.
- Extract the header and payload information from the Ethernet packet.
- Updates the Ethernet/ARP/RARP/IP/TCP/UDP and ICMP counter as and when
the respective packets are received
- Perform necessary checks on header and payload information.
- Sniffed packets sent to the Preprocessor
Preprocessor This module takes the packets from the decoder and performs the functions like IP
defragmentation, building the sessions for reassembly of packets etc. Several preprocessor
are available with SNORT to execute the necessary tasks. This module also hosts the
Anomaly learning and detection preprocessor used for detecting the intrusions leading to
anomalies. Figure 2 shows the structure of Anomaly detection preprocessor.
7
Figure 2. Anomaly Detection preprocessor
The preprocessor has following responsibilites :
- Defragments the fragmented IP packets
- Reassembles the TCP packets into streams
- Normalizes Application Layer protocols like Telnet/HTTP
- Detects Port scans/Evasion Attacks
- Preprocessed packets sent to Detection Engine
- Anomaly Detection preprocessor detects the intrusive activities in the network
Detection Engine It is the main part of the entire system which is responsible for detecting the attack
signatures in the preprocessed packets. The overall system performance directly depends on
this module. Some of the main functions handled by this module are listed below.
- Parse the rules and build an internal data structure that holds the rules in a
customized tree structure. Once the tree is built, load it into memory.
- Pass traffic through this rule tree for comparing the packet header and data against
the rules. (Uses strings matching algorithms)
- Report to the alert module on packets that have found to be carrying malicious data.
- If any new rules have been added or if existing rules are modified or deleted then
update the same to the detection engine tree structure.
- When the application is exited this will clean up all memory allocated for building the
detection engine.
Alert Module - Sends the alerts triggered by the Detection Engine to Alert Console in real time.
- Stores the alerts into alert file or into a Database as per the configuration
8
Open source php based console, called Basic Analysis and Security Engine (BASE) is
integrated with the Alert Module to enhance the user friendliness. The figure 3 shows a
screenshot of the BASE console.
Figure 3. Screenshot of BASE console showing the generated alerts
3.1 OPERATING ENVIRONMENT
The development work is carried out in C language on linux platform to comply with
the SNORT program. The following softwares/tools are used for the development and
execution of the project
ANJUTA - Open source IDE
BASE - Basic Analysis and Security Engine
GCC - GNU C Compiler to compile the components.
Libpcap - Linux Packet capturing library
MYSQL - Centralized database storage.
RHEL4 - Redhat Enterprise Linux 4
SNORT - Open Source Network Intrusion Detection System
The IDS works efficiently on a system with the following configuration:
Pentium IV 2.0GHz
512MBRAM
40 GB Hard Disk or higher
10/100 Mbps Ethernet Interface Card.
9
4 APPROACH
The primary task was to characterize the target network in terms of suitable network
parameters. The parameters are chosen such that their values will change perceivably in
normal and intrusive conditions. The features considered are the commonly seen protocols
in the network traffic, the traffic data rate and the flow direction. Once the network behavior
is quantified with these parameters, the next step would be to observe how they vary with
time. The observation has to be made on different days of a week because the network
behavior changes over working days and non working days of a week and also on general
holidays. The Anomaly based IDS has two operational modes.
Learning(or training) mode : In this mode, the IDS logs the selected network parameters
to a file. The frequency of packet sampling is set as per requirement; it is set by default to 10
minutes. IDS is put in learning mode for some time, say over a month to learn the normal
network behavior. Sufficient training period is the key factor in reducing the false alarms.
Once the learning is over, profile for the target network is generated using another
program. The IDS is also trained to learn the network behavior in the face of intrusions.
Intrusions are simulated using the MIT-DARPA training data set (Week 2). Network profile
is also generated for this condition. The logic for profile generation is given in figure 4.
Figure 4. Algorithm for generating the profile
10
Detection mode: In this mode, IDS detects in real time, the network based attacks leading
to abnormal traffic pattern. The abnormality is decided on the basis of the network profile.
The flow chart in figure 6 shows the working of Anomaly Detection technique.
Figure 6. Flow chart for Anomaly Detection technique
Three techniques are applied for the classifying the events as a intrusive or non intrusive.
11
4.1 STATISTICAL MOMENTS OR “MEAN AND STANDARD DEVIATION
MODEL”
Statistical based anomaly detection techniques use statistical properties (mean and
variance) of normal activities to build a statistical based normal profile and employ statistical
tests to determine whether observed activities deviate significantly from the normal profile [9].
A confidence interval is chosen suitably based on the experimentation [12].
σμσμ ** nxn +<<− ‘ x ’ denotes the value of a network parameter. If the value of x goes beyond ( σμ *n± ), it simply indicates an anomalous situation and can be flagged as alert. 4.2 HOTELLING’S T2 HYPOTHESIS, A MULTIVARIATE STATISTICAL
TECHNIQUE[10]
Hotelling’s 2Τ statistic for an observation X is determined by )()( 12 μμ −Σ−=Τ − XX T Where
).......,,( 321 pXXXXX = , denotes an observation of p variables at time t
),.......,,( 321 pμμμμμ = , denotes a vector of mean values of p variables at time t
and
∑ −−−
=Σn
Txxn 1
))(()1(
1 μμ , where n is the data sample size
12
The computed 2Τ value is small if the data point conforms to the norm profile.
Hotellings 2Τ test provides a complete data model of multivariate data. Since it uses the
covariance matrix Σ of p variables, it detects both mean shifts and their interrelationship in a
multivariate manner which is important in finding the network anomalies.
4.3 BAYESIAN CLASSIFICATION, A PROBABILISTIC TECHNIQUE
Bayesian statistics, in the most general form, provides a framework for combining observed
data with prior assumptions in order to model stochastic systems [11].
)()(
)/()/( IpApIApAIp = --------------(1)
Where
)/( AIp is the posterior probability of an event being intrusion given the evidence A
)(Ip is the prior probability of the intrusion, before the observation of evidence A
)/( IAp =Likelihood function of A given I
)(Ap = Total probability of evidence
The likelihood function )/( IAp denotes a probability density function of the vector
samples A given a particular estimate I of the underlying probability distribution generating
that data. A multivariate normal distribution is assumed for )/( IAp .
A Gaussian or multivariate normal distribution is characterized by its mean value vector μ
and its covariance matrix Σ and has the distribution function,
)}()(exp{||||)2(
1),( 121
2/1μμ
πμ −Σ−−
Σ=Σ − XXf T
p ----------(2)
Here X is a p dimensional pattern vector of real valued attributes The discriminant function gi(X) can be derived by using the equations (1) and (2) .
)(ln)()(21||ln
21)( 1 IpXXXg T
i +−Σ−−Σ−= − μμ
The values of )(Xgi can distinguish the intrusions from the normal events.
13
5 EXPERIMENTAL RESULTS AND DISCUSSION
To evaluate the system, two major indicators of performances are chosen.
- Detection rate
- False positive rate
Detection rate is defined as the number of intrusion instances detected by the system
divided by the total number of intrusion instances present in the test set. The false positive
rate is defined as the total number of instances that were wrongly detected as intrusions
divided by the total number of normal instances. These are good measures of performances
since they measure what percentage of intrusions the system is able to detect and how many
incorrect classifications it makes in the process.
The System was evaluated against the MIT_LL DARPA 1999 Data set and the results
are given in the Table 1 below.
Attack Name Tools/Data set used Count Detection using different Techniques
Probabilistic (Bayesian Classifier)
Statistical (Hotelliing's Hypothesis)
Statistical (Mean ± 2*SD)
ping flood ping tool 15 15 15 15
DoS attack ddos open source tool 5 5 5 5
TCP RST attack
neti open source code 5 5 5 5
TCP Syn flood attack
neti open source code 7 7 7 6
UDP attack neti open source code 10 10 10 10
X mas scan nmap tool 5 5 4 4
NTinfoscan
MIT_LL DARPA 1999 Data set(2nd week) 1 0 0 0
pod " " 2 2 2 2
back '' " 2 0 0 0
httptunnel " " 2 0 0 0
land " " 2 2 2 2
14
secret " " 3 0 0 0
portsweep " " 3 3 3 2
eject " " 3 0 0 0
mailbomb " " 2 2 2 2
ipsweep " " 3 3 2 2
satan " " 2 1 1 1
neptune " " 2 2 2 2
Total 74 62 60 58Detection Accuracy(%) 83.78 81.08 78.38Total Alerts generated 65 64 67No. of Attacks missed 12 16 20False Positive rate(%) 4.62 6.25 13.43False Negative rate(%) 16.22 21.62 27.03positive prediction rate(%) 95.40 90.63 78.30
Table 1. Experimental results Table 2. given below shows the results obtained by Daniel Barbara et al using pseudo-Bayes estimators [6]
Table 2. Experimental results on MIT_LL DARPA 1999 Data set. Source: http://www.cs.ubc.ca/local/reading/proceedings/siam_datamining2001/pdf/sdm01_29.pdf
15
5.1 DISCUSSION
The experiment clearly revealed that the Bayesian classification method gives better
detection rate and less false positives in detecting the intrusions among the three techniques
used in the project. The detection accuracy of ≈ 84 % is achieved using the Bayesian method
with the false positive rate of 4.6%. Hotelling’s statistical method gave a hit rate of ≈ 81% at
6.2% false positive rate. The performance metrics for statistical Moments(mean and standard
deviation) model yielded hit rate of ≈ 78% while the false positive rate was 13%. The
comparative analysis with the previous works also reveals that the Bayesian approach is a
superior technique.
5.2 CONCLUSION
Network Intrusion Detection System has a major role to play in safeguarding the
network resources against various kinds of attacks. With the advent of new vulnerabilities
and sophistications in the nature of attacks, new techniques for intrusion detection have
evolved. The main objectives of the research being increasing the detection accuracy while
keeping the false positive rate low.
In the present framework of project, discussed the design and development of
“Anomaly based intrusion Detection system” which is built on top of a existing open source
signature based network IDS, called SNORT so to have both the analysis techniques in a
single package.
The system is trained for more than four weeks to learn the profile of the target
network. It is also exposed to the simulated network intrusions using DARPA intrusion
detection data set 1999 and other open source tools for more than three weeks.
The thesis presented three techniques for detecting anomaly based intrusions at the
network level. They are evaluated with the DARPA IDS evaluation Data sets and the results
are compared. Bayesian approach proved to be a better solution than the Hotelling’s
Multivariate technique and the method of Statistical Moments.
Presently, the work caters only to identify the events into normal and attack classes. It
can be extended to detect and classify the attacks into multiple attack classes. Dynamic
updation of the Anomaly Model using Bayesian Network can also be considered for future
enhancement.
16
6 BIBLIOGRAPHY
[1]. R.Coolen, “Intrusion Detection: Generics and State of the Art”, RTO Technical
Report 49, http://www.tno.nl/instit/fel/div2/resources/rto-tr-049-ids.pdf
[2]. Martin Roesch : “Snort Documents” , http://www.snort.org/docs/
[3]. DARPA Intrusion Detection Evaluation, Data Sets and Documentation, 1999
http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/docs/detections
_1999.html
[4]. D. Barbar´a and S. Jajodia and N. Wu and B. Speegle ,, “The ADAM project”,
http://www.isse.gmu.edu/dbarbara/adam.html
[5]. M. Mahoney and P. Chan,. “PHAD: Packet header anomaly detection for
identifying hostile network traffic”, Technical report, Florida Tech., technical
report CS-2001-4, April 2001, http://citeseer.ist.psu.edu/mahoney01phad.html
[6]. Daniel Barbara, Ningning Wu and Sushil Jajodia : “Detecting Novel Network
Intrusions Using Bayes Estimators”,
http://www.cs.ubc.ca/local/reading/proceedings/siam_datamining2001/pdf/sdm01_
29.pdf
[7]. Mahoney, M., P. K. Chan, "Learning Models of Network Traffic for Detectin
Novel Attacks",Florida Tech. technical report 2002-03,
http://cs.fit.edu/~mmahoney/paper5.pdf
[8]. Jack Koziol, “Intrusion Detection with Snort”, Pearson publications, 2003
[9]. R. Dan Reid & Nada R. Sanders : “Operations Management”, 3rd Edition ©
Wiley ,2007
[10]. Ye, N., Li, X., Chen, Q., Emran, S. M., and Xu, M. “Probabilistic Techniques for
Intrusion Detection Based on Computer Audit Data”, IEEE Transactions on
Systems,Man, and Cybernetics, vol. 31(4), pp. 266--274, July 2001.
[11]. R. Puttini, Z. Marrakchi, and L. Me. “Bayesian Classification Model for Real-Time
Intrusion Detection”, in 22th International Workshop on Bayesian Inference and
Maximum Entropy Methods in Science and Engineering, 2002.
[12]. Petar Cisar , Sanja Maravic Cisar, “Quality Control in Function of Statistical
Anomaly Detection in Intrusion Detection Systems”, SISY 2006 - 4th Serbian-
Hungarian Joint Symposium on Intelligent Systems
www.bmf.hu/conferences/sisy2006/19_Cisar.pdf