Upload
surangma-parashar
View
217
Download
0
Embed Size (px)
Citation preview
8/3/2019 Z-Intrusion Detection System Using Rule-based Systems
http://slidepdf.com/reader/full/z-intrusion-detection-system-using-rule-based-systems 1/8
1
INTRUSION DETECTION SYSTEM USING RULE-BASED SYSTEMS
S. Jeya,
Assistant Professor, Department of Computer
Applications (M. C. A), K.S.R. College of Engineering, Thiruchengode.
ABSTRACT
This paper describes a technique of applying Genetic Algorithm
to network Intrusion Detection Systems. A brief overview of the
Intrusion Detection System, genetic algorithm, rule based
system and related detection techniques is presented. As the
transmission of data over the internet increases, the need to
protect connected systems also increases. Intrusion Detection
Systems are the latest technology used for this purpose. Although
the field of IDSs is still developing, the systems that do exist are
still not complete, in the sense that they are not able to detect all
types of intrusions. Some attacks which are detected by various
tools available today cannot be detected by other products,
depending on the types and methods that they are built on. Using
a Genetic Algorithm is one of the methods that IDSs use to detect
intrusions. They incorporate the concept of Darwin's theory and
natural selection to detect intrusions. The focus of this paper is
to introduce the application of GA, in order to improve the
effectiveness of IDSs.
1. INTRODUCTION
In recent years, Intrusion Detection System has become one of the hottest research areas in Computer Security. It is an important
detection technology and is used as a countermeasure to preserve
data integrity and system availability during an intrusion. When
an intruder attempts to break into an information system or
performs an action not legally allowed, we refer to this activity
as an intrusion. Intruders can be divided into two groups,
external and internal. The former refers to those who do not have
authorized access to the system and who attack by using various
penetration techniques. The latter refers to those with access
permission who wish to perform unauthorized activities.
Intrusion techniques may include exploiting software bugs and
system misconfigurations, password cracking, sniffing unsecured
traffic, or exploiting the design flaw of specific protocols. An
Intrusion Detection System is a system for detecting intrusions
considered intrusions. IDSs can also be divided into two groups
depending on where they look for intrusive behavior: Network-
based IDS and Host-based IDS . The former refers to systems
that identify intrusions by monitoring traffic through network
devices. A host-based IDS monitors file and process activities
related to a software environment associated with a specific host.
The architecture combines a number of different approaches to
the IDS problem, and includes different AI techniques to help
identify intrusive behavior. It uses both anomaly detection andmisuse detection techniques and is both a network-based and
host-based system. Genetic Algorithm has been used in different
ways in IDSs. One network connection and its related behavior
can be translated to represent a rule to judge whether or not a
real-time connection is considered an intrusion. These rules can
be modeled as chromosomes inside the population. The
population evolves until the evaluation criteria are met. The
generated rule set can be used as knowledge inside the IDS for
judging whether the network connection and related behaviors
are potential intrusions. The approaches described above, the
IDS can be viewed as a rule-based system (RBS) and GA can be
viewed as a tool to help generate knowledge for the RBS. This
paper shows how network connection information can be
modeled as chromosomes and how the parameters in genetic
algorithm can be defined in this respect. Some examples are
used to show the implementation.
2. MOTIVATION
One approach to computer security is to attempt to create a
completely-secure system. Unfortunately, in many
environments, it may not be feasible to render the computer
system immune to intrusions, for several reasons. First, system
software is becoming more complex. A major challenge
programmer's face in software design is the difficulty in
anticipating all conditions that may occur during program
execution and understanding precisely the implications of evensmall deviations in such conditions. Thus, system software often
contains flaws that may create security problems, and software
8/3/2019 Z-Intrusion Detection System Using Rule-based Systems
http://slidepdf.com/reader/full/z-intrusion-detection-system-using-rule-based-systems 2/8
2
? Information Sources - the different sources of event
information used to determine whether an intrusion has taken
place. These sources can be drawn from different levels of the
system, with network, host, and application monitoring most
common.
? Analysis - the part of intrusion detection systems that actually
organizes and makes sense of the events derived from theinformation sources, deciding when those events indicate that
intrusions are occurring or have already taken place. The most
common analysis approaches are misuse detection and
anomaly detection.
? Response - the set of actions that the system takes once it
detects intrusions. These are typically grouped into active and
passive measures, with active measures involving some
automated intervention on the part of the system, and passive
measures involving reporting IDS findings to humans, who
are then expected to take action based on those reports.
3.1. Deployment strategy for IDS
Organizations should consider a staged employment of IDSs to
allow personnel to gain experience and to ascertain how many
monitoring and maintenance resources they will require. The
resource requirements for each type of IDS vary widely,
depending on the organization and systems environment. IDSs
require significant preparation and ongoing human interaction.
Organizations must have appropriate security policies, plans, and
procedures in place so that personnel know how to handle the
many and varied alarms IDSs produce. We recommend
consideration of a combination of network-based IDSs and host
based IDSs to protect an enterprise-wide network. We
furthermore recommend a staged deployment, starting withnetwork-based IDSs as they are usually the simplest to install
and maintain. Next, protect critical servers with host-based IDSs.
Utilize vulnerability analysis products on a regular schedule to
test IDSs and other security mechanisms for proper function and
configuration.
Honey pots and related technologies should be used
conservatively and only by organizations with a highly skilled
technical staff that are willing to experiment with leading-edge
technology. Furthermore, such techniques should be used only
after seeking guidance from legal counsel.
Protecting a full time Internet connected system is becoming
more important than ever. An evaluation of needs should be
conducted before selecting a product as concept, method andfeatures vary. Firewalls act as a barrier between internal local
networks and the outside world (Internet). It can keep the most
detection systems, not Firewalls, are capable of detecting this
category of security violation. To enhance security, an intrusion
detection system can be run against the connection.
3.2. Strengths of IDS
Intrusion detection systems perform: Monitoring and analysis of
system events and user behaviors Testing the security states of
system configurations, base lining the
8/3/2019 Z-Intrusion Detection System Using Rule-based Systems
http://slidepdf.com/reader/full/z-intrusion-detection-system-using-rule-based-systems 3/8
3
For Every
Individual
____________ i ____________ i______
Increase Fitness For
records correctly
Classified
Evaluate Real
World Performance
Figure 1. Genetic algorithm theory
security state of a system, then tracking any changes to that
Baseline, recognizing patterns of system events that correspond
to known attacks, recognizing patterns of activity that
statistically vary from normal activity, managing operating
system audit and logging mechanisms and the data they
generate, alerting appropriate staff by appropriate means when
attacks are detected, measuring enforcement of security policies
encoded in the analysis engine, providing default information
security policies, allowing non-security experts to perform
important security monitoring functions.
3.3. Limitations of IDS
Intrusion detection systems cannot perform: Compensating for
weak or missing security mechanisms in the protection
infrastructure. Such mechanisms include firewalls, identification
and authentication, link encryption, access control mechanisms,
and virus detection and eradication, Instantaneously detecting,
reporting, and responding to an attack, when there is a heavy
network or processing load, Detecting newly published attacks
or variants of existing attacks, Effectively responding to attacks
launched by sophisticated attackers, Automatically investigating
attacks without human intervention, Resisting attacks that are
intended to defeat or circumvent them, Compensating for
problems with the fidelity of information sources, Dealingeffectively with switched networks.
4. INTRODUCTION TO GENETIC
ALGORITHM
chromosomes that function as basic instructions to the individual
in a cause and effect manner. An individual is measured by the
aggregate performance of its chromosomes.
An initial population is created by complete randomization of
the chromosomes, and individuals of subsequent generations go
through mutations, which are also randomized. As in
Darwinism, a population that goes through many generations
eliminates poor performing individuals and allows better
performing individuals to replicate and mutate themselves
during each generation. This genetic algorithm was designed so
that each individual represented a possible behavioral model.
In this algorithm chromosomes means rules. Set of rules create
population, a possible mathematical model known as
individuals. The fitness is generally expressed within the
algorithm as a floating point number with a predefined range of
values, from best performing to worst performing.
The Algorithm is as follows:
? Randomly generate an initial population M(0) ? Compute
and save the fitness u(m) for e ? Each individual m in the
current population M(t) ? Define selection probabilities
p(m) for each individual m in M(t) so that p(m) isproportional to
u(m)
? Generate M(t+1) by probabilistically selecting individuals
from M(t) to produce offspring via genetic operators
? Repeat step 2 until satisfying solution is obtained.
computer simulation, a population of many individuals is
created, each individual representing a possible mathematical
model. Each individual has one or moreFigure 2. Simple Genetic Algorithm
F( ?i) =-----------------------
A B
total number of normal connections. The range of fitness values
for this function was over the closed interval [-1,1] with -1being the poorest possible fitness and 1 being the ideal. A high
correct detection rate and a low false positive rate yielded a high
A genetic algorithm is a method of data analysis that works
analogously to Darwinian evolution. Within a Generate Evaluate Are
Initial Objective Optimization
Population Function Criteria met
Best
IndividualsMutate Reproduce Fit
Population Individuals
Yes
N
Generate
New LPopulation
Selection
ResultStart
Recombination
Mutation
8/3/2019 Z-Intrusion Detection System Using Rule-based Systems
http://slidepdf.com/reader/full/z-intrusion-detection-system-using-rule-based-systems 4/8
4
specific to mathematical modeling. These numbers' slight
change in value was the basis for mutation in this genetic
algorithm. For symbolic connection attributes (e.g., connection
type), different weights were established for each symbol based
on an ERC. For continuous connection attributes (e.g., bytes
sent), ERC coefficients were randomly established for the data.
In continuous attributes that contained data of magnitudes apart,
such as bytes sent, separate ERC coefficients were established
for each magnitude of data. The certainty formula developed for
this experiment, Ci, of whether record c was classified as an
attack by model i was:
n
C i(?) = ? ( ? i, j x ?j)
j = 1
where is the Ephemeral Random Constant-based coefficient for
attribute c j and n is the number of attributes. An arbitrary
threshold value was established, and any certainty values which
exceeded this threshold value were classified as malicious
attacks. The genetic algorithm was run for one hundred
generations with one hundred individuals.
Forty-one different types of nodes were established, one for
each of the forty-one connection record attributes. The genetic
algorithm package ECJ 7 was used for this research (Luke,
2001). It provided the necessary population breeding,
randomizing, and statistics gathering functions, from which this
genetic algorithm was written. The genetic algorithm was
written in Java, and the Webgain Visual Cafe 4.1 Expert Edition
interface development environment was used to run the
experiment. This experiment was run on a Dell computer with
an Intel Pentium III 800 megahertz microprocessor and 256
megabytes of random access memory on Microsoft Windows
2000 using Sun Microsystem's Java Development Kit (JDK)
version 1.3.1.
Information collected on each generation consisted of the mean
fitness of all of the individuals within the generation, the fitness
of the best performing individual, the correct detection rate and
the false positive rate.
5. GENETIC ALGORITHM APPLIED TO
INTRUSION DETECTION SYSTEM
Applying genetic algorithm to intrusion detection seems to be a
promising area. We discuss the motivation and implementation
details in this section.
5.1. Overview
Genetic algorithms can be used to evolve simple rules for
network traffic. These rules are used to differentiate normal
t k ti f l ti Th
administrator, stopping the connection, logging a message into
system audit files, or all of the above.
For example, a rule can be defined as:
If {the connection has following information: source IP
address 124.12.5.18; destination
IP
address:130.18.206.55; destination port number: 21; connection
time: 10.1 seconds } then {stop the connection}
The final goal of applying GA is to generate rules that match
only the anomalous connections. These rules are tested onhistorical connections and are used to filter new
connections to find suspicious network traffic. In this
implementation, the network traffic used for GA is a pre-
classified data set that differentiates normal network
connections from anomalous ones.
The genetic algorithm was run over a ten percent subset of the
data, called the training data, and then tested over the entire data
set to test real-world performance. In the real world, an
empirical behavior model would rarely see any data which
directly corresponds to training data.
5.1. Data Representation
In order to fully exploit the suspicious level, we need to
examine all fields related with a specific network connection.
For simplicity, we only consider some obvious attributes for
each connection. Altogether there are fifty-seven genes in each
chromosome. If the rule is able to find an anomalous behavior, a
bonus will be given to the current chromosome. If the rule
matches a normal connection, a penalty will be applied to the
chromosome. Clearly no single rule can be used to separate all
anomalous connections from normal connections.
The genetic algorithm starts with a population that has
randomly selected rules. The population can evolve by using the
crossover and mutations operators. Due to the effectiveness of
the evaluation function, the succeeding populations are biased
toward rules that match intrusive connections. Ultimately as the
algorithm stops, rules are selected and added into the IDS rule
base.
5.2. Parameters in Genetic Algorithm
There are many parameters to consider for the application of
GA. Each of these parameters heavily influences the
effectiveness of the genetic algorithm. We will discuss the
methodology and related parameters in the following the
evaluation function is one of the most important parameters in
genetic algorithm. The proposed implementation differs fromthe scheme used by in that the definition on calculations of
outcome and fitness is different. The following steps are used to
8/3/2019 Z-Intrusion Detection System Using Rule-based Systems
http://slidepdf.com/reader/full/z-intrusion-detection-system-using-rule-based-systems 5/8
5
Destination port number indicates to applications that the target
system is running. Some IP addresses are more probable targets
for intrusions — for example, IP addresses for military domains.
Domain-specific information is less important compared with
the source IP addresses. Other parameters like duration, bytes
sent by the originator, bytes sent by the receiver, and state are
usually less important than the above fields but are still useful.
The protocol and source port number fields are commonly
dispensable and are used for identifying some specific
intrusions.
The absolute difference between the outcome of the
chromosome and the actual suspicious level is then computed
using the following equation. The suspicious level is a threshold
that indicates the extent to which two network connections are
considered a "match." The actual value of suspicious level
reflects observations from historical data.
? = Outcome - Suspicious level
Once a mismatch happens, the penalty value is computed using
the absolute difference. The ranking in the equation indicates
whether or not an intrusion is easy to identify.
Penalty = ? * ranking / 100
The fitness of a chromosome is computed using the above
penalty:
fitness = 1 - penalty
Obviously, the range of the fitness value is between 0 and 1. By
defining evaluation, we have incorporated both temporal and
spatial information needed for identification of network
intrusion.
5.3. Crossover and Mutation
Traditional genetic algorithms have been used to identify and
converge populations of candidate hypotheses to a single globaloptimum. For this problem, a set of rules is needed as a basis for
the IDS. As mentioned earlier, there is no way to clearly identity
whether a network connection is normal or anomalous just using
one rule. Multiple rules are needed to identify unrelated
anomalies, which mean that several good rules are more
effective than a single best rule. Another reason for finding
multiple rules is that because there are so many network
connection possibilities, a small set of rules will be far from
enough.
The mutation operation should be meaningful during evolution.
For example, each segment of the IP address should not exceed
255. Mutations should be done following the requirements
specified in Table 1. These limitations can be enforced by
defining proper mutation rules
5.5 Other Parameters
There are also other parameters that need to be considered, such
as mutation rate, crossover rate, number of populations, and
number of generations. These parameters should be adjusted
according to the application environment of the system and the
organization's security policy.
Destination IP Address
Source IP Address
Destination Port Number
Alternatively, some automated response, such as terminatingthat user's session, will be taken. Normally, a rule firing will
result in additional assertions being added to the fact base.
They, in turn, may lead to additional rule-fact bindings. This
process continues until there are no more rules to be fired.
Consider the intrusion scenario in which two or more
unsuccessful login attempts are made in a period of time shorter
than it would take a human to type in the login information at a
conventional keyboard. If the rule or rules of this scenario fire,
then a specific user's suspicion level can be increased. The
system may raise an alarm or freeze the named user's account.
Account freeze would be entered into the fact database.
6. SYSTEM ARCHITECTURE
Duration
Bytes sent by originator
Bytes Sent by the
receiver
DatasetNetwork
Sniffer
GA&
AI
Rule Set
Rule
Base
State
ProtocolSource Port Number
Figure 4. Architecture of applying GA into intrusion
detection
We need to collect enough historical data that includes both
normal and anomalous network connections. This is the first
High
Low
8/3/2019 Z-Intrusion Detection System Using Rule-based Systems
http://slidepdf.com/reader/full/z-intrusion-detection-system-using-rule-based-systems 6/8
6
all the associated variables with the rule are consistent with the
binding. The rules with rule-fact bindings that meet the binding
analysis requirements are then gathered into a set from which the
"best" rule is picked, through a process called conflict
resolution. The rule then fires. It may cause an alert to be
raised for a system administrator.
7. CONCLUSION
In this paper, we discussed a methodology of applying genetic
algorithm into network intrusion detection techniques. A brief
overview of Intrusion Detection System (IDS), genetic
algorithm, and related detection techniques are discussed. The
system architecture is also introduced. Factors affecting the GA
are addressed in detail. This implementation of genetic
algorithm is unique as it considers both temporal and spatial
information of network connections during the encoding of the
problem; therefore, it should be more helpful for identification
of network anomalous behavior.
REFERENCE
[1] Bezroukov, Nikolai. 19 July 2003. "Intrusion Detection
(general issues)." Softpanorama: Open Source Software
Educational Society. Nikolai Bezroukov.
[2] Bridges, Susan, and Rayford B. Vaughn. 2000. "Intrusion
Detection Via Fuzzy Data Mining." In Proceedings of 12th
Annual Canadian Information Technology Security
Symposium, pp. 109-122. Ottawa, Canada.
[3] Crosbie, Mark, and Gene Spafford. 1995. "ApplyingGenetic Programming to Intrusion Detection." InProceedings of 1995 AAAI Fall Symposium on Genetic
Programming pp. 1-8. Cambridge, Massachusetts. URLhttp://citeseer.nj.nec.com/crosbie95applying.html (30 Oct.2003).
[4] Graham, Robert. Mar. 21, 2000. "FAQ: Network Intrusion
Detection Systems." RobertGraham.com Homepage.Robert Graham. URL:
http://www.robertgraham.com/pubs/network-intrusion-detection.html (30 Oct. 2003).
[5] Jones, Anita. K. and Robert. S. Sielken. 2000. "ComputerSystem Intrusion Detection: A Survey." Technical Report.Department of Computer Science, University of Virginia,Charlottesville, Virginia. Li, Wei. 2002. "The integration of security sensors into the Intelligent Intrusion DetectionSystem (IIDS) in a cluster environment." Master's ProjectReport. Department of Computer Science, Mississippi StateUniversity.
[6] McHugh, John, 2001. "Intrusion and Intrusion Detection."Technical Report. CERT Coordination Center, Software
Engineering Institute, Carnegie Mellon University.
[13] Anomaly Detection in IP Networks by Marina Thottan and
Chuanyi Ji IEEE Transactions on Signal ProcessingVol51 No8 August 2003.
[14] Twycross J., 2004, 'Immune Systems, Danger Theory andIntrusion Detection', to be presented at the AISB 2004Symposium on Immune System and Cognition(ImmCog-04) Leeds, U.K.
[15] P. D'haeseleer. An immunological approach to changedetection: Theoretical results. In Proceedings of the 9th
IEEE Computer Security Foundations Workshop, LosAlamitos, CA, 1996. IEEE Computer Society Press.
[16] P. D'haeseleer, S. Forrest, and P. Helman. Animmunological approach to change detection:Algorithms, analysis and implications. In Proceedings of the 1996 IEEE Symposium on Research in Security andPrivacy, Los Alamitos, CA, 1996. IEEE ComputerSociety Press.
[17] A Immunological Model of Distributed Detection and itsApplication to Computer Security. Steven A. HofmeyrPhD thesis, Department of Computer Sciences.
AUTHOR PROFILE
8/3/2019 Z-Intrusion Detection System Using Rule-based Systems
http://slidepdf.com/reader/full/z-intrusion-detection-system-using-rule-based-systems 7/8
7
[7] Miller, Brad. L. and Michael J. Shaw. 1996. "Genetic
Algorithms with Dynamic Niche Sharing for Multimodal Function Optimization." In Proceedings of IEEE InternationalConf. on Evolutionary Computation, pp. 786791. Nagoya University, Japan.
[8] Paxson, Vern. 1998. "Bro: A System for Detecting Network Intruders in Real-time." In Proceedings of 7th USENIX SecuritySymposium, pp. 31-51. San Antonio, Texas.
[9] Pohlheim, Hartmut. 30 Oct. 2003. "Genetic and
Evolutionary Algorithms: Principles, Methods and Algorithms."
[10] Genetic and Evolutionary Algorithm Toolbox. Hartmut Pohlheim.URL:
http://www.geatbx.com/docu/algindex.html.
[10] Roesch, Martin. Nov. 7-12, 1999. "Snort - Lightweight Intrusion Detection for Networks." In Proceedings of13 th
Systems Administration Conf. (LISA '99), pp. 229-238.Seattle, Washington.
[11] Sinclair, Chris, Lyn Pierce, and Sara Matzner. 1999. "An Application of Machine Learning to Network Intrusion Detection."In Proceedings of 1999 Annual Computer Security Applications Conf. (ACSAC), pp. 371-377.Phoenix,Arizona.URL:http://www.acsac.org/1999/papers /fri-b-1030-sinclair.pdf (30 Oct. 2003).
[12] Whitley, Darrell. 1994. "A Genetic Algorithm Tutorial." Statistics and Computing 4: 65-85.
8/3/2019 Z-Intrusion Detection System Using Rule-based Systems
http://slidepdf.com/reader/full/z-intrusion-detection-system-using-rule-based-systems 8/8
8
S. Jeya
Assistant Professor,M.C.A. Dept.,K.S.R. College Of Engineering,Tiruchengode -637 209,Tamil Nadu
Educationaldetails: B.Sc. Computer Science,Sivanthi Adithanar College, Nagercoil,
M.C.A. Computer Applications, Sivanthi
Adithanar College,Nagercoil. M.Phil.Computer Science.,M.S.University,Tirunelveli. Ph.D.Computer Science. Pursuing, MotherTeresa Womens University, Kodaikanal.Employement Details:1.ZyneTechnology, Bangalore, Software Engineer, 1 Yrs 6 Months,2. Rajaas Engineering College, Tirunelveli, Assistant Professor (HodMca),7 Yrs 2Months, 3. K.S.R. College Of
Engineering, Tiruchengode - 9, Assistant Professor. MembershipDetails: Life Member Of ISTE, Life
Member Of Oxford International
Journal.(201)