Z-Intrusion Detection System Using Rule-based Systems

8/3/2019 Z-Intrusion Detection System Using Rule-based Systems

http://slidepdf.com/reader/full/z-intrusion-detection-system-using-rule-based-systems 1/8

1

INTRUSION DETECTION SYSTEM USING RULE-BASED SYSTEMS

S. Jeya,

Assistant Professor, Department of Computer

Applications (M. C. A), K.S.R. College of Engineering, Thiruchengode.

ABSTRACT

This paper describes a technique of applying Genetic Algorithm

to network Intrusion Detection Systems. A brief overview of the

Intrusion Detection System, genetic algorithm, rule based

system and related detection techniques is presented. As the

transmission of data over the internet increases, the need to

protect connected systems also increases. Intrusion Detection

Systems are the latest technology used for this purpose. Although

the field of IDSs is still developing, the systems that do exist are

still not complete, in the sense that they are not able to detect all

types of intrusions. Some attacks which are detected by various

tools available today cannot be detected by other products,

depending on the types and methods that they are built on. Using

a Genetic Algorithm is one of the methods that IDSs use to detect

intrusions. They incorporate the concept of Darwin's theory and

natural selection to detect intrusions. The focus of this paper is

to introduce the application of GA, in order to improve the

effectiveness of IDSs.

1. INTRODUCTION

In recent years, Intrusion Detection System has become one of the hottest research areas in Computer Security. It is an important

detection technology and is used as a countermeasure to preserve

data integrity and system availability during an intrusion. When

an intruder attempts to break into an information system or

performs an action not legally allowed, we refer to this activity

as an intrusion. Intruders can be divided into two groups,

external and internal. The former refers to those who do not have

authorized access to the system and who attack by using various

penetration techniques. The latter refers to those with access

permission who wish to perform unauthorized activities.

Intrusion techniques may include exploiting software bugs and

system misconfigurations, password cracking, sniffing unsecured

traffic, or exploiting the design flaw of specific protocols. An

Intrusion Detection System is a system for detecting intrusions

considered intrusions. IDSs can also be divided into two groups

depending on where they look for intrusive behavior: Network-

based IDS and Host-based IDS . The former refers to systems

that identify intrusions by monitoring traffic through network

devices. A host-based IDS monitors file and process activities

related to a software environment associated with a specific host.

The architecture combines a number of different approaches to

the IDS problem, and includes different AI techniques to help

identify intrusive behavior. It uses both anomaly detection andmisuse detection techniques and is both a network-based and

host-based system. Genetic Algorithm has been used in different

ways in IDSs. One network connection and its related behavior

can be translated to represent a rule to judge whether or not a

real-time connection is considered an intrusion. These rules can

be modeled as chromosomes inside the population. The

population evolves until the evaluation criteria are met. The

generated rule set can be used as knowledge inside the IDS for

judging whether the network connection and related behaviors

are potential intrusions. The approaches described above, the

IDS can be viewed as a rule-based system (RBS) and GA can be

viewed as a tool to help generate knowledge for the RBS. This

paper shows how network connection information can be

modeled as chromosomes and how the parameters in genetic

algorithm can be defined in this respect. Some examples are

used to show the implementation.

2. MOTIVATION

One approach to computer security is to attempt to create a

completely-secure system. Unfortunately, in many

environments, it may not be feasible to render the computer

system immune to intrusions, for several reasons. First, system

software is becoming more complex. A major challenge

programmer's face in software design is the difficulty in

anticipating all conditions that may occur during program

execution and understanding precisely the implications of evensmall deviations in such conditions. Thus, system software often

contains flaws that may create security problems, and software



2

? Information Sources - the different sources of event

information used to determine whether an intrusion has taken

place. These sources can be drawn from different levels of the

system, with network, host, and application monitoring most

common.

? Analysis - the part of intrusion detection systems that actually

organizes and makes sense of the events derived from theinformation sources, deciding when those events indicate that

intrusions are occurring or have already taken place. The most

common analysis approaches are misuse detection and

anomaly detection.

? Response - the set of actions that the system takes once it

detects intrusions. These are typically grouped into active and

passive measures, with active measures involving some

automated intervention on the part of the system, and passive

measures involving reporting IDS findings to humans, who

are then expected to take action based on those reports.

3.1. Deployment strategy for IDS

Organizations should consider a staged employment of IDSs to

allow personnel to gain experience and to ascertain how many

monitoring and maintenance resources they will require. The

resource requirements for each type of IDS vary widely,

depending on the organization and systems environment. IDSs

require significant preparation and ongoing human interaction.

Organizations must have appropriate security policies, plans, and

procedures in place so that personnel know how to handle the

many and varied alarms IDSs produce. We recommend

consideration of a combination of network-based IDSs and host

based IDSs to protect an enterprise-wide network. We

furthermore recommend a staged deployment, starting withnetwork-based IDSs as they are usually the simplest to install

and maintain. Next, protect critical servers with host-based IDSs.

Utilize vulnerability analysis products on a regular schedule to

test IDSs and other security mechanisms for proper function and

configuration.

Honey pots and related technologies should be used

conservatively and only by organizations with a highly skilled

technical staff that are willing to experiment with leading-edge

technology. Furthermore, such techniques should be used only

after seeking guidance from legal counsel.

Protecting a full time Internet connected system is becoming

more important than ever. An evaluation of needs should be

conducted before selecting a product as concept, method andfeatures vary. Firewalls act as a barrier between internal local

networks and the outside world (Internet). It can keep the most

detection systems, not Firewalls, are capable of detecting this

category of security violation. To enhance security, an intrusion

detection system can be run against the connection.

3.2. Strengths of IDS

Intrusion detection systems perform: Monitoring and analysis of

system events and user behaviors Testing the security states of

system configurations, base lining the



3

For Every

Individual

____________ i ____________ i______

Increase Fitness For

records correctly

Classified

Evaluate Real

World Performance

Figure 1. Genetic algorithm theory

security state of a system, then tracking any changes to that

Baseline, recognizing patterns of system events that correspond

to known attacks, recognizing patterns of activity that

statistically vary from normal activity, managing operating

system audit and logging mechanisms and the data they

generate, alerting appropriate staff by appropriate means when

attacks are detected, measuring enforcement of security policies

encoded in the analysis engine, providing default information

security policies, allowing non-security experts to perform

important security monitoring functions.

3.3. Limitations of IDS

Intrusion detection systems cannot perform: Compensating for

weak or missing security mechanisms in the protection

infrastructure. Such mechanisms include firewalls, identification

and authentication, link encryption, access control mechanisms,

and virus detection and eradication, Instantaneously detecting,

reporting, and responding to an attack, when there is a heavy

network or processing load, Detecting newly published attacks

or variants of existing attacks, Effectively responding to attacks

launched by sophisticated attackers, Automatically investigating

attacks without human intervention, Resisting attacks that are

intended to defeat or circumvent them, Compensating for

problems with the fidelity of information sources, Dealingeffectively with switched networks.

4. INTRODUCTION TO GENETIC

ALGORITHM

chromosomes that function as basic instructions to the individual

in a cause and effect manner. An individual is measured by the

aggregate performance of its chromosomes.

An initial population is created by complete randomization of

the chromosomes, and individuals of subsequent generations go

through mutations, which are also randomized. As in

Darwinism, a population that goes through many generations

eliminates poor performing individuals and allows better

performing individuals to replicate and mutate themselves

during each generation. This genetic algorithm was designed so

that each individual represented a possible behavioral model.

In this algorithm chromosomes means rules. Set of rules create

population, a possible mathematical model known as

individuals. The fitness is generally expressed within the

algorithm as a floating point number with a predefined range of

values, from best performing to worst performing.

The Algorithm is as follows:

? Randomly generate an initial population M(0) ? Compute

and save the fitness u(m) for e ? Each individual m in the

current population M(t) ? Define selection probabilities

p(m) for each individual m in M(t) so that p(m) isproportional to

u(m)

? Generate M(t+1) by probabilistically selecting individuals

from M(t) to produce offspring via genetic operators

? Repeat step 2 until satisfying solution is obtained.

computer simulation, a population of many individuals is

created, each individual representing a possible mathematical

model. Each individual has one or moreFigure 2. Simple Genetic Algorithm

F( ?i) =-----------------------

A B

total number of normal connections. The range of fitness values

for this function was over the closed interval [-1,1] with -1being the poorest possible fitness and 1 being the ideal. A high

correct detection rate and a low false positive rate yielded a high

A genetic algorithm is a method of data analysis that works

analogously to Darwinian evolution. Within a Generate Evaluate Are

Initial Objective Optimization

Population Function Criteria met

Best

IndividualsMutate Reproduce Fit

Population Individuals

Yes

N

Generate

New LPopulation

Selection

ResultStart

Recombination

Mutation



4

specific to mathematical modeling. These numbers' slight

change in value was the basis for mutation in this genetic

algorithm. For symbolic connection attributes (e.g., connection

type), different weights were established for each symbol based

on an ERC. For continuous connection attributes (e.g., bytes

sent), ERC coefficients were randomly established for the data.

In continuous attributes that contained data of magnitudes apart,

such as bytes sent, separate ERC coefficients were established

for each magnitude of data. The certainty formula developed for

this experiment, Ci, of whether record c was classified as an

attack by model i was:

n

C i(?) = ? ( ? i, j x ?j)

j = 1

where is the Ephemeral Random Constant-based coefficient for

attribute c j and n is the number of attributes. An arbitrary

threshold value was established, and any certainty values which

exceeded this threshold value were classified as malicious

attacks. The genetic algorithm was run for one hundred

generations with one hundred individuals.

Forty-one different types of nodes were established, one for

each of the forty-one connection record attributes. The genetic

algorithm package ECJ 7 was used for this research (Luke,

2001). It provided the necessary population breeding,

randomizing, and statistics gathering functions, from which this

genetic algorithm was written. The genetic algorithm was

written in Java, and the Webgain Visual Cafe 4.1 Expert Edition

interface development environment was used to run the

experiment. This experiment was run on a Dell computer with

an Intel Pentium III 800 megahertz microprocessor and 256

megabytes of random access memory on Microsoft Windows

2000 using Sun Microsystem's Java Development Kit (JDK)

version 1.3.1.

Information collected on each generation consisted of the mean

fitness of all of the individuals within the generation, the fitness

of the best performing individual, the correct detection rate and

the false positive rate.

5. GENETIC ALGORITHM APPLIED TO

INTRUSION DETECTION SYSTEM

Applying genetic algorithm to intrusion detection seems to be a

promising area. We discuss the motivation and implementation

details in this section.

5.1. Overview

Genetic algorithms can be used to evolve simple rules for

network traffic. These rules are used to differentiate normal

t k ti f l ti Th

administrator, stopping the connection, logging a message into

system audit files, or all of the above.

For example, a rule can be defined as:

If {the connection has following information: source IP

address 124.12.5.18; destination

IP

address:130.18.206.55; destination port number: 21; connection

time: 10.1 seconds } then {stop the connection}

The final goal of applying GA is to generate rules that match

only the anomalous connections. These rules are tested onhistorical connections and are used to filter new

connections to find suspicious network traffic. In this

implementation, the network traffic used for GA is a pre-

classified data set that differentiates normal network

connections from anomalous ones.

The genetic algorithm was run over a ten percent subset of the

data, called the training data, and then tested over the entire data

set to test real-world performance. In the real world, an

empirical behavior model would rarely see any data which

directly corresponds to training data.

5.1. Data Representation

In order to fully exploit the suspicious level, we need to

examine all fields related with a specific network connection.

For simplicity, we only consider some obvious attributes for

each connection. Altogether there are fifty-seven genes in each

chromosome. If the rule is able to find an anomalous behavior, a

bonus will be given to the current chromosome. If the rule

matches a normal connection, a penalty will be applied to the

chromosome. Clearly no single rule can be used to separate all

anomalous connections from normal connections.

The genetic algorithm starts with a population that has

randomly selected rules. The population can evolve by using the

crossover and mutations operators. Due to the effectiveness of

the evaluation function, the succeeding populations are biased

toward rules that match intrusive connections. Ultimately as the

algorithm stops, rules are selected and added into the IDS rule

base.

5.2. Parameters in Genetic Algorithm

There are many parameters to consider for the application of

GA. Each of these parameters heavily influences the

effectiveness of the genetic algorithm. We will discuss the

methodology and related parameters in the following the

evaluation function is one of the most important parameters in

genetic algorithm. The proposed implementation differs fromthe scheme used by in that the definition on calculations of

outcome and fitness is different. The following steps are used to



5

Destination port number indicates to applications that the target

system is running. Some IP addresses are more probable targets

for intrusions — for example, IP addresses for military domains.

Domain-specific information is less important compared with

the source IP addresses. Other parameters like duration, bytes

sent by the originator, bytes sent by the receiver, and state are

usually less important than the above fields but are still useful.

The protocol and source port number fields are commonly

dispensable and are used for identifying some specific

intrusions.

The absolute difference between the outcome of the

chromosome and the actual suspicious level is then computed

using the following equation. The suspicious level is a threshold

that indicates the extent to which two network connections are

considered a "match." The actual value of suspicious level

reflects observations from historical data.

? = Outcome - Suspicious level

Once a mismatch happens, the penalty value is computed using

the absolute difference. The ranking in the equation indicates

whether or not an intrusion is easy to identify.

Penalty = ? * ranking / 100

The fitness of a chromosome is computed using the above

penalty:

fitness = 1 - penalty

Obviously, the range of the fitness value is between 0 and 1. By

defining evaluation, we have incorporated both temporal and

spatial information needed for identification of network

intrusion.

5.3. Crossover and Mutation

Traditional genetic algorithms have been used to identify and

converge populations of candidate hypotheses to a single globaloptimum. For this problem, a set of rules is needed as a basis for

the IDS. As mentioned earlier, there is no way to clearly identity

whether a network connection is normal or anomalous just using

one rule. Multiple rules are needed to identify unrelated

anomalies, which mean that several good rules are more

effective than a single best rule. Another reason for finding

multiple rules is that because there are so many network

connection possibilities, a small set of rules will be far from

enough.

The mutation operation should be meaningful during evolution.

For example, each segment of the IP address should not exceed

255. Mutations should be done following the requirements

specified in Table 1. These limitations can be enforced by

defining proper mutation rules

5.5 Other Parameters

There are also other parameters that need to be considered, such

as mutation rate, crossover rate, number of populations, and

number of generations. These parameters should be adjusted

according to the application environment of the system and the

organization's security policy.

Destination IP Address

Source IP Address

Destination Port Number

Alternatively, some automated response, such as terminatingthat user's session, will be taken. Normally, a rule firing will

result in additional assertions being added to the fact base.

They, in turn, may lead to additional rule-fact bindings. This

process continues until there are no more rules to be fired.

Consider the intrusion scenario in which two or more

unsuccessful login attempts are made in a period of time shorter

than it would take a human to type in the login information at a

conventional keyboard. If the rule or rules of this scenario fire,

then a specific user's suspicion level can be increased. The

system may raise an alarm or freeze the named user's account.

Account freeze would be entered into the fact database.

6. SYSTEM ARCHITECTURE

Duration

Bytes sent by originator

Bytes Sent by the

receiver

DatasetNetwork

Sniffer

GA&

AI

Rule Set

Rule

Base

State

ProtocolSource Port Number

Figure 4. Architecture of applying GA into intrusion

detection

We need to collect enough historical data that includes both

normal and anomalous network connections. This is the first

High

Low



6

all the associated variables with the rule are consistent with the

binding. The rules with rule-fact bindings that meet the binding

analysis requirements are then gathered into a set from which the

"best" rule is picked, through a process called conflict

resolution. The rule then fires. It may cause an alert to be

raised for a system administrator.

7. CONCLUSION

In this paper, we discussed a methodology of applying genetic

algorithm into network intrusion detection techniques. A brief

overview of Intrusion Detection System (IDS), genetic

algorithm, and related detection techniques are discussed. The

system architecture is also introduced. Factors affecting the GA

are addressed in detail. This implementation of genetic

algorithm is unique as it considers both temporal and spatial

information of network connections during the encoding of the

problem; therefore, it should be more helpful for identification

of network anomalous behavior.

REFERENCE

[1] Bezroukov, Nikolai. 19 July 2003. "Intrusion Detection

(general issues)." Softpanorama: Open Source Software

Educational Society. Nikolai Bezroukov.

[2] Bridges, Susan, and Rayford B. Vaughn. 2000. "Intrusion

Detection Via Fuzzy Data Mining." In Proceedings of 12th

Annual Canadian Information Technology Security

Symposium, pp. 109-122. Ottawa, Canada.

[3] Crosbie, Mark, and Gene Spafford. 1995. "ApplyingGenetic Programming to Intrusion Detection." InProceedings of 1995 AAAI Fall Symposium on Genetic

Programming pp. 1-8. Cambridge, Massachusetts. URLhttp://citeseer.nj.nec.com/crosbie95applying.html (30 Oct.2003).

[4] Graham, Robert. Mar. 21, 2000. "FAQ: Network Intrusion

Detection Systems." RobertGraham.com Homepage.Robert Graham. URL:

http://www.robertgraham.com/pubs/network-intrusion-detection.html (30 Oct. 2003).

[5] Jones, Anita. K. and Robert. S. Sielken. 2000. "ComputerSystem Intrusion Detection: A Survey." Technical Report.Department of Computer Science, University of Virginia,Charlottesville, Virginia. Li, Wei. 2002. "The integration of security sensors into the Intelligent Intrusion DetectionSystem (IIDS) in a cluster environment." Master's ProjectReport. Department of Computer Science, Mississippi StateUniversity.

[6] McHugh, John, 2001. "Intrusion and Intrusion Detection."Technical Report. CERT Coordination Center, Software

Engineering Institute, Carnegie Mellon University.

[13] Anomaly Detection in IP Networks by Marina Thottan and

Chuanyi Ji IEEE Transactions on Signal ProcessingVol51 No8 August 2003.

[14] Twycross J., 2004, 'Immune Systems, Danger Theory andIntrusion Detection', to be presented at the AISB 2004Symposium on Immune System and Cognition(ImmCog-04) Leeds, U.K.

[15] P. D'haeseleer. An immunological approach to changedetection: Theoretical results. In Proceedings of the 9th

IEEE Computer Security Foundations Workshop, LosAlamitos, CA, 1996. IEEE Computer Society Press.

[16] P. D'haeseleer, S. Forrest, and P. Helman. Animmunological approach to change detection:Algorithms, analysis and implications. In Proceedings of the 1996 IEEE Symposium on Research in Security andPrivacy, Los Alamitos, CA, 1996. IEEE ComputerSociety Press.

[17] A Immunological Model of Distributed Detection and itsApplication to Computer Security. Steven A. HofmeyrPhD thesis, Department of Computer Sciences.

AUTHOR PROFILE

http://citeseer.nj.nec.com/crosbie95applying.html

http://robertgraham.com/

http://www.robertgraham.com/pubs/network-intrusion-

http://www.robertgraham.com/pubs/network-intrusion-

http://robertgraham.com/

http://citeseer.nj.nec.com/crosbie95applying.html



7

[7] Miller, Brad. L. and Michael J. Shaw. 1996. "Genetic

Algorithms with Dynamic Niche Sharing for Multimodal Function Optimization." In Proceedings of IEEE InternationalConf. on Evolutionary Computation, pp. 786791. Nagoya University, Japan.

[8] Paxson, Vern. 1998. "Bro: A System for Detecting Network Intruders in Real-time." In Proceedings of 7th USENIX SecuritySymposium, pp. 31-51. San Antonio, Texas.

[9] Pohlheim, Hartmut. 30 Oct. 2003. "Genetic and

Evolutionary Algorithms: Principles, Methods and Algorithms."

[10] Genetic and Evolutionary Algorithm Toolbox. Hartmut Pohlheim.URL:

http://www.geatbx.com/docu/algindex.html.

[10] Roesch, Martin. Nov. 7-12, 1999. "Snort - Lightweight Intrusion Detection for Networks." In Proceedings of13 th

Systems Administration Conf. (LISA '99), pp. 229-238.Seattle, Washington.

[11] Sinclair, Chris, Lyn Pierce, and Sara Matzner. 1999. "An Application of Machine Learning to Network Intrusion Detection."In Proceedings of 1999 Annual Computer Security Applications Conf. (ACSAC), pp. 371-377.Phoenix,Arizona.URL:http://www.acsac.org/1999/papers /fri-b-1030-sinclair.pdf (30 Oct. 2003).

[12] Whitley, Darrell. 1994. "A Genetic Algorithm Tutorial." Statistics and Computing 4: 65-85.

http://www.geatbx.com/docu/algindex.html

http://www.acsac.org/1999/papers

http://www.acsac.org/1999/papers

http://www.geatbx.com/docu/algindex.html



8

S. Jeya

Assistant Professor,M.C.A. Dept.,K.S.R. College Of Engineering,Tiruchengode -637 209,Tamil Nadu

Educationaldetails: B.Sc. Computer Science,Sivanthi Adithanar College, Nagercoil,

M.C.A. Computer Applications, Sivanthi

Adithanar College,Nagercoil. M.Phil.Computer Science.,M.S.University,Tirunelveli. Ph.D.Computer Science. Pursuing, MotherTeresa Womens University, Kodaikanal.Employement Details:1.ZyneTechnology, Bangalore, Software Engineer, 1 Yrs 6 Months,2. Rajaas Engineering College, Tirunelveli, Assistant Professor (HodMca),7 Yrs 2Months, 3. K.S.R. College Of

Engineering, Tiruchengode - 9, Assistant Professor. MembershipDetails: Life Member Of ISTE, Life

Member Of Oxford International

Journal.(201)

Documents

Z-Intrusion Detection System Using Rule-based Systems