7
Evolving Block-based Neural Network and Field Programmable Gate Arrays for Host-based Intrusion Detection System Quang Anh Tran Faculty of Information Technology Hanoi University Hanoi, Vietnam e-mail: [email protected] Frank Jiang School of Engineering and IT University of New South Wales Canberra, Australia e-mails: [email protected] Quang Minh Ha Faculty of Information Technology Hanoi University Hanoi, Vietnam e-mail: [email protected] Abstract— In this paper, we design a prototype with hybrid software-enabled detection engine on the basis of an evolving block-based neural network (BBNN), and integrate it with a Field Programmable Gate Arrays (FPGA) board to enable a real-time host-based intrusion detection system (IDS). The established prototype can feed sequence of system calls obtained from a server directly into the BBNN based IDS. The structure and weights of BBNN are evolved by Genetic Algorithms. Experimental performance comparisons have been conducted against four major Support Vector Machines (SVMs) by carrying out leave-one-out cross validation. The results show that the improved BBNN outperforms other algorithms with respect to the classification and detection performances. The false alarm rate is successfully reduced as low as 2.22% while the detection rate 100% is still maintained. The running times of the proposed hardware based IDS versus other software based systems are also discussed. Keywords- intrusion detection systems (IDSs); block-based neural network (BBNN); field programmable gate arrays (FPGA) I. INTRODUCTION Nowadays large-scale distributed computing infrastructure is facing new challenges to conventional intrusion detection, prevention and self-healing security systems. The past few years saw over 600 million dollars in losses accrue annually across Australian businesses due to computer security incidents (Australian Institute of Criminology Media Release 09 June 2009. http://www.aic.gov.au/media/2009/june/20090609.aspx). As it is impossible to completely prevent computer attacks, so the intrusion detection systems (IDSs) are expected to minimize the damage caused by potential threats (or attackers). Generally, there are two types of IDSs: host-based and network-based and they can be classified into the following categories: (1) statistical features analysis approaches, correlation analysis [1], signal processing techniques [2]; (2) artificial intelligence, rule-based system, agent-based approaches [3]; and (3) data mining-based approaches [4] [5]. However, apart from the very limited public IDS datasets, current IDS systems are primarily facing two difficulties: (1) Comparatively high error rate of detection, especially for new/unknown attacks, i.e. evolving attacks; and (2) Low speed of detection especially in the large-scale real time environment, such as cloud computing. A large volume of work has been conducted in the literature recently which mainly used computational intelligence approaches to classify and detect the intrusions. Some computational intelligence tools, such as support vector machines (SVMs), fuzzy systems, and rule-based system are commonly used to produce software components for IDS [6]. Furthermore, evolutionary algorithms such as particle swarm optimization (PSO) [7], genetic algorithm (GA) [8], differential evolution (DE) [9] and ant colony optimization (ACO) [10] are also the new approaches in the development of this research area. Pure software-cored IDS can adapt itself quickly with new attacks, however, its detection speed is limited. In another word, it cannot meet the requirements in a large- scale distributed environment in the future computing scenarios such as cloud computing where enormous amount of data are merging into a centralized infrastructure. In comparison, a hardware-cored IDS system will perform a comparatively high detection speed in comparison with the existing software IDSs. However, it is unable to keep up with the new evolving attacks. Forrest [11] applied immune theory to abnormality detection for the first time in 1994. Since then, many researchers have proposed different malware detection models and achieved some success. Hence, it is the authors’ belief that a hardware-cored IDSs with proper evolving ability will be able to cope with the requirements such as real time high volume of data. In this paper, an IDS prototype is established by using the field- programmable gate array (FPGA), which is equipped with the internally evolving Block Based Neural Network (BBNN) as a software-driven engine to perform attack related detection. It has higher speed and the ability to adapt to new samples in reducing the error rate. The internal block structure of BBNN matches very well with the reconfigurable logic gate array structure of FPGA hardware. The major contributions of this paper are three-fold. First, an incremental method is proposed to perform real- time feature extraction. It enables us to increase the efficiency of feature extraction and maintain the power of FPGA processing. Secondly, a real-time hardware-cored IDS has been designed, which is seamlessly integrated with the evolving model-free BBNN. This unique combination upgrades the IDS’s learning and adaptation capabilities to achieve detecting the unknown/new attacks with better 2012 Fourth International Conference on Knowledge and Systems Engineering 978-0-7695-4760-2/12 $26.00 © 2012 IEEE DOI 10.1109/KSE.2012.31 86

[IEEE 2012 Fourth International Conference on Knowledge and Systems Engineering (KSE) - Danang, Vietnam (2012.08.17-2012.08.19)] 2012 Fourth International Conference on Knowledge and

Embed Size (px)

Citation preview

Page 1: [IEEE 2012 Fourth International Conference on Knowledge and Systems Engineering (KSE) - Danang, Vietnam (2012.08.17-2012.08.19)] 2012 Fourth International Conference on Knowledge and

Evolving Block-based Neural Network and Field Programmable Gate Arrays for Host-based Intrusion Detection System

Quang Anh Tran Faculty of Information Technology

Hanoi University Hanoi, Vietnam

e-mail: [email protected]

Frank Jiang School of Engineering and IT

University of New South Wales Canberra, Australia

e-mails: [email protected]

Quang Minh Ha Faculty of Information Technology

Hanoi University Hanoi, Vietnam

e-mail: [email protected]

Abstract— In this paper, we design a prototype with hybrid software-enabled detection engine on the basis of an evolving block-based neural network (BBNN), and integrate it with a Field Programmable Gate Arrays (FPGA) board to enable a real-time host-based intrusion detection system (IDS). The established prototype can feed sequence of system calls obtained from a server directly into the BBNN based IDS. The structure and weights of BBNN are evolved by Genetic Algorithms. Experimental performance comparisons have been conducted against four major Support Vector Machines (SVMs) by carrying out leave-one-out cross validation. The results show that the improved BBNN outperforms other algorithms with respect to the classification and detection performances. The false alarm rate is successfully reduced as low as 2.22% while the detection rate 100% is still maintained. The running times of the proposed hardware based IDS versus other software based systems are also discussed.

Keywords- intrusion detection systems (IDSs); block-based neural network (BBNN); field programmable gate arrays (FPGA)

I. INTRODUCTION Nowadays large-scale distributed computing

infrastructure is facing new challenges to conventional intrusion detection, prevention and self-healing security systems. The past few years saw over 600 million dollars in losses accrue annually across Australian businesses due to computer security incidents (Australian Institute of Criminology Media Release 09 June 2009. http://www.aic.gov.au/media/2009/june/20090609.aspx). As it is impossible to completely prevent computer attacks, so the intrusion detection systems (IDSs) are expected to minimize the damage caused by potential threats (or attackers). Generally, there are two types of IDSs: host-based and network-based and they can be classified into the following categories: (1) statistical features analysis approaches, correlation analysis [1], signal processing techniques [2]; (2) artificial intelligence, rule-based system, agent-based approaches [3]; and (3) data mining-based approaches [4] [5].

However, apart from the very limited public IDS datasets, current IDS systems are primarily facing two difficulties: (1) Comparatively high error rate of detection, especially for new/unknown attacks, i.e. evolving attacks;

and (2) Low speed of detection especially in the large-scale real time environment, such as cloud computing.

A large volume of work has been conducted in the literature recently which mainly used computational intelligence approaches to classify and detect the intrusions. Some computational intelligence tools, such as support vector machines (SVMs), fuzzy systems, and rule-based system are commonly used to produce software components for IDS [6]. Furthermore, evolutionary algorithms such as particle swarm optimization (PSO) [7], genetic algorithm (GA) [8], differential evolution (DE) [9] and ant colony optimization (ACO) [10] are also the new approaches in the development of this research area.

Pure software-cored IDS can adapt itself quickly with new attacks, however, its detection speed is limited. In another word, it cannot meet the requirements in a large-scale distributed environment in the future computing scenarios such as cloud computing where enormous amount of data are merging into a centralized infrastructure. In comparison, a hardware-cored IDS system will perform a comparatively high detection speed in comparison with the existing software IDSs. However, it is unable to keep up with the new evolving attacks. Forrest [11] applied immune theory to abnormality detection for the first time in 1994. Since then, many researchers have proposed different malware detection models and achieved some success.

Hence, it is the authors’ belief that a hardware-cored IDSs with proper evolving ability will be able to cope with the requirements such as real time high volume of data. In this paper, an IDS prototype is established by using the field-programmable gate array (FPGA), which is equipped with the internally evolving Block Based Neural Network (BBNN) as a software-driven engine to perform attack related detection. It has higher speed and the ability to adapt to new samples in reducing the error rate. The internal block structure of BBNN matches very well with the reconfigurable logic gate array structure of FPGA hardware.

The major contributions of this paper are three-fold. First, an incremental method is proposed to perform real-time feature extraction. It enables us to increase the efficiency of feature extraction and maintain the power of FPGA processing. Secondly, a real-time hardware-cored IDS has been designed, which is seamlessly integrated with the evolving model-free BBNN. This unique combination upgrades the IDS’s learning and adaptation capabilities to achieve detecting the unknown/new attacks with better

2012 Fourth International Conference on Knowledge and Systems Engineering

978-0-7695-4760-2/12 $26.00 © 2012 IEEE

DOI 10.1109/KSE.2012.31

86

Page 2: [IEEE 2012 Fourth International Conference on Knowledge and Systems Engineering (KSE) - Danang, Vietnam (2012.08.17-2012.08.19)] 2012 Fourth International Conference on Knowledge and

u1

u2

v4

v3

w14

w23w24

w13

w03

w04

u1

u2

u4

v3w34

v4

v3

w12

w03

w04

u1

v2

u4

v3

w12w13

w42 w43

w03

w02

(a) (b)

(d)(c)

u1

v2 w14

w13

w02

w04

w14w24

detection performance in a real-time environment. Thirdly, the key algorithm based on BBNN in the proposed system has been tested and further validated by a standard public IDS dataset.

This paper is organized as follows: Section II introduces the preliminaries of Block-based Neural Network, its tuning algorithms, and a description of a host-based IDS dataset. Section III discusses the proposed framework for the host-based intrusion detections using BBNN and FPGA; an incremental method is used for feature extraction. Section IV presents the experimental results and the discussions. Finally, section V concludes the paper and discusses the directions for the future research work.

II. PRELIMINARIES Artificial neural networks (ANNs) have been widely

researched over the decades and have been successfully employed into many application areas, such as pattern recognition, heartbeat monitoring, classifications and parallel distributed complex problems [12]. Its data-driven model-free structure enables the complex decision process. Although ANNs have been adopted to serve a variety of practical problems, a given problem still requires a specific design of ANN. As Moon and Kong stated in their work [13], design of a suitable network structure/configuration and optimization of corresponding parameters (e.g., weights) are still the typical types of problems in ANN researches. Hardware realization of artificial neural networks has attempted mainly with analog device and particularly in IDS. A well-posed BBNN network in the FPGA structure improves the detection capabilities against the ever-changing dynamics in the complex environment, which will be a target of this paper.

A. Block-based Neural Networks A block-based neural network (BBNN), proposed by

Moon and Kong in [13], is a network designed with flexibility to change the structure according to the signal flows between blocks as shown in Fig. 1. The BBNN can be represented by 2-D array of blocks and each individual neuron block works as a basic signal processing unit that is composed of a feed-forward neural network having four variable input/output nodes. As shown in Fig. 1, the structure of BBNN is organized with m rows and n columns of blocks which is labeled as Bi j, in which (i = 1,…, m) and ( j = 1,…,n).

In this paper, to simplify the process, we consider the connections between layers reflecting only feed-forward networks as the structure of BBNN is determined by automatic internal configuration or input-output connections of basic blocks. According to the input-output connections of the network structure, basically, the block has four different types of internal configuration. As shown in Fig. 2 (c) and (d) represent two inputs and two outputs with different internal configurations; (a) shows one input and three output basic blocks; and (b) shows three inputs and one output settings. In general, the capability BBNN is further improved through various internal configurations of the block settings

[12] [13]. We have applied the flexible configurations in the algorithm.

The output of the block is calculated by the summation of weighted inputs and biases corresponded to a feed-forward NN as follows:

j

iiijj wuwv 0

4

1+=�

=

(1)

where ui is the input to node i, vj is the output of node j, wij is the weight.

Each node is characterized by activation functions. Input nodes use a linear activation function, while the output nodes use symmetric saturated linear activation functions for hardware implementation. In this paper, we followed the description in [13] to implement BBNN in Perl programming language. The inputs were scaled to the range from -1 to 1 and the saturated linear activation function was used to keep the output values of each basic block being in the range from -1 to 1. The system was tested very well in Linux platform.

Figure 1. Internal structure of the BBNN network (adapted from [13])

Figure 2. Internal settings for blocks (adapted from [13])

87

Page 3: [IEEE 2012 Fourth International Conference on Knowledge and Systems Engineering (KSE) - Danang, Vietnam (2012.08.17-2012.08.19)] 2012 Fourth International Conference on Knowledge and

B. Evolving BBNN by Genetic Algorithm Optimization of the BBNN is composed of two tasks:

structure and parameter optimizations. Structure optimization corresponds to the determination of the type of each block. Parameter optimization or learning is the process that determines the connection weight values in each block. The types of basic blocks and the weights are tuned by the genetic algorithm. As the conventional binary encoding has the inherit shortcomings [14] [15], to overcome this problem, we take the similar enhanced encoding scheme and genetic operator from the paper [13]. This algorithm searches for global optimum based on the given fitness function, within a possible combinations of structure and weights. Genetic algorithm is used to find the optimal solution in search space consisted of chromosomes according to fitness function, which represent a set of possible settings of structure and weights of the BBNN. In fact, the overall performance can be further improved by other algorithms to tune the parameters if the given problem has various levels of complexities and computing workload. Recently, the authors have developed a new tuning algorithm called hybrid particle swarm optimization with the wavelet mutation [16], which has demonstrated promising performance in parameter optimizations.

The network structure and weights are assigned to a two-dimensional chromosome. A block is connected to its neighboring blocks with signal flow represented by arrows:

and represent 0; while and represent 1. Conventional genetic operators such as crossover, mutation and selection were used to produce offspring from parents. In this paper, we follow up the improved GA as of the paper [13], but with minor alterations which are necessary for the generation and population settings.

C. Dataset In this paper, a sequence of system calls produced by a

sendmail program in UNIX server was used to perform experiments. The dataset is publicly available at http://www.cs.unm.edu/~immsec/data/SM/UNM. It includes normal data and intrusion data. The dataset includes several files, each file contains a sequence of system call (represented by integers) when a sendmail operates normally and abnormally. The sizes of the sequences differ very much. Some is very large such as file sendmail.daemon.int contains a sequence of 1571583 system calls.

The normal dataset includes the following files: bounce-1.int, bounce.int, queue.int, sendmail.daemon.int, bounce-2.int, plus.int, and sendmail.log.int; The intrusion dataset includes the following files: sm-280.int, sm-314.int, fwd-loops-1.int, fwd-loops-2.int, fwd-loops-3.int, fwd-loops-4.int, fwd-loops-5.int, sm-10763.int, sm-10801.int, and sm-10814.int.

The sequences of system calls in the normal dataset were recorded by a sendmail process running normally; the sequences of system calls in the intrusion dataset were recorded by a sendmail process running abnormally. For the case of normal dataset, we can consider each file (each sequence) as a set of many sub-sequences and each sub-

sequence is normal sample; but in the case of intrusion dataset, we should consider each file (each sequence) as an intrusion sample, because we don’t know exactly which part of the file an intrusion happened. Therefore, in this problem, we have so many normal samples, but only 10 intrusion samples.

We divided the dataset into training dataset and testing dataset. For normal data, in the training dataset, we use the following files: bounce-1.int, bounce.int, queue.int, sendmail.daemon.int; in the testing dataset, we used the following files: bounce-2.int, plus.int, sendmail.log.int

Because the number of intrusion samples is very few (10 samples), for the case of intrusion data, we used a leave-one-out method for training and testing. The leave-one-out method leaves out one intrusion sample for testing and the rest for training; repeatedly doing this with every intrusion sample and compute the average performance [17].

D. Definition of Sample Clarifying samples was very important since they are the

objects we were going to detect as normal or intrusion. We could scan the dataset by a sliding window to produce samples. Each time, the sliding window moves forward 1 step, it will produce one sample. For example, if the dataset is

(66, 66, 4, 138, 66, 5, 23, 45, 4), then a sliding window of size 4 will produce 6 samples: (66, 66, 4, 138), (66, 4, 138, 66), (4, 138, 66, 5), (138, 66, 5, 23), (66, 5, 23, 45), (5, 23, 45, 4) There are two discussion points regarding the sliding

windows: • The smaller size of sliding window, the better ‘real-

time’ of the detection; however, the smaller size of sliding window, the higher error rate of the detection, because there is less information included in one sample.

• The dataset is store in some files (some large sequences). The problem is that in intrusion files (e.g. fwd-loops-1.int), we don’t know which part of the sequence is intrusion. We only know the whole sequence is intrusion; therefore, if we use sliding window to scan an intrusion file (e.g. fwd-loops-1.int), it will produce both normal samples and intrusion samples and we don’t know exactly which one is intrusion. (It works well for the case of normal file because it will produce all normal samples).

In this paper, we used a sliding window of size 30. For each sample, we extracted features and presented the sample as a vector (e.g. [0 1 0 1]). In case the samples came from a normal data file, the sample would be labeled as normal. When a sample came from an intrusion data file, if the sample (vector) did not appear in the set of normal samples, it would be labeled as intrusion (intrusion); otherwise it would be considered as normal sample and removed from the intrusion training dataset.

88

Page 4: [IEEE 2012 Fourth International Conference on Knowledge and Systems Engineering (KSE) - Danang, Vietnam (2012.08.17-2012.08.19)] 2012 Fourth International Conference on Knowledge and

III. THEORETICAL FRAMEWORK

A. Descriptions of the Framework The framework of our host-based IDS is depicted in Fig.

3, which integrates BBNN with FPGA implementation. Basically, a real-time feature extraction component receives (real-time) sequence of system calls and then produces a vector for each sample.

Furthermore, the fundamental working mechanism in the framework can be summarized as follows:

• The FPGA device will detect (real-time) the vector as an intrusion or not, and the vector as well as its label will be stored in a database.

• The BBNN continuously trains its structure and configuration (off-line) upon the up-to-date database.

• The FPGA updates the configuration from the BBNN.

One of the most obstacles on the way to apply BBNN-FPGA to a real time large-scale IDS is the performance of feature extraction component. This component’s outputs will be fed directly as inputs of FPGA – a hardware device; therefore, the performance of the feature selection component is critical. We therefore proposed an incremental method for feature extraction as discussed later in this paper.

B. Incremental method for feature extraction Fig. 4 shows the feature selection diagram. The

description of its components is discussed as follows: • Training dataset: The dataset used for training; for

the case of intrusion data, we used leave-one-out method as described above. Each time leave one out, the training dataset changed.

• Get words: We scanned the training dataset and record all 2-word features (e.g. “66 4”) and stored in Word sets.

• Word sets: Word set contained words came from both normal and intrusion data

Figure 3. Host-based Intrusion Detection System based on BBNN-FPGA Framework

Figure 4. Feature selection diagram

• Select words: We selected words (features) based on the Bayesian theory.

For each word w, we computed the conditional probability of a sample is an intrusion given this sample contains word w, as follows.

Let H be the event that a sample contains word w. Let E be the event that a sample is an intrusion. We computed the conditional probability Pr[E | H] as shown in (2)..

]Pr[

]Pr[]|Pr[H

HEHE ∩= (2)

]Pr[ HE ∩ and Pr[H] were computed based on the training dataset. Let A be the number of intrusion sample; let B be the number of normal sample; let C be the number of intrusion sample that contains word w; and let D be the number of normal sample that contains word w, ]Pr[ HE ∩ and Pr[H] were approximately computed by formula in (3) and (4)

BA

CHE+

=∩ ]Pr[ (3)

BADCH

++=]Pr[ (4)

After we computed the Pr[E|H] of all words, we selected the best words for feature sets as follows: If Pr[E | H] � 0.75, then select this word as intrusion feature; If Pr[E | H] � 0.25, then select this word as normal feature.

For the case of leave out the file sm-10814.int from intrusion dataset, as an example, after the Get words, the Word sets includes 339 words from the normal dataset and 244 words from intrusion dataset. (Note: we counted only different words); after the step Select words, we obtained

89

Page 5: [IEEE 2012 Fourth International Conference on Knowledge and Systems Engineering (KSE) - Danang, Vietnam (2012.08.17-2012.08.19)] 2012 Fourth International Conference on Knowledge and

Procedure Extract-Feature (s) returns a vector input: s, a new system call global: ksss ,...,, 21 , the current sample (sliding window). (k is the size of sliding window, si is the number representing the i system call). ni fff ,...,, 2 ,feature words. (n is the number of feature words) nNNN ,...,, 21 , iN is the number of time feature word fi appears in the current sample (sliding window) tFFF ,...,, 21 ,the current feature vector. (t is the number of feature groups) for i := 1 to n if if equals ][ 21ss , then

1−= ii NN if iN equals 0, then

0=kF , k is the group contains feature if else if if equals ][ ssk , then 1+= ii NN

if iN equals 1, then 1=kF , k is the group contains feature if return tFFF ,...,, 21

267 words as normal features (most-normal-like words) and 60 words as intrusion features (most-intrusion-like words).

Ideally, we should use all 267 most-normal-like words and 60 intrusion words as features set; however using all words would make the vector (of each sample) too long; therefore, I equally divided the words (267 most-normal-like words and 60 intrusion words) into 2 groups of most-normal-like words and 2 groups of most-intrusion-like words. We grouped words with high Pr[E|H] into a group.

For each training file, we used a sliding window to produce sample points as well as feature vectors. We used an incremental method to perform real-time feature extraction. Each vector was generated based on the previous vector by the following procedure.

As shown in the above procedure, the difference between

a new sample and the previous sample is just removing an old word ][ 21ss and adding a new word ][ ssk . Assume fu

and fv represent the feature word ][ 21ss and ][ ssk respectively. We just need to modify the values

1−= uu NN and 1+= vv NN as well as keeping all other values unchanged. Using this incremental method could significantly improve the proficiency of feature extraction.

C. Altera FPGA We used the Altera manufactured field-programmable

gate array (FPGA) board, which is called Altera Cyclone III FPGA Starter Board with USB interface. As a proof-of-concept process, we adopted an easy-to-configure model for

the BBNN structure. This model has 50MHz clock, 32MB Flash and 4 LEDs. We developed FPGA using Verilog HDL (a hardware design language) on FPGA development software - Quartus II v9.1 web edition.

IV. EXPERIMENTS

A. Experiment Settings When a normal data is detected as attack it is called false

alarm or false positive. We use FAR to represent false alarm rate. When an attack data is detected as normal, it is called false negative (FN). Normally, a false negative is much more dangerous than a false alarm. The detection rate refers to the rate that an attack is detected correctly, i.e. 1 – FN (rate). Detection rate and false alarm rate are the most popular measures to evaluate an intrusion detection system [18].

In this paper, we aimed to increasing the correction rate of detection and this was the return value of the fitness function in GA. The parameter settings of BBNN and GA are shown in Table I. In order to calculate the generalization performance, we used leave-one-out evaluation approaches.

B. BBNN performance In evolution, fitness of a chromosome (BBNN) is the

correction rate when use the BBNN to classify the training dataset. The improvement of the best fitness during the evolution process is shown in Fig. 5.

Table II shows the detection rate and false alarm rate of each time leave-one-out. In average, BBNN performs high genuine detection rate 1 while the false alarm rate is 0.0222.

In evolution process, fitness of a chromosome in BBNN shows the correction rate when the BBNN is to classify the training dataset. We have made the improvement of the best fitness during the evolution process in Fold-1 and as shown in Fig. 6 where 2-D encoding solution for the intrusion detection problem (for the case of leave out the file sm-10814.int) has been made.

TABLE I. BBNN AND GA PARAMETER SETTINGS

Parameter Value Justification Number of inputs in BBNN

4 There are 4 features of each sample

Number of stages in BBNN

3 N/A

Number of bits to encode each weight

2 N/A

Size of population in GA

100 N/A

Probability of crossover in GA

0.0005 N/A

Probability of mutation of weight in GA

0.0005 N/A

Probability of mutation of structure in GA

0.00005 The mutation of structure should be less than the mutation of weight.

90

Page 6: [IEEE 2012 Fourth International Conference on Knowledge and Systems Engineering (KSE) - Danang, Vietnam (2012.08.17-2012.08.19)] 2012 Fourth International Conference on Knowledge and

0 10 20 30 40 500.0

0.2

0.4

0.6

0.8

1.0

The

best

fitn

ess

Generation

Number of geneation

50 Normally the more number of generations in GA, the higher accuracy of BBNN. In this experiment, we stopped the GA optimization after 50 generations for the purpose of comparison with other algorithms.

Figure 5. The improvement of fitness during evolution

1 0 1 0 1 0 0 0 0 1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 1 0 0 0

0 1 1 1 0 1 0 1 0 0 1 0 0 0 1 1 1 1 0 1 0 1 1 0 0 1 1 0

1 0 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 0

0 0 1 0 1 1 0 1 0 0 1 1 0 0 0 1 0 1 1 0 0 1 1 0 0 0 0 0

0 1 1 0 1 0 0 0 1 1 1 1 1 0 0 0 1 1 0 0 1 1 1 0 1 0 0 0

0 1 1 1 0 1 0 0 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 1 0 0 0 0

Figure 6. 2-D encoding solution for the sendmail intrusion detection problem (for the case of leave out the file sm-10814.int)

TABLE II. BBNN PERFORMANCE OF EACH TIME LEAVE-ONE-OUT.

Sample left out DR FAR sm-280.int 1 0.02221 sm-314.int 1 0.02221 fwd-loops-1.int 1 0.02223 fwd-loops-2.int 1 0.02218 fwd-loops-3.int 1 0.02218 fwd-loops-4.int 1 0.02218 fwd-loops-5.int 1 0.02223 sm-10763.int 1 0.02223 sm-10801.int 1 0.02223 sm-10814.int 1 0.02218 Average 1 0.02221

C. Performance Comparisons with Other Algorithms. SVMs (Support Vector Machines) were selected as

benchmark due to their well-known performance for these algorithms and they are the popular classification and error detection algorithms in the recent literature. SVMs are learning machines that implement the empirical risk minimization principle in learning theory [19]. In this paper, we used LIBSVM version 3.11 (an open source software

available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/) to implement SVM algorithms. In SVM, using different types of kernel function may lead to different performance of SVM. The 4 typical kernels for SVM are as follows.

• Linear kernel

( ) cK Tlinear += 2121, xxxx (5)

• Polynomial kernel

( ) ( )dTpoly cK += 2121 , xxxx α (6)

• (RBF kernel (Radial Basic Function)

( ) 221

21 , xxxx −−= γeK RBF (7)

• Sigmoid kernel

( ) ( )caK Tsigmoid += 2121 tanh, xxxx (8)

In this experiment, we chose c = 0, a = 0.25, γ =0.25 and d = 3 as default settings of LIBSVM.

We compare the performance of BBNN with these four typical classification algorithms in SVM. All comparison used the same dataset and the same way of leave-one-out evaluation. The results are shown in Table III.

Table III shows the comparison work between BBNN and SVMs (four variations with different kernel functions). The comparison, as shown in Table III, demonstrates the proposed BBNN system can achieve 2.22% error rate while maintain high detection rate – 100%. This result outperforms other benchmark algorithms.

D. Performance of BBNN-FPGA IDS We performed BBNN-FPGA IDS in an experimental

platform, in which the Altera Cyclone III FPGA Starter Board was connected to a server via USB port. We designed FPGA using Verilog HDL on the Quartus II v9.1 web edition. The FPGA design matches with the 4-input-3-stage BBNN in our host-based IDS problem. The configuration, i.e. the type of basic block and weights are configurable. The BBNN was trained in the server to produce the best BBNN configuration (structures and weights). This best BBNN configuration was then updated to an on-board Flash of Altera Cyclone III FPGA Starter Board. The FPGA IDS was then updated with its configuration from the Flash. We also upload the testing data into the Flash, from where the data was read by FPGA IDS for testing. For testing purpose, we generated 18,000 data samples. (We could generate more data but the size of the Flash is limited.) The output of the IDS was a LED with illumination alarmed an intrusion.

Table IV shows the comparison in term of running time between BBNN-FPGA IDS and other software IDS. The software IDSs were running on the server with 512MB of RAM, Intel(R) Pentium(R) 4 CPU 2.40GHz, and Hitachi 80GB, ATA100, 7200 rpm hard disk.

91

Page 7: [IEEE 2012 Fourth International Conference on Knowledge and Systems Engineering (KSE) - Danang, Vietnam (2012.08.17-2012.08.19)] 2012 Fourth International Conference on Knowledge and

TABLE III. COMPARISON BETWEEN BBNN AND OTHER ALGORITHMS

BBNN SVM (linear)

SVM (polynomial)

SVM (RBF)

SVM (sigmoid)

DR 1 1 1 1 1 FAR 0.0222 0.0336 0.0329 0.0222 0.0336

TABLE IV. RUNNING TIME OF BBNN-FPGA IDS VS OTHER SOFTWARE IDS

Approach SVM BBNN (software)

BBNN-FPGA (hardware)

Running time (s) 8.436 6.623 � 0.0054 We used the ‘time’ command in Linux to compute the

running time of the software IDS. We can see from Table IV that for the software version, the BBNN software IDS is slightly better than SVM IDS. In our theoretical computation, the FPGA IDS took about 15 clock cycles to process 1 data sample. With the 50MHz clock in our experimental FPGA board, the time to process 18,000 samples is approximately 0.0054 second, that is about 1,000 times faster than a software IDS.

V. CONCLUSION In this paper, we have proposed an approach where the

intelligent learning algorithm BBNN can be integrated with a high-frequency enabled FPGA, which are suitable for the processing of large-scale dataset from the real-time mode. We have compared the performance of an improved BBNN where the GA mutation has been enhanced with SVMs using 4 major kernels. The performance comparison shows the promising results. The BBNN performance is best among all SVMs and also better than that of the original two-layer perception neural network. The performance of this improved BBNN achieves following good results: detection rate is 100% while the false alarm rate is as low as 2.22%. Analysis on hardware performance of FPGA-IDS shows that our system can perform about 1,000 times faster than a software IDS. A further research of this topic is to improve the training process of BBNN. We may replace GA by better optimization algorithms.

ACKNOWLEDGEMENT This research was supported by the Vietnam National

Foundation for Science and Technology Development (NAFOSTED) under project number 102.01-2010.09; the Ministry of Education and Training under agreement number B2010-26-16. Rector-Funded Visiting Fellowship (2012) at the UNSW, Canberra, Australia; and the scheme of UNSW Vice-Chancellor's research initiatives (2011-2014).

REFERENCES [1] S. Y. Daniel, J. Shuyuan, and W. Xizhao, “Covariance-matrix

modeling and detecting various flooding attacks,” Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, Vol. 37, No. 2, 2007, pp. 157–169,

[2] K. Premkumar and A. Kumar, “Optimal sleep-wake scheduling for quickest intrusion detection using wireless sensor networks,” Proc. INFOCOM, 2008, pp. 1400-1408

[3] E. Taha, I. A. Ghaffar, A. M. Bahaa Eldin, and H. M. K. Mahdi, “Agent based correlation model for intrusion detection alerts,” Proc. IEEE International Conference on intelligence and Security Informatics (ISI), 2010, pp. 89-94

[4] Z. Junzhong, X. Maozhi, S. Shanli, and Y. Lin, “A model of evolving intrusion detection system based on data mining and immune principle,” Proc. IEEE Region 10 Conference - TENCON, Vol. 2, 2004, pp. 199–202

[5] M. Ektefa, S. Memar, F. Sidi, and L. S. Affendey, “Intrusion detection using data mining techniques,” Proc. International Conference on Information Retrieval and Knowledge Management, (CAMP), 2010, pp. 200–203

[6] D.-Y. Yeung and Y. Ding, “Host-based intrusion detection using dynamic and static behavioral models,” Pattern Recognition, Vol. 36, Issue 1, 2003, pp 229–243

[7] J. Kennedy and R. Eberhart, “Particle swarm optimization.” Proc. The IEEE International Conference on Neural Networks, Vol. 4, 1995, pp. 1942-1948

[8] J.H. Holland. Adaptation in natural and artificial systems. Ann Arbor: Univ. Michigan Press, 1975

[9] R. Storn and K. Price, "Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces". Journal of Global Optimization. Vol 11, 1997, pp 341-359

[10] M. Dorigo and L. Gambardella, “Ant colony system: a cooperative learning approach to the traveling salesman problem,” IEEE Transactions on Evolutionary Computation, Vol. 1, No. 1, 1997, pp. 53–66

[11] R. d. Lemos, J. Timmis, M. Ayara, and S. Forrest, “Immune-inspired adaptable error detection for automated teller machines,” IEEE Transactions on Systems, Man, and Cybernetics, Part C, vol. 37, no. 5, pp. 873– 886.

[12] W. Jiang and G. Seong Kong, “Block-Based Neural Networks for Personalized ECG Signal Classification”, IEEE Transactions on Neural Networks, Vol. 18, No. 6, 2004, pp 1750-1761

[13] M. Sang-Woo. and K. Seong-Gon (2001). "Block-based neural networks." IEEE Transactions on Neural Networks, 12(2): 307-317.

[14] S. G. Merchant and G. D. Peterson. "Evolvable Block-Based Neural Network Design for Applications in Dynamic Environments." VLSI Design, 2010, pp 1-25

[15] L. Joon-Yong, K. Min-Soeng, et al. ”Study on encoding schemes in compact genetic algorithm for the continuous numerical problems.” Proc. SICE Annual Conference, 2007.

[16] S. H. Ling, F. Jiang, K.Y. Chan, and H.T. Nguyen, “Hybrid Fuzzy Logic-Based Particle Swarm Optimization for Flow Shop Scheduling Problem”, in International Journal of Computational Intelligence and Applications, Vol. 10, No. 3, 2011, pp. 335–356

[17] A. Lunts, V. Brailovskiy. 1967. Evaluation of attributes obtained in statistical decision rules, Engineering Cybernetics, 1967, 3: 98~109

[18] R.P. Lippman, D.J. Fried, et al, “Evaluating intrusion detection systems: the 1998 DARPA off-line intrusion detection evaluation,” Proc. The DARPA Information Survivability Conference and Exposition, vol. 2, 1999, pp. 12-26

[19] V.N. Vapnik. An overview of statistical learning theory. IEEE Transactions on Neural Networks, 1999, 10(5): 988-999

92