Intrusion Detection Model using Self Organizing Maps

Presented By,TUSHAR .A .SHINDE

14MCS1048

UNDER GUIDENCE,Prof. I.Sumaiya Thaseen,

AP(Sr),SCSEVIT University, Chennai. INDIA.

INTRUSION DETECTIONMODELUSING SELF ORGANIZING MAP

CONTENTS

Research ScopeIntrusion Detection SystemMachine LearningClusteringExisting SystemLiterature SurveyProposed ModelConclusionResearch References

Research Scope

The demand of the network is increasing in day-to-day life, And the attacks technique is also increasing, the challenging task in network security is to detect the unknown attacks.

The attacks try to gain unauthorised access to the network system and perform malware activity on the network.

Intrusion Detection System

Intrusion Detection Systems (IDS):Is a software or hardware systems that

automate the process of monitoring.

Analysing the events that occur in a computer network, to detect network attacks.

Allows organization to protect their systems from the threats.

Overview of Intrusion Detection System

In the Figure 1, internet is connected to the IDS system. The IDS system is connected to the company network to detect intrusive networks.

Figure 1: Block diagram of Intrusion Detection System.

Types of Intrusion Detection System

Network Intrusion Detection Systems(NIDS):-It is placed at a strategic point or points within the network to monitor traffic to and from all devices on the network.

Real time Examples : Installed on the web servers, multiple system are connected.

Host Intrusion Detection Systems(HIDS):-Host intrusion detection systems run on individual hosts or devices on the network.

Real time Examples : Personal Computers, Anti-spyware, Anti-virus software.

Issues in Intrusion Detection System

False Positives:A false positive is an event when a NIDS falsely raises a security threat alarm for harmless traffic.

New and sophisticated attacks:Commercial NIDS which are signature based are unable to detect new attacks whose signatures are not yet devised.

Human intervention:The system can automatically take pre-programmed actions but these are limited only to the well known attacks.

Machine Learning

Machine learning is defined for programming computers to optimize a performance criterion using example data or past experience.

Machine learning deals with automatically inferring and generalizing dependencies from data.

In contrast to plain memorization, learning methods aim at minimizing the expected error of a learning task.

Machine Learning Types

Supervised Machine Learning:The data presented to machine learning algorithm is fully labelled.In supervised learning the 'categories' are known. Techniques: Classification techniques.

Un-Supervised Machine Learning:The data presented to a machine learning algorithm is not labelled.In unsupervised learning, the learning process attempts to find appropriate 'categories'.Techniques: Clustering techniques.

Machine Learning in Intrusion Detection Techniques

Classification Technique:o Under Supervised Machine Learningo Analyses and categorize the data into known classes. o Each data sample is labelled with a known class label. o Classification techniques are: decision tree induction,

Bayesian classification. Clustering Technique:o Under Un-Supervised Machine Learningo Process of grouping objects resulting into set of clusters such

that similar objects are members of the same cluster and dissimilar objects belong to different clusters.

o Clustering techniques are: K-means, SOM, EM.

Clustering

Clustering is a process of grouping objects resulting into set of clusters such that similar objects are members of the same cluster and dissimilar objects belong to different clusters.

Clustering Techniques

SOM clustering:1.Self-organizing map is a competitive learning technique.

2.Clustering data without knowing the class memberships of the input data.

K-means clustering:1.It is known as Centroid-based clustering.2.Its a method of classifying items into number of pre-chosen groups.

Expectation-Maximization (EM) clustering:

1.A type of distribution-based clustering.2.It estimates maximum-likelihood parameters from a given data set when the data is incomplete or has missing values

Summary of clustering results with 100 clusters.

Figure: The mean squared error(mse), average purity(ave-pur), and run time(seconds) results for the clustering algorithms with 100 clusters. By the Above we can state that SOM requires less time then EM and also gives more purity then K-means.

Sr.no ALGORITHM Mse Ave-pur Time(seconds)

1. K-means 0.28±0.08 0.93±0.0 78.6±14.6

2. SOM 0.15±0.03 0.935±0.0 98.8±1.3

3. EM 0.20±0.01 0.937±0.0006 266.7±8.0

SOM clustering

o Self-organizing map is a competitive learning technique.

o Use the SOM for clustering data without knowing the class memberships of the input data.

o SOMs are a valuable tool in dealing with complex or vast amounts of data

o This is a data compression technique and provide low-dimensional (1, 2, or 3-D) visualization through a topological map.

o It is one of the most popular neural network models.

SOM clustering

Structure of a SOM:In Figure, a 4x4 SOM network (4 nodes down, 4 nodes across). It is easyto overlook this structure as being trivial, but there are a few key things to notice.First, each map node is connected to each input node. For this small 4x4 node network, that is 4x4x3=48 connections. Secondly, notice that map nodes are not connected to each other. The nodes are organized in this manner, as a 2-D gridmakes it easy to visualize the results.

SOM Algorithm Basic steps:

1.Step1: Initialize Map 2.Step 2: For t from 0 to 1 3.Step 3: Randomly select a sample 4.Step 4: Get best matching unit 5.Step 5: Scale neighbours 6.Step 6: Increase ‘t’ a small amount7.Step 7: End for

The IDS using SOM Algorithm steps are:

Step 1: To choose the output layer of network topology.

To initialize the current neighborhood distance node D (0), a positive value.

Step2: Grab an input vector and to initialize weights values obtained from inputs to outputs to make small random values.

Step3: Let’s consider a=1

The IDS using SOM Algorithm steps are(cont):

Step4: doTo select an input sample ‘ti ’.To use Euclidean distance formula: used to find similarity between input vector and map's node's weight vector.

Computing the square of ‘ti ‘ ∑k = (t i, q − w q, k (a)) 2

Where in, wq =weight vectors.ti = input sample.k (a) = current iteration.q = output node.

The IDS using SOM Algorithm steps are(cont):

To select output node q* that is having weight vector’s value with minimum value from (Step2).

Updating the nodes in neighborhood of BMU by pulling them closer towards input vector

wq (a +1) = w q (a) +η(a)(t i − w q (a )) Where in,n(a) = restraint due to distance and time of iteration from BMU.

Step5: Increment the ‘a’ and repeat until a< limit on time iteration.

Architecture

DATA PREPROCESSING DATAWARE HOUSE

FAULT DETECTION

SELF ORGANIZING MAPS(SOM)

FOR COMPETITIVE LEARNING ALGORITHM

Figure 2: IDS Architecture using SOM technique:

Existing System

Snort:

oA free and open source network intrusion detection and prevention system.

oSnort detects thousands of worms.

Bro:

oAn open-source, Unix-based network intrusion detection system.

oAnalysers that compare the activity with patterns deemed troublesome.

BASE:

oThe Basic Analysis and Security EngineoBASE is a PHP-based analysis engine to search and process a database of security events generated by various network monitoring tools.

Problems with Existing Systems

Fidelity Problem:

o The information used by the intrusion detection system is obtained from packets on a network. Data has to traverse a longer path from its origin to the IDS and in the process can potentially be destroyed or modified by an attacker.

Resource Usage Problem:

o The IDS continuously uses additional resources in the system it is monitoring even when there are no intrusions occurring.

Reliability Problem:

o An intruder can potentially disable or modify the programs running on a system, rendering the intrusion detection system useless.

Proposed Model

Pre-processing of server logs:Our web-site server log file analyser performs the following steps when provided with a log file:

1) It scans the entries in the log files to help identify unique visitor’s sessions.

2) For each identified sessions, the analyser has to examine its key matching features to generate the session’s dimensional feature-vector representation.

Proposed Model

Session identification:In this process of dividing a web-site server access log enters into sessions. Session identification is performed by: 1) Grouping all HTTP requests on web-sites that originate from the same IP address that matches the visitor and also are described by the same user-agent strings.2) By applying a timeout approach to divide into unique sessions to avoid any mishaps

Proposed Model

Dataset labelling:labels each feature-vector as belonging to one of the following four categories:

1. Human visitor’s normal Known.2. well-behaved web-site attackers.3. malicious attackers. 4. unknown visitors unidentified.

Thus, allow a better understanding of the cluster’s nature and significance results can be generated.

Implementation of SOM in WEKA tool:

WEKA is a data mining system developed by the University of Waikato in New Zealand.

It implements data mining algorithms. WEKA is a state-of-the-art facility for developing machine learning (ML) techniques and their application to real-world data mining problems.

It is a collection of machine learning algorithms for data mining tasks. The algorithms are applied directly to a dataset.

WEKA implements algorithms for data pre-processing, classification, regression, clustering, association rules; it also includes a visualization tools.

NSL-KDD Data Set for Intrusion detection

NSL-KDD is a data set suggested to solve some of the inherent problems of the KDD'99 data set

This advantage makes it affordable to run the experiments on the complete set without the need to randomly select a small portion.

A perfect representative of existing real networks, because of the lack of public data sets for network-based IDSs,

Literature Survey

Sr.no Paper Name Authors Techniques1. Two-level Clustering of

Web Sites Using Self-Organizing Maps

D. Petrilis and C. Halatsis

SOM Algorithm

2. Web robot detection: A probabilistic reasoning approach, Computer Networks

A. Stassopoulou and M. D. Dikaiakos

Probabilistic Reasoning, Bayesian classifiers

3. Monitoring the Application-Layer DDoS Attacks for Popular Websites

Y. Xie and S.-Z. Yu

HsMM (Hidden semi-Markov model) for tracking web-crawlers

Literature Survey

1. Y. Xie and S.-Z. Yu, “Monitoring the Application-Layer DDoS Attacks for Popular Websites”, IEEE/ACM Transactions on Networking, vol. 17, no. 1, pp. 15-25, Feb. 2009.

In this paper aiming at monitoring Web traffic in order to reveal dynamic shifts in normal burst traffic, which might signal onset of App-DDoS attacks during the flash crowd event.

2. D. Petrilis and C. Halatsis, “Two-level Clustering of Web Sites Using Self-Organizing Maps” , Neural Process Letters, vol. 27, no. 1, pp. 85-95, Feb. 2008.

This paper proposing a method based on Kohonen’s self-organizing map (SOM) that utilizes both content and context mining clustering techniques to help visitors identify relevant information quicker. The input of the content mining is the set of web pages of the web site whereas the source of the context mining is the access-logs of the web site.

Literature Survey

3. A. Stassopoulou and M. D. Dikaiakos, “Web robot detection: A probabilistic reasoning approach, Computer Networks”, The International Journal of Computer and Telecommunications Networking, vol. 53, no. 3, pp. 265-278, Feb. 2009.

This paper, introduce a probabilistic modelling approach for addressing the problem of Web robot detection from Web-server access logs. that classifies automatically access log sessions as being crawler- or human-induced, by combining various pieces of evidence proven to characterize crawler and human behaviour.

References

Y. Xie and S.-Z. Yu, “Monitoring the Application-Layer DDoS Attacks for Popular Websites”, IEEE/ACM Transactions on Networking, vol. 17, no. 1, pp. 15-25, Feb. 2009.

T. Kohonen, “Self-Organizing Maps, 3rd ed. New York: Springer-Verlag”, Berlin Heidelberg, 2001.

D. Petrilis and C. Halatsis, “Two-level Clustering of Web Sites Using Self-Organizing Maps” , Neural Process Letters, vol. 27, no. 1, pp. 85-95, Feb. 2008.

References

Prolexic Technologies, “Evolving Botnet Capabilities - and What This Means for DDoS”, White Paper, 2010.

A. Stassopoulou and M. D. Dikaiakos, “Web robot detection: A probabilistic reasoning approach, Computer Networks”, The International Journal of Computer and Telecommunications Networking, vol. 53, no. 3, pp. 265-278, Feb. 2009.

THANKING YOU

Data & Analytics

Intrusion Detection Model using Self Organizing Maps