38
Data Mining Based Intrusion Detection System Krishna C Surendra Babu

slides

  • Upload
    tommy96

  • View
    676

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: slides

Data Mining Based Intrusion Detection System

Krishna C Surendra Babu

Page 2: slides

Papers: A Data Mining Framework for Building

Intrusion Detection Models(Wenke Lee, Salvotore J. Stolfo)- Research supported in parts by grants from

DARPA

Creation and Deployment of Data Mining-Based Intrusion Detection Systems in Oracle Database 10g

Page 3: slides

Intrusion Detection System:

Intrusion Detection Techniques: Anomaly Detection

Misuse Detection DOS Probing Unauthorized access to local super user

(U2R) Unauthorized access from a remote

machine (R2L)

Page 4: slides

Requirements: Reliable Extensible Easy to manage Low maintenance cost

Page 5: slides

Data MiningData mining refers to extracting or mining knowledge from large amounts of data.

Data Warehouse A data warehouse is a repository

of information collected from multiple sources

A Data Mining Framework for Building Intrusion Detection Models

Page 6: slides
Page 7: slides

Why Data Mining? The dataset is large. Constructing IDS manually is

expensive and slow. Update is frequent since new

intrusionoccurs frequently.

A Data Mining Framework for Building Intrusion Detection Models

Page 8: slides

Challenges for Data Mining in building IDS

Develop techniques to automate theprocessing of knowledge-intensive feature

selection. Customize the general algorithm to incorporate

domain knowledge so only relevant patterns are reported

Compute detection models that are accurate and efficient in run-time

Page 9: slides

Mining the data

Dataset Types: Network based dataset Host based dataset

Build IDS by mining in the records. When an attack is detected, give alarms to

the administration system.

Page 10: slides

Framework of Building IDS

Preprocessing. Summarize the raw data. Association Rule Mining. Find sequence patterns (Frequent

Episodes) based on the association rules. Construct new features based on the sequence patterns. Construct Classifiers on different set of features

Page 11: slides

Preprocessing To summarize raw data to high level

event, e.g network connection, time, duration,

service, host, destination

Bro and NFR Packet filtering Techniques can be used.

Page 12: slides
Page 13: slides

Classification Classify each audit record into one of

a discrete set of possible categories, normal or a particular kind of intrusion.

Page 14: slides

Association rule mining

Searches for interesting relationships among attributes in a given data set i.e. to derieve multi feature(attribute) correlations from a database table.

Page 15: slides

Sequence Pattern Mining

Frequent Episodes. X,Y->Z, [c,s,w] With the existence of itemset X and Y, Z

will occur in time w.

Page 16: slides

Feature Construction

Feature extraction is the processes of determining what evidence that can be taken from raw audit data is most useful for analysis.

Construct new feature according to the frequent episode.

Some features will show close relationship to

each other. Then combine the features. Some frequent episode may indicate

interesting new features.

Page 17: slides

Build Model (classifier) Build different classifiers for differentattacks.

Page 18: slides

Experiments

The DARPA data 4G compressed tcpdump data of 7 weeks of network

traffics. Contains 4 main categories of attacks

DOS: denial of service, e.g., ping-of-death, syn flood

R2L: unauthorized access from a remote machine, e.g., guessing password

U2R: unauthorized access to local super user privileges by a local unprivileged user, e.g., buffer overflow

PROBING: e.g., port-scan, ping-sweep

Page 19: slides

Results

Training on the 7 weeks of labeled data, and testing on

the 2 weeks unlabeled data. The test data contains 14 attack types which do

not exist in training data. Comparing 4 methods:

Columbia: the IDS developed according to the framework

introduced above Group 1-3: three systems developed by knowledge

engineering approaches.

Page 20: slides
Page 21: slides

Results

Detection rate on New and Old attacks. Old attacks: type of attacks occur in both

training and testing data. New attacks: type of attacks occur in testing

data only.

Page 22: slides

Creation and Deployment of Data Mining Based Intrusion Detection Systems in Oracle Database 10G

DAID A database centric architecture that leverages data mining with in the Oracle RDBMS to address the challenges.

Scheduling capabilities Alert infrastructure Data analysis tools Security Scalability reliability

Page 23: slides

Requirements for a production quality IDS

Centralized view of the data Data transformation capabilities Analytic and data mining methods Flexible detector deployment, including

scheduling that enables periodic model creation and distribution

Real-time detection and alert infrastructure Reporting capabilities Distributed processing High system availability Scalability with system load

Page 24: slides

• Sensors • Extraction, transformation

and load (ETL) • Centralized data

warehousing • Automated model

generation • Automated model

distribution • Real-time and offline

detection • Report and analysis • Automated alerts

Page 25: slides

Sensors Collects audit information

Network traffic data System logs on individual hosts System calls made by processes

Page 26: slides

ETL

Used for pre processing audit streams and feature extraction

Use SQL and user defined functions to extract key pieces of information.Ex: computes windowing analytic function to

compute the number of http connections to a given host

Page 27: slides

Model Generation

Popular Techniques for misuse and anomaly detection: Association Rules Clustering Support Vector Machines

Supervised learning methods for Classification

Decision Trees

Page 28: slides

Model build functionality: Dbms_data_mining PL/SQL package- to train linear SVM anomaly and misuse

detection models.- Test dataset

- Probing- Denial of service- Unauthorized access to a local

superuser(u2r)- Unauthorized access from a remote

machine(r2l)(37 subclasses of attacks under the 4 generic

categories)

Page 29: slides

Misuse Detection Problem

Anomaly Detection Problem

Accuracy of the system 92.1%

Page 30: slides

Periodic Model Updates as new data is accumulated

Model rebuild when the performance falls below a predefined level

Page 31: slides

Model Distribution

Real Application Clusters (RAC)

Page 32: slides

DetectionReal time / offline

Audit data are classified as attack or not by misuse detection SVM model.

Page 33: slides

Functional index on the probability of a case being an attack or not

returns all cases in audit_data with probability greater than 0.5 of being an attack

Page 34: slides

Combination of multiple models

The query returns all cases where either model1 or model2 indicate an attack with probability higher than 0.4:

In this case, when the anomaly_model classifies a case as an attack with probability greater than 0.5, the misuse_model will attempt to identify the type of attack:

Page 35: slides

Reports and Analysis

Page 36: slides
Page 37: slides
Page 38: slides

Conclusion

Data mining techniques are very useful in Intrusion Detection Still need manually interpretation/advice in some processing steps More efficient on known attacks than on

unknown attacks only if the training data contains all normal behavior