40
Hybrid Intelligent Hybrid Intelligent Systems for Network Systems for Network Security Security Lane Thames Lane Thames Georgia Institute of Georgia Institute of Technology Technology Savannah, GA Savannah, GA [email protected] [email protected]

Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA [email protected]

Embed Size (px)

Citation preview

Page 1: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Hybrid Intelligent Systems Hybrid Intelligent Systems for Network Securityfor Network Security

Lane ThamesLane ThamesGeorgia Institute of TechnologyGeorgia Institute of Technology

Savannah, GASavannah, [email protected]@gtsav.gatech.edu

Page 2: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Presentation OverviewPresentation Overview

Discuss Network Security Issues Discuss Network Security Issues Discuss the goals of this paper’s projectDiscuss the goals of this paper’s projectOverview of Self Organizing MapsOverview of Self Organizing MapsOverview of Bayesian Learning NetworksOverview of Bayesian Learning NetworksDescribe the details of the Hybrid SystemDescribe the details of the Hybrid SystemReview the Experimental ResultsReview the Experimental ResultsDiscuss Future Work and ConclusionsDiscuss Future Work and ConclusionsQ&AQ&A

Page 3: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Network Security MotivationNetwork Security Motivation

Internet Growth is Steadily Increasing Internet Growth is Steadily Increasing

Over 1 Billion Internet UsersOver 1 Billion Internet Users

Many different types of applications are Many different types of applications are now using the Internet as a now using the Internet as a communication channelcommunication channel

Page 4: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Data Source: www.idc.comData Source: www.idc.com

Page 5: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Network Security MotivationNetwork Security Motivation

No more “Script Kiddies”No more “Script Kiddies”Hacking is now more than just a hobbyHacking is now more than just a hobbyHackers have created their own revenue Hackers have created their own revenue generating channelsgenerating channelsCommon hacking “commodities”Common hacking “commodities” Hacking software that is for saleHacking software that is for sale Corporate ExtortionCorporate Extortion Corporate EspionageCorporate Espionage Identity TheftIdentity Theft

Page 6: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Network Security MotivationNetwork Security Motivation

Classical Attack TypesClassical Attack Types Buffer OverflowBuffer Overflow Denial of Service (DoS)Denial of Service (DoS) Distributed Denial of Service (DDoS)Distributed Denial of Service (DDoS) ReconnaissanceReconnaissance VirusVirus WormsWorms Trojan HorseTrojan Horse

Page 7: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Network Security MotivationNetwork Security MotivationHackers are using more sophisticated Hackers are using more sophisticated mechanismsmechanisms

Phishing—Less SophisticatedPhishing—Less Sophisticated Easy to fool a novice userEasy to fool a novice user

Pharming—More SophisticatedPharming—More Sophisticated Easy to fool novice and expert usersEasy to fool novice and expert users

DoS and DDoS—Used for extortionDoS and DDoS—Used for extortionRemote Root Access—Used for espionage and Remote Root Access—Used for espionage and

identity theftidentity theft

Page 8: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Network Security MotivationNetwork Security Motivation

The numbers do not lieThe numbers do not lie

Hackers are constantly looking for ways to Hackers are constantly looking for ways to cause mischiefcause mischief Steal your dataSteal your data Handicap your machinesHandicap your machines Take your money, etc, etc.Take your money, etc, etc.

Page 9: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Data Source: Data Source: http://www.cert.org/stats/cert_stats.htmlhttp://www.cert.org/stats/cert_stats.html

Page 10: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Network Security MotivationNetwork Security Motivation

The Bottom Line: Network Security The Bottom Line: Network Security Research and Commerce is here to stay!Research and Commerce is here to stay!

Page 11: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Project GoalsProject Goals

Develop an Intelligent System that works Develop an Intelligent System that works reliably with data that can be collected reliably with data that can be collected purely within a Networkpurely within a Network

Why? If security mechanisms are difficult Why? If security mechanisms are difficult to use, people will not use them.to use, people will not use them.

Using data from the network takes the Using data from the network takes the burden off the end userburden off the end user

Page 12: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Hybrid Intelligent SystemsHybrid Intelligent Systems

A system was developed that made use of A system was developed that made use of two types of Intelligence Algorithms:two types of Intelligence Algorithms:

Self-Organizing MapsSelf-Organizing Maps

Bayesian Learning NetworksBayesian Learning Networks

Page 13: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Training and Testing Data SetTraining and Testing Data Set

KDD-CUP 99 Data SetKDD-CUP 99 Data Set

The Data set used for the Third The Data set used for the Third International Knowledge Discovery and International Knowledge Discovery and Data Mining Tools CompetitionData Mining Tools Competition

Page 14: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Training and Testing Data SetTraining and Testing Data Set

41 Total Features Categorized as:41 Total Features Categorized as:

Basic TCP/IP featuresBasic TCP/IP features Content FeaturesContent Features Time Based Traffic FeaturesTime Based Traffic Features Host Based Traffic FeaturesHost Based Traffic Features

Page 15: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Training and Testing Data SetTraining and Testing Data Set

Attack Type CategoriesAttack Type Categories

Remote to Local ExploitsRemote to Local Exploits User to Root ExploitsUser to Root Exploits Denial of ServiceDenial of Service Probing (Reconnaissance)Probing (Reconnaissance)

Page 16: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Self Organizing Maps—SOM Self Organizing Maps—SOM

Pioneered by Dr. Teuvo KohonenPioneered by Dr. Teuvo Kohonen

An algorithm that transforms high An algorithm that transforms high dimensional input data domains to dimensional input data domains to elements of a low dimensional array of elements of a low dimensional array of nodesnodes

A fixed size grid of nodes—sometimes A fixed size grid of nodes—sometimes denoted as neurons to reflect neural net denoted as neurons to reflect neural net similaritysimilarity

Page 17: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Self-Organizing MapsSelf-Organizing Maps

Input Data VectorsInput Data Vectors

][ 1 rxxX

Page 18: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Self Organizing MapsSelf Organizing Maps

Let a parametric real set of vectors be Let a parametric real set of vectors be associated with each element, associated with each element, ii, of the , of the SOM gridSOM grid

][ 1 ikii mmM

Page 19: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Self-Organizing MapsSelf-Organizing Maps

Furthermore,Furthermore,

},{, ni

ni mxMX

Page 20: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Self-Organizing MapSelf-Organizing Map

A decoder function is defined on the A decoder function is defined on the basis of distance between the input basis of distance between the input vector and the parametric vector.vector and the parametric vector.

The decoder function is used to map the The decoder function is used to map the image of the input vector onto the SOM image of the input vector onto the SOM grid. The decoder function is usually grid. The decoder function is usually chosen to be either the Manhattan or chosen to be either the Manhattan or Euclidean distance metric.Euclidean distance metric.

),( iMxd

Page 21: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Self-Organizing MapsSelf-Organizing Maps

A Best Matching Unit, denoted as the A Best Matching Unit, denoted as the index c, is chosen as the node on the SOM index c, is chosen as the node on the SOM grid that is closest to the input vectorgrid that is closest to the input vector

)},({minarg ii Mxdc

Page 22: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Self-Organizing MapsSelf-Organizing Maps

The dynamics of the SOM algorithm The dynamics of the SOM algorithm demand that the Mdemand that the M ii be shifted towards the be shifted towards the

order of X such that a set of values {Morder of X such that a set of values {M ii} are } are

obtained as the limit of convergence of the obtained as the limit of convergence of the following:following:

iciii Htmtxttmtm )]()()[()()1(

Page 23: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

SOM DemoSOM Demo

The next few plots will demonstrate how The next few plots will demonstrate how the parametric vector will converge to the the parametric vector will converge to the input data vectorinput data vector

Demonstrate the effects of parameters on Demonstrate the effects of parameters on one anotherone another

Display the error function for this demoDisplay the error function for this demo

Page 24: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu
Page 25: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu
Page 26: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu
Page 27: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu
Page 28: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu
Page 29: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Bayesian Learning Networks--BLNBayesian Learning Networks--BLN

A BLN is a probabilistic model built on the A BLN is a probabilistic model built on the concept of the Directed Acyclic Graph concept of the Directed Acyclic Graph (DAG)(DAG)The DAG is a graph of nodes where each The DAG is a graph of nodes where each node is a random variable of interestnode is a random variable of interestThe directed edges of the graph represent The directed edges of the graph represent relationships among the variablesrelationships among the variablesIf an arc is emitted from a node If an arc is emitted from a node hh to a to a node node DD, we say that , we say that h h is the parent of is the parent of DD

Page 30: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Bayesian Learning NetworksBayesian Learning Networks

The Fundamental Equation: Bayes TheoremThe Fundamental Equation: Bayes Theorem

)(

)()|()|(

DP

hPhDPDhP

Page 31: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Bayesian Learning NetworksBayesian Learning Networks

In Bayesian learning, we calculate the In Bayesian learning, we calculate the probability of an hypothesis and make probability of an hypothesis and make predictions on that basispredictions on that basis

Predictions or classifications are reduced Predictions or classifications are reduced to probabilistic inferenceto probabilistic inference

Page 32: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Bayesian Learning NetworksBayesian Learning Networks

With BLN, we have With BLN, we have conditional probabilities conditional probabilities for each node given its for each node given its parentsparents

The graph shows causal The graph shows causal connections, not the flow connections, not the flow of information thru the of information thru the graphgraph

Prediction versus Prediction versus abductionabduction

xx11

xx33xx22

xx55

xx44

Page 33: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Naïve Bayesian Learning NetworkNaïve Bayesian Learning Network

The Naïve BLN is a special The Naïve BLN is a special case of the general BLNcase of the general BLNIt contains one root (parent) It contains one root (parent) node which is called the class node which is called the class variable, Cvariable, CThe leaf nodes are the The leaf nodes are the attribute variables (Xattribute variables (X11 … X … Xii))

It is Naïve because it assumes It is Naïve because it assumes the attributes are conditionally the attributes are conditionally independent given the class.independent given the class.

CC

xxiixx22xx11

Page 34: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

The Naïve BLN ClassifierThe Naïve BLN Classifier

Once the network is trained, it can be used Once the network is trained, it can be used to classify new examples where the to classify new examples where the attributes are given and the class variable attributes are given and the class variable is unobserved—abductionis unobserved—abduction

The Goal: Find the most probable class The Goal: Find the most probable class value given a set of attribute instantiations value given a set of attribute instantiations (X(X11 … X … Xii))

Page 35: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Naïve BLN ClassifierNaïve BLN Classifier

)|()(maxarg

)|()|,,(

)()|,,(maxarg

),,(

)()|,,(maxarg

),|(maxarg

1

1

1

1

1

jii

jCc

NB

jii

ji

jjiCc

NB

i

jji

CcNB

ijCc

NB

cXPcPC

cXPcXXP

cPcXXPC

XXP

cPcXXPC

XXcPC

j

j

j

j

Page 36: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Hybrid System ArchitectureHybrid System Architecture

Page 37: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Experimental ResultsExperimental Results

4 types of analyses were made with the 4 types of analyses were made with the datasetdataset BLN analysis with network and host based BLN analysis with network and host based

datadata BLN analysis with network dataBLN analysis with network data Hybrid analysis with network and host based Hybrid analysis with network and host based

datadata Hybrid analysis with network based dataHybrid analysis with network based data

Page 38: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Experimental ResultsExperimental ResultsBLN-Host/BLN-Host/Network Network BasedBased

BLN-BLN-Network Network BasedBased

Hybrid-Host/Hybrid-Host/Network BasedNetwork Based

Hybrid-Hybrid-Network Network BasedBased

Total Total CasesCases

65,50565,505 62,04762,047 65,50565,505 62,04762,047

Correctly Correctly ClassifiedClassified

65,01965,019 59,73459,734 65,23865,238 61,63161,631

% % Correctly Correctly ClassifiedClassified

99.26%99.26% 96.27%96.27% 99.59%99.59% 99.33%99.33%

Number ofNumber of Incorrectly Incorrectly ClassifiedClassified

486486 23152315 267267 416416

Page 39: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

Future and Current WorkFuture and Current Work

HoneyNet ProjectHoneyNet Project

Resource Resource Management Management System with System with Intelligent System Intelligent System Processing at the Processing at the CoreCore

Page 40: Hybrid Intelligent Systems for Network Security Lane Thames Georgia Institute of Technology Savannah, GA lane.thames@gtsav.gatech.edu

ConclusionConclusion

Intelligent Systems algorithms are very useful Intelligent Systems algorithms are very useful tools for applications in Network Securitytools for applications in Network Security

Experimental results show that a hybrid system Experimental results show that a hybrid system built with SOM and BLN can produce very built with SOM and BLN can produce very accurate responses when classifying Network accurate responses when classifying Network based data flows which is very promising for based data flows which is very promising for those wishing design classification systems that those wishing design classification systems that do not rely on host based datado not rely on host based data