Real-Time Intrusion Detection Systems Sandeep Kotagiri Graduate Student, CACS April 11 th 2006

Real-Time Intrusion Detection Systems

Sandeep KotagiriGraduate Student, CACS

April 11th 2006

Papers Presented

ADMIT : Anomaly-based Data Mining for Intrusions K. Sequeira, M. Zaki ACM SIGKDD, 2002.

Integrated Access Control and Intrusion Detection for Web Servers Tatyana Ryutov, Clifford Neuman, Dongho Kim, and Li Zhou IEEE Transactions on Parallel & Distributed Systems,

September 2003 The Specification and Enforcement of Advanced

Security Policies Tatyana Ryutov and Clifford Neuman IEEE Proceedings of the Third International Workshop on

Policies for Distributed Systems and Networks 2002

ADMIT: Anomaly-based Data Mining for Intrusions

According to the 2000 Computer Security Institute/FBI computer crime

study, 85% of the 538 companies surveyed, reported an intrusion or exploit

of their corporate data, with 64% suffering a loss.

Features of a good IDS

ADMIT: Real time IDS with host-based data collection and processing

Problem : Differentiate between masqueraders and the true users of a

computer terminal

How: augment password authentication with ADMIT

What does ADMIT do? It is terminal resident, monitors terminal usage for

user, creates user profile and verifies data against it.

Overview of ADMIT

Types of IDS: signature based and anomaly based Network level data, System call-level data, User command-level data User profile for intrusion detection through clustering Observation : Distribution of test point to clusters changes significantly at

the time of attacks which is an indicator of anomalous behavior ADMIT is a user-profile dependent, temporal sequence clustering based,

real-time intrusion detection system with host based data collection and processing.

Advantages using clustering Model scaling Reduction of noise through cluster support Analyzing cluster centers and thus significant data reduction Intra-cluster similarity threshold and alarms (Type A and Type B)

ADMIT ARCHITECTURE

2 main stages : training and testing

Capturing user data :

Unix shell command data captured via t(csh) mechanism

Recognizer parses user history data and emits them as tokens

Session: all data between logging on and logging off (*SOF* and *EOF*)

Parsing user data into tokens

An example session

*SOF* ; Is –l ; vi tl.txt ; ps –eaf ; vi t2.txt ; ls -a /usr/bin/* ; rm -i /home/* ; vi t3.txt

t4.txt ; ps –ef ; *EOF*

Conversion to Tokens

T={ti :0 , t2 = ps-eaf, t3 =vi <1>, t4 = Is -a

<1>, t5 = rm -i <1>, t6 = vi <2>, and t7= ps -ef.

<n> gives the number of arguments (n) of a command

vi t1.txt is tokenized as vi<1> and vi t3.txt t5.txt t6.txt as vi<3>

Familiarizing with terms used sequence s, of specified length l, is a list of tokens, occurring contiguously in

the same session of audit data, i.e., s Tl, where T is the token alphabet. cluster c, is a collection of sequences of user initiated command data, such

that all its sequences are very similar to others within itself using some similarity measure Sim(), but different from those in other clusters.

If c={s0,s1,s2,…..,sn-1} is a cluster with n sequences then cluster center sc is

A profile p, is the set of clusters of sequences of user-initiated command data whose centers characterize the user behavior. Thus, for user u,

Where r and r’ are intra-cluster and inter-cluster similarity threshold and Sim(s1,s2) is similarity between two sequences and

Flow of Control in ADMIT

Similarity Measure Sim(s1, s2)

2 sequencess1={vi <1>, ps-eaf, vi <1>,ls –a <1>,}

S2={vi <1>, ls –a <1>, rm –i <1>, vi <2>}

MCP (match count polynomial bound ) : counts the number slots in the two sequences for which both have identical tokens MCP for above example is 1

MCE (match count exponential bound) is a variant of MCP in that it doubles for each matching value

MCAP/MCAE (Match Count with Adjacency Reward and Polynomial/Exponential Bound) is a variant of MCP/MCE where adjacent matches are rewarded

LCS (Longest Common Subsequence) is length of longest subsequences of tokens that the sequences have in common It is 2 for the above sequences

ADMIT Algorithms

Data Training Data Pre-processing Clustering user sequences Cluster refinement

Merge clusters Split clusters

Online Testing Real-time data pre-processing Similarity search within profile Sequence rating Sequence classification

Data Training – Data Pre-processing SOF* ; ls -1 ; vi t1.txt ; ps –eaf ; vi t2.txt ; ls - a /usr/bin/* ; rm -i/home/*; vi

t3.txt t4.txt; ps -ef; *EOF* FeatureSelector parses, cleans and tokenizes the audit data, within each

session specified by the ProfileManager.

T = {ti : 0 _, t2 = ps -eaf, t3 = vi<1>, t4 = ls -a <1>, t5 = rm -i <1>, t6 = vi <2>, and t7 = ps -ef.

FeatureSelector creates sequences of length l. For e.g. if l=4 the set of user sequences is given as S={si : 0 , ps -eaf, vi <1> }

S1 = { vi <1>, ps -eaf, vi <1>, ls -a <1>}

s2 = { ps -eaf, vi <1>, ls -a <1>, rm -i <1>}

s3 = {vi <1>, ls -a <1>, rm -i <1>, vi <2>}

s4 = {Is -a <1>, rm -i <1>, vi <2>, ps -ef }

Data Training – Clustering User Sequences

Data Training – Clustering User SequencesExample: with r = 3 Initially Su, = Su

a = {s0, s1, s2, s3, s4},

pu, = Suc = 0.

Say new center is s0.

For all remaining sequences in Su - Suc where Su

c = {s0}, we compute similarity to the new center s0.

Using LCS as the similarity metric we get Sim(s1 , s0) = 3 since vi <1>, ps -eaf, vi <1> is their LCS.

||y we get: Sim(s2, s0) = 2, Sim(s3, s0) = 1, and Sim(s4, s0) = 0.

Since s1 passes the threshold, we add it to the new cluster to get cnew = {s0, s1}.

Therefore the new Sua = {s2, s3, s4}. Repeating the while loop we get the

profile as

pu, = {c0 = {s0, s1}, c1 = {s2}, c2 = {s3, s4}}.

Data Training – Cluster Refinement Purpose of Cluster Refinement

setting the intra-cluster similarity r may require experimentation. Cluster may have a lot in common with another Larger sub-clusters within clusters

Algorithms

Data Training – Cluster Refinement

Example

From above pu, = {c0,cl,c2} and r' = 2

Using LCS, Sim(c0,cl) = Sim(s0,s2) = 2.

In this case, the two clusters should be merged to get c0 = {s0, s1, s2}

Now c1 is deleted from the profile. Also, the center for c0 becomes s1.

For clusters that have high support, SplitClusters callsDynamicClustering to re-cluster them into smaller, higherdensity clusters.

Online Testing – Real Time Data Pre-processing Testing must happen in an online manner as the user sequences are

produced Example Sequence: *SOF*; vi t4.txt ; vi t4.txt ; vi t4.txt ; ls -a/home/* ; rm

-i/home/turbo/tmp/*; ls- a/home/* ; vi t2.txt t4.txt ; ps –el ; Right padding is done in the absence of complete sequences Tokenizing :

T' = {t’i : 0 , t’1 = vi <1>, t’2 = vi <1>, t’3 = ls -a <1>, t’4 = rm -i <1>, t‘5= ls -a <1>,t’6 = vi <2>, t’7 = ps -of.

For l=4 S' = {s’i : 0 ,vi <l>,vi <l>,ls -a <1>} s’1= {vi <l>,vi <l>,ls -a <l>,rm -i <1>} s’2= {vi <l>,ls -a <l>,rm -i <l>,ls -a <1>} s’3= {Is-a <l>,rm-i <l>,ls-a <l>,vi <2>} s’4= {rm -i <l>,ls -a <l>,vi <2>,ps -ef}

Online Testing – Profile Search for each sequence s’i, find the most similar cluster in pu

similarity between a sequence s’i and a profile pu

Sim(s’i,pu,) = maxcj, {Sim(s’i, scj)}

Example

pu = {c0 = {s0, s*1, s2}, c1 = {s*3, s4}}

(cluster centers are indicated with '*').

Then Sim(s’0,pu=) = max( Sim( s0, sc0), Sim(s0, sc1 ) ) = max( Sim(s0, s1 ), Sim(s0, s3)) = max(3, 2) = 3.

Similarly Sim(s’1,pu) = 3, Sim(s’2,pu) = 3, Sim(s’3,pu) = 3, and

Sim( s’4 ,pu) = 2.

Online Testing – Sequence Rating Noisy data and high false positive rates Using past sequences, present sequences are tested to see if it is

noise or true change in profile LAST_n

Arithmetic mean of the similarity of last n sequences

For the five new sequences, using this rating metric with n = 3, we would get the following ratings: Ro = R1 = R2 =R3 = 3, and Ra = 8/3 = 2.67

Online Testing – Sequence Rating WEIGHTED

The weighted mean of the last rating and the current sequence's similarity. The rating Rj for the jth sequence is calculated as

Rj = *Sim(sj.,pu) + (1 –) * Rj-1 , where R0= Sim(s’0,pu). For example, if = 0.33, then Ro =R1 =R2 =R3 =3, and R4 =2.66.

DECAYED_WEIGHTS A variant of WEIGHTED. is varied according to the sequence number The rating Rj for jth sequence is calculated as

E.g. if y = 4100 and z = 7500, then R0 = R1 R2 = R3 = 3, and R4 = 2.66.

Online Testing: Prediction (Normal Vs Anomaly)

Normal i.e. true user , anomaly i.e. possible masquerader Based upon the sequence rating Rj for sequence sj

Normal Sequences TACCEPT is lower accept threshold

If user sequence rating > TACCEPT then normal user E.g.

TACCEPT =2.7, for WEIGHTED rating metric (a = 0.33) no alarm will be raised for s’0, since R0 = 3 > 2.7.

||y, s’1, s‘2, s‘3 are all normal;

assigned to the nearest profile cluster, e.g., c0 = {s0, s*1, s2, s’0, s’1} and c1 = {s*3, s4, s’2, s’3}

Cluster centers are recalculated

Online Testing: Prediction (Normal Vs Anomaly)

Anomalous Sequences Sequences that fail TACCEPT Test

E.g. for s’4 R4=2.66 < 2.7 Type A alarm

Reasons Noise (typing errors) Concept drift (change of project) Anomalous Sequence

larger the number of anomalous sequences in near succession, the more suspicious the identity of the user

Cluster the anomalous sequences to get a better estimate of behavioral change

Type B alarm if cluster size crosses certain threshold Tcluster

Incremental Clustering Algorithm

Initially pu={c0,c1}, S”a=and SC

U={s1,s3}

Since R4=2.66<2.7 s’i=s’4

Assign s’4 to S”a and

pu=pu U (c2={s’4})

After testing pu becomes

pu=(c0 = {s0, s*1, s2, s’0, s’1} ,

c1 = {s*3, s4, s’2, s’3}, c2={s’4}

Results

The system achieves approximately 80% detection rate and 15% false positive rate

The security analyst should only go through the anomalous clusters instead of vast amounts of audit data

Integrated Access Control and Intrusion Detection for Web

Servers Problems faced by Web Servers Stealing and destroying data Denying user access Changing website content to embarrass organizations Subverting Web Servers through vulnerable cgi scripts Denial of Service (DOS) attack

Traditional access control systems were not designed to detect and adjust their behavior to take corrective action

Separate components like fire-walls, IDSs and code integrity checkers – they do not fully address a web server’s security needs.

This approach supports access control policies extended with the capability of identifying intrusions and respond to the intrusions in real time.

Generic Application Level Intrusion Detection Framework

Generic Authorization and Access Control API

Supports fine grained access control and application level intrusion detection and response

Evaluates HTTP requests and determines whether the requests are allowed and if they represent a threat according to a policy.

Provides general-purpose execution environment in which EACLs are evaluated

Policy Enforcement – 3 phases Before requested operation starts (is the operation authorized) During execution of the authorized operation (detect malicious behavior during

exec) After operation completes (logging and notification whether the operation

succeeded or failed )

respond to suspected intrusion in real-time before it causes damage Can be easily integrated with different applications

Apache Web server, SOCKS5, sshd, and FreeS/WAN IPsec for Linux.

Policy Representation - EACL

EACL-Extended Access Control List Simple policy language designed to describe user-level authorization

policy EACL is associated with an object to be protected

Specifies negative and positive access rights on the object Also has optional set of associated conditions

Types of Conditions Pre-conditions : What must be true in order to grant request Request-result conditions : must be activated whether granted or denied Mid-conditions : what must be true during the execution of requested op Post-conditions: what must happen after the completion of operation

EACL entry consists of positive or negative access rights and four condition blocks : a set of pre-conditions ……

EACL Syntax

An EACL is specified according to the following format:

eacl ::= {eacl_entry}eacl_entry ::= pos_access_ right_ conditions | neg_access_right_conditionspos_access_right ::= "pos_access_right"def_auth valueneg_access_right ::= "neg_access_right"def_auth_valueconditions ::= pre_conds mid_conds rr_conds post_condspre_conds ::= {condition}mid_conds ::= {condition}rr_conds ::= {condition}post_conds ::= {condition}condition ::= cond_type def_auth valuecond_type ::= alphanumeric_stringdef_auth ::= alphanumeric_stringvalue ::= alphanumeric_string

cond_type : type of condition

def_auth : authority responsible for defining the value within

cond_type

value : value of the condition

EACL Example : Access to host# EACL entry 1neg_access_right test host_loginpre_cond_access_id KerberosV.5

[email protected]

# EACL entry 2

pos_access_right test host_login

pre_cond_location IPsec 10.1.1.0-10.1.200.255

pre_cond_access_id

X509”/C=US/O=Trusted/OU=orgb.edu/CN=

partnerB”

pre_cond_threshold_local <3 failures/day/failed log/

rr_cond_update_log local on : failure/failed_log/info:userID

mid_cond_duration local _< 8hrs

# EACL entry 3pos access right test host loginpre cond location IPsec 10.1.1.0-10.1.200.255pre cond access id KerberosV.5

[email protected] cond threshold local <3 failures/day/failed

log/rr cond update log local on:failure/failed

log/info:userIDmid cond duration local < 8hrs

# EACL entry 4pos access right test host check statuspre cond location IPsec 10.1.1.0-10.1.200.255

# EACL entry 5pos access right test host shut downpre cond access id KerberosV.5

[email protected] cond audit local on:success/info:userIDpost cond notify local

email/to:sysadmin/on:failure

EACL Policy Composition and Modules in GAA Policy Composition

Process of relating separately specified policies System-wide policy and local policy (merged) System-wide policy specifies a composition mode that describes how

local policies are to be composed with it Expand – disjunction of rights Narrow – conjunction of rights Stop – local policies are ignored

GAA Modules Access Control Detector Countermeasure handler

Security Database

GAA-API and IDS Interaction

“GAA-API to IDS” Interaction Ill-formed access requests Access request with abnormal parameters Denied Access Exceeding threshold Incidents and Suspicious application behavior Legitimate activity (creating and updating user profiles)

“IDS to GAA-API” Interaction Can be used for updating policies and adjusting policy values

such as thresholds, times and locations.

GAA-API and APACHE IntegrationApache Access Control

.htaccess file

Order Deny; AllowDeny from AllAllow from 10:0:0:0=255:0:0:0AuthType Basic

AuthUserFile /usr/local/apache2/:htpasswd-isi-staff

Require valid-userSatisfy All

Access request _--> check access control policies

Outputs:

HTTP_OK HTTP_DECLINED

HTTP_AUTHREQUIRED

GAA-API to Enhance the Access Control of Apache Server

Apache Server does not support fine-grained policies like Which users or user groups from which location are allowed to access Does not support other conditions like time, threat level, system load.

GAA-APACHE Access Control Makes use of system-wide and local policy and configuration files 3 status values are returned to describe policy enforcement process

Authorization Status Sa indicates whether the request is authorized (GAA_YES), not authorized (GAA_NO) or uncertain (GAA_MAYBE)

Midcondition enforcement status Sm indicate status of mid-conditions Postcondition enforcement status Sp indicate the status of post-conditions

Policy evaluation happens in four phases as in the figure Sa to Apache format

GAA_YES HTTP_OK GAA_NO HTTP_DECLINED GAA_MAYBE HTTP_AUTHREQUIRED

Examples

When system level is higher than low, lock down the system and require user authentication for all accesses within the network

System-wide policy

eacl_mode 1 # composition mode narrow#EACL entry 1neg_access_right * *pre_cond_system_threat_level local = high

Local policy:

#EACL entry 1pos_access_right apache *pre_cond_system_threat_level local > lowpre_cond_accessID_USER apache *

Prevention of penetration and/or surveillance attacks by detecting CGI script abuse

System-wide policy

eacl_mode 1# composition mode narrow#EACL entry 1neg_access_right * *pre_cond_accessID_GROUP local BadGuys

Local policy

#EACL entry 1neg_access_right apache *pre_cond_regex gnu “ ‘*phf*’ ‘test-cgi*’ “rr_cond_notify local on:failure/email/sysadmin/info :

CGIexploitrr_cond_update_log local on:failure/BadGuys/info:IP

#EACL entry 2Pos_access_right apache *

Conclusions

Traditional access control mechanisms have little ability to support or respond to the detection of attacks.

A generic authorization framework that supports security policies that can detect attempted and actual security breaches and which can actively respond by modifying security policies dynamically has been developed.

The GAA-API implementation is available at http://gaaapi.sysproject.info.

Documents

Real-Time Intrusion Detection Systems Sandeep Kotagiri Graduate Student, CACS April 11 th 2006