Upload
alyson-craig
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Papers Presented
ADMIT : Anomaly-based Data Mining for Intrusions K. Sequeira, M. Zaki ACM SIGKDD, 2002.
Integrated Access Control and Intrusion Detection for Web Servers Tatyana Ryutov, Clifford Neuman, Dongho Kim, and Li Zhou IEEE Transactions on Parallel & Distributed Systems,
September 2003 The Specification and Enforcement of Advanced
Security Policies Tatyana Ryutov and Clifford Neuman IEEE Proceedings of the Third International Workshop on
Policies for Distributed Systems and Networks 2002
ADMIT: Anomaly-based Data Mining for Intrusions
According to the 2000 Computer Security Institute/FBI computer crime
study, 85% of the 538 companies surveyed, reported an intrusion or exploit
of their corporate data, with 64% suffering a loss.
Features of a good IDS
ADMIT: Real time IDS with host-based data collection and processing
Problem : Differentiate between masqueraders and the true users of a
computer terminal
How: augment password authentication with ADMIT
What does ADMIT do? It is terminal resident, monitors terminal usage for
user, creates user profile and verifies data against it.
Overview of ADMIT
Types of IDS: signature based and anomaly based Network level data, System call-level data, User command-level data User profile for intrusion detection through clustering Observation : Distribution of test point to clusters changes significantly at
the time of attacks which is an indicator of anomalous behavior ADMIT is a user-profile dependent, temporal sequence clustering based,
real-time intrusion detection system with host based data collection and processing.
Advantages using clustering Model scaling Reduction of noise through cluster support Analyzing cluster centers and thus significant data reduction Intra-cluster similarity threshold and alarms (Type A and Type B)
ADMIT ARCHITECTURE
2 main stages : training and testing
Capturing user data :
Unix shell command data captured via t(csh) mechanism
Recognizer parses user history data and emits them as tokens
Session: all data between logging on and logging off (*SOF* and *EOF*)
Parsing user data into tokens
An example session
*SOF* ; Is –l ; vi tl.txt ; ps –eaf ; vi t2.txt ; ls -a /usr/bin/* ; rm -i /home/* ; vi t3.txt
t4.txt ; ps –ef ; *EOF*
Conversion to Tokens
T={ti :0 < i < 8}, where t0 = ls-1, t1 = vi <1>, t2 = ps-eaf, t3 =vi <1>, t4 = Is -a
<1>, t5 = rm -i <1>, t6 = vi <2>, and t7= ps -ef.
<n> gives the number of arguments (n) of a command
vi t1.txt is tokenized as vi<1> and vi t3.txt t5.txt t6.txt as vi<3>
Familiarizing with terms used sequence s, of specified length l, is a list of tokens, occurring contiguously in
the same session of audit data, i.e., s Tl, where T is the token alphabet. cluster c, is a collection of sequences of user initiated command data, such
that all its sequences are very similar to others within itself using some similarity measure Sim(), but different from those in other clusters.
If c={s0,s1,s2,…..,sn-1} is a cluster with n sequences then cluster center sc is
A profile p, is the set of clusters of sequences of user-initiated command data whose centers characterize the user behavior. Thus, for user u,
Where r and r’ are intra-cluster and inter-cluster similarity threshold and Sim(s1,s2) is similarity between two sequences and
Similarity Measure Sim(s1, s2)
2 sequencess1={vi <1>, ps-eaf, vi <1>,ls –a <1>,}
S2={vi <1>, ls –a <1>, rm –i <1>, vi <2>}
MCP (match count polynomial bound ) : counts the number slots in the two sequences for which both have identical tokens MCP for above example is 1
MCE (match count exponential bound) is a variant of MCP in that it doubles for each matching value
MCAP/MCAE (Match Count with Adjacency Reward and Polynomial/Exponential Bound) is a variant of MCP/MCE where adjacent matches are rewarded
LCS (Longest Common Subsequence) is length of longest subsequences of tokens that the sequences have in common It is 2 for the above sequences
ADMIT Algorithms
Data Training Data Pre-processing Clustering user sequences Cluster refinement
Merge clusters Split clusters
Online Testing Real-time data pre-processing Similarity search within profile Sequence rating Sequence classification
Data Training – Data Pre-processing SOF* ; ls -1 ; vi t1.txt ; ps –eaf ; vi t2.txt ; ls - a /usr/bin/* ; rm -i/home/*; vi
t3.txt t4.txt; ps -ef; *EOF* FeatureSelector parses, cleans and tokenizes the audit data, within each
session specified by the ProfileManager.
T = {ti : 0 _< i < 8}, where t0 = ls -1, t1 = vi <1>, t2 = ps -eaf, t3 = vi<1>, t4 = ls -a <1>, t5 = rm -i <1>, t6 = vi <2>, and t7 = ps -ef.
FeatureSelector creates sequences of length l. For e.g. if l=4 the set of user sequences is given as S={si : 0 < I < |T| - l}
Where
S0 = { ls -1, vi <1>, ps -eaf, vi <1> }
S1 = { vi <1>, ps -eaf, vi <1>, ls -a <1>}
s2 = { ps -eaf, vi <1>, ls -a <1>, rm -i <1>}
s3 = {vi <1>, ls -a <1>, rm -i <1>, vi <2>}
s4 = {Is -a <1>, rm -i <1>, vi <2>, ps -ef }
Data Training – Clustering User SequencesExample: with r = 3 Initially Su, = Su
a = {s0, s1, s2, s3, s4},
pu, = Suc = 0.
Say new center is s0.
For all remaining sequences in Su - Suc where Su
c = {s0}, we compute similarity to the new center s0.
Using LCS as the similarity metric we get Sim(s1 , s0) = 3 since vi <1>, ps -eaf, vi <1> is their LCS.
||y we get: Sim(s2, s0) = 2, Sim(s3, s0) = 1, and Sim(s4, s0) = 0.
Since s1 passes the threshold, we add it to the new cluster to get cnew = {s0, s1}.
Therefore the new Sua = {s2, s3, s4}. Repeating the while loop we get the
profile as
pu, = {c0 = {s0, s1}, c1 = {s2}, c2 = {s3, s4}}.
Data Training – Cluster Refinement Purpose of Cluster Refinement
setting the intra-cluster similarity r may require experimentation. Cluster may have a lot in common with another Larger sub-clusters within clusters
Algorithms
Data Training – Cluster Refinement
Example
From above pu, = {c0,cl,c2} and r' = 2
Using LCS, Sim(c0,cl) = Sim(s0,s2) = 2.
In this case, the two clusters should be merged to get c0 = {s0, s1, s2}
Now c1 is deleted from the profile. Also, the center for c0 becomes s1.
For clusters that have high support, SplitClusters callsDynamicClustering to re-cluster them into smaller, higherdensity clusters.
Online Testing – Real Time Data Pre-processing Testing must happen in an online manner as the user sequences are
produced Example Sequence: *SOF*; vi t4.txt ; vi t4.txt ; vi t4.txt ; ls -a/home/* ; rm
-i/home/turbo/tmp/*; ls- a/home/* ; vi t2.txt t4.txt ; ps –el ; Right padding is done in the absence of complete sequences Tokenizing :
T' = {t’i : 0 < i < 8} where t’0 = vi <1>, t’1 = vi <1>, t’2 = vi <1>, t’3 = ls -a <1>, t’4 = rm -i <1>, t‘5= ls -a <1>,t’6 = vi <2>, t’7 = ps -of.
For l=4 S' = {s’i : 0 < i < IT'I - l} s’0 = {vi <l>,vi <l>,vi <l>,ls -a <1>} s’1= {vi <l>,vi <l>,ls -a <l>,rm -i <1>} s’2= {vi <l>,ls -a <l>,rm -i <l>,ls -a <1>} s’3= {Is-a <l>,rm-i <l>,ls-a <l>,vi <2>} s’4= {rm -i <l>,ls -a <l>,vi <2>,ps -ef}
Online Testing – Profile Search for each sequence s’i, find the most similar cluster in pu
similarity between a sequence s’i and a profile pu
Sim(s’i,pu,) = maxcj, {Sim(s’i, scj)}
Example
pu = {c0 = {s0, s*1, s2}, c1 = {s*3, s4}}
(cluster centers are indicated with '*').
Then Sim(s’0,pu=) = max( Sim( s0, sc0), Sim(s0, sc1 ) ) = max( Sim(s0, s1 ), Sim(s0, s3)) = max(3, 2) = 3.
Similarly Sim(s’1,pu) = 3, Sim(s’2,pu) = 3, Sim(s’3,pu) = 3, and
Sim( s’4 ,pu) = 2.
Online Testing – Sequence Rating Noisy data and high false positive rates Using past sequences, present sequences are tested to see if it is
noise or true change in profile LAST_n
Arithmetic mean of the similarity of last n sequences
For the five new sequences, using this rating metric with n = 3, we would get the following ratings: Ro = R1 = R2 =R3 = 3, and Ra = 8/3 = 2.67
Online Testing – Sequence Rating WEIGHTED
The weighted mean of the last rating and the current sequence's similarity. The rating Rj for the jth sequence is calculated as
Rj = *Sim(sj.,pu) + (1 –) * Rj-1 , where R0= Sim(s’0,pu). For example, if = 0.33, then Ro =R1 =R2 =R3 =3, and R4 =2.66.
DECAYED_WEIGHTS A variant of WEIGHTED. is varied according to the sequence number The rating Rj for jth sequence is calculated as
E.g. if y = 4100 and z = 7500, then R0 = R1 R2 = R3 = 3, and R4 = 2.66.
Online Testing: Prediction (Normal Vs Anomaly)
Normal i.e. true user , anomaly i.e. possible masquerader Based upon the sequence rating Rj for sequence sj
Normal Sequences TACCEPT is lower accept threshold
If user sequence rating > TACCEPT then normal user E.g.
TACCEPT =2.7, for WEIGHTED rating metric (a = 0.33) no alarm will be raised for s’0, since R0 = 3 > 2.7.
||y, s’1, s‘2, s‘3 are all normal;
assigned to the nearest profile cluster, e.g., c0 = {s0, s*1, s2, s’0, s’1} and c1 = {s*3, s4, s’2, s’3}
Cluster centers are recalculated
Online Testing: Prediction (Normal Vs Anomaly)
Anomalous Sequences Sequences that fail TACCEPT Test
E.g. for s’4 R4=2.66 < 2.7 Type A alarm
Reasons Noise (typing errors) Concept drift (change of project) Anomalous Sequence
larger the number of anomalous sequences in near succession, the more suspicious the identity of the user
Cluster the anomalous sequences to get a better estimate of behavioral change
Type B alarm if cluster size crosses certain threshold Tcluster
Incremental Clustering Algorithm
Initially pu={c0,c1}, S”a=and SC
U={s1,s3}
Since R4=2.66<2.7 s’i=s’4
Assign s’4 to S”a and
pu=pu U (c2={s’4})
After testing pu becomes
pu=(c0 = {s0, s*1, s2, s’0, s’1} ,
c1 = {s*3, s4, s’2, s’3}, c2={s’4}
Results
The system achieves approximately 80% detection rate and 15% false positive rate
The security analyst should only go through the anomalous clusters instead of vast amounts of audit data
Integrated Access Control and Intrusion Detection for Web
Servers Problems faced by Web Servers Stealing and destroying data Denying user access Changing website content to embarrass organizations Subverting Web Servers through vulnerable cgi scripts Denial of Service (DOS) attack
Traditional access control systems were not designed to detect and adjust their behavior to take corrective action
Separate components like fire-walls, IDSs and code integrity checkers – they do not fully address a web server’s security needs.
This approach supports access control policies extended with the capability of identifying intrusions and respond to the intrusions in real time.
Generic Authorization and Access Control API
Supports fine grained access control and application level intrusion detection and response
Evaluates HTTP requests and determines whether the requests are allowed and if they represent a threat according to a policy.
Provides general-purpose execution environment in which EACLs are evaluated
Policy Enforcement – 3 phases Before requested operation starts (is the operation authorized) During execution of the authorized operation (detect malicious behavior during
exec) After operation completes (logging and notification whether the operation
succeeded or failed )
respond to suspected intrusion in real-time before it causes damage Can be easily integrated with different applications
Apache Web server, SOCKS5, sshd, and FreeS/WAN IPsec for Linux.
Policy Representation - EACL
EACL-Extended Access Control List Simple policy language designed to describe user-level authorization
policy EACL is associated with an object to be protected
Specifies negative and positive access rights on the object Also has optional set of associated conditions
Types of Conditions Pre-conditions : What must be true in order to grant request Request-result conditions : must be activated whether granted or denied Mid-conditions : what must be true during the execution of requested op Post-conditions: what must happen after the completion of operation
EACL entry consists of positive or negative access rights and four condition blocks : a set of pre-conditions ……
EACL Syntax
An EACL is specified according to the following format:
eacl ::= {eacl_entry}eacl_entry ::= pos_access_ right_ conditions | neg_access_right_conditionspos_access_right ::= "pos_access_right"def_auth valueneg_access_right ::= "neg_access_right"def_auth_valueconditions ::= pre_conds mid_conds rr_conds post_condspre_conds ::= {condition}mid_conds ::= {condition}rr_conds ::= {condition}post_conds ::= {condition}condition ::= cond_type def_auth valuecond_type ::= alphanumeric_stringdef_auth ::= alphanumeric_stringvalue ::= alphanumeric_string
cond_type : type of condition
def_auth : authority responsible for defining the value within
cond_type
value : value of the condition
EACL Example : Access to host# EACL entry 1neg_access_right test host_loginpre_cond_access_id KerberosV.5
# EACL entry 2
pos_access_right test host_login
pre_cond_location IPsec 10.1.1.0-10.1.200.255
pre_cond_access_id
X509”/C=US/O=Trusted/OU=orgb.edu/CN=
partnerB”
pre_cond_threshold_local <3 failures/day/failed log/
rr_cond_update_log local on : failure/failed_log/info:userID
mid_cond_duration local _< 8hrs
# EACL entry 3pos access right test host loginpre cond location IPsec 10.1.1.0-10.1.200.255pre cond access id KerberosV.5
[email protected] cond threshold local <3 failures/day/failed
log/rr cond update log local on:failure/failed
log/info:userIDmid cond duration local < 8hrs
# EACL entry 4pos access right test host check statuspre cond location IPsec 10.1.1.0-10.1.200.255
# EACL entry 5pos access right test host shut downpre cond access id KerberosV.5
[email protected] cond audit local on:success/info:userIDpost cond notify local
email/to:sysadmin/on:failure
EACL Policy Composition and Modules in GAA Policy Composition
Process of relating separately specified policies System-wide policy and local policy (merged) System-wide policy specifies a composition mode that describes how
local policies are to be composed with it Expand – disjunction of rights Narrow – conjunction of rights Stop – local policies are ignored
GAA Modules Access Control Detector Countermeasure handler
Security Database
GAA-API and IDS Interaction
“GAA-API to IDS” Interaction Ill-formed access requests Access request with abnormal parameters Denied Access Exceeding threshold Incidents and Suspicious application behavior Legitimate activity (creating and updating user profiles)
“IDS to GAA-API” Interaction Can be used for updating policies and adjusting policy values
such as thresholds, times and locations.
GAA-API and APACHE IntegrationApache Access Control
.htaccess file
Order Deny; AllowDeny from AllAllow from 10:0:0:0=255:0:0:0AuthType Basic
AuthUserFile /usr/local/apache2/:htpasswd-isi-staff
Require valid-userSatisfy All
Access request _--> check access control policies
Outputs:
HTTP_OK HTTP_DECLINED
HTTP_AUTHREQUIRED
GAA-API to Enhance the Access Control of Apache Server
Apache Server does not support fine-grained policies like Which users or user groups from which location are allowed to access Does not support other conditions like time, threat level, system load.
GAA-APACHE Access Control Makes use of system-wide and local policy and configuration files 3 status values are returned to describe policy enforcement process
Authorization Status Sa indicates whether the request is authorized (GAA_YES), not authorized (GAA_NO) or uncertain (GAA_MAYBE)
Midcondition enforcement status Sm indicate status of mid-conditions Postcondition enforcement status Sp indicate the status of post-conditions
Policy evaluation happens in four phases as in the figure Sa to Apache format
GAA_YES HTTP_OK GAA_NO HTTP_DECLINED GAA_MAYBE HTTP_AUTHREQUIRED
Examples
When system level is higher than low, lock down the system and require user authentication for all accesses within the network
System-wide policy
eacl_mode 1 # composition mode narrow#EACL entry 1neg_access_right * *pre_cond_system_threat_level local = high
Local policy:
#EACL entry 1pos_access_right apache *pre_cond_system_threat_level local > lowpre_cond_accessID_USER apache *
Prevention of penetration and/or surveillance attacks by detecting CGI script abuse
System-wide policy
eacl_mode 1# composition mode narrow#EACL entry 1neg_access_right * *pre_cond_accessID_GROUP local BadGuys
Local policy
#EACL entry 1neg_access_right apache *pre_cond_regex gnu “ ‘*phf*’ ‘test-cgi*’ “rr_cond_notify local on:failure/email/sysadmin/info :
CGIexploitrr_cond_update_log local on:failure/BadGuys/info:IP
#EACL entry 2Pos_access_right apache *
Conclusions
Traditional access control mechanisms have little ability to support or respond to the detection of attacks.
A generic authorization framework that supports security policies that can detect attempted and actual security breaches and which can actively respond by modifying security policies dynamically has been developed.
The GAA-API implementation is available at http://gaaapi.sysproject.info.