Upload
briar
View
38
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Information-Theoretic Measures for Anomaly Detection. Wenke Lee, and Dong Xiang (North Carolina State University). IEEE Security and Privacy, 2001. Speaker: Chang Huan Wu 2009/4/14. Outline. Introduction Information-Theoretic Measures Case Studies Conclusions. Introduction (1/2). - PowerPoint PPT Presentation
Citation preview
Information-Theoretic Measures for Anomaly
Detection
Wenke Lee, and Dong Xiang(North Carolina State University)
IEEE Security and Privacy, 2001Speaker: Chang Huan Wu
2009/4/14
2
Outline
IntroductionInformation-Theoretic MeasuresCase StudiesConclusions
3
Introduction (1/2)
Misuse detection– Use the “signatures” of known attacks
Anomaly detection– Use established normal profiles
The basic premise for anomaly detection: There is regularity in audit data that is consistent with the normal behavior and thus distinct from the abnormal behavior
4
Introduction (2/2)
Most anomaly detection models are built based solely on “expert” knowledge or intuition
Provide theoretical foundations as well as useful tools that can facilitate the IDS development process and improve the effectiveness of ID technologies
5
Information-Theoretic Measures (1/7)
Entropy
Use entropy as a measure of the regularity of audit data
6
Information-Theoretic Measures (2/7) Conditional Entropy
Let X be a collection of sequences where each is (e1, e2, …, en-1, en), each ei is an audit event; let Y be the collection of subsequences where each is (e1, e2, …, ek), and k < n
H(X | Y) tell us how much uncertainty remains for the rest of audit events in a sequence x after we have seen y
7
Information-Theoretic Measures (3/7)
Relative Entropy
Relative entropy measures the distance of the regularities between two datasets– Training dataset and testing dataset
8
Information-Theoretic Measures (4/7)
When we use conditional entropy to measure the regularity of sequential dependencies, we can use relative conditional entropy to measure the distance between two audit datasets
9
Information-Theoretic Measures (5/7)
Intrusion detection can be cast as a classification problem
When constructing a classifier, a classification algorithm needs to search for features with high information gain– When the dataset is partitioned according
to this feature values, the subsets will have lower entropy
10
Information-Theoretic Measures (6/7)
Information Gain
11
H(X)=-((4/16)*log2(4/16)+(12/16)*log2(12/16))=0.8113E( 年齡 )=(6/16)*H(<35)+(10/16)*H(>35)=0.7946
Gain( 年齡 )=H(X)-E( 年齡 )=0.0167
Gain( 年齡 )=0.0167 Gain( 性別 )=0.0972 Gain( 家庭所得 )=0.0177
Information Gain
12
Information-Theoretic Measures (7/7)
Intuitively, the more information we have, the better the detection performance– There is always a cost for any gain
We can define information cost as the average time for processing an audit record and checking against the detection model
13
UNM sendmail System Call Data (1/6)
University of New Mexico (UNM) sendmail system call data
Each trace contains the consecutive system calls made by the run-time processes
Used the first 80% traces as the training data and the last 20%as part of the testing data
14
UNM sendmail System Call Data (2/6)
H(length-n sequences | subsequences of the length n-1) Measures the regularity of how the first n-1 system calls
determines the n-th system call
=> Conditional entropy drops as sequence length increases
15
UNM sendmail System Call Data (3/6)
For normal data, the trend of misclassification rate coincides with the trend of conditional entropy
16
UNM sendmail System Call Data (4/6)
Misclassification rates for the intrusion traces are much higher This suggests that we can use the range of the
misclassification rate as the indicator of whether a given trace is normal or abnormal (intrusion)
17
UNM sendmail System Call Data (5/6)
When the training and testing normal datasets differs more, then the misclassification rate on testing normal data is also higher
18
UNM sendmail System Call Data (6/6)
The cost is a linear function of the sequence length Length ↑, accuracy ↑ but cost also↑
19
MIT Lincoln Lab sendmail BSM Data (1/6)
BSM data developed and distributed by MIT Lincoln Lab for the 1999 DARPA evaluation
Each audit record corresponds to a system call made by sendmail– Contains additional information (Ex. u
ser and group IDs, the obj name)
20
MIT Lincoln Lab sendmail BSM Data (2/6)
UNM data : (s1, s2, … , sl)BSM data
– so : (s1_o1, s2_o2, … , sl_ol)
– s-o : (s1, o1, s2, o2, … , sl, ol)
– s: system call , o: obj name (system or user or other)
21
MIT Lincoln Lab sendmail BSM Data (3/6)
Conditional entropy drops as sequence length increases
22
MIT Lincoln Lab sendmail BSM Data (4/6)
For in-bound mails the testing data have clearly higher misclassification rates than the training data
23
MIT Lincoln Lab sendmail BSM Data (5/6)
Out-bound mails have much smaller relative conditional entropy than in-bound mails
24
MIT Lincoln Lab sendmail BSM Data (6/6)
Though the performance with obj name is slightly better, if we consider cost, it is actually better to use system call name only
25
MIT Lincoln Lab Network Data (1/4)
tcpdump data developed and distributed by MIT Lincoln Lab for the 1998 DARPA evaluation
Each record describes a connection using the following features: timestamp, duration, source port, source host, service…
26
MIT Lincoln Lab Network Data (2/4)
Destination host was used for partitioning the data into per-host subsets
27
MIT Lincoln Lab Network Data (3/4)
We can see from the figure that intrusion datasets have much higher misclassification rates
Models from the (more) partitioned datasets have much better performance
28
MIT Lincoln Lab Network Data (4/4)
Conditional entropy decrease as window size grows
29
Conclusion
Proposed to use some information-theoretic measures for anomaly detection
30
Comments
Provide theoretical foundations, use numbers to tell the result
Plentiful experiment result