Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Automatically Evading Classifiers A Case Study on PDF Malware Classifiers
Weilin Xu David Evans Yanjun Qi
University of Virginia
Machine Learning is Solving Our Problems
2
Fake
Spam IDS MalwareFake Accounts
…
…
3
4
Machine Learning is Eating the World
Data
Scientist
Security
Expert
5
?
Machine Learning is Eating the World
Data
Scientist
Security
Expert
6
No! Security is different.
Goal: Understand classifiers under attack. Results: Vulnerable to automated evasion.
Security Tasks are Different: Adversary Adapts
7
Building Machine Learning Classifiers
8
Trained ClassifierLabelledTraining
Data
MLAlgorithm
Training (Supervised Learning)
FeatureExtraction
Vectors
Assumption: Training Data is Representative
9
LabelledTraining
Data
MLAlgorithm
FeatureExtraction
Vectors
Deployment
Malicious / Benign
Operational Data
Trained Classifier
Training (Supervised Learning)
Results: Evaded PDF Malware ClassifiersPDFrate*
[ACSAC’12]Hidost
[NDSS’13]
Accuracy 0.9976 0.9996
False Negative Rate 0.0000 0.0056
False Negative Rate with Adversary 1.0000 1.0000
10
* Mimicus [Oakland ’14], an open source reimplementation of PDFrate.
Results: Evaded PDF Malware ClassifiersPDFrate*
[ACSAC’12]Hidost
[NDSS’13]
Accuracy 0.9976 0.9996
False Negative Rate 0.0000 0.0056
False Negative Rate with Adversary 1.0000 1.0000
11
Very robust against “strongest conceivable mimicry attack”.
* Mimicus [Oakland ’14], an open source reimplementation of PDFrate.
Variants
12
Clone
Benign PDFsMalicious PDF
Mutation
01011001101Variants
Variants
Select Variants
✓✓✗✓
Based on Genetic ProgrammingAutomated Evasion Approach
Variants
13
Clone
Benign PDFsMalicious PDF
Mutation
01011001101Variants
Variants
Select Variants
✓✓✗✓
Based on Genetic ProgrammingAutomated Evasion Approach
/Catalog /Pages
0
/JavaScript
eval(‘…’);
/RootModifiedParser
Extract Me If You Can: Abusing PDF Parsers in Malware Detectors
Curtis Carmony,et al.
Variants
14
Clone
Benign PDFsMalicious PDF
Mutation
01011001101Variants
Variants
Select Variants
✓✓✗✓
Based on Genetic ProgrammingAutomated Evasion Approach
/Catalog /Pages
0
/JavaScript
eval(‘…’);
/Root
Mutation
Variants From Benign
Insert / Replace / Delete
Variants
15
Clone
Benign PDFsMalicious PDF
Mutation
01011001101Variants
Variants
Select Variants
✓✓✗✓
Based on Genetic ProgrammingAutomated Evasion Approach
/Catalog /Pages
0
/JavaScript
eval(‘…’);
/Root
Mutation
Variants From Benign
128
546
0
0
Insert / Replace / Delete
Variants
16
Clone
Benign PDFsMalicious PDF
Mutation
01011001101Variants
Variants
Select Variants
✓✓✗✓
Based on Genetic ProgrammingAutomated Evasion Approach
/Catalog /Pages
0
/JavaScript
eval(‘…’);
/Root
Mutation
Variants From Benign
128
546
0
0
Insert / Replace / Delete
Variants
17
Clone
Benign PDFsMalicious PDF
Mutation
01011001101Variants
Variants
Select Variants
✓✓✗✓
Based on Genetic ProgrammingAutomated Evasion Approach
/Catalog /Pages
0
/JavaScript
eval(‘…’);
/Root
Mutation
Variants From Benign
128
546
0
0
128
0
Insert / Replace / Delete
Variants
18
Clone
Benign PDFsMalicious PDF
Mutation
01011001101Variants
Variants
Select Variants
✓✓✗✓
Based on Genetic ProgrammingAutomated Evasion Approach
/Catalog /Pages
0
/JavaScript
eval(‘…’);
/Root
Mutation
Variants From Benign
128
0
Insert / Replace / Delete
Variants
19
Clone
Benign PDFsMalicious PDF
Mutation
01011001101Variants
Variants
Select Variants
✓✓✗✓
Based on Genetic ProgrammingAutomated Evasion Approach
/Catalog /Pages
0
/JavaScript
eval(‘…’);
/Root
Mutation
Variants From Benign
128
0
Insert / Replace / Delete
Variants
20
Clone
Benign PDFsMalicious PDF
Mutation
01011001101Variants
Variants
Select Variants
✓✓✗✓
Based on Genetic ProgrammingAutomated Evasion Approach
Variants
21
Clone
Benign PDFsMalicious PDF
Mutation
01011001101Variants
Variants
Select Variants
✓✓✗✓
Based on Genetic ProgrammingAutomated Evasion Approach
Fitness Function
Oracle
Target Classifier
f(x)
Malicious?
Score
Fitness ScoreVariants
Variants
22
Clone
Benign PDFsMalicious PDF
Mutation
01011001101Variants
Variants
Select Variants
✓✓✗✓
Based on Genetic ProgrammingAutomated Evasion Approach
Fitness Function
Oracle
Target Classifier
f(x)
Malicious?
Score
Fitness ScoreVariants
Malicious
Benign
Variants
23
Clone
Benign PDFsMalicious PDF
Mutation
01011001101Variants
Variants
Select Variants
✓✓✗✓
Based on Genetic ProgrammingAutomated Evasion Approach
Results: Evaded PDFrate 100%
24
Original Malware Seeds
Results: Evaded PDFrate 100%
25
Original Malware Seeds
Evasive Variants
Evaded PDFrate with Adjusted Threshold
26
Original Malware Seeds
Evasive Variants
Evasive Variants with lower threshold
Results: Evaded Hidost 100%
27
Original Malware Seeds
Results: Evaded Hidost 100%
28
Original Malware Seeds
Evasive Variants
29
Difficulty varies by seedSimple mutations often work Complex mutations sometimes needed.
Difficulty varied by targets:PDFrate: 6 days to evade all Hidost: 2 days to evade all
Results: Accumulated Evasion Rate
Cross-Evasion Effects
30
PDF MalwareSeeds
Hidost
EvasivePDF Malware
(against Hidost)Automated Evasion
PDFrate 387/500 Evasive (77.4%)
3/500 Evasive (0.6%)
Gmail’s classifier is secure?
Cross-Evasion Effects
31
PDF MalwareSeeds
Hidost
EvasivePDF Malware
(against Hidost)Automated Evasion
PDFrate 387/500 Evasive (77.4%)
3/500 Evasive (0.6%)
Gmail’s classifier is secure? different.
Evading Gmail’s Classifier
32
Evasion rate on : 135/380 (35.5%)
Evading Gmail’s Classifier
33
Evasion rate on : 179/380 (47.1%)
Conclusion
34
Source Code: http://EvadeML.org
Vs.
Who will win this arm race?