Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated...

Preview:

Citation preview

Automatically Evading Classifiers A Case Study on PDF Malware Classifiers

Weilin Xu David Evans Yanjun Qi

University of Virginia

Machine Learning is Solving Our Problems

2

Fake

Spam IDS MalwareFake Accounts

3

4

Machine Learning is Eating the World

Data

Scientist

Security

Expert

5

?

Machine Learning is Eating the World

Data

Scientist

Security

Expert

6

No! Security is different.

Goal: Understand classifiers under attack. Results: Vulnerable to automated evasion.

Security Tasks are Different: Adversary Adapts

7

Building Machine Learning Classifiers

8

Trained ClassifierLabelledTraining

Data

MLAlgorithm

Training (Supervised Learning)

FeatureExtraction

Vectors

Assumption: Training Data is Representative

9

LabelledTraining

Data

MLAlgorithm

FeatureExtraction

Vectors

Deployment

Malicious / Benign

Operational Data

Trained Classifier

Training (Supervised Learning)

Results: Evaded PDF Malware ClassifiersPDFrate*

[ACSAC’12]Hidost

[NDSS’13]

Accuracy 0.9976 0.9996

False Negative Rate 0.0000 0.0056

False Negative Rate with Adversary 1.0000 1.0000

10

* Mimicus [Oakland ’14], an open source reimplementation of PDFrate.

Results: Evaded PDF Malware ClassifiersPDFrate*

[ACSAC’12]Hidost

[NDSS’13]

Accuracy 0.9976 0.9996

False Negative Rate 0.0000 0.0056

False Negative Rate with Adversary 1.0000 1.0000

11

Very robust against “strongest conceivable mimicry attack”.

* Mimicus [Oakland ’14], an open source reimplementation of PDFrate.

Variants

12

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

Variants

13

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/RootModifiedParser

Extract Me If You Can: Abusing PDF Parsers in Malware Detectors

Curtis Carmony,et al.

Variants

14

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/Root

Mutation

Variants From Benign

Insert / Replace / Delete

Variants

15

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/Root

Mutation

Variants From Benign

128

546

0

0

Insert / Replace / Delete

Variants

16

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/Root

Mutation

Variants From Benign

128

546

0

0

Insert / Replace / Delete

Variants

17

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/Root

Mutation

Variants From Benign

128

546

0

0

128

0

Insert / Replace / Delete

Variants

18

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/Root

Mutation

Variants From Benign

128

0

Insert / Replace / Delete

Variants

19

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/Root

Mutation

Variants From Benign

128

0

Insert / Replace / Delete

Variants

20

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

Variants

21

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

Fitness Function

Oracle

Target Classifier

f(x)

Malicious?

Score

Fitness ScoreVariants

Variants

22

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

Fitness Function

Oracle

Target Classifier

f(x)

Malicious?

Score

Fitness ScoreVariants

Malicious

Benign

Variants

23

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

Results: Evaded PDFrate 100%

24

Original Malware Seeds

Results: Evaded PDFrate 100%

25

Original Malware Seeds

Evasive Variants

Evaded PDFrate with Adjusted Threshold

26

Original Malware Seeds

Evasive Variants

Evasive Variants with lower threshold

Results: Evaded Hidost 100%

27

Original Malware Seeds

Results: Evaded Hidost 100%

28

Original Malware Seeds

Evasive Variants

29

Difficulty varies by seedSimple mutations often work Complex mutations sometimes needed.

Difficulty varied by targets:PDFrate: 6 days to evade all Hidost: 2 days to evade all

Results: Accumulated Evasion Rate

Cross-Evasion Effects

30

PDF MalwareSeeds

Hidost

EvasivePDF Malware

(against Hidost)Automated Evasion

PDFrate 387/500 Evasive (77.4%)

3/500 Evasive (0.6%)

Gmail’s classifier is secure?

Cross-Evasion Effects

31

PDF MalwareSeeds

Hidost

EvasivePDF Malware

(against Hidost)Automated Evasion

PDFrate 387/500 Evasive (77.4%)

3/500 Evasive (0.6%)

Gmail’s classifier is secure? different.

Evading Gmail’s Classifier

32

Evasion rate on : 135/380 (35.5%)

Evading Gmail’s Classifier

33

Evasion rate on : 179/380 (47.1%)

Conclusion

34

Source Code: http://EvadeML.org

Vs.

Who will win this arm race?

Recommended