34
Automatically Evading Classifiers A Case Study on PDF Malware Classifiers Weilin Xu David Evans Yanjun Qi University of Virginia

Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Automatically Evading Classifiers A Case Study on PDF Malware Classifiers

Weilin Xu David Evans Yanjun Qi

University of Virginia

Page 2: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Machine Learning is Solving Our Problems

2

Fake

Spam IDS MalwareFake Accounts

Page 3: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

3

Page 4: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

4

Page 5: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Machine Learning is Eating the World

Data

Scientist

Security

Expert

5

?

Page 6: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Machine Learning is Eating the World

Data

Scientist

Security

Expert

6

No! Security is different.

Page 7: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Goal: Understand classifiers under attack. Results: Vulnerable to automated evasion.

Security Tasks are Different: Adversary Adapts

7

Page 8: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Building Machine Learning Classifiers

8

Trained ClassifierLabelledTraining

Data

MLAlgorithm

Training (Supervised Learning)

FeatureExtraction

Vectors

Page 9: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Assumption: Training Data is Representative

9

LabelledTraining

Data

MLAlgorithm

FeatureExtraction

Vectors

Deployment

Malicious / Benign

Operational Data

Trained Classifier

Training (Supervised Learning)

Page 10: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Results: Evaded PDF Malware ClassifiersPDFrate*

[ACSAC’12]Hidost

[NDSS’13]

Accuracy 0.9976 0.9996

False Negative Rate 0.0000 0.0056

False Negative Rate with Adversary 1.0000 1.0000

10

* Mimicus [Oakland ’14], an open source reimplementation of PDFrate.

Page 11: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Results: Evaded PDF Malware ClassifiersPDFrate*

[ACSAC’12]Hidost

[NDSS’13]

Accuracy 0.9976 0.9996

False Negative Rate 0.0000 0.0056

False Negative Rate with Adversary 1.0000 1.0000

11

Very robust against “strongest conceivable mimicry attack”.

* Mimicus [Oakland ’14], an open source reimplementation of PDFrate.

Page 12: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Variants

12

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

Page 13: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Variants

13

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/RootModifiedParser

Extract Me If You Can: Abusing PDF Parsers in Malware Detectors

Curtis Carmony,et al.

Page 14: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Variants

14

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/Root

Mutation

Variants From Benign

Insert / Replace / Delete

Page 15: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Variants

15

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/Root

Mutation

Variants From Benign

128

546

0

0

Insert / Replace / Delete

Page 16: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Variants

16

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/Root

Mutation

Variants From Benign

128

546

0

0

Insert / Replace / Delete

Page 17: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Variants

17

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/Root

Mutation

Variants From Benign

128

546

0

0

128

0

Insert / Replace / Delete

Page 18: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Variants

18

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/Root

Mutation

Variants From Benign

128

0

Insert / Replace / Delete

Page 19: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Variants

19

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

/Catalog /Pages

0

/JavaScript

eval(‘…’);

/Root

Mutation

Variants From Benign

128

0

Insert / Replace / Delete

Page 20: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Variants

20

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

Page 21: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Variants

21

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

Fitness Function

Oracle

Target Classifier

f(x)

Malicious?

Score

Fitness ScoreVariants

Page 22: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Variants

22

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

Fitness Function

Oracle

Target Classifier

f(x)

Malicious?

Score

Fitness ScoreVariants

Malicious

Benign

Page 23: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Variants

23

Clone

Benign PDFsMalicious PDF

Mutation

01011001101Variants

Variants

Select Variants

✓✓✗✓

Based on Genetic ProgrammingAutomated Evasion Approach

Page 24: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Results: Evaded PDFrate 100%

24

Original Malware Seeds

Page 25: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Results: Evaded PDFrate 100%

25

Original Malware Seeds

Evasive Variants

Page 26: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Evaded PDFrate with Adjusted Threshold

26

Original Malware Seeds

Evasive Variants

Evasive Variants with lower threshold

Page 27: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Results: Evaded Hidost 100%

27

Original Malware Seeds

Page 28: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Results: Evaded Hidost 100%

28

Original Malware Seeds

Evasive Variants

Page 29: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

29

Difficulty varies by seedSimple mutations often work Complex mutations sometimes needed.

Difficulty varied by targets:PDFrate: 6 days to evade all Hidost: 2 days to evade all

Results: Accumulated Evasion Rate

Page 30: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Cross-Evasion Effects

30

PDF MalwareSeeds

Hidost

EvasivePDF Malware

(against Hidost)Automated Evasion

PDFrate 387/500 Evasive (77.4%)

3/500 Evasive (0.6%)

Gmail’s classifier is secure?

Page 31: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Cross-Evasion Effects

31

PDF MalwareSeeds

Hidost

EvasivePDF Malware

(against Hidost)Automated Evasion

PDFrate 387/500 Evasive (77.4%)

3/500 Evasive (0.6%)

Gmail’s classifier is secure? different.

Page 32: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Evading Gmail’s Classifier

32

Evasion rate on : 135/380 (35.5%)

Page 33: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Evading Gmail’s Classifier

33

Evasion rate on : 179/380 (47.1%)

Page 34: Automatically Evading Classifiers - NDSS Symposium · Based on Genetic Programming Automated Evasion Approach /Catalog /Pages 0 /JavaScript eval(‘…’); Modified /Root Parser

Conclusion

34

Source Code: http://EvadeML.org

Vs.

Who will win this arm race?