18
Evolutionary Computation for Improving Malware Analysis Kevin Leach 1 , Ryan Dougherty 2 , Chad Spensky 3 , Stephanie Forrest 2 , Westley Weimer 1 1 University of Michigan 2 Arizona State University 3 University of California, Santa Babara June 1, 2019 1/18

Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Evolutionary Computation forImproving Malware Analysis

Kevin Leach1, Ryan Dougherty2, Chad Spensky3,Stephanie Forrest2, Westley Weimer1

1University of Michigan2Arizona State University

3University of California, Santa Babara

June 1, 2019

1/18

Page 2: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Setting the Stage

Ï Can we improve the efficiency ofautomated malware analysis usingevolutionary computation?

2/18

Page 3: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Introduction

3/18

Page 4: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Malware Analysis

Ï Analysts want to quickly identifymalware behavior

Ï What damage does it do?Ï How does it infect a system?Ï How do we defend against it?

4/18

Page 5: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Stealthy MalwareÏ Growing volume of stealthy malwareÏ Malware sample maintains secrecy by using

artifacts to detect analysis environmentsÏ Timing artifacts — overhead introduced by analysis

Ï Single-stepping instructions with debugger is slowÏ Imperfect VM environment does not match native speed

Ï Functional artifacts — features introduced by analysisÏ isDebuggerPresent() — legitimate feature abused by

adversariesÏ Incomplete emulation of some instructions by VMÏ Device names (hard drive named “VMWare disk”)

Ï Automated analysis is difficult

5/18

Page 6: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Automated Malware Analysis

Ï Cluster of servers analyzes malwareÏ Analysis success depends on cluster’s environment

(e.g., OS, virtualization)

6/18

Page 7: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Transparency

Ï We want to understand stealthy samples

Ï We can mitigate artifactsÏ Hook/intercept API calls

(e.g., isDebuggerPresent())Ï Spoof timing

(e.g., virtualize result of rdtsc instruction)Ï Use alternate virtualization

(e.g., a sample that detects VMWare may not detectVirtualBox)

7/18

Page 8: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Cost of TransparencyÏ Mitigation costs resources

Ï Development effort(e.g., modifying virtualization)

Ï Execution time(e.g., due to runtime overhead)

Ï Mitigation covers some subset ofmalware

Ï Artifact category(e.g., hooking disk-related APIs coversmalware that checks the disk)

8/18

Page 9: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Key Idea: Transparency/Cost

Ï We can control which artifacts areexposed

Ï i.e., control costs and coverage

Ï Use a vector of yes/no answers

VMWare? large disk? Spoof timers? cost

0 1 0 x1 0 1 y

9/18

Page 10: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Key Idea: Transparency/Cost

Ï Derive a cost model empiricallyÏ What are the values of x and y?

VMWare? large disk? Spoof timers? cost

0 1 0 x1 0 1 y

10/18

Page 11: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Automated Malware Analysis

Ï Can we analyze more stealthy malware ifwe offer different analysis environments?

11/18

Page 12: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Proposed Architecture

Covering array GI algorithm

Cost model Artifacts

Server Cluster

StealthyMalware

12/18

Page 13: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Proposed Approach

Given: Cost model, number of serversFind: settings that minimize cost, maximize coverage

Config Server A0 A1 A2 cost

1 1 0 1 0 32 1 0 1 2

2 1 0 1 1 42 1 1 0 3

3 1 0 0 0 12 1 1 1 7

13/18

Page 14: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Proposed Approach

Given: Cost model, number of serversFind: settings that minimize cost, maximize coverage

Config Server A0 A1 A2 cost

1 1 0 1 0 32 1 0 1 2

2 1 0 1 1 42 1 1 0 3

3 1 0 0 0 12 1 1 1 7

14/18

Page 15: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Proposed Approach

Given: Cost model, number of serversFind: settings that minimize cost, maximize coverage

Config Server A0 A1 A2 cost

1 1 0 1 0 32 1 0 1 2

2 1 0 1 1 42 1 1 0 3

3 1 0 0 0 12 1 1 1 7

15/18

Page 16: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Initial ResultsÏ Challenge Ground-truth data is hard to

come byÏ Manually reverse engineered 20 samples

Ï Baseline analyze each withhigh-transparency, high-cost analysis(e.g., all 1’s)

Ï roughly 360x overhead

Ï Cost Function We estimated a costfunction based manual review

16/18

Page 17: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Initial ResultsBaseline Us / 2 servers Us / 4 servers Us / 8 servers

360x 16.14 – 64.51 12.06 – 49.39 1.05 – 30.92

Ï Promising improvements to throughput(potentially, by 1–2 orders of magnitude)

Ï Future workÏ Ongoing analysis involving 20k+ stealthy

samplesÏ Need to empirically derive cost function

(rather than manual assessment)

17/18

Page 18: Evolutionary Computation for Improving Malware Analysis-12pt · Evolutionary Computation for Improving Malware Analysis Kevin Leach1, Ryan Dougherty2, Chad Spensky3, Stephanie Forrest2,

Conclusion

Ï We can control which artifactsare exposed to stealthy malwaresamples

Ï We can evolve a set of analysisserver configurations tomaximize coverage whileminimizing cost

18/18