32
Real-Time Detection of Malware Downloads via Large-Scale URLFileMachine Graph Mining Babak Rahbarinia ; Marco Balduzzi ; Roberto Perdisci AsiaCCS 2016, June 02, Xi’an, China 1

Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Embed Size (px)

Citation preview

Page 1: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Real-Time Detection of Malware Downloads via Large-Scale URL→File→Machine Graph Mining

Babak Rahbarinia ; Marco Balduzzi ; Roberto PerdisciAsiaCCS 2016, June 02, Xi’an, China

1

Page 2: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Introduction

Traditional AV is dead?Signature-based VS. Statistical-based

Traditional AVs inefficiency (they don’t work!)polymorphism, code obfuscation, packers, ...

URL blacklistingstatic, lags behindtime consuming analysis of individual URLs

Local VS. GlobalLocal: looks at one potential malware at a time

Global: leverages global situational awareness

2

Page 3: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Introduction

Large-scale analysis of behavioral patterns“Who - where - what” relationshipGlobal situation awarenessGraph-based machine learning

Combination of system- and network-level info

Mastino:Real-time and concurrent detection of download

eventsReal-world deployment on million of machines

(Internet-scale)3

Page 4: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Approach

4

Page 5: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Approach

5

Page 6: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Static+dynamic detection [Many]

Graph mining detection: Polonium [KDD10]Offline approach VS real-timeOnly files classification VS + URLs (download event)Bipartite VS tripartite graphProprietary reputation function VS open

AMICO [Esorics13]HTTP-centric VS protocol-independentOnly works in LANs VS “move across networks”

Google’s CAMP [NDSS13]Browser-centric VS system-centric

(Quick) Related Work

6

Page 7: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Download GraphURLs

Files Machines

7

Page 8: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

AnnotationsURLs

Files Machines

● Age of URL, domain, path, IP

● Size● Lifetime, prevalence● Packed, signed

● Download behavior● Client processes8

Page 9: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

URLs

Files Machines

Labeling

Machines’ reputations based on their download/activity history 9

● B: Alexa (-hosting)● M: GSB + WRS

● B: Grid + VT● M: VT

Page 10: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Features and classifier

f

url1 url2 url3

f behavior-basedfeatures = {URL stats, machine stats}

url4

machine1 machine3machine2

compute min, max, med, avg, and std

compute min, max, med, avg, and std

URL’s R + R of [FQD, e2LD, path, path pattern, query string, query pattern]

Machine’s R

Files Features

10

Page 11: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Features and classifier

f

url1 url2 url3

f behavior-basedfeatures = {URL stats, machine stats}

url4

machine1 machine3machine2

compute min, max, med, avg, and std

compute min, max, med, avg, and std

URL’s R + R of [FQD, e2LD, path, path pattern, query string, query pattern]

Machine’s R

f intrinsicfeatures = {file size, prevalence,

packed, signed, ...}+

Files Features

11

Page 12: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Features and classifier

f

url1 url2 url3

f behavior-basedfeatures = {URL stats, machine stats}

url4

machine1 machine3machine2

compute min, max, med, avg, and std

compute min, max, med, avg, and std

URL’s R + R of [FQD, e2LD, path, path pattern, query string, query pattern]

Machine’s R

f intrinsicfeatures = {file size, prevalence,

packed, signed, ...}

Files Features

URLs Features

u + {all URLs sharing a component with u}

file1 file2 file3

u behavior-basedfeatures = {files stats, machine stats}

file4

machine1 machine3machine2

compute min, max, med, avg, and std

compute min, max, med, avg, and std

File’s R

Machine’s R

+

12

Page 13: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Features and classifier

URLs Features

u + {all URLs sharing a component with u}

file1 file2 file3

u behavior-basedfeatures = {files stats, machine stats}

file4

machine1 machine3machine2

compute min, max, med, avg, and std

compute min, max, med, avg, and std

File’s R

Machine’s R

u intrinsicfeatures = {URL, FQD,

e2LD recency}+

f

url1 url2 url3

f behavior-basedfeatures = {URL stats, machine stats}

url4

machine1 machine3machine2

compute min, max, med, avg, and std

compute min, max, med, avg, and std

URL’s R + R of [FQD, e2LD, path, path pattern, query string, query pattern]

Machine’s R

f intrinsicfeatures = {file size, prevalence,

packed, signed, ...}

Files Features

+

13

Page 14: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #1

U1

U2

URLs

Files Machines

F2

F1

F3

G1

G2

What could be said about F1 and F2?

14

Page 15: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #1URLs

Files Machines

F2

F1

What could be said about F1 and F2?

15

Page 16: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #1URLs

Files Machines

F2

F1

What could be said about F1 and F2?

16

Page 17: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #2

u

URLs

Files

What could be said about F1?All neighbors are unknown

F1

Machines17

Page 18: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #2

u

URLs

Files

FQD Path

All URLs that share the same components as u

Machines

All URL components:* FQD* e2LD* Path* Path pattern* Query string* Query string pattern* IP* IP/24

18

F1

Page 19: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #2

u

URLs

Files

FQD Path

All URLs that share the same components as u

Machines19

F1

Page 20: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #2

u

URLs

Files

FQD Path

All URLs that share the same components as u

Machines

F1

20

Page 21: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Deployment

TimeDay 1 Day 2 Today

...Yesterday

21

Time Window of 10 days

Page 22: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Deployment

TimeDay 1 Day 2 Today

...Yesterday

Trained classifiers

URL classifier

SHA1 classifier

Real-time classification

of URLs & SHA1s

Detection of

Malicious Download Events

22

Page 23: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Data Collection

7 months of data (Jan to Aug 2014)d = (u; f; m) Hundreds of thousands of machines, files, urlsMillion of nodes

Labeling:Files: VirusTotal, GRID [Trend]URLs: Alexa, Google Safe Browsing, WRS [Trend]

Annotations:File census and GUID census [Trend]Virus Total (signed..)

23

Page 24: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Train & test for new download events

New download events

Detection results new events over 7 periods of 5 days (35 days, total)

Files URLs

24

Page 25: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Combined detection of download events

(u = m) v (f = m) -> d = m1 day experiment (5 months)

Efficiency: requests are served in ~0.16 sec84% of detection: 0-days (unknown)

25

Page 26: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Wuachos.A DropperFilename file_saw.exe

URLs with _no_ reputationLow prevalenceInvalid signaturePath pattern with R of 0.72 (malicious) [*]

1,445 URLs serving 182 polymorphic malware

[*] /f/1392240240/1255385580/2 , /f/1392240120/4165299987/2 -> /H1/I10/I10/I1

Case Study #1

26

Page 27: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Somoto AdwareFilename FreeZipSetup-[\d].exe

Packed, short lifetime, prevalence = 01 related machine downloaded 1 known

sample during our time window T=10days

Detected a campaign of 695 samples616 were unknown to VirusTotal

61 unknown +6 months

Case Study #2

27

Page 28: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

TTAWinCDM Spyware

Machine and URL with _no_ reputationLow lifetime&prevelance&countries

Mismatch on downloading processAcrobat process VS. Unauthoritative domain

Flash 0-day (+2 month)

Case Study #3

28

Page 29: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Analysis of Window T

Bonus #1

29

Page 30: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Features Analysis

Bonus #2

30

Files analysis URLs analysis

Page 31: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Mastino: real-time detection of malware downloads by passive clients monitoring

Content agnostic, behavioral analysis

Real-world deployment on large-scaleOver 95% TP / 0.5% FP0-days

Conclusions

31

Page 32: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Thank you!

@embytehttp://www.madlab.it

Babak Rahbarinia ; Marco Balduzzi ; Roberto Perdisci

Questions?

32