DEEPSEC 2013: Malware Datamining And Attribution

Preview:

DESCRIPTION

Greg Hoglund explained at BlackHat 2010 that the development environments that malware authors use leaves traces in the code which can be used to attribute malware to a individual or a group of individuals. Not with the precision of name, date of birth and address but with evidence that a arrested suspects computer can be analysed and compared with the "tool marks" on the collected malware sample.

Citation preview

Malware AttributionTheory, Code and Result

Who am I?

• Michael Boman, M.A.R.T. project

• Have been “playing around” with malware analysis “for a while”

• Working for FireEye

• This is a HOBBY project that I use my SPARE TIME to work on

Agenda

Theorybehind Malware Attribution

Codeto conduct Malware Attribution analysis

Resultof analysis

Theory

• Malware Attribution: tracking cyber spies - Greg Hoglund, Blackhat 2010

http://www.youtube.com/watch?v=k4Ry1trQhDk

What am I trying to do?

Binary Human

Move this way

What am I trying to do?

Binary Human

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

What am I trying to do?

Binary Human

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Actions / Intent

Installation / Deploym

ent

CN

A (spreader) / C

NE (search &

exfil tool)

CO

MS

Defensive / A

nti-forensic

Exploit

Shellcode

DN

S, Com

mand and C

ontrol Protocol,

Encryption

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Actions / Intent

Installation / Deploym

ent

CN

A (spreader) / C

NE (search &

exfil tool)

CO

MS

Defensive / A

nti-forensic

Exploit

Shellcode

DN

S, Com

mand and C

ontrol Protocol,

Encryption

Steps

• Step 0: Gather malware

• Step 1: Extract metadata from binary

• Step 2: Store metadata and binary in MongoDB

• Step 3: Analyze collected data

Step 1: Extract metadata from binary

Development Steps

Core “backbone” sourcecode

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Runtime libraries

Time

Paths

MAC Address

Malware

Packing

Machine Binary

Source

Development Steps

Core “backbone” sourcecode

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Runtime libraries

Time

Paths

MAC Address

Malware

Packing

Machine Binary

Source

Development Steps

Core “backbone” sourcecode

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Runtime libraries

Time

Paths

MAC Address

Malware

Packing

Machine Binary

Source

Step 1: Extract metadata from binary• Hashes (for sample identification)

• md5, sha1, sha256, sha512, ssdeep etc.

• File type / Exif / PEiD

• Compiler / Packer etc.

• PE Headers / Imports / Exports etc.

• Virustotal results

• Tags

Identifyingcompiler / packer

• PEiD

• Python

• peutils.SignatureDatabase().match_all()

PE Header information

VirusTotal Results

Tags

• User-supplied tags to identify sample source and behavior

• analyst / analyst-system supplied

Step 2: Store metadata and binary in MongoDB

Components• Modified VXCage server

• Collects a lot more metadata then the original

• Stores malware & metadata in MongoDB instead of FS / ORDBMS

VXCage REST API• /malware/add

• Add sample

• /malware/get/<filehash>

• Download sample. If no local sample, search other repos

• /malware/find

• Search for sample by md5, sha256, ssdeep, tag, date

• /tags/list

• List tags

Step 3: Analyze collected data

Identifying development environments

• Compiler / Linker / Libraries

• Strings

• Paths

• PE Translation header

• Compile times

• Number of times a software been built

Cataloging behaviors

• Packers

• Encryption

• Anti-debugging

• Anti-VM

• Anti-forensics

Result

Have I seen you before?

• Detects similar malware (based on SSDEEP fuzzy hashing)

Different MD5,100% SSDeep match

SSDEEP Analysis (3007)

SSDEEP Analysis (3007)

SSDEEP Analysis (851)

Challanges

• Party handshake problem:

• 707k samples analyzed and counting (resulting in over 250 billion compares!)

• Need a better target (pre-)selection

What compilers / packers are common?

1. "Borland Delphi 3.0 (???)", 54298

2. "Microsoft Visual C++ v6.0", 33364

3. "Microsoft Visual C++ 8", 28005

4. "Microsoft Visual Basic v5.0 - v6.0", 26573

5. "UPX v0.80 - v0.84", 22353

Are there any unidentified packers?

• How to identify a packer

• PE Section is empty in binary, is writable and executable

How common are anti-debugging techniques?

• 31622 out of 531182 PE binaries uses IsDebuggerPresent (6 %)

• Packed executable uncounted

Analysis Coverage

Core “backbone” sourcecode

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Runtime libraries

Time

Paths

MAC Address

Malware

Packing

Machine Binary

Source

Future

What am I trying to do in the future

Binary Human

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Expand scope of analysis+network +memory +os changes +behavior

What am I trying to do in the future

• More automation

• More modular design

• Solve the “Big Data” issue I am getting myself into (Hadoop?)

• More pretty graphs

Thank you

• Michael Boman

• michael@michaelboman.org

• @mboman

• http://blog.michaelboman.org

• Code available at https://github.com/mboman/vxcage