Malware AttributionTheory, Code and Result
Who am I?
• Michael Boman, M.A.R.T. project
• Have been “playing around” with malware analysis “for a while”
• Working for FireEye
• This is a HOBBY project that I use my SPARE TIME to work on
Agenda
Theorybehind Malware Attribution
Codeto conduct Malware Attribution analysis
Resultof analysis
Theory
• Malware Attribution: tracking cyber spies - Greg Hoglund, Blackhat 2010
http://www.youtube.com/watch?v=k4Ry1trQhDk
What am I trying to do?
Binary Human
Move this way
What am I trying to do?
Binary Human
BlacklistsNet ReconCommand
and Control
Developer Fingerprints
TacticsTechniquesProcedures
Social Cyberspace
DIGINT
Physical Surveillance HUMINT
What am I trying to do?
Binary Human
BlacklistsNet ReconCommand
and Control
Developer Fingerprints
TacticsTechniquesProcedures
Social Cyberspace
DIGINT
Physical Surveillance HUMINT
BlacklistsNet ReconCommand
and Control
Developer Fingerprints
TacticsTechniquesProcedures
Social Cyberspace
DIGINT
Physical Surveillance HUMINT
BlacklistsNet ReconCommand
and Control
Developer Fingerprints
TacticsTechniquesProcedures
Social Cyberspace
DIGINT
Physical Surveillance HUMINT
Actions / Intent
Installation / Deploym
ent
CN
A (spreader) / C
NE (search &
exfil tool)
CO
MS
Defensive / A
nti-forensic
Exploit
Shellcode
DN
S, Com
mand and C
ontrol Protocol,
Encryption
BlacklistsNet ReconCommand
and Control
Developer Fingerprints
TacticsTechniquesProcedures
Social Cyberspace
DIGINT
Physical Surveillance HUMINT
Actions / Intent
Installation / Deploym
ent
CN
A (spreader) / C
NE (search &
exfil tool)
CO
MS
Defensive / A
nti-forensic
Exploit
Shellcode
DN
S, Com
mand and C
ontrol Protocol,
Encryption
Steps
• Step 0: Gather malware
• Step 1: Extract metadata from binary
• Step 2: Store metadata and binary in MongoDB
• Step 3: Analyze collected data
Step 0: Gather malware
• VirusShare (virusshare.com)
• OpenMalware (www.offensivecomputing.net)
• MalShare (www.malshare.com)
• CleanMX (support.clean-mx.de/clean-mx/viruses)
• Malware Domain List (www.malwaredomainlist.com/mdl.php)
Step 1: Extract metadata from binary
Development Steps
Core “backbone” sourcecode
Tweaks & Mods
3rd party sourcecode
3rd party libraries
Compiler
Runtime libraries
Time
Paths
MAC Address
Malware
Packing
Machine Binary
Source
Development Steps
Core “backbone” sourcecode
Tweaks & Mods
3rd party sourcecode
3rd party libraries
Compiler
Runtime libraries
Time
Paths
MAC Address
Malware
Packing
Machine Binary
Source
Development Steps
Core “backbone” sourcecode
Tweaks & Mods
3rd party sourcecode
3rd party libraries
Compiler
Runtime libraries
Time
Paths
MAC Address
Malware
Packing
Machine Binary
Source
Step 1: Extract metadata from binary• Hashes (for sample identification)
• md5, sha1, sha256, sha512, ssdeep etc.
• File type / Exif / PEiD
• Compiler / Packer etc.
• PE Headers / Imports / Exports etc.
• Virustotal results
• Tags
Identifyingcompiler / packer
• PEiD
• Python
• peutils.SignatureDatabase().match_all()
PE Header information
VirusTotal Results
Tags
• User-supplied tags to identify sample source and behavior
• analyst / analyst-system supplied
Step 2: Store metadata and binary in MongoDB
Components• Modified VXCage server
• Collects a lot more metadata then the original
• Stores malware & metadata in MongoDB instead of FS / ORDBMS
VXCage REST API• /malware/add
• Add sample
• /malware/get/<filehash>
• Download sample. If no local sample, search other repos
• /malware/find
• Search for sample by md5, sha256, ssdeep, tag, date
• /tags/list
• List tags
Step 3: Analyze collected data
Identifying development environments
• Compiler / Linker / Libraries
• Strings
• Paths
• PE Translation header
• Compile times
• Number of times a software been built
Cataloging behaviors
• Packers
• Encryption
• Anti-debugging
• Anti-VM
• Anti-forensics
Result
Have I seen you before?
• Detects similar malware (based on SSDEEP fuzzy hashing)
Different MD5,100% SSDeep match
SSDEEP Analysis (3007)
SSDEEP Analysis (3007)
SSDEEP Analysis (851)
Challanges
• Party handshake problem:
• 707k samples analyzed and counting (resulting in over 250 billion compares!)
• Need a better target (pre-)selection
What compilers / packers are common?
1. "Borland Delphi 3.0 (???)", 54298
2. "Microsoft Visual C++ v6.0", 33364
3. "Microsoft Visual C++ 8", 28005
4. "Microsoft Visual Basic v5.0 - v6.0", 26573
5. "UPX v0.80 - v0.84", 22353
Are there any unidentified packers?
• How to identify a packer
• PE Section is empty in binary, is writable and executable
How common are anti-debugging techniques?
• 31622 out of 531182 PE binaries uses IsDebuggerPresent (6 %)
• Packed executable uncounted
Analysis Coverage
Core “backbone” sourcecode
Tweaks & Mods
3rd party sourcecode
3rd party libraries
Compiler
Runtime libraries
Time
Paths
MAC Address
Malware
Packing
Machine Binary
Source
Future
What am I trying to do in the future
Binary Human
BlacklistsNet ReconCommand
and Control
Developer Fingerprints
TacticsTechniquesProcedures
Social Cyberspace
DIGINT
Physical Surveillance HUMINT
Expand scope of analysis+network +memory +os changes +behavior
What am I trying to do in the future
• More automation
• More modular design
• Solve the “Big Data” issue I am getting myself into (Hadoop?)
• More pretty graphs
Thank you
• Michael Boman
• @mboman
• http://blog.michaelboman.org
• Code available at https://github.com/mboman/vxcage