An Introduction to Malware Classification

John Seymourseymour1@umbc.edu, @jjseymour3

2016-04-23

whoami

• Ph.D. student at the University of Maryland, Baltimore County (UMBC)

• Also a data scientist at ZeroFOX, Inc.

Outline

• Malware meets Machine Learning• Where to find lots of malware• Features commonly used• Top performing models• Where we currently need effort

Malware meets Machine Learning

Problem: more malware variants created than we can possibly ever analyze

Crash Course in Machine Learning

• Machine Learning: finding patterns in data• Potential patterns: “Features”• Making sense of those patterns: “Models”• Libraries exist for creating models from

features (python, R, WEKA)

• Places to get started:• http://www.r2d3.us/visual-intro-to-machine-learning-part-1/• https://www.dataquest.io/mission/74/getting-started-with-kaggle/

Where to find malware

600 samples • Lots of exploit kits• Includes analyses

10,868 samples(about 500GB)

• 9 families of malware• Hexdumps/Assembly files (from IDA)• Neutered: PE headers removed

271,092 samples• Labeled by KAV• Last update: 2007• Most-used academic dataset

24,783,626 samples• Split into chunks of 65,536 samples• Available by Torrent• Unlabeled (so far…)

As many as you want

• VirusTotal: Needs Private API• Research Requests• Licensing issues

No presentation complete without a graph(Note Logarithmic Scale)

malware-traffic-analysis Kaggle Vx Heaven VirusShare256

4095.99999999999

65535.9999999999

262144

1048576

4194303.99999999

16777216

67108863.9999998

Number of Samples

Features commonly used

PE-File metadata

Python pefile library is amazing

Image courtesy of trustwave.com

N-Grams on hexdumps/assembly files

• Sliding window over text

• Features:• DEAD: 1• ADBE: 1• BEEF: 1

Other features commonly used

• Opcodes, imports, etc• Assembly instructions

Top performing models

Again, use libraries!

SVMs xgboost Deep Learning

Where we need more effort

• Stopping the “overfitting” problem• Models don’t generalize to unseen data/new

networks• Happens even in high-tier conferences: See 2012

“Selecting Features to Classify Malware”• More collaboration with malware analysts• “Pentesting” machine learning models

Thank you!Questions?

seymour1@umbc.edu, @jjseymour3

An Introduction to Malware Classification

Technology

MR201402 effectiveness of unknown malware classification by logistic regression analysis

DETECTION AND CLASSIFICATION OF OBFUSCATED MALWARE Moustafa Saleh.pdf · DETECTION AND CLASSIFICATION OF OBFUSCATED MALWARE Moustafa E. Saleh, M.Sc., The University of Texas at San

Metadata-driven Threat Classification of Network Endpoints Appearing in Malware

UpDroid: Updated Android Malware and Its Familial ...aselcuk.etu.edu.tr/SiberGuvenlikGunu-2019.12.23/KursatAktas.pdf · 21 malware families, 2479 malware samples. Family Classification

An effective approach for classification of advanced ... · An effective approach for classification of advanced malware with high accuracy ... malware attacks against mobile devices

What Species of this Fish is? Malware Classification with

Using Mobile Malware tactics for red teaming · Verification Standard (MASVS) ... | 3 Malware tactics for red teaming. Classification: Internal Mobile Malware in a tiny nutshell Malware

Android Malware Classification using Parallelized Machine Learning …lxu/resources/dissertation-lifan.pdf · ANDROID MALWARE CLASSIFICATION USING PARALLELIZED MACHINE LEARNING METHODS

Faster, More Effective Flowgraph -based Malware Classification

A New Mobile Malware Classification for Camera ... · PDF file32 mobile malware classification based on system call and permission to detect camera exploitation for Android smartphone

A Comparative Assessment of Malware Classification using Binary

Android Malware Family Classification using Ensembling of

Towards Classification of Polymorphic Malware

Malware Detection and Classification

Suricata for Malware Classification - 2020 SuriCon in ...€¦ · Kaspersky | Suricata for Malware Classification • Scanning traffic from already detected malicious executables

Fast Automated Unpacking and Classification of Malware

Why an Android App is Classified as Malware? Towards ... · Why an Android App is Classified as Malware? Towards Malware Classification Interpretation BOZHI WU, Nanyang Technological

Malware Classification Based on Hidden Markov Model and

Malware Classification Using Structured Control Flow

Introduction To Malware Analysis