Using Data Science Techniques to Detect Malicious Behavior

Using Data Science Techniques to

Help Detect Malicious Behavior

Phil Roth, Data Scientist

• An introduction to key data science concepts

• Challenges that exist to applying those concepts to security data

• Why focusing on aiding a human security analyst can lead to better machine learning tools

• How Endgame’s enterprise product benefits from that focus

Key Takeaways

Data Science Process

Gather Raw Data

Process and Clean

Explore the Data

Apply a Model

Communicate the Result

Data can come from many disparate sources.

Raw data must be cleaned and features extracted

Gather RawData

Process and Clean Data

Explore DataFinding relationships in the data provides hints about what features and models will be useful.

Models exploit features and relationships in the data to make a statement.

Apply a Model

Communicate the Result

The output of a data product is useless without effective and actionable communication.

Introduction to Machine Learning Models

In supervised learning, input data is labeled. An algorithm attempts to reproduce those labels on new unlabeled data.

input datalabel-3 -4 1 0 1-4 -3 1 1 1-4 -4 0 0 1+4 +3 1 0 0+3 +4 0 1 0+3 +3 1 0 0

new datalabel-3 -4 1 1 ???

Supervised learning

A Support Vector Machine1 finds the best separating boundary between two classes in space.

Supervised learning example

1 http://scikit-learn.org/stable/modules/svm.html

In unsupervised learning, input data is unlabeled. An algorithm attempts to find hidden structure in that data.

input data-3 -4 1 0-4 -3 1 1-4 -4 0 0+4 +3 1 0+3 +4 0 1+3 +3 1 0

group 1

group 2

Unsupervised learning

Unsupervised learning example

step 1:

step 2:

etc…

k-means clustering iteratively improves the location of cluster centers by moving them closer to cluster means

Challenges with Security Data

Recommendation Systems

Character RecognitionMNIST Database of Handwritten Digits

Security lacks open datasets

The DARPA Intrusion Detection Evaluation dataset is 15 years old, simulated, and techniques trained on it were never actionable.

Sharing data in the security industry will always be a challenge that even President Obama is attempting to address.

Security lacks open datasets

Labeling is an expensive process that requires expertise.

Security lacks easy labels

Is this binary malicious?

Is this traffic an intrusion?

Are these products related?

False positives lead to expensive analyst investigations and alert fatigue and

False negatives get CEOs fired

Security lacks tolerance for errors

Machine Learning in security could benefit from focusing on “human in the loop” products over

“the algorithm does it all” products

Chess Analogy

1997: IBM’s supercomputer Deep Blue vs. Gary Kasparov2005: Team ZachS vs multiple Grandmasters in Freestyle Chess2

Human/Machine teams retained an edge over machines for decades

2 Cowen, Tyler. Average Is Over. Chapter 5. 2013

Using the Human/Machine Model

Cloud deployed virtual machines are clustered based on their behavior. The results are communicated to analysts and used to improve the detection of malicious behavior.

Endgame Implementation

Package, process, and user information is collected from the machines.

DBSCAN, a clustering algorithm, groups the machines based on that information.

Endgame implementation

• An introduction to key data science concepts

• Existing challenges to applying those concepts to security data

• Why focusing on aiding a human security analyst can lead to better machine learning tools

• How Endgame’s enterprise product benefits from that focus

Key Takeaways

For more information contact: egs-info@endgame.com

Using Data Science Techniques to Detect Malicious Behavior

Technology

Group Session: Malvertising : How To Detect and Deal With Malicious Ads

6thSense: A Context-aware Sensor-based Attack Detector for ... · Bayes, and LMT) to detect malicious behavior associated with sensors. We implemented 6thSense on a sensor-rich Android

th Intrusion Detection System to Detect Malicious ......Intrusion Detection System to Detect Malicious Misbehaviour Nodes in Manet M.Vijay1, R.Sujatha2 P.G Scholar (CSE), M.Kumarasamy

Marc Chalé Nathaniel D. Bastian Jeffery Weir Army Cyber Institute · 2020. 6. 1. · meta-models to detect malicious behavior in computer networks. Maxwell et al. [12] further focused

Detecting malicious files with YARA rules as they traverse the … · David Bernal @d4v3c0d3r I want to help blue teamers detect malicious files on the network using YARA rules, Zeek

Dynamic Detection of Malicious Behavior

Malicious Behavior Detection Method Using API Sequence in

Malicious Packet Dropping : How It Might Impact the TCP Performance & How We Can Detect It

Analyzing Malicious Behavior Effectively with ExtraHop

Detection Systems Signature Based Intrusioncaesar.web.engr.illinois.edu/courses/CS598.S13/... · Intrusion Detection Systems Detect malicious activities/attacks Hacking/ unauthorized

Shady Paths: Leveraging Surfing Crowds to Detect Malicious Web Pages

Static Analysis of Executables to Detect Malicious Patternspages.cs.wisc.edu/~jha/jha-papers/security/usenix_2003_tr.pdf · Static Analysis of Executables to Detect Malicious Patterns

Learning to Detect and Classify Malicious Executables in

Learning to Detect and Classify Malicious Executables in ...jmlr.org/papers/volume7/kolter06a/kolter06a.pdf · We describe the use of machine learning and data mining to detect and

Learning to Detect and Classify Malicious Executables in the Wild

Hulk: Eliciting Malicious Behavior in Browser Extensions

Monitoring vibration to detect an equipment's unusual behavior · Monitoring vibration to detect an equipment's unusual behavior This application note explains: les of using the XS770A

MalSpot: Multi2 Malicious Network Behavior Patterns Analysis

TRIVIA: Visualizing Reputation Profiles to Detect Malicious Sellers in

PE-Miner: Mining Structural Information to Detect ...homepage.divms.uiowa.edu/~mshafiq/files/raid09-zubair.pdf · PE-Miner: Mining Structural Information to Detect Malicious Executables