Cybersecurity with AI - Ashrith Barthur

CONFIDENTIAL

Ashrith Barthur, Security ScientistJuly 19, 2016

CyberSecurity and AI - Looking for anomalies

Few Problems in Cybersecurity

1. Malicious external/internal threat (Phishing, Malicious Domains, etc.)

2. Large scale attacks (DDoS, Spam campaign, etc.)3. Data loss (Data Ex-filtration)4. User behavioural analytics (Inside threat, account take over)

These are primary problems enterprises are interested in solving as it directly affects business.

How are these cybersecurity problems handled?

1. Rule Based systems2. Large scale user of experts who understand systems well3. Expert identification of conditions and their combinations which are

true markers of malicious behaviour4. Multiple security professionals who understand specific conditions

and combination, and can identify malicious behaviour

Is this justified?

YES.

Why?

1. Cyber Security's focus is to identify every instance of malicious behaviour and not leave things to probability.

2. Risk associated with each security event is large. Thus, making identification of each event very important.

What is the problem with this approach?

1. It takes time as large amount of logs need to be analysed and threats must be identified as real/potential/false positive.

2. Requires experts, large number of professionals. 3. It is a manual process and requires investigation with associated

events, multiple logs - considerably slow.4. Even with a thorough investigation it is possible that a malicious

event could be missed - anomalous.

Outlier? Anomalous?

1. Outliers are simply put events (when statistically modeled) have a low probability of occurrence.

2. Anomalies are events that have never been seen. 3. Identifying anomalous events is difficult.

How do you solve this problem?

1. Create a malicious behaviour context based on your domain knowledge

2. Using the context to statistically transform the anomalous behaviour as an outlier or at least as a unique occurrence.

3. See if the model fits your contextual assumptions.

Example

1. Studying successful Windows user login times for the entire enterprise does not yield interesting behaviour.

2. Studying these user logins in context is important.3. Understanding that login patterns of general users, administrators

and system account accounts are different. 4. Also, understanding that different kinds of logins, physical systems

logins, network based, remote, unlocks, caches logins are different in behaviour.

5. Interactions between types of users and types of logins also yield unique behaviour. Each analytical context is associated with a certain expected behaviour. Any violation of this expected behaviour is flagged and studied.

The Problem? Even Now?

1. The biggest problem even now is that there is no ground truth for us to identify that a behaviour identified as unexpected, outside its context is truly anomalous.

2. Therefore we end up with the problem of unsupervised process3. Anomalous behaviour detection in cyber security is unsupervised

Only Data tells us the truth. We validate our analysis using feedback.

How do we solve this?

1. We still have experts who can identify if these identified behaviours are indeed malicious

2. The information we provide speeds up the analytics and investigation

3. The building of context and statistically identifying unexpected behaviour reduces the need to go through unnecessary data.

4. We use this feedback at multiple levels, a. improve features that go into the contextb. modify context itselfc. look at changes in thresholdsd. use the feedback as a mechanism to turn the problem into a

supervised problem.

Event Correlation and Behavioural Identification - A perfect segway to log correlation.

1. The idea of context is used where malicious behavioural identification is important.

2. Individual logs - system, network logs are not comprehensive enough to identify anomalous events on their own.

3. Therefore using log correlation to identify events and building a context around the event is important.

4. Individual events can never be considered in vacuum. 5. The logs primarily correlated by time and then by possibly

connected events.

Example of Event/Log Correlation - An example of an event

A user account with multiple failed logins, followed by a successful login.The successfully logged in machine connected to a database servers, requested a database dumb and this data was downloaded back to the machine.

Identifying these events, and identifying that these events are happening in a series is is correlated events.

Let's break these events down. You have, 1. Multiple login attempts and 1 final successful login ( could be interpreted

as a user trying his password wrongly - we all do that)2. A connection to a database server (totally harmless)3. A dump of the data on the machine (might be creating a new database

and took a dump)4. Moved the dump of data to the local machine (Totally fine if someone

wants to work on the data locally)

The Analysis of correlated events

1. Here we have 4 different events which tell us a story only when there is correlation.

2. Correlation is important because behavioural anomalies described earlier are not statistical outliers. They are unseen data points.

3. These anomalies surface after observing the interactions between different events.

What have we gathered?

1. Defining the right context to identify anomalous malicious events.

2. Identification of correlated events for logs3. Transformation of anomalous behaviour.4. Verifying with experts

Thanks to the attendees, support staff, open source members of H2O, colleagues, and our clients for helping us help them by analysing new datasets and grow H2O.

The Team

Mark Chan - Scientist, Engineer, Hacker, Ninja.

Ivy Wang - UI, Problem, Details, Details, and Details Expert.

Fonda Ingram - Comms, and Reqs Expert, The Wall (GoT).

Data & Analytics

Cybersecurity with AI - Ashrith Barthur