SESSION ID: CSV-R14 FIM AND SYSTEM CALL AUDITING AT …...Start with a simple File Integrity...

Preview:

Citation preview

SESSION ID:

#RSAC

Ravi Honnavalli

FIM AND SYSTEM CALL AUDITING AT SCALE IN A LARGE CONTAINER DEPLOYMENT

CSV-R14

Staff EngineerWalmartTwitter handle: @ravi_honnavalli

#RSAC

Disclaimer

2

NOTE: All content discussed here are out of self learning and not related to the work I do at Walmart.

#RSAC

Ever increasing amount of logs

3

#RSAC

Overwhelming amount of choices

4

Too many options!!

Static rules

?

ML?

Event source

?

Agent vs Agentles

s

SIEM tools

#RSAC

Flood of OSS tools

5

Elasticsearch

osquery

journald ElastAlert

TensorFlowUnstructured datastores

fluentd

#RSAC

GOAL: Demystifying the choices we have

6

•Classifying event sources

•Understand event source type

•Understand how they work in containers

Understanding types of event sources

•Understand the insights we are looking for

•Build a stack based on the event classification

•If needed customize existing open source tools

•Build adaptors / tools that join the whole chain

Build our own stack based on insight needed

• The stacks discussed in this presentation are by no means the only stack availableMake an informed

decision

#RSAC

• kauditd• Inotify

Kernel

Possibilities of tools evaluated

7

• auditd• go-auditd• go-audit-container• osquery

Data shippers• Logstash• Filebeat• Fluentd

Sink• File• Syslog

Deployment tools• Chef• Puppet• Ansible

Fleet manager• Zentral• Kolide• Doorman• Hand crafted tool

Unstructured data stores• Elasticsearch• MongoDB

Graphing and reporting• Kibana• Grafana• ElastAlert• Custom tools based on

query DSL/Lucene

Data Preprocessing• Custom tool to build a

training set

ML• TensorFlow• Layer depth• Optimization

algorithm• Learning rate• Gradient descent

mechanisms

#RSAC

Classification of event sources

8

Event sources

Event based

syscall inotify

Scheduled query

agent

#RSAC

Security insight based on event source type

9

sysc

all Looking for specific

outliers among mostly normal

dynamic events.

• Like identifying outliers

• Monitoring constantly for a specific malicious system call along with other criteria (uid, etc)

ino

tify Safe-guarding specific

sensitive files / area in the file system

• Watch for CREATE/ACCESS/MODIFY/DELETE events on specific files

agen

t Scheduled activities for static information

• OS patch level queries, vulnerable kernel modules, mis-configuration

#RSAC

SYSTEM CALLS

#RSAC

Why syscall?

11

Fundamental transit points between user land

and kernel

Every process makes system calls disclosing information of

its activity

Several user space tools that send audit information

(auditd, go-audit, go-audit-container)

Can provide deep insight when aggregated and drilled

down

Ideal candidate to build a machine learning training set as the volume of data is huge

#RSAC

Audit component

12

User land

Kernel land

Kauditd

Syscall interfaceNetlinksocket

Reporting daemon

User space application

Namespacingconcern!

#RSAC

Container anatomy

13

PID FS net

UTC UID RPC

Container 1

PID FS net

UTC UID RPC

Container 2

PID FS net

UTC UID RPC

Container 3

Audit, syslog, kernel key ring

Kernel

#RSAC

go-audit-container

14

#RSAC

Audit log to gain insights at scale

User land

Kauditd

Syscall interfaceNetlinksocket

go-audit-container

User space application

Elasticsearch

Grafana

Kibana

Pre-Processsink

Email

Pagerduty

Slack

TensorFlow

Kernel

#RSAC

Demo: Privilege escalation

16

#RSAC

INOTIFY

#RSAC

Inotify component

18

User land

Kernel landInotify component

User space applicationInotify_add_watch

Watch list

Inotify_event { }

Inotify_event { }

Inotify_event { }

Inotify_event { }

Event queue

inotify sysctlsare not

namespaced!

#RSAC

Why inotify?

19

Lesser CPU consumption on

an average

Missing details in the reports

#RSAC

inotify based stack for FIM

User land

Kernel land

Elasticsearch

Grafana

Kibana

Pre-Processsink

Email

Pagerduty

Slack

TensorFlow

osquery

Inotifycomponent

Register watch

Notify event

#RSAC

AGENTS

#RSAC

osquery

22

osquery osqueryosqueryosquery

Fleet Manager

OS OSOS OS

Distributed Query

#RSAC

Osquery stack to get insights at scale

Elasticsearch

Grafana

Kibana

Pre-Processsink

Email

Pagerduty

Slack

TensorFlow

osquery

OS

Extension plugin

Fleet manager

#RSAC

Compliance query

24

#RSAC

East West Threat

25

#RSAC

osquery packs

26

#RSAC

Learning from using fixed queries in Kibana, Grafana and custom tools

27

Robust rules need a lot of

queries

Fixed queries need to be constantly updated for new

patterns

Any small variation of the rules is a false negative

Machine learning helps improve our ability to detect

anomalies and broadly classify security posture

#RSAC

MACHINE LEARNING

#RSAC

High level differences

29

Unsupervised

Elasticsearch ML

Anomaly detection

Time series data

Supervised

Pre-processink-osquery

Explicit labelling and pre-processing

Explicit data classification on disparate info

#RSAC

Picture credit: https://unsplash.com/@ripato30

Elasticsearch ML: Detecting outliers

#RSAC

Demo of ElasticSearch ML

31

#RSAC

Use case: Supervised Learning

32

Classifying data from different event sources

Broad classification into RED/YELLOW/GREEN

Classifying to have a big picture of the security posture of the organization

#RSAC

Supervised learning: Building a training set is key

33

ElasticsearchPreprocessink-

osqueryLabelling TensorFlow

#RSAC

Pre-processink-osquery stages

Stage 0• Query one probe at a time from ES

• Label into RED/YELLOW/GREEN

• Write to stage_0.csv

Stage 1 • Merge into existing stage_1.csv

Manual Labeling

• At this stage human administrator can manually label events that were not labeled or which were incorrectly labeled by the automated rules

Stage 2• Transform into numeric

values, normalizing and mean centering

#RSAC

ML choices if you are building your own solution

35

Activation function

Learning rate

Depth of the network

Batch size vs iterations

Optimzers

#RSAC

Results of our experiment

36

ReLU activation

Batch Size

Adam optimizer

Lower the learning rate the better

#RSAC

Lessons learnt

37

Identify event sources keeping in mind nature of events needed, frequency, memory and CPU footprint of the agents on the application nodes, etc.

Be aware of how and what is namespaced in containers when configuring event sources in containers.

High level classification of ML

Machine learning helps in getting intelligent insights, but needs training and fine tuning.

#RSAC

Apply

38

Start with a simple File Integrity monitoring implementation using inotify log. Observe load of FIM events on the infrastructure

Grow the solution to more detailed monitoring. Keep in mind load on the infrastructure, when thousands of nodes start sending events.

Try applying ML based on unsupervised learning and then get your hands dirty with training

Think of possibilities outside of what is discussed here today

#RSAC

AuditNG suite

39

https://github.com/auditNG/preprocessink-osquery

https://github.com/auditNG/go-audit-container

#RSAC

Questions?

40

You can also reach out later:Twitter handle: @ravi_HonnavalliLinkedIn: https://www.linkedin.com/in/ravi-honnavalli-0535163/