SESSION ID:
#RSAC
Ravi Honnavalli
FIM AND SYSTEM CALL AUDITING AT SCALE IN A LARGE CONTAINER DEPLOYMENT
CSV-R14
Staff EngineerWalmartTwitter handle: @ravi_honnavalli
#RSAC
Disclaimer
2
NOTE: All content discussed here are out of self learning and not related to the work I do at Walmart.
#RSAC
Ever increasing amount of logs
3
#RSAC
Overwhelming amount of choices
4
Too many options!!
Static rules
?
ML?
Event source
?
Agent vs Agentles
s
SIEM tools
#RSAC
Flood of OSS tools
5
Elasticsearch
osquery
journald ElastAlert
TensorFlowUnstructured datastores
fluentd
#RSAC
GOAL: Demystifying the choices we have
6
•Classifying event sources
•Understand event source type
•Understand how they work in containers
Understanding types of event sources
•Understand the insights we are looking for
•Build a stack based on the event classification
•If needed customize existing open source tools
•Build adaptors / tools that join the whole chain
Build our own stack based on insight needed
• The stacks discussed in this presentation are by no means the only stack availableMake an informed
decision
#RSAC
• kauditd• Inotify
Kernel
Possibilities of tools evaluated
7
• auditd• go-auditd• go-audit-container• osquery
Data shippers• Logstash• Filebeat• Fluentd
Sink• File• Syslog
Deployment tools• Chef• Puppet• Ansible
Fleet manager• Zentral• Kolide• Doorman• Hand crafted tool
Unstructured data stores• Elasticsearch• MongoDB
Graphing and reporting• Kibana• Grafana• ElastAlert• Custom tools based on
query DSL/Lucene
Data Preprocessing• Custom tool to build a
training set
ML• TensorFlow• Layer depth• Optimization
algorithm• Learning rate• Gradient descent
mechanisms
#RSAC
Classification of event sources
8
Event sources
Event based
syscall inotify
Scheduled query
agent
#RSAC
Security insight based on event source type
9
sysc
all Looking for specific
outliers among mostly normal
dynamic events.
• Like identifying outliers
• Monitoring constantly for a specific malicious system call along with other criteria (uid, etc)
ino
tify Safe-guarding specific
sensitive files / area in the file system
• Watch for CREATE/ACCESS/MODIFY/DELETE events on specific files
agen
t Scheduled activities for static information
• OS patch level queries, vulnerable kernel modules, mis-configuration
#RSAC
SYSTEM CALLS
#RSAC
Why syscall?
11
Fundamental transit points between user land
and kernel
Every process makes system calls disclosing information of
its activity
Several user space tools that send audit information
(auditd, go-audit, go-audit-container)
Can provide deep insight when aggregated and drilled
down
Ideal candidate to build a machine learning training set as the volume of data is huge
#RSAC
Audit component
12
User land
Kernel land
Kauditd
Syscall interfaceNetlinksocket
Reporting daemon
User space application
Namespacingconcern!
#RSAC
Container anatomy
13
PID FS net
UTC UID RPC
Container 1
PID FS net
UTC UID RPC
Container 2
PID FS net
UTC UID RPC
Container 3
Audit, syslog, kernel key ring
Kernel
#RSAC
go-audit-container
14
#RSAC
Audit log to gain insights at scale
User land
Kauditd
Syscall interfaceNetlinksocket
go-audit-container
User space application
Elasticsearch
Grafana
Kibana
Pre-Processsink
Pagerduty
Slack
TensorFlow
Kernel
#RSAC
Demo: Privilege escalation
16
#RSAC
INOTIFY
#RSAC
Inotify component
18
User land
Kernel landInotify component
User space applicationInotify_add_watch
Watch list
Inotify_event { }
Inotify_event { }
Inotify_event { }
Inotify_event { }
Event queue
inotify sysctlsare not
namespaced!
#RSAC
Why inotify?
19
Lesser CPU consumption on
an average
Missing details in the reports
#RSAC
inotify based stack for FIM
User land
Kernel land
Elasticsearch
Grafana
Kibana
Pre-Processsink
Pagerduty
Slack
TensorFlow
osquery
Inotifycomponent
Register watch
Notify event
#RSAC
AGENTS
#RSAC
osquery
22
osquery osqueryosqueryosquery
Fleet Manager
OS OSOS OS
Distributed Query
#RSAC
Osquery stack to get insights at scale
Elasticsearch
Grafana
Kibana
Pre-Processsink
Pagerduty
Slack
TensorFlow
osquery
OS
Extension plugin
Fleet manager
#RSAC
Compliance query
24
#RSAC
East West Threat
25
#RSAC
osquery packs
26
#RSAC
Learning from using fixed queries in Kibana, Grafana and custom tools
27
Robust rules need a lot of
queries
Fixed queries need to be constantly updated for new
patterns
Any small variation of the rules is a false negative
Machine learning helps improve our ability to detect
anomalies and broadly classify security posture
#RSAC
MACHINE LEARNING
#RSAC
High level differences
29
Unsupervised
Elasticsearch ML
Anomaly detection
Time series data
Supervised
Pre-processink-osquery
Explicit labelling and pre-processing
Explicit data classification on disparate info
#RSAC
Picture credit: https://unsplash.com/@ripato30
Elasticsearch ML: Detecting outliers
#RSAC
Demo of ElasticSearch ML
31
#RSAC
Use case: Supervised Learning
32
Classifying data from different event sources
Broad classification into RED/YELLOW/GREEN
Classifying to have a big picture of the security posture of the organization
#RSAC
Supervised learning: Building a training set is key
33
ElasticsearchPreprocessink-
osqueryLabelling TensorFlow
#RSAC
Pre-processink-osquery stages
Stage 0• Query one probe at a time from ES
• Label into RED/YELLOW/GREEN
• Write to stage_0.csv
Stage 1 • Merge into existing stage_1.csv
Manual Labeling
• At this stage human administrator can manually label events that were not labeled or which were incorrectly labeled by the automated rules
Stage 2• Transform into numeric
values, normalizing and mean centering
#RSAC
ML choices if you are building your own solution
35
Activation function
Learning rate
Depth of the network
Batch size vs iterations
Optimzers
#RSAC
Results of our experiment
36
ReLU activation
Batch Size
Adam optimizer
Lower the learning rate the better
#RSAC
Lessons learnt
37
Identify event sources keeping in mind nature of events needed, frequency, memory and CPU footprint of the agents on the application nodes, etc.
Be aware of how and what is namespaced in containers when configuring event sources in containers.
High level classification of ML
Machine learning helps in getting intelligent insights, but needs training and fine tuning.
#RSAC
Apply
38
Start with a simple File Integrity monitoring implementation using inotify log. Observe load of FIM events on the infrastructure
Grow the solution to more detailed monitoring. Keep in mind load on the infrastructure, when thousands of nodes start sending events.
Try applying ML based on unsupervised learning and then get your hands dirty with training
Think of possibilities outside of what is discussed here today
#RSAC
AuditNG suite
39
https://github.com/auditNG/preprocessink-osquery
https://github.com/auditNG/go-audit-container
#RSAC
Questions?
40
You can also reach out later:Twitter handle: @ravi_HonnavalliLinkedIn: https://www.linkedin.com/in/ravi-honnavalli-0535163/