34
Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs Wajih Ul Hassan , Mark Lemay, Nuraini Aguse, Adam Bates, Thomas Moyer NDSS Symposium 2018 Feb 20, 2018

Towards Scalable Cluster Auditing through …wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/03/... · ProFTPD MySQL 485 630 130 0.11 0.12 0.17 LOG SIZE IN MB Winnower

Embed Size (px)

Citation preview

Towards Scalable Cluster Auditing through Grammatical Inference

over Provenance Graphs WajihUlHassan,MarkLemay,NurainiAguse,

AdamBates,ThomasMoyer

NDSS Symposium 2018 Feb 20, 2018

Notable Data Breach in 2017

2

Notable Data Breach in 2017

3

Equifax Data Breach Timeline 2017

apr may jun jul aug sep oct

Breached Detected

Hackers in Equifax Servers

Patched

Breached Announced

Notable Data Breach in 2017

4

Equifax Data Breach Timeline 2017

apr may jun jul aug sep oct

Breached Detected

Hackers in Equifax Servers

Patched

Breached Announced

3 Months of crucial attack audit logs

Notable Data Breach in 2017

5

Equifax Data Breach Timeline 2017

apr may jun jul aug sep oct

Breached Detected

Hackers in Equifax Servers

Patched

Breached Announced

3 Months of crucial attack audit logs Are current auditing systems scalable?

Data Provenance aka Audit log � Lineage of system activities � Represented as Directed Acyclic Graph (DAG) � Used for forensic analysis

6

1.   Bash,SpawnsNGINX2.  NGINX,Receivesfromabc.com3.  NGINX,ReadsFileindex.html4.  ….......

index.html

NGINX

abc.com

Auditlog

Bash

ProvenanceGraph

Bash:exec(“./NGINX”);NGINX:recv(…,“abc.com”);fread(“index.html”);

CodeExecution

Data Provenance in a Cluster

7

WorkerNodes

MasterNode

Centralized auditing not practical due to two

limitations

Limitation#1: Graph Complexity � NGINX and MySQL running for 5 mins on a single machine

8

Finding needle in a haystack problem

Limitation#2: Storage overhead � Leads to network overhead as logs are transferred to master

node

9

[VALUE]GB

[VALUE]GB

[VALUE]GB

[VALUE]GB

0

2

4

6

8

10

12

14

Day1 Day2 Day3 Day4 Day5

LogSize(G

B)

AuditLogSizeGrowthforaSingleNGINXserver

Uncompressed Compressed

2.54GB/Dayonasinglemachine

Withcompression>1GB/Day

Winnower

� Cluster applications are replicated in accordance with microservice architecture principle

� Replicated apps produce highly homogeneous provenance graphs �  core execution behaviour is similar

10

Key Idea: Remove redundancy from provenance graphs across cluster before sending to master node

11

Before

/up/*

NGINX

*

Bash

mysql

*

mysqld

/db/*

Master Node View with Winnower

After

Otherlibraryfilevertices

Winnower � Build consensus model across cluster using graph grammars �  Like string grammar, graph grammars provide rule-based

mechanisms �  For generating, manipulating and analyzing graphs �  Induction – produce grammar from a given graph �  Parsing – membership test of a given graph is in a grammar

12

a a

t t t

b b b

a

S ≔ A T

A ≔ a B

T t

B≔

b

S

S ≔ e

a e

t

b

Graph GraphGrammar

Architecture

13

AuditModule

Prov.Graph

Worker Nodes

ModelAggregator

Master Node

Worker Node

Fetchgraphateachepoch

——

——

——

—— —

——

——

———

Fine-grained Graph

Abstracted Graph

Graph

Abstraction

ModelGraph

Graph

Induction

WinnowerAgent

Modelgraphs/grammarsfromcluster

Architecture

14

AuditModule

ModelAggregator

——

——

——

—— —

——

——

———

Fine-grained Graph

Abstracted Graph

Graph

Abstraction

ModelGraph

Graph

Induction

WinnowerAgent

AggregatedModelOnlysendModelupdates

Worker Nodes

Master Node

Worker Node

Prov.Graph

Fetchgraphatnextepoch

Architecture

15

AuditModule

ModelAggregator

WinnowerAgent

QuerypartofProvenancegraphHigh-fidelity

Provenancegraph

Worker Nodes

Master Node

Worker Node

Provenance Graph Abstraction � Graph Induction process builds a model/grammar that concisely

describe the whole graph � However, instance-specific fields frustrate any attempts to build a

generic application behaviour model

16

NoGeneralmodelasinstancespecificinformationsuchPIDisdifferentamonggraphs

ftppid:2788

ftp workerpid:2797

192.168.0.2

ftp listenerpid:2789

192.168.0.1

ftp listenerpid:2791

ftp workerpid:2795

192.168.0.2192.168.0.1 /up/File1Inode:3

/up/File2Inode:5

ftppid:2780

Node 1 Node 2

GraphInduction

Provenance Graph Abstraction � Provenance graph vertices have well defined fields

�  E.g. pid:1234 , FilePath:/etc/ld.so

� Defined rules manually that remove or generalize these fields

ftppid:2788

ftp workerpid:2797

192.168.0.2

ftp listenerpid:2789

192.168.0.1

ftp listenerpid:2791

ftp workerpid:2795

192.168.0.2192.168.0.1 /up/File1Inode:3

/up/File2Inode:5

ftppid:2780

Node 1 Node 2ftp

ftp worker

192.168.0.0/24

ftp listener

192.168.0.0/24

ftp listener

ftp worker

192.168.0.0/24192.168.0.0/24/up/* /up/*

ftp

Node 2Node 1

GraphAbstraction

Provenance Graph Induction � Deterministic Finite Automata (DFA) Learning to generate grammar

�  Encodes the causality in generated models �  In DFA learning the present state of a vertex includes the path taken

to reach the vertex (provenance ancestry) �  Winnower extends it to remember descendants (provenance progeny)

� State of each vertex consist of three items: 1.  Label 2.  Provenance ancestry 3.  Provenance progeny

18

File1.txt

gzip

Bash

File1.txt

Progenyofgzipvertex

Ancestryofgzipvertex

Provenance Graph Induction � Finds repetitive patterns using standard implicit and explicit

state merging algorithm �  Implicit state merging combines two subgraphs if states of each

vertex are same in both subgraphs

19

ftp

ftp worker

192.168.0.0/24

ftp listener

192.168.0.0/24

ftp listener

ftp worker

192.168.0.0/24192.168.0.0/24/up/* /up/*

ftp

Node 2Node 1 ftp192.168.0.0/24

ftp listener

ftp worker

192.168.0.0/24 /up/*

Confidence levelLegend 2

GraphInduction

java

javamapper

data

javareducer

java

java mapper

data

javareducer

Node 1

Explicit State Merging

20Mergetwonodes

� At high-level explicit state merging �  Picks two nodes and make their states same �  Check if subgraph can be merged implicitly

� Consider a chained map reduce job

java

java mapper

data

javareducer

java

java mapper

data

javareducer

Node 1

:=SS:=AS:=T|VT:=A->X|A->YX:=B->WY:=C->WW:=D|D->SA:=dataB:=javamapperC:=javareducerD:=java

GraphGrammar

A

B

D

C

Provenance Graph Induction � Consider a graph with a malicious activity � Malicious behavior is visible in the final model

21

GraphInduction

ftp

ftp worker

ftp listener

ftp listener

ftp worker/up/*

/up/*

bash

Malicious filewget

x.x.x.x

ftp

Node 1

ftp ftp listener

ftp worker/up/*

Node 3

Node 2ftp

ftp listener

ftp worker

/up/*

bash

Malicious file

wget

x.x.x.x

Confidence levelLegend 1 3

Master Node

Evaluation Setup

� Setup �  1 VM as master node, 4 VMs as worker nodes �  SPADE and Docker Swarm �  Epoch size 50 sec

� Metrics �  Storage Overhead �  Computational Cost �  Effectiveness

22

Storage Overhead on Master Node

23

0 100 200 300 400 500 600 700

HTTPD

ProFTPD

MySQL

485

630

130

0.11

0.12

0.17

LOGSIZEINMB

Winnower Raw

98.7%decrease

Storage Reduction on Master Node

24

�  Apache Webserver with moderate workload

�  Note the log scale on y-axis

1

10

100

1000

10000

100000

50 100 150 200 250 300 350 400 450 500 550 600

LOG

SIZ

E (M

B)

TIME (SEC)

Raw(Uncompressed) Raw(Compressed) Winnower

7zcompressionisnotsuitable:•  Noglobalviewofcluster•  Oblivioustopreviousbatch

Evaluation: Computation Cost

25

�  Average time spent in induction and membership test at each epoch

0

5

10

15

20

25

30

35

50 100 150 200 250 300 350 400 450 500 550 600

AverageTime(sec)

ElapsedTime(sec)

Apache MySQL ProFTPD

HeterogeneousWorkload->

Updatesmodel

GenerateModelforfirsttime

Membershipcheckinexistingmodel

Case Study: Ransomware Attack

26

•  Attacker exploits Redis database server vulnerability version < 3.2

•  Vulnerability allows attacker to change SSH key and log in as Root

•  Attacker deletes the database and left a note using vim to send bitcoins get database back

Traditional Graph of Attack

27

� 10 instances of redis running in the cluster � ~80k vertices and ~83K edges with 161 MB size � Part of provenance graph shown below

28

Winnower Generated Provenance graph

� 54 vertices and 68 edges with 0.7 MB size � Part of graph is shown below:

Worker

* /uploads/*

redis-server

x.x.x.x

Attack Provenance

Nginx

*

bash

/root/.ssh/authorized_keys

*

172.17.0.0/24

/var/lib/redis/dump.rdb

/proc/12743/stat

/var/log/redis/redis.log

x.x.x.x sshd bash

/root/ransomware.notevim

/dev/tty

Other library files

Confidence levelLegend 1 10

29

Winnower Generated Provenance graph

� What happens if we attack all the nodes in the cluster

Worker

* /uploads/*

redis-server

x.x.x.x

Attack Provenance

Nginx

*

bash

/root/.ssh/authorized_keys

*

172.17.0.0/24

/var/lib/redis/dump.rdb

/proc/12743/stat

/var/log/redis/redis.log

x.x.x.x sshd bash

/root/ransomware.notevim

/dev/tty

Other library files

Confidence levelLegend 10

� Winnower is the first practical system for provenance-based auditing of clusters at scale with low overhead

� Winnower significantly improves attack identification and investigation in a large cluster

30

Conclusion

Thank you for your time. [email protected]

31

Questions

32

Backup Slides

Threat model

� Assumptions �  Winnower only tracks user-space attacks i.e. trusts the OS �  Log integrity is maintained

� Attack surface �  Distributed application replicated on Worker nodes

� Attacker’ motive �  Gain control over worker node by exploiting a software vulnerability in

the distributed application

33

Online Learning

34

Abstractedgraph

DFALearning

GraphGrammar

AggregatedModel

ModelAggregator