61
StreamWorks – A System for Real-Time Graph Pattern Matching on Network Traffic GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

StreamWorks – A System for Real-Time Graph Pattern Matching on Network Traffic

GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL

January 21, 2015 1

Pacific Northwest National Laboratory

Unclassified

Page 2: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 2

Emerging Graph Patterns

Goal:Detect and identify precursor events and patterns as they emerge in complex networks such that events or threats may be mitigated or acted upon before they are fully realized

Unclassified

Capture evolution of critical graph patternsDevise optimal search strategy to identify emerging patternConsider cases where target subgraph patterns may or may not be known Subgraph pattern matching is a well-studied NP-hard problem. Some work on scalable algorithmsLimited work on subgraph matching in dynamic networksApplication areas:

Computer network intrusions and threatsSocial media and network analysisFinancial and stock market analysisDistributed sensor networks

Page 3: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 3

Emerging Graph Pattern Algorithm in Action

Data Graph

Alder

Maple

DNS Server

Web Server

Host

DNS Server

Web Server

Host

DNS Server

Web Server

Host HostDNS

ServerWeb

Server

Host

Subgraph Join TreeDNS

ServerWeb

Server

Host HostHost

100%

33%

33%33%

67%

Unclassified

Page 4: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 4

Data Graph

Trout (Web Server)

Alder

Maple

DNS Server

Web Server

Host

DNS Server

Web Server

Host

DNS Server

Web Server

Host HostDNS

ServerWeb

Server

Host

Subgraph Join TreeDNS

ServerWeb

Server

Host HostHost

100%

33%

33%33%

67%

Unclassified

Emerging Graph Pattern Algorithm in Action

Page 5: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 5

Data Graph

Trout (Web Server)

Goliath (DNS Server)

Alder

Maple

DNS Server

Web Server

Host

DNS Server

Web Server

Host

DNS Server

Web Server

Host HostDNS

ServerWeb

Server

Host

Subgraph Join TreeDNS

ServerWeb

Server

Host HostHost

100%

33%

33%33%

67%

Unclassified

Emerging Graph Pattern Algorithm in Action

Page 6: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 6

Data Graph

Trout (Web Server)

Goliath (DNS Server)

Pine

Alder

Maple

DNS Server

Web Server

Host

DNS Server

Web Server

Host

DNS Server

Web Server

Host HostDNS

ServerWeb

Server

Host

Subgraph Join TreeDNS

ServerWeb

Server

Host HostHost

100%

33%

33%33%

67%

Unclassified

Emerging Graph Pattern Algorithm in Action

Page 7: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 7

Data Graph

Trout (Web Server)

Goliath (DNS Server)

Pine

Alder

Maple

DNS Server

Web Server

Host

DNS Server

Web Server

Host

DNS Server

Web Server

Host HostDNS

ServerWeb

Server

Host

Subgraph Join TreeDNS

ServerWeb

Server

Host HostHost

100%

33%

33%33%

67%

Unclassified

Emerging Graph Pattern Algorithm in Action

Page 8: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 8

Data Graph

Trout (Web Server)

Goliath (DNS Server)

Oak

Pine

Alder

Maple

DNS Server

Web Server

Host

DNS Server

Web Server

Host

DNS Server

Web Server

Host HostDNS

ServerWeb

Server

Host

Subgraph Join TreeDNS

ServerWeb

Server

Host HostHost

100%

33%

33%33%

67%

Unclassified

Emerging Graph Pattern Algorithm in Action

Page 9: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 9

Data Graph

Trout (Web Server)

Goliath (DNS Server)

Oak

Pine

Cedar

Alder

Maple

DNS Server

Web Server

Host

DNS Server

Web Server

Host

DNS Server

Web Server

Host HostDNS

ServerWeb

Server

Host

Subgraph Join TreeDNS

ServerWeb

Server

Host HostHost

100%

33%

33%33%

67%

Unclassified

Emerging Graph Pattern Algorithm in Action

Page 10: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 10

Data Graph

Trout (Web Server)

Goliath (DNS Server)

Oak

Pine Birch

Cedar

Alder

Maple

DNS Server

Web Server

Host

DNS Server

Web Server

Host

DNS Server

Web Server

Host HostDNS

ServerWeb

Server

Host

Subgraph Join TreeDNS

ServerWeb

Server

Host HostHost

100%

33%

33%33%

67%

Unclassified

Emerging Graph Pattern Algorithm in Action

Page 11: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 11

Data Graph

Trout (Web Server)

Goliath (DNS Server)

Oak

Pine Birch

Cedar

Alder

Maple

DNS Server

Web Server

Host

DNS Server

Web Server

Host

DNS Server

Web Server

Host HostDNS

ServerWeb

Server

Host

Subgraph Join TreeDNS

ServerWeb

Server

Host HostHost

100%

33%

33%33%

67%

Unclassified

Emerging Graph Pattern Algorithm in Action

Page 12: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 12

Trout (Web Server)

Goliath (DNS Server)

Oak

Pine Birch

Cedar

Data Graph

Alder

Maple

DNS Server

Web Server

Host

DNS Server

Web Server

Host

DNS Server

Web Server

Host HostDNS

ServerWeb

Server

Host

Subgraph Join TreeDNS

ServerWeb

Server

Host HostHost

100%

33%

33%33%

67%

Unclassified

Emerging Graph Pattern Algorithm in Action

Page 13: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 13

Detecting Emerging Cyber Attacks

Developing emerging subgraph pattern algorithm in a package we call StreamWorks to detect cyber intrusions and attacks in computer network trafficConstructing set of cyber attack graph patterns related to network scans, reflector attacks, flood attacks, viruses, worms, etc. in collaboration with PNNL cybersecurity analystsUtilizing anonymized internet traces data curated by CAIDA (The Cooperative Association for Internet Data Analysis) at SDSC/UCSD and simulated intrusion detection datasets from the University of New Brunswick’s Information Security Centre of Excellence

Unclassified

Page 14: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 14

Witty Worm

Internet worm that began to spread on March 19, 2004Targeted buffer overflow vulnerability in internet security systems (ISS) productsPayload contained phrase “(^.^) insert witty message here (^.^)”Attacked port 4000 with packets of sizes between 796 and 1307

796<=Packet Len<=1307 Port 4000

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host

Host

796<=Packet Len<=1307 Port 4000

796<=Packet Len<=1307 Port 4000

796<=Packet Len<=1307 Port 4000

796<=Packet Len<=1307 Port 4000

Unclassified

Page 15: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 15

Distributed Denial-of-Service Smurf Attack

Attacker sends packets to broadcast IP address with spoofed source address of victim’sPackets delivered to intermediate hostsIntermediate hosts reply to return address of victim

Host

Host

Host

ICMP Echo Request

ICMP Echo Request

ICMP Echo Request

ICMP Echo Request

ICMP Echo Reply

ICMP Echo Reply

ICMP Echo ReplyHacker

ICMP Echo Request

Router Victim

Unclassified

Page 16: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

Distributed Denial-of-Service DNS Amplification Attack

January 21, 2015 16

Agents or zombies generate large number of DNS requests with spoofed source addressDNS servers send 3 different types of responses to victimDNS response packets may be significantly larger than DNS request packets

DNS Query Response| ICMP Dest Unreachable| Frag IP Address

DNS Query Response| ICMP Dest Unreachable| Frag IP Address

DNS Query Response| ICMP Dest Unreachable| Frag IP Address

DNS Query

DNS Query

DNS Query

DNS Server

DNS Server

DNS Server

Victim

Zombie

Zombie

Zombie

DNS Query

DNS Query

DNS Query

Unclassified

Page 17: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 17

Subgraph Join Tree for DDoS Smurf Attack

Host

Host

Host

ICMP Echo Request

ICMP Echo Request

ICMP Echo Request

ICMP Echo Request

ICMP Echo Reply

ICMP Echo Reply

ICMP Echo ReplyHacker

ICMP Echo Request

Router Victim

Subgraph Join Tree (Breadth-First)Cyberattack Pattern

100%

14%

43%43%

86%

ICMP Echo Request

Host

Host

Host

RouterBroadcast Address

ICMP Echo Request

ICMP Echo Request

Router Victim

ICMP Echo Request

Host

Host

Host

Victim

ICMP Echo Reply

ICMP Echo Reply

ICMP Echo Reply

Host

Host

Host

Router VictimTime < E2

Time < E3

Time < E4

E2

E3

E4

Host

Host

Host

Router Victim

Time < E1

E1

Time < E1

Time < E1

Unclassified

Page 18: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 18

DDoS Smurf Attack Query

18

48:06

100%

14%

43%43%

86%

ICMP Echo Request

Host

Host

Host

RouterBroadcast Address

ICMP Echo Request

ICMP Echo Request

Router Victim

ICMP Echo Request

Host

Host

Host

Victim

ICMP Echo Reply

ICMP Echo Reply

ICMP Echo Reply

Host

Host

Host

Router VictimTime < E2

Time < E3

Time < E4

E2

E3

E4

Host

Host

Host

Router Victim

Time < E1

E1

Time < E1

Time < E1

Breadth-First SJT

Unclassified

Page 19: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 19

DDoS Smurf Attack Query

19

51:39

100%

14%

43%43%

86%

ICMP Echo Request

Host

Host

Host

RouterBroadcast Address

ICMP Echo Request

ICMP Echo Request

Router Victim

ICMP Echo Request

Host

Host

Host

Victim

ICMP Echo Reply

ICMP Echo Reply

ICMP Echo Reply

Host

Host

Host

Router VictimTime < E2

Time < E3

Time < E4

E2

E3

E4

Host

Host

Host

Router Victim

Time < E1

E1

Time < E1

Time < E1

Breadth-First SJT

Unclassified

Page 20: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 20

DDoS Smurf Attack Query

20

53:11

100%

14%

43%43%

86%

ICMP Echo Request

Host

Host

Host

RouterBroadcast Address

ICMP Echo Request

ICMP Echo Request

Router Victim

ICMP Echo Request

Host

Host

Host

Victim

ICMP Echo Reply

ICMP Echo Reply

ICMP Echo Reply

Host

Host

Host

Router VictimTime < E2

Time < E3

Time < E4

E2

E3

E4

Host

Host

Host

Router Victim

Time < E1

E1

Time < E1

Time < E1

Breadth-First SJT

Unclassified

Page 21: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

Scalable Subgraph Pattern Matching Algorithm

Semantic Graph Library (SGlib)

Network Analysis

Visualizations

Stream Processing

(Storm)

Run-Time System

January 21, 2015 21

StreamWorks Components

21Unclassified

Subgraph Join Tree Generation Algorithms

Statistics Collection Tools

Graph Pattern Library

Graph Pattern Definition and Join

Tree Modeling

Page 22: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 22

Scalable Subgraph Matching Algorithm

22Unclassified

Distributed Implementation of Dynamic Graph Search

Update the distributed graph with new edges in parallelSearch the updated graph in parallel for unique sub-queries

Colors represents unique sub-queries in SJ-TreeEach node in SJ-Tree maintains a match collectionEach nodes receive the new set of matches

Perform parallel hash join of new and old matches in SJ-Tree at each level

3

0 2

41

Page 23: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 2323Unclassified

CAIDA Network Traffic (2.49M nodes, 19.55M edges)

Scalability Results for Distributed Implementation of Dynamic Graph Search

Scalable Subgraph Matching Algorithm

PNNL institutional computing (PIC)

cluster: 692 nodes, AMD Interlagos processors, dual

socket, 16 cores per socket, 64 GB memory

per node, QDR InfiniBand

Page 24: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 2424Unclassified

Distributed Dynamic Graph Data Structure

Scalable Subgraph Matching Algorithm

Page 25: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 2525Unclassified

Scalable Subgraph Matching Algorithm

Concurrent Graph Queries via Multiple Subgraph Join TreesConduct parallel searching across all subgraph patterns of all subgraph join treesLeverage locality in terms of operations: Identify common subgraph patterns across subgraph join trees and search once for multiple queriesLeverage locality in terms of data: Identify graph regions that apply to multiple subgraph searches and track and manage once for multiple queries

Page 26: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

0

5

10

15

20

25

30

35

40

1 2 4 8 16

Reco

rds P

er S

econ

d (T

hous

ands

)

Number of Threads

With Filtering/Aggregation Without Filtering/Aggregation

January 21, 2015 26

Stream Processing

26Unclassified

Developed various Apache Storm Bolts to filter/aggregate Netflow dataFiltered and aggregated data is passed to emerging subgraph algorithm using Apache ActiveMQTuning of primitive subgraph matching between Storm and emerging subgraph algorithm is ongoing

1M Netflow records through Storm, 1 record per message through ActiveMQ (on PIC)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2 4 8 16 32 64

Gra

ph U

pdat

es P

er S

econ

d (M

illio

ns)

Nodes in Cluster

BatchSize (10M) BatchSize (100M)

10M and 100M power law graph updates through emerging subgraph algorithm (on PIC)

Page 27: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 27

Cyberattack Graph Patterns

27Unclassified

Port Scan

Exploit SpreadSyn

Ack

Exploit ExploitExploit

Is Netflow data and patterns enough to effectively detect emerging cyberattacks?

Page 28: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 28

Cyberattack Graph Patterns

28Unclassified

Port Scan

Exploit SpreadSyn

Ack

Exploit ExploitExploit

New User?Failed Logins?

Admin Privileges?Escalated Privileges?

New User?Failed Logins?Admin Privileges?Escalated Privileges?

Known Host?Type of Server?

Machine Type Known to Scan?

Newly Added Application?Known Application Creator?Known Process?Known Exploit?

Page 29: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 29

Multi-Source Cyberattack Graph Patterns

29Unclassified

Look to fuse streaming Netflow data with other streaming data sources such event logs, host scan logs, firewall logs, and anti-malware reportsEnrich the semantic graph with more attributes to collect information from the fused data streamDerive additional candidate graph patterns and their associated subgraph join trees with fuller attributes to better elaborate specific cyberattacks

Page 30: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 30Unclassified

Automatic Subgraph Join Tree Generation

With known target graph patterns

Breadth-first traversalDepth-first traversalFrequency-based

With unknown target graph patterns

Frequency-based

Host

Host

Host

ICMP Echo Request

ICMP Echo Request

ICMP Echo Request

ICMP Echo Request

ICMP Echo Reply

ICMP Echo Reply

ICMP Echo ReplyHacker

ICMP Echo Request

Router Victim

Page 31: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

DNS Server A

Host 3Host 2 Host 3Host 2

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3

DNS Server A

Host 1

DNS Server A

Host 3Host 2

DNS Server A

Web Server B

Host 3

DNS Server A

Host 3

DNS Server A

Host 3Host 2

DNS Server A

Host 2

DNS Server A

Web Server B

Host 1

Web Server B

Host 1 Host 3

Web Server B

Host 3

Web Server B

Host 3Host 2

Web Server B

Host 1

DNS Server A

Host 1 Host 3Host 2

Web Server B

January 21, 2015 3131

Automatic Frequency-Based Join Tree Generation with Known Graph Pattern

Page 32: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 32

Automatic Frequency-Based Join Tree Generation with Known Graph Pattern

32

DNS Server A

Web Server B

Host 3

DNS Server A

Web Server B

Host 1

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3DNS Server A

Host 3Host 2

DNS Server A

Host 3Host 2

Web Server B

Host 3Host 2

DNS Server A

Host 2

Web Server B

Host 1

Web Server B

Host 3

DNS Server A

Host 1 Host 3Host 2

Web Server B

532

DNS Server A

Host 3Host 2

14

Web Server B

Host 1 Host 3

27

173

210

182267

89

96

Host 3Host 2

7561139

1637

1284

DNS Server A

Host 1

1375

DNS Server A

Host 3

1369

Page 33: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 3333

DNS Server A

Web Server B

Host 3

DNS Server A

Web Server B

Host 1

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3DNS Server A

Host 3Host 2

DNS Server A

Host 3Host 2

Web Server B

Host 3Host 2

DNS Server A

Host 2

Web Server B

Host 1

Web Server B

Host 3

DNS Server A

Host 1 Host 3Host 2

Web Server B

532

DNS Server A

Host 3Host 2

14

Web Server B

Host 1 Host 3

27

173

210

182267

89

96

Host 3Host 2

7561139

1637

1284

DNS Server A

Host 1

1375

DNS Server A

Host 3

1369

Automatic Frequency-Based Join Tree Generation with Known Graph Pattern

Page 34: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 3434

DNS Server A

Web Server B

Host 3

DNS Server A

Web Server B

Host 1

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3DNS Server A

Host 3Host 2

DNS Server A

Host 3Host 2

Web Server B

Host 3Host 2

DNS Server A

Host 2

Web Server B

Host 1

Web Server B

Host 3

DNS Server A

Host 1 Host 3Host 2

Web Server B

532

Web Server B

Host 1 Host 3

27

173

210

182267

89

96

Host 3Host 2

7561139

1637

1284

DNS Server A

Host 1

1375

1369

DNS Server A

Host 3

DNS Server A

Host 3Host 2

14

Automatic Frequency-Based Join Tree Generation with Known Graph Pattern

Page 35: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 3535

DNS Server A

Web Server B

Host 3

DNS Server A

Web Server B

Host 1

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3DNS Server A

Host 3Host 2

DNS Server A

Host 3Host 2

Web Server B

Host 3Host 2

DNS Server A

Host 2

Web Server B

Host 1

Web Server B

Host 3

532

Web Server B

Host 1 Host 3

27

173

210

182267

89

96

Host 3Host 2

7561139

1637

1284

DNS Server A

Host 1

1375

DNS Server A

Host 1 Host 3Host 2

Web Server B

DNS Server A

Host 3Host 2

14

DNS Server A

Host 3

1369

Automatic Join Tree Generation with Known Graph Pattern

Page 36: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 3636

DNS Server A

Web Server B

Host 3

DNS Server A

Web Server B

Host 1

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3DNS Server A

Host 3Host 2

DNS Server A

Host 3Host 2

Web Server B

Host 3Host 2

DNS Server A

Host 2

Web Server B

Host 1

Web Server B

Host 3

532173

210

182267

89

96

Host 3Host 2

7561139

1637

1284

DNS Server A

Host 1

1375

DNS Server A

Host 1 Host 3Host 2

Web Server B

DNS Server A

Host 3

1369

Web Server B

Host 1 Host 3

27

DNS Server A

Host 3Host 2

14

Automatic Join Tree Generation with Known Graph Pattern

Page 37: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 3737

DNS Server A

Web Server B

Host 3

DNS Server A

Web Server B

Host 1

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3DNS Server A

Host 3Host 2

DNS Server A

Host 3Host 2

Web Server B

Host 3Host 2

DNS Server A

Host 2

Web Server B

Host 1

Web Server B

Host 3

532173

210

182267

89

96

Host 3Host 2

7561139

1637

1284

DNS Server A

Host 1

1375

DNS Server A

Host 1 Host 3Host 2

Web Server B

DNS Server A

Host 1 Host 3Host 2

Web Server B

DNS Server A

Host 3

1369

Web Server B

Host 1 Host 3

27

DNS Server A

Host 3Host 2

14

Automatic Frequency-Based Join Tree Generation with Known Graph Pattern

Page 38: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 3838

DNS Server A

Web Server B

Host 3

DNS Server A

Web Server B

Host 1

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3DNS Server A

Host 3Host 2

DNS Server A

Host 3Host 2

Web Server B

Host 3Host 2

DNS Server A

Host 2

Web Server B

Host 1

Web Server B

Host 3

532173

210

182267

89

96

Host 3Host 2

7561139

1637

1284

DNS Server A

Host 1

1375

DNS Server A

Host 1 Host 3Host 2

Web Server B

DNS Server A

Host 3

1369

DNS Server A

Host 1 Host 3Host 2

Web Server B

Web Server B

Host 1 Host 3

27

DNS Server A

Host 3Host 2

14

Automatic Join Tree Generation with Known Graph Pattern

Page 39: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 3939

DNS Server A

Web Server B

Host 3

DNS Server A

Web Server B

Host 1

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3DNS Server A

Host 3Host 2

DNS Server A

Host 3Host 2

Web Server B

Host 3Host 2

DNS Server A

Host 2

Web Server B

Host 1

Web Server B

Host 3

532173

210

182267

89

96

1139

1637

1284

DNS Server A

Host 1

1375

DNS Server A

Host 1 Host 3Host 2

Web Server B

1369

DNS Server A

Host 3DNS

Server A

Host 1 Host 3Host 2

Web Server B

Web Server B

Host 1 Host 3

27

DNS Server A

Host 3Host 2

14

Host 3Host 2

756

Automatic Join Tree Generation with Known Graph Pattern

Page 40: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 4040

DNS Server A

Web Server B

Host 3

DNS Server A

Web Server B

Host 1

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3DNS Server A

Host 3Host 2

DNS Server A

Host 3Host 2

Web Server B

Host 3Host 2

DNS Server A

Host 2

Web Server B

Host 1

Web Server B

Host 3

532173

210

182267

89

96

1139

1637

1284

DNS Server A

Host 1

1375

DNS Server A

Host 1 Host 3Host 2

Web Server B

DNS Server A

Host 3

1369

DNS Server A

Host 1 Host 3Host 2

Web Server B

Web Server B

Host 1 Host 3

27

DNS Server A

Host 3Host 2

14

Host 3Host 2

756

DNS Server A

Host 1 Host 3Host 2

Web Server B

Automatic Frequency-Based Join Tree Generation with Known Graph Pattern

Page 41: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 4141

DNS Server A

Web Server B

Host 3

DNS Server A

Web Server B

Host 1

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3DNS Server A

Host 3Host 2

DNS Server A

Host 3Host 2

Web Server B

Host 3Host 2

DNS Server A

Host 2

Web Server B

Host 1

Web Server B

Host 3

532173

210

182267

89

96

1139

1637

1284

DNS Server A

Host 1

1375

DNS Server A

Host 1 Host 3Host 2

Web Server B

DNS Server A

Host 3

1369

DNS Server A

Host 1 Host 3Host 2

Web Server B

Web Server B

Host 1 Host 3

27

DNS Server A

Host 3Host 2

14

Host 3Host 2

756

DNS Server A

Host 1 Host 3Host 2

Web Server B

Automatic Frequency-Based Join Tree Generation with Known Graph Pattern

Page 42: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 4242

DNS Server A

Web Server B

Host 3

DNS Server A

Web Server B

Host 1

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3DNS Server A

Host 3Host 2

DNS Server A

Host 3Host 2

Web Server B

Host 3Host 2

DNS Server A

Host 2

Web Server B

Host 1

Web Server B

Host 3

532173

210

182267

89

96

1139

1637

1284

DNS Server A

Host 1 Host 3Host 2

Web Server B

DNS Server A

Host 3

1369

DNS Server A

Host 1 Host 3Host 2

Web Server B

Web Server B

Host 1 Host 3

27

DNS Server A

Host 3Host 2

14

Host 3Host 2

756

DNS Server A

Host 1 Host 3Host 2

Web Server B

DNS Server A

Host 1

1375

Automatic Frequency-Based Join Tree Generation with Known Graph Pattern

Page 43: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 4343

DNS Server A

Web Server B

Host 3

DNS Server A

Web Server B

Host 1

DNS Server A

Host 1 Host 2

DNS Server A

Host 1 Host 3DNS Server A

Host 3Host 2

DNS Server A

Host 3Host 2

Web Server B

Host 3Host 2

DNS Server A

Host 2

Web Server B

Host 1

Web Server B

Host 3

532173

210

182267

89

96

1139

1637

1284

DNS Server A

Host 1 Host 3Host 2

Web Server B

DNS Server A

Host 3

1369

DNS Server A

Host 1 Host 3Host 2

Web Server B

Web Server B

Host 1 Host 3

27

DNS Server A

Host 3Host 2

14

Host 3Host 2

756

DNS Server A

Host 1 Host 3Host 2

Web Server B

DNS Server A

Host 1

1375

DNS Server A

Host 1 Host 3Host 2

Web Server B

Automatic Frequency-Based Join Tree Generation with Known Graph Pattern

Page 44: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 44

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

44

Emergent Infrequent Subgraph Patterns

Page 45: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 45

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

45

Web Server J

DB ZHost 2

Emergent Infrequent Subgraph Patterns

Page 46: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 46

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

46

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Emergent Infrequent Subgraph Patterns

Page 47: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 47

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

47

DNS Server G

Host 4DB O

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Emergent Infrequent Subgraph Patterns

Page 48: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 48

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

48

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Emergent Infrequent Subgraph Patterns

Page 49: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 49

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

49

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

Emergent Infrequent Subgraph Patterns

Page 50: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

Emergent Infrequent Subgraph Patterns

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

January 21, 2015

Page 51: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 51

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

51

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

Emergent Infrequent Subgraph Patterns

Web Server B

Host 1 FS Y

DNS Server A

FS YDB X

Page 52: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 52

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

52

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

Emergent Infrequent Subgraph Patterns

Web Server B

Host 1 FS Y

DNS Server A

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

Page 53: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 53

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

53

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

Emergent Infrequent Subgraph Patterns

Web Server B

Host 1 FS Y

DNS Server A

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

FS YDB X

Page 54: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 54

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

54

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

Emergent Infrequent Subgraph Patterns

Web Server B

Host 1 FS Y

DNS Server A

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

Page 55: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 55

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

55

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

Emergent Infrequent Subgraph Patterns

Web Server B

Host 1 FS Y

DNS Server A

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

DNS Server C

Host 5 FS P

Page 56: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 56

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

56

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

Emergent Infrequent Subgraph Patterns

Web Server B

Host 1 FS Y

DNS Server A

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

DNS Server C

Host 5 FS P

DNS Server D

Web Server E

DB Q

Page 57: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 57

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

57

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

Emergent Infrequent Subgraph Patterns

Web Server B

Host 1 FS Y

DNS Server A

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

DNS Server C

Host 5 FS P

DNS Server D

Web Server E

DB Q

DNS Server A

Host 1 FS Y

Page 58: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 58

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

58

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

Emergent Infrequent Subgraph Patterns

Web Server B

Host 1 FS Y

DNS Server A

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

DNS Server C

Host 5 FS P

DNS Server D

Web Server E

DB Q

DNS Server A

Host 1 FS Y

DNS Server A

Host 1 FS Y

Page 59: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 59

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

59

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

Emergent Infrequent Subgraph Patterns

Web Server B

Host 1 FS Y

DNS Server A

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

DNS Server C

Host 5 FS P

DNS Server D

Web Server E

DB Q

DNS Server A

Host 1 FS Y

DNS Server A

Host 1 FS Y

DNS Server A

Host 1 FS Y

Page 60: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 60

Automatic Frequency-Based Join Tree Generation with Unknown Graph Pattern

60

DNS Server G

Host 4DB O

DNS Server I

Host 3FS N

Web Server J

DB ZHost 2

DNS Server A

FS YDB X

Web Server B

Host 1 FS Y

Emergent Infrequent Subgraph Patterns

Web Server B

Host 1 FS Y

DNS Server A

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

FS YDB X

DNS Server A

Host 1 FS YDB X

Web Server B

DNS Server C

Host 5 FS P

DNS Server D

Web Server E

DB Q

DNS Server A

Host 1 FS Y

DNS Server A

Host 1 FS Y

DNS Server A

Host 1 FS Y

DNS Server A

Host 1 FS YDB X

Web Server B

Page 61: StreamWorks – A System for Real-Time Graph Pattern ... · GEORGE CHIN, SUTANAY CHOUDHURY AND KHUSHBU AGARWAL January 21, 2015 1 Pacific Northwest National Laboratory Unclassified

January 21, 2015 61

Summary

Developing scalable emerging subgraph pattern algorithm that can detect and identify precursor events and patterns as they emerge in complex networksUtilizing an efficient and novel subgraph join tree approach which tracks and monitors partial matches of a query graph against a large-scale dynamic networkApplying emerging subgraph pattern algorithm to the detection of computer network threats and intrusionsPackaging emerging subgraph pattern capabilities into an interactive network analysis framework called StreamWorksExtending StreamWorks to support emerging subgraph patterns across multiple dynamic data sourcesDeveloping approach for dynamic subgraph join tree generation to support the detection of zero-day exploits

Unclassified