30
Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS / Performance 2009 2009. 7. 8 Seokshin Son [email protected]

Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Embed Size (px)

Citation preview

Page 1: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity

Graph Decomposition

Yu Jin, Esam Sharafuddin, Zhi-Li ZhangSIGMETRICS / Performance 2009

2009. 7. 8Seokshin Son

[email protected]

Page 2: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Methodology for network graph analysis

• Decompose the graph (extracting community structures or dense subgraphs)

• Interpretation and persistence of the subgraphs (identifying non-random substructures)

• Understanding the formation of network graphs and associated applications

decomposition

interpretable?

persistent?

1. Characterizing the graph2. Explain the graph formation3. Applications1

2

3

2/24

Page 3: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Outline

• Application traffic activity graph (TAG)• Decomposing application traffic activity

graphs• Interpretation of TAG subgraphs• Temporal properties of TAG subgraphs• Applications• Summary and General remarks

3/24

Page 4: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Application traffic activity graphs (TAG’s)

• Host-to-host communication involves various types of traffic.• We study snapshots of network traffic to capture the temporal correlations.• We detrend the data based on ports (applications) and create TAGs.• Inside hosts are (likely) service requesters and outside hosts are service

providers.• TAG’s are bi-partite and the associated adjacency matrices are binary.

UMN

Internet

T

HTTP port 80/443

Email port 25/993Gnutella port 6346/6348

4/24

Page 5: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Application traffic activity graphs (TAGs) and evolution

HTTP 1K to 3K DNS 1K to 3KAOL IM 1K to 3KEmail 1K to 3K

5/24

Page 6: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Characteristics of TAGs

• We observe difference in terms of basic statistics, such as graph density, average in/out degree, etc.

• ALL TAGs contain giant connected component (GCC), which accounts for more than 85% of all the edges.

80.00%

82.00%

84.00%

86.00%

88.00%

90.00%

92.00%

94.00%

96.00%

98.00%

100.00%

GC

C s

ize

HTTPEmail

AOL Msgr

BitTorre

ntDNS

eMule

Gnutella

MSN Msgr

6/24

Page 7: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Outline

• Application traffic activity graph (TAG)• Decomposing application traffic activity

graphs• Interpretation of TAG subgraphs• Temporal properties of TAG subgraphs• Applications• Summary and General remarks

7/24

Page 8: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Dense subgraphs in TAGs

• Block structures in the adjacency matrices indicate dense subgraphs in TAGs

• Rotating rows and columns of corresponding adjacency matrices…

1 1 1 0 0 1 01 1 1 0 0 0 01 1 1 0 0 0 00 0 0 1 1 1 10 0 0 1 0 1 1

HTTP Email AOL IM BitTorrent DNS

8/24

Page 9: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Extracting dense subgraphs

• Extracting dense subgraphs can be formulated as a co-clustering problem, i.e., cluster hosts into inside host groups and outside host groups, then extract pairs of groups with more edges connected (higher density).

• This co-clustering problem can be solved by tri-nonnegative matrix factorization algorithm, which minimizes:

• We use hard clustering setting by assigning each host to only one inside/outside group

9/24

Page 10: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Tri-nonnegative matrix factorization

1 0 0 … 01 0 0 … 00 1 0 … 00 1 0 … 0

…0 0 0 … 1

1 1 1 0 0 … 00 0 0 1 1 … 0

. . .0 0 0 0 0 … 1

× ×

≈ × ×A R H C

Adjacency matrix assoc. with TAG

Row group membership

Indicator matrix

Proportional to the subgraph density

matrix

Column group membership

Indicator matrix

We identify dense subgraphs basedon the large entries in H

R is m-by-k, C is r-by-n, hence, the product is a low-rank approximation of A, with rank min(k, r)

10/24

Page 11: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Subgraph prototypes

• Recall inside (UMN) hosts are (likely) service requesters and outside hosts are service providers.

• Based on the number of inside/outside hosts in each subgraph, we propose three prototypes.

In-star Bi-meshOut-star

One inside client accesses multiple outside servers

Multiple inside client accesses one outside servers

Multiple inside clients interacts with many outside servers

11/24

Page 12: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Characterizing TAGs with subgraph prototypes

• Different application TAGs contain different types of subgraphs

• We can distinguish and characterize applications based on the subgraph components

• What do these subgraphs mean?

HTTP Email AOL IM BitTorrent DNS

12/24

Page 13: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Outline

• Application traffic activity graph (TAG)• Decomposing application traffic activity

graphs• Interpretation of TAG subgraphs• Temporal properties of TAG subgraphs• Applications• Summary and General remarks

13/24

Page 14: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Interpreting HTTP bi-mesh structures

• Most star structures are due to popular servers or active clients

• We can explain more than 80% of the HTTP bi-meshes identified in one dayServer correlation driven– Server farms

• Lycos, Yahoo, Google– Correlated service providers

• CDN: LLNW, Akamai, SAVVIS, Level3• Advertising providers: DoubleClick, etc.

• User interests driven• News: WashingtonPost, New York Times, Cnet• Media: ImageShack, casalemedia, tl4s2• Online shopping: Ebay, Costco, Walmart• Social network: Facebook, MySpace

14/24

Page 15: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

How are the dense subgraphs connected?

AS1

AS1 AS2

(A) Randomly connected stars (C) Pool

(B) Tree: client/server dual role (D) Correlated pool

15/24

Page 16: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Outline

• Application traffic activity graph (TAG)• Decomposing application traffic activity

graphs• Interpretation of TAG subgraphs• Temporal properties of TAG subgraphs• Applications• Summary and General remarks

16/24

Page 17: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Evolution of HTTP TAGs

• The extracted dense subgraphs are assumed to be non-random (persistent)

• However, each subgraph may evolve over time with hosts leaving/joining the subgraph

• A “best effort” similarity metric:

Gra

nu

lari

ty

AS

domain name

IP

=?

If the percentage of similar hosts (at certain level) is greater than η

17/24

Page 18: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Evolution of HTTP TAGs (2)• TAGs are temporally stable at the Domain/AS level• TAGs are transient at the host level

18/24

Page 19: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Evolution of HTTP TAGs (2)• Subgraphs last from a few hours to a whole day• Subgraphs become more similar during the day time

19/24

Page 20: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Outline

• Application traffic activity graph (TAG)• Decomposing application traffic activity

graphs• Interpretation of TAG subgraphs• Temporal properties of TAG subgraphs• Applications• Summary and General remarks

20/24

Page 21: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Application: Identifying unknown traffic

• Similarity of UDP port 4000 TAG subgraphs with subgraphs from messenger (chat) traffic

AOL Messenger Yahoo! Messenger UDP port 4000

Best match

21/24

Page 22: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Application: Storm worm botnet analysis

StormTypical P2P

Storm worm botnet graph contains many bi-mesh structures, which differs significantly from typical p2p applications

Bots query for supernodesBots acquire commands from supernodes Spam campaign

22/24

Page 23: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Outline

• Application traffic activity graph (TAG)• Decomposing application traffic activity

graphs• Interpretation of TAG subgraphs• Temporal properties of TAG subgraphs• Applications• Summary and General Remarks

23/24

Page 24: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Summary and General Summary and General RemarksRemarks

• Many network research questions can be formulated as “graph analysis” problem

• These graphs are mostly not random and have latent structures.

• We propose a co-clustering (tri-Nonnegative matrix factorization) based method for decomposing such graphs and reveal the community structures (dense subgraphs).

• Obtained subgraphs are meaningful and persistent, which help understand the formation of complicated communication graphs.

• We demonstrate various applications based on the decomposition results

24/24

Page 25: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Appendix

25

Page 26: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Characteristics of app. TAGs

• These statistics show difference between various app. TAGs

• It does not explain the formation of TAGs

26/24

Page 27: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

TNMF algorithm related

• Iterative optimization algorithm

• Group density matrix derivation

27/24

Page 28: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Pratical issues of tNMF

• Selection of rank and density– Linear search of appropriate ranks– Edge coverage converges after

proper chosen rank and density

• With the above rank selection method, we achieve stable subgraph decomposition results

28/24

Page 29: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Pratical issues of tNMF (2)

• Low convergence rate and local minima– We use SVD as an initialization approach

for tNMF.– Multiple runs are applied when

necessary and the result with the lowest relative square error (RSE) is selected.

29/24

Page 30: Unveiling Core Network-Wide Communication Patterns through Application Traffic Activity Graph Decomposition Yu Jin, Esam Sharafuddin, Zhi-Li Zhang SIGMETRICS

Number of subgraphs over time

30/24