34
Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

Embed Size (px)

Citation preview

Page 1: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

Mining Anomalies Using Traffic Feature Distributions

Anukool LakhinaMark Crovella

Christophe Diot

in ACM SIGCOMM 2005

Presented by:Sailesh Kumar

Page 2: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

2 - Sailesh Kumar - 04/22/23

Overview

Introduction to PCA Application of PCA to OD flows

» Lakhina et al. SIGMETRICS’04

Volume Anomalies Subspace Analysis of Link Volume Traffic

» Lakhina et al. SIGCOMM’04

Feature based Anomaly Detection Anomaly Classification

» Lakhina et al. SIGCOMM’05

Page 3: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

3 - Sailesh Kumar - 04/22/23

PCA is a useful statistical technique that has found application in fields such as face recognition and image compression.

It is a technique for finding patterns in data of high dimension.

Two important terms:

Covariance Matrix

Eigenvector/Eigenvalue

Principal Component Analysis

Like variance provides the relationship between data items in a single

dimension, covariance provides the relationship between different

dimensions of multi-dimensional data

•A nxn matrix has n eigenvectors.•All eigenvectors are orthogonal to each other.

Page 4: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

4 - Sailesh Kumar - 04/22/23

PCA is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences.

Since patterns in data can be hard to find in data of high dimension, where graphical representation is not available, PCA is a powerful tool for analyzing data.

It essentially reduces the dimension if data along multiple dimensions are correlated A simple example of 2-D correlated data

» (No of hours studied, Marks obtained in exam)» (Traffic entering in a network, traffic exiting a network)» Such set of data may exhibit strong correlation (positive correlation in this instance)» Thus, it might be worthwhile describing this data set as a single dimension data.

Principal Component Analysis

Page 5: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

5 - Sailesh Kumar - 04/22/23

2-D data and Principal Components

Page 6: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

6 - Sailesh Kumar - 04/22/23

One Dimensional Projection

Page 7: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

7 - Sailesh Kumar - 04/22/23

Consider a p-dimensional data If data along the p dimensions are correlated (high positive or negative

covariance), then it can be represented with fewer dimensions (k) In general any p dimensional data set can be mapped onto first k principal axes

» First k principal components with the highest eigenvalues Data mapped onto the k dimensions are usually called the normal component Remaining data is called the residual component

Principal Component Analysis

Page 8: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

8 - Sailesh Kumar - 04/22/23

OD flow is the traffic that enters at an origin PoP and exits at a destination PoP of a backbone network. Relationship between link traffic and OD flow traffic is captured by the routing matrix A.

» A has size (#links) x (# OD flows)» Aij = 1 if OD flow j traverses through link i.» Traffic engineering is essentially adjusting the matrix A.

A network with n PoP will have n2 OD flows.» Thus OD flows are high dimensional data.

– 20 PoP will result in 400 dimensions.

However, quite intuitively OD flows are correlated.» Hence they can be represented with far fewer dimensions.» Lakhina et al. (SIGMETRICS’04) shows it.

OD Flows

Page 9: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

9 - Sailesh Kumar - 04/22/23

Lakhina et al. (SIGMETRICS’04) shows that only 5-10 dimensions are sufficient to capture 95+% of the traffic

OD Flows

Page 10: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

10 - Sailesh Kumar - 04/22/23

Volume anomaly typically arises on an OD flow (traffic arriving at one PoP and destined for another PoP)

If we only monitor traffic on network links, volume arising from an OD flow may not be noticeable» Thus, naïve approach won’t work if OD flow info isn’t available

Why care about OD Flows

Page 11: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

11 - Sailesh Kumar - 04/22/23

Even if OD flow information is not available, and only link traffic information is available, PCA can be applied and subspace technique can detect volume anomalies

What is the data» Data consist of time samples of traffic volumes at all m links in the network» Thus, Y is the t x m traffic measurement matrix

– An arbitrary row y of Y denotes one sample Use PCA to separate normal and anomalous traffic Construct the principal axes and map data onto them Consider set of first k axes which captures the highest variance

» Projection of y on these k axes is called normal traffic while remaining traffic is residual traffic

Subspace Analysis of Link Traffic

Page 12: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

12 - Sailesh Kumar - 04/22/23

Subspace Analysis of Link Traffic

An approach to separate normal traffic from anomalous traffic

Normal Subspace, : space spanned by the first k principal components

Anomalous Subspace, : space spanned by the remaining principal components

Then, decompose traffic on all links by projecting onto and to obtain:

Traffic vector of all links at a particular point in time

Normal trafficvector

Residual trafficvector

Page 13: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

13 - Sailesh Kumar - 04/22/23

Note that during anomaly, normal component doesn’t change that much while residual component changes quite a lot» Thus, anomalies can be detected by setting some threshold

Subspace Analysis Results

Page 14: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

14 - Sailesh Kumar - 04/22/23

Lets talk about today’s paper now!

Objective is to build a anomaly diagnosis system» detects a diverse range of anomalies,» distinguishes between different types of anomalies,» and group similar anomalies

Clearly these goals are too ambitious» Anomalies are a moving target (malicious anomalies)» New anomalies will continue to arise» In general this is a difficult problem

This paper takes significant steps towards a system that fulfills these criteria.

Background Over

Page 15: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

15 - Sailesh Kumar - 04/22/23

Most Anomalies induce a change in distributional aspects of packet header fields (called features).» DOS attack – multiple source IP address concentrated on a single destination IP address» Network scan – dispersed distribution of destination addresses» Most worms/viruses also induce some change in distribution of certain features» However these changes can be very subtle and mining them is like searching for needles in a haystack

Unlike many previous approach, this paper aims to detect events which disturb the distribution of traffic features rather than traffic volume

Typical Characteristics of Anomaly

Page 16: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

16 - Sailesh Kumar - 04/22/23

Port scan anomaly (traffic feature changes, however traffic volume remains more or less the same)

Limitations of Volume Based Detection

We can use entropy tocapture the variationsin the traffic feature

•Takes value 0 when distribution is maximally concentrated.•Takes value log2N when distribution is maximally dispersed.

Page 17: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

17 - Sailesh Kumar - 04/22/23

Effectiveness of Feature Entropy

But stands out in feature entropy, which also revealsits structure

Port scan dwarfed in volume metrics…

Page 18: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

18 - Sailesh Kumar - 04/22/23

In volume based scheme, # of packets or bytes per time slot was the variable. In entropy based scheme, in every time slot, the entropy of every traffic feature is the

variable. This gives us a three way data matrix H.

» H(t, p, k) denotes at time t, the entropy of OD flow p, of the traffic feature k.

To apply subspace method,we need to unfold it into a single-way representation.

Entropy based scheme

Page 19: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

19 - Sailesh Kumar - 04/22/23

Decompose into a single-way matrix Now apply the usual subspace decomposition

» Every row of the matrix will be decomposed into

Multi-way to single-way

# od-pairs

# tim

ebins

H(SrcIP) H(SrcPort) H(DstPort)H(DstIP)

H(sr

cIP) H(

dstIP

)

H(sr

cPor

t)

H(ds

tPor

t)

# od-pairs# od-pairs

# tim

ebins

# tim

ebins

H(SrcIP) H(SrcPort) H(DstPort)H(DstIP)

H(sr

cIP) H(

dstIP

)

H(sr

cPor

t)

H(ds

tPor

t)

Traffic vector of all links at a particular point in time

Normal trafficvector

Residual trafficvector

Page 20: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

20 - Sailesh Kumar - 04/22/23

The application of entropy on both traffic features and the ensemble of OD flows has a key benefit that correlated anomalies across both OD flows and features stand-out.

Moreover, as we have shown in the first example, entropy is an effective summarization tool, when traffic volume changes are not significant.

We now evaluate how this scheme improves over volume based schemes

Benefits of using Features + OD flow

Page 21: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

21 - Sailesh Kumar - 04/22/23

Entropy Based versus Volume Based

Page 22: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

22 - Sailesh Kumar - 04/22/23

Detection Rates

Page 23: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

23 - Sailesh Kumar - 04/22/23

Cluster Anomaly which are close enough in space.

Anomalies can be thought of as a point in 4-D space with co-ordinate vectors» h = [H(srcIP), H(dstIP), H(srcPort), H(dstPort)]

Do anomalies of similar type appear next to each other in the entropy space» YES!

Before digging into details, a brief Introduction of Clustering Algorithms

Anomaly Classification

Page 24: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

24 - Sailesh Kumar - 04/22/23

Objective is to cluster close enough data points Two general approach to cluster

» K-means and Hierarchical

K-means is one of the simplest unsupervised learning algorithm.1. Place K points into the space represented by the data that

are being clustered. These points form initial group centroids.2. Assign each object to the group that has the closest

centroid.3. When all objects have been assigned, recalculate the

positions of the K centroids.4. Repeat Steps 2 and 3 until the centroids no longer move.

Hierarchical approach begins with one cluster and breaks it into multiple clusters OR begins with n clusters and merge different clusters

Clustering Algorithms

Page 25: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

25 - Sailesh Kumar - 04/22/23

Nearest Neighbor, Level 2, k = 7 clusters.

From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Page 26: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

26 - Sailesh Kumar - 04/22/23

Nearest Neighbor, Level 3, k = 6 clusters.

From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Page 27: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

27 - Sailesh Kumar - 04/22/23

Nearest Neighbor, Level 4, k = 5 clusters.

From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Page 28: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

28 - Sailesh Kumar - 04/22/23

Nearest Neighbor, Level 5, k = 4 clusters.

From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Page 29: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

29 - Sailesh Kumar - 04/22/23

Nearest Neighbor, Level 6, k = 3 clusters.

From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Page 30: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

30 - Sailesh Kumar - 04/22/23

Nearest Neighbor, Level 7, k = 2 clusters.

From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Page 31: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

31 - Sailesh Kumar - 04/22/23

Nearest Neighbor, Level 8, k = 1 cluster.

From http://www.stat.unc.edu/postscript/papers/marron/Stat321FDA/RimaIzempresentation.ppt

Page 32: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

32 - Sailesh Kumar - 04/22/23

Clustering Anomalies

Summary: Correctly classified 292 of 296 injected anomalies

(DstIP

)

(SrcIP) (SrcIP)

Known Labels Cluster ResultsLegend

Code Red Scanning

Single source DOS attack

Multi source DOS attack

Page 33: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

33 - Sailesh Kumar - 04/22/23

Clustering Anomalies

(SrcIP)(SrcPort)

(DstIP)

– Results of both clustering algorithms are consistent

• Heuristics identify about 10 clusters in dataset

Page 34: Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar

34 - Sailesh Kumar - 04/22/23

Questions?