Internet Traffic Classification KISS

Preview:

DESCRIPTION

Internet Traffic Classification KISS. Dario Bonfiglio, Alessandro Finamore, Marco Mellia , Michela Meo, Dario Rossi. Traffic Classification & Measurement. Why ? Identify normal and anomalous behavior Characterize the network and its users Quality of service Filtering … How? - PowerPoint PPT Presentation

Citation preview

Internet Traffic ClassificationKISS

Dario Bonfiglio, Alessandro Finamore, Marco Mellia, Michela Meo, Dario Rossi

1

Traffic Classification & Measurement Why??

Identify normal and anomalous behavior Characterize the network and its users Quality of service Filtering …

How?How? By means of passive measurement Using Tstat

2

3

Tstat

Traffic classifier Deep packet inspection Statistical methods

Persistent and scalable monitoring platform Round Robin Database (RRD) Histograms

Internal Clients

EdgeRouter

External Servers

htt

p:/

/tst

at.

tlc.

polit

o.it

htt

p:/

/tst

at.

tlc.

polit

o.it

Tstat at a Glance

Worm and Viruses?

Did someone open a Christmas card? Happy new year to Windows!! Did someone open a Christmas card? Happy new year to Windows!!

Anomalies (Good!)Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008

Spammer Disappear McColo SpamNet shut off on Tuesday, November 11th, 2008

New Applications – P2PTVFiorentina 4 - Udinese 2Fiorentina 4 - Udinese 2

Inter 1 - Juventus 0Inter 1 - Juventus 0

Traffic classification

Look at the packets…

Tell me what protocol and/or application

generated them

Port:

Port: 4662/4672

Port:

Port:

Payload: “bittorrent”

Payload: E4/E5

Payload:

Payload: RTP protocol

Skype Bittorrent

Gtalk eMule

Typical approach: Deep Packet Inspection (DPI)

It fails more and more:P2P

EncryptionProprietary solution

Many different flavours

The Failure of DPI

11.05.2008 12:29 eMule 0.49a released 11.05.2008 12:29 eMule 0.49a released

1.08.2008 20:25 eMule 0.49b released 1.08.2008 20:25 eMule 0.49b released

Possible Solution: Behavioral Classifier

Phase 1

Feature

Phase 3

Verify

1. Statistical characterization of traffic (given source) 2. Look for the behaviour of unknown traffic and

assign the class that better fits it3. Check for possible classification mistakes

Phase 2

DecisionTraffic(Known)

(Training) (Operation)

Phase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Our Approach

Statistical characterization of bits in a flow

Do NOT look at the SEMANTIC and TIMING… but rather look at the protocol FORMAT

Test2

Chunking and 2

First N payload bytes

First N payload bytes

C chunks Each of

b bits2

12

C[ ], … ,

Vector of Statistics

The provides an implicit measure of entropy or randomness

2

Observeddistribution

Expecteddistribution(uniform)

Consider a chunk of 2 bits:

0 1 2 3 0 1 2 3 0 1 2 3

RandomValues

DeterministicValue

Counter

Oi

and different beaviour

4 bit long chunks: evolution

random

x x x x

2

random

Deterministic )12(2 bN

0 0 0 1

4 bit long chunks: evolution2

random

deterministic

mixed

x 0 0 0

x 0 x 0

0 x x x

4 bit long chunks: evolution2

Chi Square Classifier

Split the payload into groups

Apply the test on the groups at the flow end: each message is a sample

Some groups will contain Random bits Mixed bits Deterministic bits

0 8 16 24---------------------| ID | FUNC |---------------------

CSC

1

10

100

1000

10000

100000

1e+006

100 1000 10000 100000 1e+006n [pkt]

Deterministic groupRandom group

Mixed group

And the counter example?

2 byte long counter

MSG L2 L1 LSG

MostSignificantGroup

LessSignificantGroup

Protocol format as seen from the2

Statistical characterization of bits in a flow

Decision process Test

Minimum distance / maximum likelihood

2

Phase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Our Approach

C-dimension space

21

2C[ ], … ,

Iperspace

ClassificationRegions

EuclideanDistance

Support VectorMachine

2i

2j

Class

Class

My Point

Example considering the 2

2i

2j Centroid

Center of mass

Euclidean Distance Classifier

2i

2j

True NegativeAre “Far”

True PositivesAre “Nearby”

CentroidCenter of mass

Euclidean Distance Classifier

2i

2j

False Positives

CentroidCenter of mass

Iper-sphere

Euclidean Distance Classifier

2i

2j Centroid

Center of mass

Iper-sphere False negatives

Radius

Euclidean Distance Classifier

2i

2j Centroid

Center of mass

Iper-sphere min { False Pos. } min { False Neg. }

Confidence

The distance is a measure of the condifence of the decision

Euclidean Distance Classifier

Radius

Tru

e P

ositi

ve

– F

alse

pos

itive

How to define the sphere radius?

Space ofsamples(dim. C)

Kernel function

Space of feature

(dim. ∞)

Kernel functions Move point so that borders

are simple

Support Vector Machine

Support vectors

Support vectors

Kernel functions Move point so that borders

are simple

Borders are planes Simple surface! Nice math Support Vectors LibSVM

Support Vector Machine

Decision Distance from the border Confidence is a

probability

p ( class )

Kernel functions

Borders are planes Simple surface! Nice math Support Vectors LibSVM

Support Vector Machine

Performance evaluationHow accurate is all this?

Our ApproachPhase 1

Feature

Phase 3

Verify

Phase 2

DecisionTraffic(Known)

Statistical characterization of bits in a flow

Decision process Test

Minimum distance / maximum likelihood

2

Per flow and per endpoint

What are we going to classify? It can be applied to both single flows And to endpoints

It is robust to sampling Does not require to monitor all packets, not the

first packets

35

Real traffic tracesInternet

Fastweb

Known + Other Training Known Traffic False Negatives Unknown traffic False Positives

Trace

RTPeMuleDNS

Oracle(DPI +Manual )

other

Other UnknownTraffic

1 day long trace

20 GByte diUDP traffic

Definition of false positive/negative

TrafficOracle (DPI) eMuleRTP

DNS

Other

Classifing “known”

true positives

false negatives

true negatives

false positives

Classifing “other”KISS KISS

Case A Case BRtp 0.08 0.23Edk 13.03 7.97Dns 6.57 19.19

Case A Case B0.00 0.050.98 0.540.12 2.14

Case A Case Bother 13.6 17.01

Euclidean Distance SVM

Case A Case B0.00 0.18

Results

Known traffic(False Neg.)

[%]

Other(False Pos.)

[%]

Real traffic trace

RTP errors are oracle mistakes(do not identify RTP v1)

DNS errors are due to impure training set

(for the oracle all port 53 is DNS traffic)

EDK errors are (maybe) Xbox Live(proper training for “other”)

FN are always below 3%!!!

Tuning trainset size

%

True positives

False positives

Samples per class

(confidence 5%)

Small training setFor “known”: 70-80 MbyteFor “other”: 300 Mbyte

2

packets

%

True positives

False positives

Tuning num of packets for

(confidence 5%)

Protocols with volumesat least 70-80 pkts per flow

P2P-TV applications

P2P-TV applications are becoming popularThey heavly rely on UDP at the transport protocolThey are based on proprietary protocolsThey are evolving over time very quicklyHow to identify them?... After 6 hours, KISS give you results

The Failure of DPI

And for TCP?

44

Chunking and 2

First N payload bytes

First N payload bytes

C chunks Each of

b bits2

12

C[ ], … ,

Vector of Statistics

The provides an implicit measure of entropy or randomness

2

Observeddistribution

Expecteddistribution(uniform)

Results

46

Results

47

Pros and Cons

KISS is good because…• Blind approach• Completely automated• Works with many protocols• Works even with small training• Statistics can start at any point• Robust w.r.t. packet drops• Bypasses some DPI problems

but…• Learn (other) properly• Needs volumes of traffic• May require memory (for now)• Only UDP (for now)• Only offline (for now)

Papers D. Bonfiglio, M. Mellia, M. Meo, D. Rossi, P. Tofanelli “Revealing skype

traffic: when randomness plays with you”, ACM SIGCOMM, Kyoto, JP, August 2007

D. Rossi, M. Mellia, M. Meo, “A Detailed Measurement of Skype Network Traffic”, 7th International Workshop on Peer-to-Peer Systems (IPTPS '08), Tampa Bay, Florida, February 2008

D. Bonfiglio, M. Mellia, M. Meo, N. Ritacca, D. Rossi, “Tracking Down Skype Traffic”, IEEE Infocom, Phoenix, AZ, 15,17 April 2008

D. Bonfiglio, M. Mellia, M. Meo, D. Rossi Detailed Analysis of Skype Traffic IEEE Transactions on Multimedia "1", Vol. 11, No. 1, pp. 117-127, ISSN: 1520-9210, January 2009

A. Finamore, M. Mellia, M. Meo, D. Rossi KISS: Stochastic Packet Inspection 1st Traffic Monitoring and Analysis (TMA) Workshop Aachen, 11 May 2009

And for TCP

50

Recommended