8
Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW ‘08) October 23, 2008 http:// networks.cs.northwestern.edu

Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW 08) October 23, 2008

Embed Size (px)

Citation preview

Page 1: Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW 08) October 23, 2008

Google-based Traffic Classification

Aleksandar KuzmanovicNorthwestern University

IEEE Computer Communications Workshop (CCW ‘08)

October 23, 2008

http://networks.cs.northwestern.edu

Page 2: Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW 08) October 23, 2008

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)2

Traffic Classification

Problem – traffic classificationCurrent approaches(port-based, payload signatures,numerical and statistical etc.)

Our approach– Use information about destination IP

addresses available on the Internet

A. Kuzmanovic Google-based Traffic Classification

Page 3: Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW 08) October 23, 2008

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)3

Getting External Information

Use Google!

Can we systematically exploit search engines to harvest endpoint information available on the Internet?

Huge amount of endpoint information available on the web

A. Kuzmanovic Google-based Traffic Classification

Page 4: Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW 08) October 23, 2008

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)4

Websites run logging software and display statistics

Some popular proxy services also display logs

Popular servers (e.g., gaming) IP addresses are listed

Blacklists, banlists, spamlists also have web interfaces

Even P2P information is available on the Internet since the first point of contact with a P2P swarm is a

publicly available IP address

Where Does the Information Come From?

ServersServersClientsClientsP2PP2PMaliciousMalicious

A. Kuzmanovic Google-based Traffic Classification

Page 5: Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW 08) October 23, 2008

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

URL Hit textURL Hit textURL Hit text…. ….

Rapid Match

Domain name KeywordsDomain name Keywords

….….

IP tagging

IP Addressxxx.xxx.xxx.xxx

Website cache

Search hits

5

Methodology – Web Classifier and IP Tagging

A. Kuzmanovic Google-based Traffic Classification

Page 6: Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW 08) October 23, 2008

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)6

165.124.182.169

Tagged IP Cache

Traffic Classification

Mail server

193.226.5.150 Website

68.87.195.25 Router

186.25.13.24 Halo server

Hold a small % of the IP addresses seen

Look at source and destination IP addresses

and classify traffic

A. Kuzmanovic Google-based Traffic Classification

Page 7: Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW 08) October 23, 2008

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

When no sampling is doneUEP outperforms BLINC

UEP maintains a large classification ratio even at

higher sampling rates

BLINC stays in the dark2% at sampling rate 100

UEP retains high classification capabilities with sampled traffic

7

Working with Sampled Traffic

A. Kuzmanovic Google-based Traffic Classification

Page 8: Google-based Traffic Classification Aleksandar Kuzmanovic Northwestern University IEEE Computer Communications Workshop (CCW 08) October 23, 2008

I. Trestian Unconstrained Endpoint Profiling (Googling the Internet)

Summary

Shift research focus from mining operational network traces to harnessing information that is already available on the web

Deep packet inspection and legal issues:– Federal Wiretap Act: “thou shalt not intercept the

contents of communications. Violations can result in civil and criminal penalties. The worst offenses may be investigated by the FBI, Secret Service, DEA, and IRS as felony prosecutions.”

– Only 2 exceptions:• The provider protection exception• Consent

8A. Kuzmanovic Google-based Traffic Classification