56
1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

Embed Size (px)

Citation preview

Page 1: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

1

Uncovering Functional Networks in Internet Traffic

Mark Meiss

September 25, 2006

Page 2: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

2

Who am I?

Mark Meiss

• Ph.D. candidate in Computer Science– Committee: Filippo Menczer, Alessandro

Vespignani, Katy Börner, Minaxi Gupta, Kay Connelly

• Researcher at the Advanced Network Management Laboratory (ANML)– http://anml.iu.edu/

Page 3: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

3

Page 4: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

4

What’s the agenda?

The subject of today’s story:

• Finding a way to improve security without compromising user privacy

• A case study in applied network science

This work is done with Filippo Menczer and Alessandro Vespignani.

Page 5: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

5

There’s what we imagine…

What do people do online?

surfing sending email playing games

Page 6: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

6

What do people do online?And there’s what is actually happening…

file sharing worms & viruses porn

Page 7: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

7

Not just a value judgment

These applications all affect the health of a data network.

There are legal problems, yes; but also…• Crowding out other applications.

– (Napster was once over 70% of all IUB traffic)

• Compromised computers are used to launch further attacks.

• “Common nuisances” are on the ’Net as well.

Page 8: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

8

The bottom line

Network administrators

need to be able to identify

what applications

are being used on the network.

…but this can be very difficult.

Page 9: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

9

A crash coursein data networks

We’ll use a running example:• Buddy Bradley wants to read a web page about his

favorite band at Vulgar Entertainment, Inc.

Page 10: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

10

Page 11: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

11

Page 12: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

12

Page 13: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

13

Page 14: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

14

Page 15: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

15

Page 16: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

16

Page 17: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

17

Page 18: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

18

Page 19: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

19

Page 20: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

20

Quick summary

• Each network conversation is identified by four pieces of information– Client address and port number– Server address and port number

• The server uses a well-known port number

• The client uses an ephemeral port number

Page 21: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

21

So why is it hard to identify applications?

• Well-known ports are a convention, not a rule– Web, e-mail, etc. do have ports assigned by the IANA

– BitTorrent, Gnutella, Napster, etc. do not

• Client and server ports share the same namespace• In practice…

– Any application can use any pair of port numbers

• Our focus: discovering what application is running on a port with no assigned use.

Page 22: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

22

The conventional solution

Let’s look inside

all of those packets!

Page 23: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

23

Page 24: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

24

Page 25: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

25

Another problem

• Packet inspection doesn’t scale– Modern high-speed networks run at 10 gigabits

per second or faster(that’s one full DVD every few seconds)

– General-purpose computers can’t even copy that data in real time

Page 26: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

26

Page 27: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

27

Page 28: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

28

Introducing the “flow”

• We can summarize Buddy’s Web surfing as two flows:– 192.168.65.33:13029 to 10.99.205.122:80 (456 bytes)

– 10.99.205.122:80 to 192.168.65.33:13029 (63,211 bytes)

Page 29: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

29

Where do flows come from?

• Architectural features of Internet routers allow them to export flow data

• Routers can’t summarize all the data– Packets are sampled to construct the flows– Typical sampling rate is around 1:100

Page 30: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

30

What can you dowith a flow?

• Usual answer:– Treat a flow as a record in a relational database– Who talked to port 1337?– What proportion of our traffic is on port 80?– Who is scanning for vulnerable systems?– Which hosts are infected with this worm?

• These are useful and valid questions.

Page 31: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

31

What can you dowith a flow?

• Our approach:– Treat a flow as a directed, weighted edge– The resulting network describes user behavior

• Hold that thought for now…

Page 32: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

32

The Internet2/Abilene network

• TCP/IP network connecting research and educational institutions in the U.S.– Over 200 universities

and corporate research labs

• Also provides transit service between Pacific Rim and European networks

Page 33: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

33

Why study Abilene?

• Wide-area network that includes both domestic and international traffic

• Heterogeneous user base including hundreds of thousands of undergraduates

• High capacity network (10-Gbps fiber-optic links) that has never been congested

• Research partnership gives access to (anonymized) traffic data unavailable from commercial networks

Page 34: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

34

Flow collection

Flows are exported in Cisco’s netflow-v5 formatand anonymized before being written to disk.

Page 35: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

35

Data dimensions

• Observed Abilene on April 14, 2005– About 200 terabytes of data exchanged– This is roughly 25,000 DVDs of information

• 600 million flow records– Almost 28 gigabytes on disk– 15 million unique hosts involved

Page 36: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

37

Weighted bipartite digraph

Page 37: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

38

M

iCiin ws

1,

N

jjCout ws

1,

Page 38: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

39

Multiple digraphs

Port 80 (Web) Port 6346 (Gnutella)

Port 25 (Mail) Port 19101 (???)

Page 39: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

40

Application correlation

• Consider the out-strength of a client in the networks for ports p and q:

j

pij

pi ws

j

qij

qi ws

Page 40: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

41

Application correlation

• Build a pair of vectors from the distribution of strength values:

),,( ||1pC

p ssp

),,( ||1qC

q ssq

Page 41: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

42

Application correlation

• Examine the cosine similarity of the vectors:

• When σ = 0, applications p and q are never used together.

• When σ = 1, applications p and q are always used together, and to the same extent.

qp

qpqp

),(

Page 42: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

43

Clustering applications

• We now have σ(p, q) for every pair of ports• Convert these similarities into distances:

• If σ = 0, then d is large; if σ = 1, then d = 0• Now apply Ward’s hierarchical clustering

algorithm

1),(

1),(

qpqpd

Page 43: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

44

Page 44: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

46

Classifying unknownapplications

• To classify an unknown application, see what known applications it clusters with

• Our classification experiment– Take 16 unknown ports– Guess function based on similarity data– Validate or invalidate guesses based on external

evidence

Page 45: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

47

Example #1

• Port 388 is coupled with FTP and Hotline– FTP is a file transfer application– Hotline is an early file-sharing application– Our guess: traditional file transfer application

• Actual identity: Unidata/LDM– Used for moving large meteorological data sets

Page 46: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

48

Example #2

• Port 19101 is coupled with instant messaging and P2P applications– Our guess: a P2P application that relies on

individual contact for file transfers

• Actual identity: Clubbox– Korean file-sharing program– Users trade large files on virtual hard drives

Page 47: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

49

Page 48: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

50

Overall results

• For our 16 guesses:– 8 were unambiguously correct– 6 were partially correct

• These turned out to be trojans and malware

• We learned that IRC + P2P = evil afoot

– 2 could not be confirmed or disproven• Ports were in transient use during data collection

Page 49: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

51

Implications

• We can identify the type of an application without examining a single packet!– Scalable– Preserves user privacy– Difficult to do with relational view of flow data

Page 50: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

52

Page 51: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

53

Page 52: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

54

Page 53: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

55

Page 54: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

56

Page 55: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

57

Broader application

• Generic view of the situation:– Weighted network of entities derived from

activity with labeled classes of interaction– Find the sub-network for each labeled class– Use the network distributions to calculate

similarity scores for the classes– Use the similarity scores to cluster the classes– Classify unknown classes using these clusters

Page 56: 1 Uncovering Functional Networks in Internet Traffic Mark Meiss September 25, 2006

58

Thank you!

• Questions and comments…