23
Automation of Traceroute-Based Topographic Internet Mapping BY DENNIS DONOGHUE GEORGETOWN COMPUTER SCIENCE SECURITY LAB

Network Topology Symposium

Embed Size (px)

Citation preview

Page 1: Network Topology Symposium

Automation of Traceroute-Based

Topographic Internet Mapping

BY DENNIS DONOGHUEGEORGETOWN COMPUTER SCIENCE SECURITY LAB

Page 2: Network Topology Symposium

Motivation For Mapping the Internet Research on Routing and Network Topography often requires a network model. Nearly all the publicly available maps of the Internet aren't amazing for the type of research and analysis our lab does.

Page 3: Network Topology Symposium

Src

Dest

?

?

Georgetown

Company A

Company B Company C

? ?

Page 4: Network Topology Symposium

Traceroute The vast majority of Internet maps are derived from traceroute data.

The Traceroute tool gives the user with a list of routers between the user and an Internet destination.

In a normal Internet connection, your computer sends a data packet to your router, which then sends it off to the next router. The next router does the same thing, again and again, until the data reaches the destination.

Source Router Router Router Destination

Page 5: Network Topology Symposium

Building a Network Model The source, destination, and intermediate routers become nodes in the network model. The connections the traceroute followed between the routers becomes the links in our network model.

Page 6: Network Topology Symposium

The more sets of traceroute data you add, the less it looks like a single path and the more it starts to look like a real network

Page 7: Network Topology Symposium

Trimming and Consolidating Trimming the network is essential for making the network model computationally feasible. In situations were no important data is lost, we remove some links and nodes from our graphs. Nodes with only one link are removed from the graph. We also remove disconnected parts of the graph.

Page 8: Network Topology Symposium

Degree-1 Vertex

Degree-1 Vertex

Unconnected Nodes

Unconnected Nodes

Page 9: Network Topology Symposium

Summary We scan the network with the traceroute tool We convert our traceroute data into nodes and links We combine all the nodes and links to form our network model We trim our network model to make it computationally feasible

Page 10: Network Topology Symposium

The Running of Our Graphing Program Our programs are written in Python and Bash. All of the data we use is publicly available. All of the software packages, libraries, and tools we use are open source (publicly available and free to use, edit, and distribute).

Page 11: Network Topology Symposium

Collecting Data Our traceroute data comes from the Center for Applied Internet Data Analysis, a government funded organization based out of the San Diego Supercomputer Center. We use several other types of data:

◦IP<->AS relationship data ◦Alexa top Lists◦GeoIP data

Page 12: Network Topology Symposium

Creating Points of Presence We need to convert our data into points of presence (PoP). A PoP is an access point to the internet, a physical location that houses servers, routers, and other routing equipment. Pops are often controlled by large businesses and organizations. By converting from IP addresses to PoPs, we simplify our graphs greatly.

Page 13: Network Topology Symposium

Creating a PoP

Page 14: Network Topology Symposium

Creating a PoP

Page 15: Network Topology Symposium

Building the Graph We are now able to construct a graph using Networkx, a python network manipulation and study package. This takes our PoP and the links between them and converts it into a network model. The graph we create is encoded in the gpickle format. Gpickle is perfect for encoding our graph, but has a major drawback. There are no programs that natively support visualizing gpickle files.

Page 16: Network Topology Symposium

Automation and Optimization I focused this summer on updating, automating, and optimizing the network topography program. The automation is done through creating a Bash script that ties together all the scripts and commands that make up the mapping program. This script produces the inputs needed for the program and backs up the data ever time it is run. Optimizing the mapping program focuses on limiting repeated downloads and parallelizing algorythms.

Page 17: Network Topology Symposium

Motivation and End Goal The end goal of this project is to turn our program into a publicly available research tool. We intend to have this program run on its own every a few times a week, generating network models using the newest possible data. By providing a resource key to our field of work, we hope to make research easier for everyone, not just researchers at Georgetown

Page 18: Network Topology Symposium

Summary Internet maps are vital for computer network research The decentralized architecture of the Internet makes maps difficult to construct

We construct maps by aggregating and linking together millions of independent Internet measurements

Our Internet maps will be frequently updated and released to CS researchers on a regular basis

Questions?

Page 19: Network Topology Symposium

Acknowledgements Henry Tan Professor Sherr Drs. Maria Donoghue and Thomas Coate Georgetown University The Howard Hughes Medical Institute The HHMI Scholars

Page 20: Network Topology Symposium

Source Router

Traceroute from source to destination150.183.95.1 0.5ms Traceroutes Explained

• Each packet of data has a Time to live (ttl) value. Ttl is the number of hops a packet can travel; each hop deincrements ttl. The packet dies at zero, sending some data back to the source.

• That data includes timing information and the IP address of the router where the ttl reached zero, among other error information.

Page 21: Network Topology Symposium

Source Router Router

Traceroute from source to destination150.183.95.1 0.5ms134.75.23.1 1ms

• Each packet of data has a Time to live (ttl) value. Ttl is the number of hops a packet can travel; each hop deincrements ttl. The packet dies at zero, sending some data back to the source.

• That data includes timing information and the IP address of the router where the ttl reached zero, among other error information.

Page 22: Network Topology Symposium

Source Router Router

Traceroute from source to destination150.183.95.1 0.5ms134.75.23.1 1ms

Router

203.234.255.165 1.5ms

• Each packet of data has a Time to live (ttl) value. Ttl is the number of hops a packet can travel; each hop deincrements ttl. The packet dies at zero, sending some data back to the source.

• That data includes timing information and the IP address of the router where the ttl reached zero, among other error information.

Page 23: Network Topology Symposium

Source Router Router

Traceroute from source to destination150.183.95.1 0.5ms134.75.23.1 1ms

Router

203.234.255.165 1.5ms

Destination

112.174.83.37 2ms

• Each packet of data has a Time to live (ttl) value. Ttl is the number of hops a packet can travel; each hop deincrements ttl. The packet dies at zero, sending some data back to the source.

• That data includes timing information and the IP address of the router where the ttl reached zero, among other error information.