Upload
dennis-donoghue
View
68
Download
0
Embed Size (px)
Citation preview
Automation of Traceroute-Based
Topographic Internet Mapping
BY DENNIS DONOGHUEGEORGETOWN COMPUTER SCIENCE SECURITY LAB
Motivation For Mapping the Internet Research on Routing and Network Topography often requires a network model. Nearly all the publicly available maps of the Internet aren't amazing for the type of research and analysis our lab does.
Src
Dest
?
?
Georgetown
Company A
Company B Company C
? ?
Traceroute The vast majority of Internet maps are derived from traceroute data.
The Traceroute tool gives the user with a list of routers between the user and an Internet destination.
In a normal Internet connection, your computer sends a data packet to your router, which then sends it off to the next router. The next router does the same thing, again and again, until the data reaches the destination.
Source Router Router Router Destination
Building a Network Model The source, destination, and intermediate routers become nodes in the network model. The connections the traceroute followed between the routers becomes the links in our network model.
The more sets of traceroute data you add, the less it looks like a single path and the more it starts to look like a real network
Trimming and Consolidating Trimming the network is essential for making the network model computationally feasible. In situations were no important data is lost, we remove some links and nodes from our graphs. Nodes with only one link are removed from the graph. We also remove disconnected parts of the graph.
Degree-1 Vertex
Degree-1 Vertex
Unconnected Nodes
Unconnected Nodes
Summary We scan the network with the traceroute tool We convert our traceroute data into nodes and links We combine all the nodes and links to form our network model We trim our network model to make it computationally feasible
The Running of Our Graphing Program Our programs are written in Python and Bash. All of the data we use is publicly available. All of the software packages, libraries, and tools we use are open source (publicly available and free to use, edit, and distribute).
Collecting Data Our traceroute data comes from the Center for Applied Internet Data Analysis, a government funded organization based out of the San Diego Supercomputer Center. We use several other types of data:
◦IP<->AS relationship data ◦Alexa top Lists◦GeoIP data
Creating Points of Presence We need to convert our data into points of presence (PoP). A PoP is an access point to the internet, a physical location that houses servers, routers, and other routing equipment. Pops are often controlled by large businesses and organizations. By converting from IP addresses to PoPs, we simplify our graphs greatly.
Creating a PoP
Creating a PoP
Building the Graph We are now able to construct a graph using Networkx, a python network manipulation and study package. This takes our PoP and the links between them and converts it into a network model. The graph we create is encoded in the gpickle format. Gpickle is perfect for encoding our graph, but has a major drawback. There are no programs that natively support visualizing gpickle files.
Automation and Optimization I focused this summer on updating, automating, and optimizing the network topography program. The automation is done through creating a Bash script that ties together all the scripts and commands that make up the mapping program. This script produces the inputs needed for the program and backs up the data ever time it is run. Optimizing the mapping program focuses on limiting repeated downloads and parallelizing algorythms.
Motivation and End Goal The end goal of this project is to turn our program into a publicly available research tool. We intend to have this program run on its own every a few times a week, generating network models using the newest possible data. By providing a resource key to our field of work, we hope to make research easier for everyone, not just researchers at Georgetown
Summary Internet maps are vital for computer network research The decentralized architecture of the Internet makes maps difficult to construct
We construct maps by aggregating and linking together millions of independent Internet measurements
Our Internet maps will be frequently updated and released to CS researchers on a regular basis
Questions?
Acknowledgements Henry Tan Professor Sherr Drs. Maria Donoghue and Thomas Coate Georgetown University The Howard Hughes Medical Institute The HHMI Scholars
Source Router
Traceroute from source to destination150.183.95.1 0.5ms Traceroutes Explained
• Each packet of data has a Time to live (ttl) value. Ttl is the number of hops a packet can travel; each hop deincrements ttl. The packet dies at zero, sending some data back to the source.
• That data includes timing information and the IP address of the router where the ttl reached zero, among other error information.
Source Router Router
Traceroute from source to destination150.183.95.1 0.5ms134.75.23.1 1ms
• Each packet of data has a Time to live (ttl) value. Ttl is the number of hops a packet can travel; each hop deincrements ttl. The packet dies at zero, sending some data back to the source.
• That data includes timing information and the IP address of the router where the ttl reached zero, among other error information.
Source Router Router
Traceroute from source to destination150.183.95.1 0.5ms134.75.23.1 1ms
Router
203.234.255.165 1.5ms
• Each packet of data has a Time to live (ttl) value. Ttl is the number of hops a packet can travel; each hop deincrements ttl. The packet dies at zero, sending some data back to the source.
• That data includes timing information and the IP address of the router where the ttl reached zero, among other error information.
Source Router Router
Traceroute from source to destination150.183.95.1 0.5ms134.75.23.1 1ms
Router
203.234.255.165 1.5ms
Destination
112.174.83.37 2ms
• Each packet of data has a Time to live (ttl) value. Ttl is the number of hops a packet can travel; each hop deincrements ttl. The packet dies at zero, sending some data back to the source.
• That data includes timing information and the IP address of the router where the ttl reached zero, among other error information.