Upload
elvin-butler
View
218
Download
0
Embed Size (px)
DESCRIPTION
Internet Topology Measurement: Internet topology measurement studies Involves topology collection / construction / analysis Current state of the research activities Distributed topology data collection studies/platforms – Skitter, AMP, iPlane, Dimes, DipZoom, … – 20M path traces with over 20M nodes Issues in topology construction 1.Verifying accuracy of path traces 2.IP alias resolution 3.Subnet inference 4.Anonymous router resolution CS 790g: Complex Networks 3
Citation preview
University of Nevada, Reno
Resolving Anonymous Routers
Hakan KARDES
CS 790gComplex Networks
Outline
• Introduction• Anonymous router resolution – Problem– Previous approaches
• Anonymity types• Anonymity resolution via graph-based
induction (GBI)• Conclusions
2CS 790g: Complex Networks
Internet Topology Measurement:
Internet topology measurement studies• Involves topology collection / construction / analysis
• Current state of the research activities• Distributed topology data collection studies/platforms
– Skitter, AMP, iPlane, Dimes, DipZoom, …– 20M path traces with over 20M nodes
• Issues in topology construction1. Verifying accuracy of path traces2. IP alias resolution3. Subnet inference4. Anonymous router resolution
CS 790g: Complex Networks 3
Topology Collection (traceroute)
• Probe packets are carefully constructed to elicit intended response from a probe destination
• traceroute probes all nodes on a path towards a given destination– TTL-scoped probes obtain ICMP error messages from routers on the path– ICMP messages includes the IP address of intermediate routers as its source
• Merging end-to-end path traces yields the network map
Internet Topology Discovery 4
S DA B C
DestinationTTL=1
IPA
TTL=2
IPB
TTL=3
IPC
TTL=4
IPD
Vantage Point
Details
Outline
• Introduction• Anonymous router resolution – Problem– Previous approaches
• Anonymity types• Anonymity resolution via graph-based
induction (GBI)• Conclusions
5CS 790g: Complex Networks
• Anonymous routers do not respond to traceroute probes and appear as in traceroute output– Same router may appear as in multiple traces.
Internet Topology Discovery 6
y: S – L – H – x
x: H – L – S – y
y: S – – H – x
x: H – – S – y
S
L
H
y
x
S
L
H
y
x
y
S
1 2
H
x
Current daily raw topology data sets include• ~ 20 million path traces with• ~ 20 million occurrences of s along with• ~ 500K public IP addresses
The raw topology data is far from representing the underlying sampled network topology
Problem
7
Internet2 backbone
Traces• x - H - L - S - y• x - H - A - W - N - z• y - S - L - H - x• y - S - U - K - C - N - z• z - N - C - K- H - x• z - N - C - K - U - S - y
S
L
U
K
C
H
A
W
Ny
x
z
CS 790g: Complex Networks
Problem
Internet2 backboneS
L
U
K
C
H
A
W
Ny
x
z
Traces• x - - L - S - y• x - - A - W - - z• y - S - L - - x• y - S - U - - C - - z• z - - C - - - x• z - - C - - U - S - y
CS 790g: Complex Networks 6
Problem
Internet Topology Discovery 9
U K C N
L H A W
S
d
e
f
Sampled network
d
e
fS U
L
C
AW
Resulting network
Traces• d - - L - S - e• d - - A - W - - f• e - S - L - - d• e - S - U - - C - - f• f - - C - - - d• f - - C - - U - S - e
Problem
• Basic heuristics– IP: Combine anonymous nodes between same known nodes [Bilir 05]
• Limited resolution
– NM: Combine all anonymous neighbors of a known node [Jin 06]• High false positives
• More theoretic approaches– Graph minimization approach [Yao 03]
• Combine s as long as they do not violate two accuracy conditions:(1) Trace preservation condition and (2) distance preservation condition
• High complexity O(n5) – n is number of s
– ISOMAP based dimensionality reduction approach [Jin 06]• Build an nxn distance matrix then use ISOMAP to reduce it to a nx5 matrix
Distance: (1) hop count or (2) link delay• High complexity O(n3) – n is number of nodes
10
U K C N
L H A W
S
xy
z
Sampled network
x
y
zS U
L
C
A W
After resolution
x
y
zS U
L
C
A
After resolution
WH
x
y
zS U
L
C
A
W
Resulting networkCS 790g: Complex Networks
Previous Approaches
Outline
• Introduction• Anonymous router resolution – Problem– Previous approaches
• Anonymity types• Anonymity resolution via graph-based
induction (GBI)• Conclusions
11CS 790g: Complex Networks
Anonymity Types
• Type 1: Do not send any ICMP responses• Type 2: Rate limit ICMP responses• Type 3: Do not send ICMP responses when
congested• Type 4: Filtered ICMP responses at border
routers• Type 5: ICMP responses with private source
IP address
12CS 790g: Complex Networks
Graph Based Induction (GBI) - Our Approach
• Graph based induction– A graph data mining technique
• Find frequent substructures in a graph data• Commonly used in mining biological and chemical graph data
• Use of GBI for anonymous router resolution– Observe common graph structures due to anonymous routers– Develop localized algorithms with manageable computational
and storage overhead – Trace Preservation Condition
• Merge anonymous nodes as long as they cause no loops in path traces
13CS 790g: Complex Networks
Common Structures
14
Ax C y2Ax C y2
Parallel -substring
y1
y3
y1
y3
DA wx
C y
E z
DA wx
C y
E z
Star
A
C
x
y
D w
F v
E z
A
C
x
y
D w
F v
E z
Complete Bipartite
A
C
x
y
D w
E z
A
C
x
y
D w
E z
Clique
CS 790g: Complex Networks
Parallel -substring
• Algorithm• For each -substrings (a,i,c), represent it as a tuple (a||c, i)
– a||c is the tuple identifier and a<c• Read path traces and build the sorted list L of two tuples• Subsequently read tuples are compared to the ones in the list based on tuple
identifiers and duplicates are excluded from L
• Handling anonymity due to ICMP rate limiting or congestion• A second scan of path traces looking for substrings of the form (a,b,c)
corresponding to (a,i,c) in L
15
a c
b
a cb
CS 790g: Complex Networks
Clique
• Generate a new graph G* = (V*,E*)– For each -substring of type (a, e, b),
• V* ← V* U {a, b}• E* ← E* U {e(a,b)}
• First identify 4-cliques and grow them by adding nodes that are connected to at least 4 nodes of the structure– Helps in tolerating few missing links in large cliques
• Then, process all 3-cliques
16
a
c
d
e
a
c
d
e
a
c
d
e
CS 790g: Complex Networks
Complete Bipartite
• First search for a small size, i.e., K2,3, complete bipartite structure in G* and then grow it to a larger one– Take each pair of nodes and look whether they are in a K2,3
– Identifying a K2,3, look for larger complete bipartite graphs K2,m and then Kn,m that contain the identified K2,3.
• Then, process all K2,2’s
17
A
C
D
F
E
A
C
D
F
E
In G
C
D
F
E
In G* In G
A
CS 790g: Complex Networks
Star
• Combine anonymous neighbors of a known node under trace preservation condition– Starting from ones with smallest number of
anonymous neighbors
18
DA w
C y
E z
DA w
C y
E z
Note: Operate on G and not on G*
CS 790g: Complex Networks
Outline
• Introduction• Anonymous router resolution – Problem– Previous approaches
• Anonymity types• Anonymity resolution via graph-based
induction (GBI)• Conclusions
CS 790g: Complex Networks 19
Summary
Internet Topology Discovery 20
DA
C
E
GBI
DA
C
E
Underlying
DA
C
E
Collected
DA
C
E
Neighbor Matching
Responsiveness reduced in the last decade NP-hard problem Graph Based Induction Technique
Practical approach for anonymous router resolution Identifies common structures Handles all anonymity types Helpful in resolving multiple anonymous routers in a locality Uses subnet info to reduce the false postives
References• M. H. Gunes and K. Sarac. Resolving anonymous routers• in internet topology measurement studies. In IEEE INFOCOM,• Apr. 2008.• S. Bilir, K. Sarac, and T. Korkmaz. Intersection characteristics of end-to-end Internet paths• and trees. IEEE International Conference on Network Protocols (ICNP), Boston, MA, USA,• November 2005.• A. Broido and K. Claffy. Internet topology: Connectivity of IP graphs. Proceedings of SPIE ITCom Conference, Denver, CO, USA, August 2001.• B. Cheswick, H. Burch, and S. Branigan. Mapping and visualizing the Internet. ACM USENIX,San Diego, CA, USA, June 2000.• B. Yao, R. Viswanathan, F. Chang, and D. Waddington. Topology inference in the presence of anonymous routers. IEEE INFOCOM, San Francisco, CA,
USA, March 2003.• P. Tan, M. Steinbach, and V. Kumar. Introduction to data mining. Addison-Wesley, Reading,• MA, USA, 2005.• X. Jin, W.-P. K. Yiu, S.-H. G. Chan, and Y. Wang. Network topology inference based on end-to-end measurements. IEEE Journal on Selected Areas in
Communications special issue on Sampling the Internet, 24(12):2182{2195, Dec. 2006.• D. Cook and L. Holder. Mining graph data. John Wiley & Sons, 2006.• T. Matsuda, H. Motoda, and T.Washio. Graph-based induction and its applications. Advanced• Engineering Informatics, 16(2):135{1434, April 2002.• Michihiro Kuramochi, George Karypis, "Frequent Subgraph Discovery," Data Mining, IEEE International Conference on, pp. 313, First IEEE
International Conference on Data Mining (ICDM'01), 2001. • Michihiro Kuramochi, George Karypis, "An Efficient Algorithm for Discovering Frequent Subgraphs," IEEE Transactions on Knowledge and Data
Engineering, vol. 16, no. 9, pp. 1038-1051, September, 2004.• Inokuchi, A., Washio, T., and Motoda, H. 2003. Complete Mining of Frequent Patterns from Graphs: Mining Graph Data.Mach. Learn. 50, 3
(Mar.2003),321-354.DOI=http://dx.doi.org/10.1023/A:1021726221443• Inokuchi, A., Washio, T., and Motoda, H. 2004. A General Framework for Mining Frequent Subgraphs from Labeled Graphs.Fundam. Inf. 66, 1-2
(Nov. 2004), 53-82.
QUESTIONS