Upload
oswald-shelton
View
218
Download
0
Embed Size (px)
DESCRIPTION
Definitions Virtual path: network level abstraction of “direct link” between two hosts. At the network layer, it is realized by a single route. Autonomous system (AS): collection of routers and hosts controlled by a single administrative entity.
Citation preview
End-to-End Routing Behavior in the Internet
Vern Paxson
Presented by Sankalp Kohli and Patrick Wong
Idea Previous studies of routing protocols explain
routing behavior only qualitatively Use end-to-end measurement to determine:
Route pathologies Route stability Route symmetry
Definitions Virtual path: network level abstraction of
“direct link” between two hosts. At the network layer, it is realized by a single route.
Autonomous system (AS): collection of routers and hosts controlled by a single administrative entity.
Routing Protocols Interior Gateway Protocol (IGP): routing
protocol for entities within the same AS. Border Gateway Protocol (BGP): for
inter-AS routing. Each AS keeps a routing table with reachable hosts and corresponding costs. Upon detected changes, only affected part of routing table is shared.
Methodology Run Network Probes Daemon (NPD) on a
number of Internet sites (37)
Methodology Key property (N2 scale)
Use N sites to measure N2 Internet paths Each NPD site periodically measure the route to
another NPD site, by using traceroute
Methodology Two sets of experiments
D1 – measure each virtual path between two NPD’s with a mean interval of 1-2 days, Nov-Dec 1994 – maintains the average load of one measurement per two hours
D2 – measure each virtual path using a bimodal distribution inter-measurement interval, Nov-Dec 1995
60% with mean of 2 hours (burst) 40% with mean of 2.75 days (to measure longer term
behavior) Measurements in D2 were paired Measure A=>B and then B<= A
Methodology Links traversed during D1 and D2
Methodology Exponential sampling
Unbiased sampling – measures instantaneous signal with equal probability
PASTA principle – Poisson Arrivals See Time Averages Only 37 sites?!
Argue that sampled AS’s are on half of the Internet routes If we weight each AS by its likelihood of occurring in an AS
path, then the AS’s sampled by routes we measured comprised about half of the Internet AS’s by weight
Confidence intervals for probability that an event occurs
Limitations Just a small subset of Internet paths Just two points at a time Difficult to say why has something happened,
only with end-to-end measurements Possible fixes: something more robust than traceroute
or multiple measurement requests 5%-8% of time couldn’t connect to NPD’s
Introduces bias toward underestimation D2 Pairing helps correct the underestimation
Routing Pathologies Persistent routing loops Temporary routing loops Erroneous routing Connectivity altered mid-stream Temporary outages (> 30 sec)
Routing Loops & Erroneous Routing Routing Loops:
Forwarding Loop Information Loop Traceroute Loop
Persistent routing loops (10 in D1 and 50 in D2) Several hours long (e.g., > 10 hours) It is not confined to single router
Erroneous routing (one in D1) A route UK=>USA goes through Israel
Route Changes Connectivity change in mid-stream (10 in D1
and 155 in D2) Route changes during measurements Recovering bimodal: (1) 100’s msec to seconds; (2)
order of minutes Route fluttering
Rapid route oscillation Very little fluttering was seen and only happened
within the AS.
Example of Route Fluttering
wustl (St. Loutis) to umann(Mannheim, Germany)Solid: 17 hops, dotted: 29 hops
Problems with Fluttering Asymmetry Path properties difficult to predict
This confuses RTT estimation in TCP, may trigger false retransmission timeouts
Packet reordering TCP receiver generates DUPACK’s, may trigger
spurious fast retransmits
Infrastructure Failures “host unreachable” from router well inside
the network. 0.21% in D1, estimate availability rate
99.8%. This dropped to 99.5% in D2.
NPD’s unreachable due to many hops (6 in D2) Unreachable more than 30 hops Path length not necessary correlated with
distance 1500 km end-to-end route of 3 hops 3 km (MIT – Harvard) end-to-end route of 11
hops
Temporary Outages
Sequence of traceroute packets lost due to temporary loss of connectivity or heavy congestion.
In D1(D2), 55% (43%) had 0 losses, 44% (55%) had 1 to 5 losses, and 0.96% (2.2%) had 6 or more.
Distribution of Long Outages (>30 sec )
Time-of-Day patterns
Temporary outages: min (0.4%) occurred during the 1:00-2:00 h, max (8.0%) during the 15:00-16:00 h.
Infrastructure failures: min (1.2%) at 9:00-10:00 h, peak during 15:00-16:00 h.
Pathology Summary
Routing Stability Two definitions of stability:
Prevalence: likelihood to observe a particular route
Steady state probability that a virtual path at an arbitrary point in time uses a particular route
Persistence: how long a route remains unchanged
Affects utility of storing state in routers
Routing Stability Routing Prevalence
Let r be the steady-state probability that a VP uses route r at an arbitrary time, and k the number of times we observe the route.
Due to PASTA, an unbiased estimator of r can be computed as
The prevalence of the dominant route is analyzed.
nk
rr
Routing Prevalence
In general, Internet paths are strongly dominated by a single route, especially if observed at higher granularity.
Routing Persistence The notion of persistence depends on what is
deemed persistent. A series of measurements are undertaken to
classify routes according to their alternation frequency.
Conclusion: routing changes occur over a wide range of time scales, i.e., from minutes to days
Routing Symmetry
Sources of Routing Asymmetry Link cost metrics “hot potato” routing problem due to the
competing providers. “cold potato”
Routing Symmetry
Analysis of Routing Symmetry Measurements were paired to ensure that an
asymmetry is actually being captured. Asymmetry is quite common (49% on a city
granularity, 30% AS granularity). Size of Asymmetries
Majority confined to one hop (one city or AS)
Summary Pathologies doubled during 1995 Asymmetry is quite common Paths heavily dominated by a single route Over 2/3 of Internet paths are reasonable
stable (> days). The other 1/3 varies over many time scales