Upload
willa-dale
View
19
Download
2
Embed Size (px)
DESCRIPTION
On the Power of Off-line Data in Approximating Internet Distances. Danny Raz ( [email protected] ) Technion - Israel Institute of Technology and Prasun Sinha ( [email protected] ) Bell Labs., Lucent Technologies. Outline. Internet Distance Off line metrics - PowerPoint PPT Presentation
Citation preview
On the Power of Off-line Data in Approximating Internet Distances
Danny Raz ([email protected])
Technion - Israel Institute of Technology
and
Prasun Sinha ([email protected])
Bell Labs., Lucent Technologies
Outline
• Internet Distance• Off line metrics
– Geographic distance, #hops, # AS, depth
• Linear Regression for Internet distance estimation• Multi-variable linear regression• Accuracy of picking closest mirror site• The next step
Internet Distance• Internet Distance: one way delay between hosts• Components of Internet Distance
– Dynamic• Server Load• Network Congestion / Router Load
– Static• propagation delay over the links• Router processing delay• Edge-router processing delay
Goal: To study the power of estimating the Static Internet Distance using off-line metrics
Importance of Internet Distance Estimation
• Picking closest mirror-site/cache • For use in Content Distribution Networks
Approaches
• Dynamic– Dynamic probing [Dykes et. al. Infocom ’00]– Passive monitoring [Andrews et. al. Infocom ’02]
• Static– Semi-active probing (IDMAPs) [Jamin et. al. Infocom ’00]
• Other relevant work:– Geographic Distance and RTT: [Padmanabhan Sigcomm ‘02]
Static Internet Distance
• Propagation delay: geographical distance
• Router processing delay: # hops
• Edge-router processing delay: # AS
Static Internet Distance = geo-distance + hop-count + AS-count ?
AS #1 AS #2 AS #3
AS: Autonomous System
Core Router
Edge Router
Data Collection
• Clients: 2500 public libraries in US• Servers (mirrors/caches): 8 traceroute locations in US• The location (latitude, longitude) is known for every host.• For every client-server pair
– Run multiple (10) traceroutes– Pick the traceroute result with the smallest RTT – Compute
• Geo-distance: based on latitude and longitude• Hop-count: from traceroute• AS-count: from traceroute based on names of routers and IP Address Prefixes
Linear Regression(Geo-distance and Hop-count)
minRTT vs. Geo-distanceSE (Std. Error) = 26.93
minRTT vs. Hop-countSE (Std. Error) = 25.71
Multiple Linear Regression (Multiple metrics)
minRTT vs. Geo-distance, Hop-count
SE = 21.52
minRTT vs. Geo-distance, AS-count
SE = 23.80
minRTT = geo-distance + hop-count + AS-count ?
Term Coefficient p-value
Geo-distance 12.53 () <0.0001
Hop-count 2.45 () <0.0001
AS-count -0.64 () 0.0387
• High correlation between hop-count and AS-count (highest among any other pair of metrics)
• Hop-count and AS-count should not be used together
A new Off-line metric: Depth
• Hop-count: requires dynamic probing• Introduce an alternate metric: Depth
– Average Hop-count to the nearest backbone network (a hand-made list of 30 big core networks)
– Constant per host (client/server)
– Alternately, measure in units of time rather than hops
– (Client depth + Server depth) as a metric
Linear Regression (Depth)
minRTT vs. DepthSE = 41.02
minRTT vs. Depth and Geo-distanceSE = 24.52
Squared Errors in Estimating minRTT
MetricSE
(Standard Error)
Geo-distance, Hop-count 21.52
Geo-distance, AS-count 23.80
Geo-distance, Depth 24.52
Hop-count 25.71
Geo-Distance 26.93
Depth 41.02
Accuracy of picking the nearest mirror site
Allowed Delta
RandomGeo-
distanceHop-count
Geo-distance,
Hop-count
Geo-distance,
Depth
0 12.50% 37.84% 44.32% 38.41% 33.98%
10ms 21.15% 53.07% 58.98% 55.91% 50.45%
20ms 33.75% 73.18% 76.70% 74.89% 70.91%
30ms 46.25% 90.91% 88.75% 91.36% 89.43%
880 clients and 8 servers
Summary
• Combination of hop-count and geographic distance improves over individual metrics
• Using Depth along with Geo-distance improves performance and is completely off-line
• For closest mirror selection with 30 ms allowed deviation, almost any metric gives 90% accuracy
Is there much space to improve?
The Next Step
• Global Data– Collection and analysis of data based on
clients and servers spread across the globe
• Using both off-line and on-line– Techniques to combine the power of off line
estimation with on-line estimation.