16
On the Power of Off-line Data in Approximating Internet Distances Danny Raz ([email protected] ) Technion - Israel Institute of Technology and Prasun Sinha ([email protected] ) Bell Labs., Lucent Technologies

On the Power of Off-line Data in Approximating Internet Distances

Embed Size (px)

DESCRIPTION

On the Power of Off-line Data in Approximating Internet Distances. Danny Raz ( [email protected] ) Technion - Israel Institute of Technology and Prasun Sinha ( [email protected] ) Bell Labs., Lucent Technologies. Outline. Internet Distance Off line metrics - PowerPoint PPT Presentation

Citation preview

Page 1: On the Power of Off-line Data in Approximating Internet Distances

On the Power of Off-line Data in Approximating Internet Distances

Danny Raz ([email protected])

Technion - Israel Institute of Technology

and

Prasun Sinha ([email protected])

Bell Labs., Lucent Technologies

Page 2: On the Power of Off-line Data in Approximating Internet Distances

Outline

• Internet Distance• Off line metrics

– Geographic distance, #hops, # AS, depth

• Linear Regression for Internet distance estimation• Multi-variable linear regression• Accuracy of picking closest mirror site• The next step

Page 3: On the Power of Off-line Data in Approximating Internet Distances

Internet Distance• Internet Distance: one way delay between hosts• Components of Internet Distance

– Dynamic• Server Load• Network Congestion / Router Load

– Static• propagation delay over the links• Router processing delay• Edge-router processing delay

Goal: To study the power of estimating the Static Internet Distance using off-line metrics

Page 4: On the Power of Off-line Data in Approximating Internet Distances

Importance of Internet Distance Estimation

• Picking closest mirror-site/cache • For use in Content Distribution Networks

Page 5: On the Power of Off-line Data in Approximating Internet Distances

Approaches

• Dynamic– Dynamic probing [Dykes et. al. Infocom ’00]– Passive monitoring [Andrews et. al. Infocom ’02]

• Static– Semi-active probing (IDMAPs) [Jamin et. al. Infocom ’00]

• Other relevant work:– Geographic Distance and RTT: [Padmanabhan Sigcomm ‘02]

Page 6: On the Power of Off-line Data in Approximating Internet Distances

Static Internet Distance

• Propagation delay: geographical distance

• Router processing delay: # hops

• Edge-router processing delay: # AS

Static Internet Distance = geo-distance + hop-count + AS-count ?

AS #1 AS #2 AS #3

AS: Autonomous System

Core Router

Edge Router

Page 7: On the Power of Off-line Data in Approximating Internet Distances

Data Collection

• Clients: 2500 public libraries in US• Servers (mirrors/caches): 8 traceroute locations in US• The location (latitude, longitude) is known for every host.• For every client-server pair

– Run multiple (10) traceroutes– Pick the traceroute result with the smallest RTT – Compute

• Geo-distance: based on latitude and longitude• Hop-count: from traceroute• AS-count: from traceroute based on names of routers and IP Address Prefixes

Page 8: On the Power of Off-line Data in Approximating Internet Distances

Linear Regression(Geo-distance and Hop-count)

minRTT vs. Geo-distanceSE (Std. Error) = 26.93

minRTT vs. Hop-countSE (Std. Error) = 25.71

Page 9: On the Power of Off-line Data in Approximating Internet Distances

Multiple Linear Regression (Multiple metrics)

minRTT vs. Geo-distance, Hop-count

SE = 21.52

minRTT vs. Geo-distance, AS-count

SE = 23.80

Page 10: On the Power of Off-line Data in Approximating Internet Distances

minRTT = geo-distance + hop-count + AS-count ?

Term Coefficient p-value

Geo-distance 12.53 () <0.0001

Hop-count 2.45 () <0.0001

AS-count -0.64 () 0.0387

• High correlation between hop-count and AS-count (highest among any other pair of metrics)

• Hop-count and AS-count should not be used together

Page 11: On the Power of Off-line Data in Approximating Internet Distances

A new Off-line metric: Depth

• Hop-count: requires dynamic probing• Introduce an alternate metric: Depth

– Average Hop-count to the nearest backbone network (a hand-made list of 30 big core networks)

– Constant per host (client/server)

– Alternately, measure in units of time rather than hops

– (Client depth + Server depth) as a metric

Page 12: On the Power of Off-line Data in Approximating Internet Distances

Linear Regression (Depth)

minRTT vs. DepthSE = 41.02

minRTT vs. Depth and Geo-distanceSE = 24.52

Page 13: On the Power of Off-line Data in Approximating Internet Distances

Squared Errors in Estimating minRTT

MetricSE

(Standard Error)

Geo-distance, Hop-count 21.52

Geo-distance, AS-count 23.80

Geo-distance, Depth 24.52

Hop-count 25.71

Geo-Distance 26.93

Depth 41.02

Page 14: On the Power of Off-line Data in Approximating Internet Distances

Accuracy of picking the nearest mirror site

Allowed Delta

RandomGeo-

distanceHop-count

Geo-distance,

Hop-count

Geo-distance,

Depth

0 12.50% 37.84% 44.32% 38.41% 33.98%

10ms 21.15% 53.07% 58.98% 55.91% 50.45%

20ms 33.75% 73.18% 76.70% 74.89% 70.91%

30ms 46.25% 90.91% 88.75% 91.36% 89.43%

880 clients and 8 servers

Page 15: On the Power of Off-line Data in Approximating Internet Distances

Summary

• Combination of hop-count and geographic distance improves over individual metrics

• Using Depth along with Geo-distance improves performance and is completely off-line

• For closest mirror selection with 30 ms allowed deviation, almost any metric gives 90% accuracy

Is there much space to improve?

Page 16: On the Power of Off-line Data in Approximating Internet Distances

The Next Step

• Global Data– Collection and analysis of data based on

clients and servers spread across the globe

• Using both off-line and on-line– Techniques to combine the power of off line

estimation with on-line estimation.