Toward Better Geolocation: Improving Internet Distance Estimates Using Route Traces

Preview:

DESCRIPTION

Toward Better Geolocation: Improving Internet Distance Estimates Using Route Traces. Chandrika Jayant Ethan Katz-Bassett. Outline. Motivations for geolocation Constraint-Based Geolocation Problems with CBG Our Approach PlanetLab Experiments Conclusion/ Future Work. Geolocation?. - PowerPoint PPT Presentation

Citation preview

Toward Better Toward Better Geolocation:Geolocation:

Improving Internet Distance Improving Internet Distance Estimates Using Route TracesEstimates Using Route Traces

Chandrika JayantChandrika Jayant

Ethan Katz-BassettEthan Katz-Bassett

OutlineOutline

Motivations for geolocationMotivations for geolocation Constraint-Based GeolocationConstraint-Based Geolocation Problems with CBGProblems with CBG Our ApproachOur Approach PlanetLab ExperimentsPlanetLab Experiments Conclusion/ Future WorkConclusion/ Future Work

Geolocation?Geolocation?

Infer the geographic location of an Infer the geographic location of an Internet hostInternet host

Many applications would benefit from Many applications would benefit from this informationthis information

Advertising, EBS, location sensitive Advertising, EBS, location sensitive infoinfo

Different levels of granularityDifferent levels of granularity

Constraint-Based Constraint-Based GeolocationGeolocation

Landmarks: Set of hosts with known Landmarks: Set of hosts with known locationslocations

Each landmark estimates distance to Each landmark estimates distance to targettarget

Set performs multilateration using Set performs multilateration using these distancesthese distances

-Gueye, Ziviani, Crovella, Fdida (2004)-Gueye, Ziviani, Crovella, Fdida (2004)

CBG MultilaterationCBG Multilateration

CBG MultilaterationCBG Multilateration

CBG MultilaterationCBG Multilateration

CBG MultilaterationCBG Multilateration

CBG Bestline Distance CBG Bestline Distance EstimatesEstimates

CBG Breakdowns (CBGB’s)CBG Breakdowns (CBGB’s)

Estimates are not tight and vary Estimates are not tight and vary widely widely large confidence regions, large confidence regions, need many probes to get a few tight need many probes to get a few tight onesones

No better at estimating training set vs. No better at estimating training set vs. other hosts (in general)other hosts (in general)

More data trained on, worse accuracy More data trained on, worse accuracy (in general)(in general)

Still underestimate some distancesStill underestimate some distances

Our ApproachOur Approach

Intuition: Targets that have similar Intuition: Targets that have similar routes have similar delay routes have similar delay distance distance conversions conversions

Use route info to achieve more accurate Use route info to achieve more accurate estimatesestimates

Want to fit into CBG frameworkWant to fit into CBG framework 2 main techniques, still using bestline fit:2 main techniques, still using bestline fit:

Path-BasedPath-Based Router-BasedRouter-Based

Path-Based EstimationPath-Based Estimation

• Landmark learns routes to its training setLandmark learns routes to its training set• Traceroute target up to TTL = xTraceroute target up to TTL = x• Find longest partial path shared with a subset of Find longest partial path shared with a subset of training hoststraining hosts• Calculate bestline using only this subsetCalculate bestline using only this subset

Router-Based EstimationRouter-Based Estimation

Landmark learns routes to its training Landmark learns routes to its training setset

Send packet to target with TTL = xSend packet to target with TTL = x Find subset of training hosts with paths Find subset of training hosts with paths

through this router through this router Calculate bestline using only this Calculate bestline using only this

subsetsubset In practice, use xIn practice, use x11,, xx22,…,,…, xxn n

PlanetLab ExperimentsPlanetLab Experiments

110 PlanetLab hosts in North America110 PlanetLab hosts in North America Lat/long available for eachLat/long available for each Used Scriptroute to gather delay and Used Scriptroute to gather delay and

routes between hostsroutes between hosts 26 landmarks (after munging)26 landmarks (after munging) Path-Based used TTL up to 12Path-Based used TTL up to 12 Router-Based used TTLs (12,9,6)Router-Based used TTLs (12,9,6)

Map of Landmark Hosts (26)Map of Landmark Hosts (26)

Path Length vs. AccuracyPath Length vs. Accuracy

Router TTL vs. Accuracy Router TTL vs. Accuracy

Overall Accuracy of Overall Accuracy of EstimationsEstimations

Estimations to Collocated Estimations to Collocated TargetsTargets

Effects of Training Set SizeEffects of Training Set Size

ConclusionsConclusions

Geolocation has powerful potentialGeolocation has powerful potential CBG is interesting, but needs CBG is interesting, but needs

improvementimprovement Route information improves accuracy Route information improves accuracy

of distance estimatesof distance estimates With more accurate estimations, likely With more accurate estimations, likely

need fewer landmarks to locate need fewer landmarks to locate targetstargets

Future WorkFuture Work

Modify CBG to better handle Modify CBG to better handle underestimates/ use other line fitsunderestimates/ use other line fits

Test on larger data setsTest on larger data sets Numerical analysisNumerical analysis Extend to hosts w/ slower connectionsExtend to hosts w/ slower connections Use delay distribution to “normalize” Use delay distribution to “normalize”

measurementsmeasurements

Questions?Questions?