Comparing Broadband ISP Performance using Big Data from M-Lab

1

Comparing Broadband ISP Performanceusing Big Data from M-Lab

Xiaohong Deng, Yun Feng, Thanchanok Sutjarittham, Hassan Habibi Gharakheili, Blanca Gallego,and Vijay Sivaraman

Abstract—Comparing ISPs on broadband speed is challeng-ing, since measurements can vary due to subscriber attributessuch as operation system and test conditions such as accesscapacity, server distance, TCP window size, time-of-day, andnetwork segment size. In this paper, we draw inspiration fromobservational studies in medicine, which face a similar challengein comparing the effect of treatments on patients with diversecharacteristics, and have successfully tackled this using “causalinference” techniques for post facto analysis of medical records.Our first contribution is to develop a tool to pre-process andvisualize the millions of data points in M-Lab at various time-and space-granularities to get preliminary insights on factorsaffecting broadband performance. Next, we analyze 24 monthsof data pertaining to twelve ISPs across three countries, anddemonstrate that there is observational bias in the data due todisparities amongst ISPs in their attribute distributions. For ourthird contribution, we apply a multi-variate matching methodto identify suitable cohorts that can be compared without bias,which reveals that ISPs are closer in performance than thoughtbefore. Our final contribution is to refine our model by developinga method for estimating speed-tier and re-apply matching forcomparison of ISP performance. Our results challenge conven-tional rankings of ISPs, and pave the way towards data-drivenapproaches for unbiased comparisons of ISPs world-wide.

Index Terms—Broadband performance, Big data, Data analyt-ics, Measurement lab

I. INTRODUCTION

This paper asks the question: how should we compareInternet Service Providers (ISPs) in terms of the broadbandspeeds they provide to consumers? (aspects such as pricingplans, quotas, and reliability are not considered in this paper).On the face of it, determining the answer may seem simple: asubscriber’s speed can be measured directly (say via a speed-test tool or an adaptive bit-rate video stream), allowing ISPs tobe compared based on the average (or median) measured speedacross their subscriber base. However, this approach has deepconceptual problems: an ISP-A who has many subscribers in

X. Deng was with the School of Electrical Engineering and Telecommu-nications, University of New South Wales, Sydney, NSW 2052, Australia(e-mail: [email protected]).

T. Sutjarittham, H. Habibi Gharakheili, and V. Sivaraman are withthe School of Electrical Engineering and Telecommunications, Univer-sity of New South Wales, Sydney, NSW 2052, Australia (e-mails:[email protected], [email protected], [email protected]).

Y. Feng is with Shanghai Huawei Technologies, Pudong, China, ZIP 201206(e-mail: [email protected]).

B. Gallego is with the Centre for Big Data Research in Health, Uni-versity of New South Wales, Sydney, NSW 2052, Australia (e-mail:[email protected]).

This submission is an extended and improved version of our paper presentedat the ITNAC 2015 conference [1]

remote areas served by low-capacity (wired or wireless) infras-tructure will compare poorly to an ISP-B whose subscribersare predominantly city dwellers connected by fiber; yet, itcould well be that ISP-A can provide higher speeds than ISP-B to every subscriber covered by ISP-B! The comparison biasillustrated above arising from disparity in access capacity isbut one example of many potential confounding factors, suchas latency to content servers, host and server TCP window sizesettings, maximum segment size in the network, and time-of-day, that directly bias measurement test results. Observationalstudies therefore need to understand and correct for such biasesto ensure that the comparisons are fair.

In this study we draw inspiration from the field of medicine,which has grappled for decades with appropriate methodsto compare new drugs/treatments. “Patients” (in our casebroadband subscribers) with “attributes” such as gender, age,medical conditions, and prior medications (in our case access-capacity, server-latency, host-settings, etc.) are given “treat-ments” (in our case ISPs), and their efficacy needs to becompared. The dilemma is that any given patient can onlybe measured taking treatment-A or treatment-B, but not bothat the same time; similarly, a subscriber in our case canonly be observed when connected to one ISP, so the “groundtruth” of that customer’s broadband performance served byother ISPs is never observed. To overcome this issue, the goldstandard for medical treatment comparisons is a “randomizedcontrol trial” (RCT), wherein each patient in the cohort israndomly assigned to one of the multiple treatments (oneof which could be a placebo). The randomization is crucialhere, in the expectation that known as well as unknownattributes that could confound the experiment outcome getevenly distributed across the groups being compared so thatstatistically meaningful inferences can be drawn.

Alas, “randomized” assignment of ISPs to subscribers isnot a viable option in the real world, so we have to insteadrely on “observational” studies that analyze performance datagiven a priori assignment of ISPs to subscribers. Fortunatelyfor us, techniques for observational studies are maturingrapidly, particularly in medicine where analyzing big datafrom electronic health records is much cheaper than runningcontrolled clinical trials, and can yield valuable insights on thecausal relationship between patient attributes and treatmentoutcomes. In this work we have collaborated closely witha medical informatics specialist to apply “causal inference”techniques to analyzing ISP performance data – unlike aclassic supervised learning problem, causal inference worksby estimating how things might look in different conditions,

arX

iv:2

101.

0979

5v1

[cs

.PF]

24

Jan

2021

2

thereby differentiating the influence of A versus B, insteadof trying to predicting the outcome. We apply this method tothe wealth of broadband data available from the open M-Labplatform, which holds over 40 million measurement resultsworld-wide for the year 2016. Though no data-driven approachcan guarantee that causal relationships are deduced correctly,as there could be unknown attributes that affect outcome(the “unknown unknowns”, to use the Rumsfeld phrase), webelieve that the M-Lab data set captures most, if not all,of the important attributes that are likely to affect the speedmeasurements.

Our objective in this paper is to apply emerging data-analysis techniques to the big-data from M-Lab to get newinsights into ISP broadband performance comparison. Our firstcontribution is somewhat incidental - we develop a tool thatallows researchers to easily and quickly process and depictM-Lab data to visualize performance metrics (speed, latency,loss, congestion) at various spatial (per-house, per-ISP, per-country) and temporal (hourly, monthly, yearly) granularities.Our second contribution applies our tool to over 17 milliondata samples taken in 2015 and 2016 spanning 12 ISPs in 3countries, to identify the relative impact of various attributes(access speed-tier, host settings, server distance, etc.) on broad-band performance. We reveal, both visually and analytically,that dominant attributes can vary significantly across ISPs,corroborating our earlier assertion that subscriber cohorts havedisparate characteristics and ISP comparisons are thereforeriddled with bias. Our third contribution is to apply a causalinference technique, called multi-variate matching, to filterthe data sets by identifying cohorts with similar attributesacross ISPs. Our final contribution is to refine our method tomake M-Lab data more useful by mapping measurements withhouseholds and estimate their speed-tier, so that meaningfulperformance comparisons across households can be conducted.Our results indicate that the ISPs are closer in speed perfor-mance than previously thought, and their relative ranking canbe quite different to what the raw aggregates indicate.

The rest of this paper is organized as follows: §II recapsprior work in broadband performance, and gives relevantbackground on causal inference techniques. In §III we describeour measurement data set, the attributes it contains, andpreliminary insights gleaned from our visualization tool. Theattribute distributions and underlying biases are discussed in§IV, while in §V we apply multi-variate matching to reducebias and compare ISPs in a fair manner. §VI presents oursystematic approach to estimate household access capacityfrom M-Lab data. The paper is concluded in §VII with pointersto future work.

II. BACKGROUND AND PRIOR WORK

A. Broadband Measurement and Reporting

Measuring and ranking broadband ISPs has been ongoing(and contentious) for several years – Netflix publishes amonthly ISP speed index [2] that ranks ISPs based on theirmeasured prime time Netflix performance, and Youtube graphsfor each ISP the break-down of video streaming quality (lowvs standard vs high definition) by time-of-day averaged over

a 30-day period [3]. While these large content providersundoubtedly have a wealth of measurement data, these arespecific to their services, and neither their data nor their precisecomparison methods are available in the public domain (to befair Google does outline a methodology on its video qualityreport page, but it fails to mention important elements such aswhether it only considers video streams of a certain minimumduration, whether a house that watches more video streamscontributes more to the aggregate rating, and how it accountsfor various factors such as browser type, server latency, etc.that vary across subscribers and can affect the measurementoutcome). Governments are also under increasing pressure tocompile consumer reports on broadband performance – forexample the FCC in the US [4] directs consumers to variousspeed test tools to make their own assessment, and the ACCCin Australia [5] is running a pilot program to instrumentvolunteers’ homes with hardware probes to measure theirconnection speeds. Additionally, various national regulators inEurope employ their own methods of measuring broadbandspeed and publish white papers as surveyed in [6] – forexample, the ofcom in the UK uses a hardware measurementunit (developed by SamKnows), while several other nationalregulators such as in Italy, Austria, Germany, Portugal, Slove-nia use specialized software solutions (developed in-house)while the regulator in Greece adopted M-Lab’s NDT tool.

While there is a commendable amount of effort beingexpended on collecting data, via either passive measurementof video traffic or active probing using hardware devices (werefer the reader to a recent survey [7] that gives an overview ofmeasurement platforms and standardization efforts), less efforthas been expended on a systematic analysis of the collecteddata. This matters, because early works such as [8] havedemonstrated that broadband speed measurements can exhibithigh variability, and these differences arise from a complexset of factors including test methodology and test conditions,including home networks and end users’ computers, that makeit very challenging to attribute performance bottlenecks to theconstituent parts, specifically the ISP network. While theirwork acknowledges that broadband benchmarking needs tolook beyond statistical averages to identify factors exogenousto the ISP, they do not offer any specific approaches for doingso. NANO [9] developed a system that infers the effect ofISPs policy on a service performance – it also compares theservice performance across multiple ISPs. NANO establishesa causal relationship between an ISP and its observed per-formance by adjusting confounding factors such as client-based confounder, network-based confounder and time-basedconfounder. But, it does not consider the TCP throughputperformance comparison across ISPs. NetDiff [10] designeda system that offers a fair performance comparison of ISPnetworks by taking into account the size and geographic spreadof each ISP. It helps customers determine which ISP offers thebest performance according to their specific workload. Buttheir work considers only one confounding factor. A separatebody of work [11]–[13] explores model-driven and data-drivenmethods to estimate or predict end-to-end available bandwidth;however, they operate at short time-scale, their data-sets aresmall, and their focus is not specific to broadband networks.

3

We believe our work is among the first to combine causalinference techniques for observational studies with the bigdata openly available from the M-Lab measurement platformto attempt a fair comparison of ISP broadband performance.

B. Causal Inference Analysis

As mentioned earlier, the gold standard for comparisonsis a randomized control trial, which is not feasible in ourcase. We therefore have to use observational data with a prioriassignments of ISPs to subscribers, and use causal inferencemethods [14]–[16] that can control for differences in the co-variate distributions between the groups being compared so asto minimize confounding. One of the most popular methods is“matching” [17], which selects subsets of observations in onegroup (the treatment group) for comparison with observationshaving similar covariate distributions in the comparator group(the control group) – balancing the distribution of covariatesin the two groups gives the effect of a randomized experiment.Matching has been used extensively in epidemiological, social,and economic research studies, and has been proven to reduceconfounding bias very effectively. The most common ap-proaches to perform matching are propensity score matching,multivariate matching based on Mahalanobis distance, andmore recently, genetic matching algorithms. In this paperwe chose multivariate matching, which is easier to tune byinterpreting results when the number of attributes is not toolarge, and is well supported in R [18].

Once the covariates of the observations have been matchedbetween the groups, the difference in outcome is averaged toestimate the average treatment effect (ATE) – in medicine,this could quantify the average effect of a drug compared to aplacebo, while in our case it estimates the average differencein download speed between the two ISPs being compared.Certain pre-conditions are needed for our approach: we assumethat the group assignment (i.e., choice of ISP for subscriber)has been made independent of the outcome, conditional onthe observed covariates; that the baseline covariates, althoughmeasured post facto, are not affected by the treatment (ISP);and that there are sufficient observations for which the prob-ability of assignment is bounded away from zero and one.In simple words, this states that households (patients) did notmake ISP (treatment) choice based on known outcomes orattributes, and there are a reasonable number of samples fromthe two groups being compared that have similar covariatedistributions (our results in §V will capture this via p-values).

III. DATA-SET SELECTION, ATTRIBUTES, VISUALIZATION

In this section we briefly introduce M-Lab, its measurementtools and data repositories. We then describe the data we haveselected and pre-processed, the attributes we have extracted,and the visualization tool we have built.

M-Lab [19] was founded in 2008 as a consortium ofresearch, industry, and public-interest partners, backed byGoogle, to create a global open measurement platform anddata repository that researchers can use for deeper studies ofInternet performance. M-Lab has built a platform on whichtest servers are well distributed across continents and ISPs,

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 3 45 10 20 50 100 500 1000Test Frequency

% o

f tes

ts

AUGBUS

Overall test counts by countrycomplementary CDF

Fig. 1. CCDF of household test counts in AU, UK, and US.

and any interested party can design, implement, and deploynew Internet measurement tools under an open license. Thisgives a significant advantage to M-Lab over other platformssuch as PerfSONAR [20] – M-Lab provides a much largercollection data generated by tens of millions of tests fromclients connected to hundreds of ISPs across the globe everyyear. All data collected in the M-Lab platform are openaccess (as opposed to commercial platforms such as Ookla[21]), and available either in raw format as file archivesover Google cloud storage, or in SQL-friendly parsed formataccessible using BigQuery. In terms of diversity, the M-Labcovers a more diverse range of users compared to hardware-based platforms such as SamKnows [22] and BISMark [23] –deploying a hardware-based measurement at users’ premisesis constrained by distribution of devices, and thus limited toselected populations.

A. Data Set Selection and Pre-Processing

In this paper we use the data collected by the NetworkDiagnostic Test (NDT) tool, since it has by far the largestnumber of speed test samples (over 40 million for the year2016), and captures a rich set of attributes for each test(discussed later in this section). In order to evaluate thegenerality of our methods, we apply them to data from threecountries: Australia (AU), the United Kingdom (UK), and theUnited States (US). We select four of the largest ISPs fromeach country for comparison: Telstra, Optus, iiNet, and TPGfrom AU; BT, Virgin, Sky and TalkTalk from the UK; andComcast, Verizon, AT&T, and Cox from the US. For theseISPs, we analyze the NDT speed test measurements taken fromtwo years (2015 and 2016), comprising 1.3m samples for AU,1.4m for UK, and 14.5m for US – the latter is an order ofmagnitude larger since Google searches in the US got linkedwith NDT as of July 2016.

Determining household speed: Our first objective is toset a baseline for ISP speed comparison by computing theirmean/median values. However, we found in the NDT data-set

4

●

●

●

●●

● ● ●

●●

●●

0.0

2.5

5.0

7.5

10.0

1 2 3 4 5 6 7 8 9 10 11 12month

Dow

nloa

d sp

eed

(Mbp

s)

● Optus TPG Telstra iiNet

(a) AU.

●● ●

●

●●

●

●●

●

●●

0

10

20

30

1 2 3 4 5 6 7 8 9 10 11 12month

Dow

nloa

d sp

eed

(Mbp

s)

● BT Sky TalkTalk Virgin

(b) UK.

●● ● ● ● ●

● ●● ● ● ●

0

10

20

30

40

50

1 2 3 4 5 6 7 8 9 10 11 12month

Dow

nloa

d sp

eed

(Mbp

s)

● AT&T Comcast Cox Verizon

(c) US.

Fig. 2. Monthly median download speeds: (a) Australia, (b) UK, (c) US.

that some IP addresses were conducting many more tests thanothers (for convenience of exposition we will henceforth referto each unique IP address as a “household”). Our data setwas found to have around 565k, 464k, and 2.81m householdsfor AU, UK, and US respectively, indicating that the averagehousehold contributes only 2-4 samples each month. There ishowever a significant skew in monthly test frequency amongsthouseholds, as shown in the complementary CDF (CCDF)of Fig. 1 - in AU and UK for example, the bottom 50% ofhouseholds contribute 5 or fewer samples, while the top 10%contribute 50 or more samples each. We eliminate this biasby aggregating (averaging) the results to obtain a single valueper household per month, and plot the resulting month-by-month median download speed across households in Fig. 2.The rankings shown in the figure are broadly consistent withthe ones published by Netflix – like us, the Netflix ISP speedindex ranks Optus, Virgin, and Verizon as highest respectivelyin AU, UK, and US for most months of 2016. There is noreference point to how Netflix computes its ISP indexing.But because this way it generates similar ranking as Netflixindex, so we believe it’s a fair baseline and will use it asa naive baseline for our discussions and comparisons againstother methods developed later in this paper. Therefore, lateron, when we refer to naive arithmetic average, we mean ISPperformance aggregated by household – each house only hasone vote to the ISP average.

Determining access speed-tier: The download speed for ahousehold will be limited by the capacity of its access link,which in turn is dictated by physical attributes such as medium(fiber, copper, wireless) and distance from the local exchange.It may further be constrained if the subscriber has chosena plan with lower advertised speed. We term this maximumpossible speed available to the household as its “access speed-tier”. As we will see in the next section, this attribute isimportant when comparing ISPs, but is not explicitly presentin the data since M-Lab is not privy to advertised speeds andsubscriber plans. We, therefore, have to infer a household’saccess speed-tier from the measured data. We take an ap-proximate approach of using the largest value of measuredspeed as the access speed-tier for that household, provided:(a) the household has conducted a minimum threshold numberof tests, and (b) at least one test was conducted during off-peak hours (i.e., outside of 7pm-11pm local time). Filtering

TABLE INUMBER OF SAMPLED NDT MEASUREMENTS FROM EACH ISP

year ISP raw test count annual test count > 20 used2015 Telstra 117,019 11,180 9.6%2015 Optus 46,138 8,232 17.8%2015 iiNet 42,917 5,144 12.0%2015 TPG 52,186 14,928 28.6%2015 BT 238,134 13,844 5.8%2015 Virgin 205,149 54,371 26.5%2015 Sky 235,271 27,623 11.7%2015 TalkTalk 7,450 652 8.8%2015 AT&T 460,482 148,512 32.3%2015 Cox 215,499 56,282 26.1%2015 Verizon 291,421 88,010 30.2%2015 Comcast 769,728 216,252 28.1%2016 Telstra 478,469 65,839 13.8%2016 Optus 161,303 26,421 16.4%2016 iiNet 199,764 20,525 10.3%2016 TPG 219,547 53,879 24.5%2016 BT 126,013 10,890 8.6%2016 Virgin 71,947 20,405 28.4%2016 Sky 95,964 10,292 10.7%2016 TalkTalk 17,543 3,666 20.9%2016 AT&T 2,841,976 1,280,814 45.1%2016 Cox 1,391,983 580,976 41.7%2016 Verizon 1,333,403 537,319 40.3%2016 comcast 5,442,720 2,164,122 39.8%

by higher test-count threshold will estimate access speed-tier more accurately, but reduces the data-set by eliminatinghouseholds that conduct fewer tests (see CCDF of test-countsin Fig. 1). We chose threshold of 20 for AU and UK, and 50for US, so as to get reasonable confidence in our estimatesof access speed-tier. As we can see in Table I, the test-countthresholds we choose are able to retain 10-30%, 6-27% and 26-33% samples for AU, UK and US respectively in 2015’s data,and 10-24%, 9-30% and 40-45% samples for AU, UK andthe US respectively in 2016’s data. As the magnitude of thenumber of samples increased in the US since July of 2016, webelieve, having a test-count threshold to select measurementsamples will not be a constraint for applying our method onfuture’s data.

B. Attribute Selection

The NDT speed-test client connects to its nearest NDTserver for the speed test. The server records the client infor-mation (IP address, geographic location, OS, client version),

5

Fig. 3. Comcast speed-tests faceted by day-of-month for December 2016.

network attributes (RTT, MSS), server side stats (TCP buffers,maximum congestion window sizes, and other web100 vari-ables [24] for TCP tuning), and detailed run-time measure-ments (speed, loss, congestion signal counts). While nearly50 attributes are included, we found that many were sparselyrecorded, due to different NDT client versions embedded indifferent applications on different operating systems choosingto record a different subset of attributes.

The TCP window size of the client is expected to affectmeasured speeds. There are a number of Web100 performancestatistics related to the TCP window [24]: the TCP receiver’sannounced maximum window size (Rwnd) which is deter-mined by the available head room in the RecieveBuf, andAverage Receiver Window is generally close to Rwnd. We usethe maximum window size (Rwnd) as it is solely dependent onclient conditions, whereas RecieveBuf can also reflect networkconditions. The distance between client and server is recordedin the form of minimum round-trip-time (min-RTT) over theduration of each test, as is the maximum segment size (MSS).The client region is recorded, which allows conversion to localtime-of-day to determine if the test is done during peak or off-peak times. The Operating System and version attribute cantell us about the default host settings (e.g., TCP auto-tuning,Nagle option) that affect speed performance. The broadbandISP for the client is deduced based on a Whois lookup ofthe client IP address, and the speed-tier for the householdcomputed as explained earlier. The Client Limited Time isthe percentage of time a test was constrained by the clientitself. This attribute is inherently correlated with speed-tier –we found that higher speed-tier clients are more likely to beconstrained by client-side limitations. Since this attribute is notdirectly associated with the speed, we do not process it whenwe quantify the correlation between attributes and speeds butonly use it for applying the Causal Inference model later on.The impact of these attributes, reflective of test conditions,will be discussed in the next section.

C. Visualization Tool

Since visualization is key to human comprehension andinterpretation of results, we built a tool to ease the generationof plots of various performance measures (speed, RTT, con-gestion signals) filtered by country, ISP, or specific household,at time-scales of hours, days, and months. A data-extractionscript in Python queries the M-Lab NDT store to extractdata, and an R script filters fields of interest, and annotatesit with extra attributes (such as speed-tier and local time)into country-specific local files. A set of analytics scripts inR then perform the various algorithmic operations, rangingfrom simple aggregation and normalization to more complexcausal inference models discussed later. A JavaScript front-endprovides user interaction to input plotting options and displaythe resulting graphs.

Our UI is openly accessible athttps://mlab-vis.sdn.unsw.edu.au/ and we encouragethe reader to try the various plot options, such as: (a)aggregated plots that show monthly/hourly ISP (raw ornormalized) speeds, (b) scatter plots that show downloadspeeds by time-of-day or day-of-month, (c) distribution plotof speed-tier for a specific ISP, (d) correlation plot showinghow the download speed relates to the various attributes,and (e) household plots that show speed, RTT, etc., specificto a client IP address. Though we will see several plotsgenerated by our tool throughout this paper, here we wouldlike to illustrate some visual insights from a facet plot, inthis case speed measurements taken at different times of day.Fig. 3 shows a panel of 31 plots, each of which depicts allspeed-tests done during that day of the month, over the monthof December 2016 for Comcast. The top-10 contributinghouseholds are each given their unique color, with theirIP addresses (and in parenthesis the number of test pointscontributed for that month) shown in the legend. One canimmediately see the temporal skew in testing patterns: thedark green household (IP address 67.180.193.135 with 1281tests) does its speed testing exclusively on three days (22-Dec,29-Dec, and 30-Dec), the light green coded household (IPaddress 24.147.127.89 with 672 tests) is concentrated on 18thDecember, while other households such as red (IP address50.248.236.185 with 1809 tests) and purple (IP address76.114.35.144 with 1055 tests) are spread across every dayof the month. The plot also gives a visual representation tothe variability in number of tests conducted across houses,as well as the variability in speed experienced by the samehousehold. It is often useful to corroborate the numericalresults presented in this paper with their visual depictionsmade possible by our tool.

IV. TEST DATA ATTRIBUTE DISTRIBUTIONS AND BIASES

In this section we study how test condition attributes (af-fecting speed) can vary in measurements across different ISPs,and how this can bias the comparison results. We begin byfeeding the measurement test results, along with the associatedtest conditions attributes, to the Random Forest method in Rto compute the “importance rank” (other machine learningmethods for variable selection, such as Bayesian Additive

6

Time of day

MSS

ISP

OS

Distance

Receiver Window Size

Speed tier

0 100 200

vimp

(a) AU.

Time of day

MSS

ISP

OS

Distance


Speed tier

0 200 400

vimp

(b) UK.

Time of day

MSS

OS

ISP

Distance


Speed tier

0 250 500 750 1000

vimp

(c) US.Fig. 4. Attribute importance computed from Random Forest for (a) AU, (b) UK, and (c) US.

10%

9%

19%

35%

11%17%

0.00

0.25

0.50

0.75

1.00

8 12 20 30 50 100Capacity−tier (Mbps)

Dens

ity

(a) Optus.

21%12%

16%14%

13%

19%

0.00

0.25

0.50

0.75

1.00

8 12 2030 50 100Capacity−tier (Mbps)

Dens

ity

(b) Telstra.

Fig. 5. Comparing access speed-tier for (a) Optus, and (b) Telstra, in AU.

0.0

0.5

1.0

1.5

1 10 100Speed Tier (Mbps)

Dens

ity

Linux 3.4+ Windows Vista + Windows XP −

Fig. 6. Comparing download speeds by OS in AU.

Regression Trees [25], provided similar results). Fig. 4 showsthat the access speed-tier, host buffers, and distance attributeshave the highest impact on measured download speed, acrossall the countries studied. This by itself is not very surprising,since these factors directly determine TCP dynamics and hencemeasured speeds. What is surprising to observe is that theISP per-se has a relatively lower weight, typically rankingfourth or fifth in importance. Of course, in countries such asthe US, each ISP often runs their own physical broadbandinfrastructure, and therefore wields a much larger influencevia the access speed-tier attribute, whereas in countries such asAU and UK the ISPs typically share (nationalized) broadbandinfrastructure, and hence do not dictate the access speed-tierattribute. In either case, the ISP attribute is found to be no moresignificant than the operating system on the client running the

test.

Even with a shared/nationalized broadband infrastructure,some ISPs may be serving customers with lower access speed-tiers, which can drag their averages down. In fact, in AU,Telstra claims that it serves more rural/regional customers thanother ISPs such as Optus, which is used as a reason why itranks lower on the Netflix ISP speed index (and in our monthlymedian plot shown in Fig. 2(a)). To check the plausibility ofthis claim, we use our tool to plot in Fig. 5 the distribution ofspeed-tiers of households served by Tesltra and Optus in 2016.The disparity is evident – Optus has only 10% of subscribersat speed-tier below 8 Mbps and 63% above 20 Mbps, whileTelstra has 21% of subscribers below 8 Mbps and only 46%above 20 Mbps. Since much of the access infrastructure inAustralia is open and can be shared by all ISPs, the disparityin access speed-tier is attributable to different proportions ofmetropolitan versus regional customers served by the two ISPs.Another illustration of the impact of an attribute, in this casethe client OS, is shown visually in Fig. 6. This plot is from M-Lab test data from the AU for 2016, and shows the distributiondensity of measured download speed (x-axis on log scale)separated by the OS type. The bias is again evident here –clients using flavors of Linux version 3.4 or higher (solidred curve) are clustered in the range of 10-100 Mbps, whileWindows clients running XP or older (dashed blue curve) areconcentrated in 5-20 Mbps – this could be attributed to thelack of TCP auto-tuning in older versions of the Windows OS[26].

7

0

1

2

30 100 300 1000 3000Rwnd (KB)

Den

sity

0

2

4

6

1 10 100Speed Tier (Mbps)

Den

sity

0

1

2

3

4

5

3 10 30 100 300Distance (ms)

Den

sity

0.0

0.1

0.2

1250 1300 1350 1400 1450 1500MSS (Bytes)

Den

sity

0

50

100

Linux 3.4+ Windows Vista + Windows XP −Operating System

Cou

nt

ISP optus telstra iinet tpg

(a) AU.

0

5

10

15

20

30 100 300 1000 3000Rwnd (KB)

Den

sity

0

5

10

15

1 10 100 1000Speed Tier (Mbps)

Den

sity

0

2

4

6

1 10 100 1000Distance (ms)

Den

sity

0.0

0.5

1.0

1.5

1250 1300 1350 1400 1450 1500MSS (Bytes)

Den

sity

0

5000

10000

Linux 3.4+ Windows Vista + Windows XP −Operating System

Cou

nt

ISP at&t cox comcast verizon

(b) US.

Fig. 7. Distribution of various attributes for: (a) AU, and (b) US.

0

25

50

75

100

Telstravs.

iiNet

Optusvs.

iiNet

Telstravs.

TPG

Optusvs.

TPG

Optusvs.

Telstra

iiNetvs.

TPG2016−ISP pairs in speedtier bin [0, 8]

Disc

arde

d sa

mple

s (%

)

nr−0.1nr−0.2r−0.1r−0.2

(a) AU.

0

25

50

75

100

Comcastvs.

AT&T

Verizonvs.

AT&T

Coxvs.

AT&T

Verizonvs.Cox

Verizonvs.

Comcast

Coxvs.

Comcast2015−ISP pairs in speedtier bin [30, 50]

Disc

arde

d sa

mple

s (%

)

nr−0.1nr−0.2r−0.1r−0.2

(b) US.Fig. 8. Impact of matching with/without replacement and caliper for: (a) AU and (b) US.

For completeness, we show the distributions of all themajor attributes (host TCP window size, speed-tier, distance,MSS, and OS) in Fig. 7 for AU and US (UK has similarcharacteristics to AU, and is omitted here for space reasons).It can be seen that in almost every attribute, the ISPs differ:for example in AU, TPG subscribers are more skewed towardslower Rwnd (TCP window), larger client-server distance, andmore Windows XP (or older) OS, compared to subscribersof other ISPs, while in the US AT&T subscribers have largerdistance and more widely spread Rwnd than others. As dis-cussed earlier, such differences in attributes (that are reflectiveof test conditions) can bias the test outcomes in multiple ways;the next section develops a method to eliminate this bias andundertake a fair comparison of ISPs.

V. DEBIASING USING MULTI-VARIATE MATCHING

We use the causal inference technique called multi-variatematching, as briefly introduced in §II-B, to balance the covari-

ate distributions to reduce bias. For attributes that take con-tinuous values (speed-tier, host-buffers, client-server distance,MSS), we use the Mahalanobis distance [18] to compute the“closeness” between measurement samples pertaining to ISP-1 (the “treatment group”) and ISP-2 (the “control group”).Samples that are within a “caliper” (specified in units ofthe standard deviation of each attribute) distance from eachother are deemed to be matched, and constitute the “commonsupport” between the two groups, while all other samplesare dropped. A larger caliper allows more samples to bematched for greater common support, reducing error (variance)in the comparison of the average treatment effect (ATE),whereas a smaller caliper makes the matching more exactfor improved unit homogeneity, yielding lower bias [15]. Thecaliper therefore has to be tuned to achieve the desired balancebetween error and bias.

Another key factor is to check whether matching shouldbe done “with replacement” (i.e., r) or “without replacement”

8

Before Matching After matching







0.1 1 10 100 0.1 1 10 1000.0

0.2

0.4

0.6

1 10 100 1000 1 10 100 10000.00.51.01.52.02.5

100 10000 100 100000

1

2

3

1400 1450 1500 1400 1450 15000.00

0.05

0.10

0.15

0.01 0.1 1 0.01 0.1 10

1

2

3

Non−peakWeekdays

PeakWeekdays

Non−peakWeekends

PeakWeekends

Non−peakWeekdays

PeakWeekdays

Non−peakWeekends

PeakWeekends

0

5000

10000

window scale disabled


0

10000

20000

Download Speed (Mbps)

Distance (ms)

Rwnd (KB)

MSS (Byte)

Receiver limited time

Time of week

OS

Dis

tribu

tion

Dis

tribu

tion

Dis

tribu

tion

Dis

tribu

tion

Dis

tribu

tion

No.

of s

ampl

esN

o. o

f sam

ples

Optus Telstra

Optus TelstraFig. 9. Covariate distribution before and after matching with replacementOptus vs. Telstra (in AU) measurements with caliper 0.2.

(i.e., nr). Matching with replacement can often decrease biasbecause controls that look similar to many treated individualscan be used multiple times [27], but this makes the inferencemore complex since matched controls become dependent. Weemploy both of these methods (i.e., r and nr). Also for caliper,we use two values 0.1 and 0.2 to evaluate the impact of bothtight and relax matching respectively. We show our results inFig. 8 for six pairs of ISPs within a specific speed-tier in AUand US.

It is seen that when matching without replacement is em-ployed, for both AU and US, a fairly large fraction of samples








0.1 1 10 100 0.1 1 10 1000.0

0.5

1.0

1.5

1 10 100 1 10 1000

1

2

3

100 1000 10000 100 1000 100000

1

2

1400 1450 1500 1400 1450 15000.0

0.1

0.2

0.01 0.1 1 0.01 0.1 10.0

0.5

1.0

1.5

Non−peakWeekdays

PeakWeekdays

Non−peakWeekends

PeakWeekends

Non−peakWeekdays

PeakWeekdays

Non−peakWeekends

PeakWeekends

05000

100001500020000



010000200003000040000

Download Speed (Mbps)

Distance (ms)

Rwnd (KB)

MSS (Byte)

Receiver limited time

Time of week

OS

Dis

trib

utio

nD

istr

ibut

ion

Dis

trib

utio

nD

istr

ibut

ion

Dis

trib

utio

nN

o. o

f sam

ples

No.

of s

ampl

es

Comcast AT&T

Comcast AT&TFig. 10. Covariate distribution before and after matching with replacementComcast vs. AT&T (in US) measurements with caliper 0.2.

(i.e., typically between 50% to 75%) fall out of the commonsupport, and thus are discarded – except for “Verizon vs. Cox”pair in US with the caliper value 0.1. Instead, matching withreplacement discards less fraction of samples (or keeps a largerportion of samples in common support area). Focusing on AUin Fig. 8(a), the discard rate is less than 25% for four pairsof ISPs when matching with replacement is employed and thecaliper value is set to 0.2 (purple bars) – two pairs experience arelatively higher discard rate about 45%. For US, on the otherhand, same settings give a fairly consistent discard rate of lessthan 25% across four pairs of ISPs, as shown by purple bars

9

●●

●●

●●●●●●

●●

●● ●●●●

●●

●●

●●●●●●●●

●●●●●●

●●●● ●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●

●●

●●

●●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●●

●

●●●

●

●

●

●

●

●

●●

●●●

●

●

●

●●

●

●●

●

●

●●

●●

●●

●

●●

●

●

●

●

●

●●

●

●

−10

−5

0

5

1 10 20 30 40 50 60 70 80ISP pairs ID

Diff

. of d

ownl

oad

spee

d (M

bps)

● nrr

Fig. 11. Comparing ISP speeds for 83 pairs in AU and US.

in Fig 8(b). As a results, we use matching with replacementand caliper 0.2 having sufficient number of samples in thecommon support to keep error low.

A. Matching on Selected ISP Pairs

Let us begin by looking at Optus vs Telstra in AU for speed-tier [0, 8] Mbps. In Fig. 9, we show how covariate measuresand download speeds change when matching with replacementof caliper 0.2 is applied. It is seen that attributes differ substan-tially before and after matching – matching discards 20.8% ofsamples and makes the covariate distributions more similar(thereby reducing bias). Note that, before matching, Telstra isdisadvantaged in terms of longer distance and larger numberof users with old OS (not supporting TCP auto-tuning), andhence sees a smaller download speed on average, suggestedby naive aggregation (e.g., Fig. 2(a)). When confoundingfactors (attributes) are balanced using the matching method,interestingly Telstra gives better performance than Optus onaverage – a result contrary to the one seen in Fig. 2(a).

We now compare Comcast and AT&T in the US for speed-tier bin [30, 50] Mbps. We can observe in Fig. 10 thatattributes are quite disparate between Comcast and AT&Twhen all samples are considered (before matching). Comcast isdisadvantaged against AT&T by several factors including: a) ahigher distance between clients and server; b) a larger numberof measurements from old OS (i.e., window scale disabled);and c) a larger number of measurements in the state withhigh value (i.e., 0.1 to 1) of Receiver-Limited-Time. Therefore,a naive average difference suggests that Comcast is about 8Mpbs slower than AT&T on average. But matching balancesthe attributes (confounding factors) as shown by plots on theright in Fig. 10 and thus reduces the speed difference to 3-4Mbps.

B. Matching on Larger Set of ISP Pairs and Various Speed-Tiers

We now extend our evaluation to a larger set of ISP pairswith diversity of speed-tiers for dataset from two years 2015and 2016. Table II shows 83 ISP pairs (in AU and US) alongwith their corresponding speed-tier (in Mbps) and the year ofdata. ISP pairs are sorted in ascending order of their speeddifference (computed from naive aggregation) – each pair isassigned a unique ID (order) for ease of reference in Table II.

We show the results of matching with caliper 0.2 in Fig. 11.The speed difference of each ISP pair resulted from naivearithmetic means, matching without replacement (nr), andmatching with replacement (r) are shown by black dots,red segments, and green segments respectively. Matchingestimated differences are represented by error bars of 95%confidence interval.

We make the following observations: (a) when raw dif-ference is close to zero for pair IDs between 40 to 70, theestimated differences after matching are also close to zero –matching method does not change the inference, (b) when rawdifference has a large negative value for pair IDs between 1to 20 (or a large positive value for pair IDs between 75 to83), matching estimates the difference much lower (i.e., 0 to-3 Mbps) than what naive arithmetic means indicate (i.e., -3 to -8 Mbps), and (c) in some cases, the raw differenceis positive but the estimated value becomes negative withstatistical significance – this indicates that a simple use ofraw average for ranking ISPs could be misleading, for exampleOptus vs. Telstra in AU (as discussed earlier).

VI. REFINING SPEED-TIER ESTIMATION

We developed a causal inference model for a fair compari-son of ISP performance in the previous section. In this sectionwe refine our model by undertaking a more detailed analysis toestimate household speed-tier (also referred to as “broadbandaccess capacity”) from the MLab data, allowing us to furtherimprove the fidelity of broadband speed comparisons.

A. Isolating Households

M-Lab data-points are indexed by IP address of homegateways. ISPs allocate IP addresses based on their resourcepool, subscriber base, or business policy. In some cases, anISP (often a large one) may have a fairly large pool of publicIP addresses and can assign every subscriber a unique publicIP, but one-to-one address lease may change dynamically overtime. In other cases, the ISP (often a small one) will insteadassign a public IP address to a group of subscribers, andthen employ NAT to multiplex their traffic. Consequently, itbecomes challenging to extract the broadband capacity fromM-Lab data, as an IP address does not necessarily representa single household. Thus, we need a method to isolate data-points corresponding to single households.

The congestion signal of each NDT data-point indicateshow the TCP congestion window (cwnd) is being affectedby congestion and is incremented by any form of congestionnotifications including Fast Retransmit, ECN, and timeouts.Theoretically, a large value of congestion signal (congestion

10

TABLE IIISP PAIRS WITH COMMON SUPPORT IN AU AND US.

ISP pair ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

ISP1 vs ISP2 Com

cast

vs.A

T&

T

Com

cast

vs.A

T&

T

Virg

invs

.BT

Com

cast

vs.V

eriz

on

Com

cast

vs.V

eriz

on

Com

cast

vs.C

ox

Com

cast

vs.A

T&

T

Virg

invs

.BT

Com

cast

vs.A

T&

T

Virg

invs

.BT

Virg

invs

.BT

Com

cast

vs.C

ox

Com

cast

vs.C

ox

Com

cast

vs.C

ox

Virg

invs

.Sky

Com

cast

vs.A

T&

T

Virg

invs

.Sky

Virg

invs

.BT

Virg

invs

.Sky

Virg

invs

.Sky

Virg

invs

.BT

Speed-tier [30, 50] [30, 50] [50, 75] [75, 100] [20, 25] [50, 75] [75, 100] [25, 30] [20, 25] [30, 50] [30, 50] [30, 50] [20, 25] [12, 20] [25, 30] [50, 75] [30, 50] [12, 20] [20, 25] [25, 30] [20, 25]

Year 2016 2015 2015 2015 2015 2015 2016 2016 2016 2015 2016 2016 2016 2016 2016 2016 2016 2015 2015 2015 2015

ISP pair ID 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

ISP1 vs ISP2 Virg

invs

.Tal

kTal

k

Tels

tra

vs.O

ptus

Com

cast

vs.A

T&

T

Com

cast

vs.A

T&

T

Com

cast

vs.C

ox

Virg

invs

.Sky

Com

cast

vs.A

T&

T

Com

cast

vs.V

eriz

on

Virg

invs

.Sky

Com

cast

vs.V

eriz

on

Virg

invs

.Sky

Com

cast

vs.V

eriz

on

Com

cast

vs.A

T&

T

Com

cast

vs.A

T&

T

Com

cast

vs.V

eriz

on

TPG

vs.O

ptus

Com

cast

vs.A

T&

T

TPG

vs.T

elst

ra

Com

cast

vs.C

ox

Com

cast

vs.V

eriz

on

Com

cast

vs.C

ox

Speed-tier [12, 20] [25, 30] [12, 20] [25, 30] [30, 50] [12, 20] [0, 8] [50, 57] [30, 50] [30, 50] [12, 20] [20, 25] [8, 12] [20, 25] [25, 30] [12, 20] [0, 8] [12, 20] [25, 30] [50, 57] [20, 25]

Year 2016 2016 2016 2015 2015 2016 2016 2016 2015 2016 2015 2016 2016 2015 2015 2015 2015 2015 2015 2015 2015

ISP pair ID 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

ISP1 vs ISP2 Com

cast

vs.A

T&

T

Com

cast

vs.C

ox

Com

cast

vs.A

T&

T

Com

cast

vs.V

eriz

on

Com

cast

vs.C

ox

Com

cast

vs.V

eriz

on

Com

cast

vs.V

eriz

on

Com

cast

vs.C

ox

Com

cast

vs.A

T&

T

Com

cast

vs.V

eriz

on

Com

cast

vs.V

eriz

on

Tels

tra

vs.i

iNet

TPG

vs.T

elst

ra

Tels

tra

vs.i

iNet

Tels

tra

vs.O

ptus

Com

cast

vs.C

ox

TPG

vs.O

ptus

Tels

tra

vs.T

PG

Com

cast

vs.C

ox

Tels

tra

vs.i

iNet

Tels

tra

vs.T

PG

Speed-tier [8, 12] [8, 12] [12, 20] [30, 50] [12, 20] [12, 20] [0, 8] [0, 8] [25, 30] [0, 8] [8, 12] [0, 8] [8, 12] [8, 12] [0, 8] [8, 12] [0, 8] [0, 8] [75, 100] [12, 20] [8, 12]

Year 2015 2015 2015 2015 2015 2015 2016 2015 2016 2015 2015 2016 2015 2016 2016 2016 2015 2016 2015 2016 2016

ISP pair ID 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83

ISP1 vs ISP2 Com

cast

vs.V

eriz

on

Com

cast

vs.C

ox

Tels

tra

vs.i

iNet

Virg

invs

.Sky

Tels

tra

vs.O

ptus

Tels

tra

vs.O

ptus

Tels

tra

vs.O

ptus

Virg

invs

.Sky

Tels

tra

vs.T

PG

Virg

invs

.BT

TPG

vs.T

elst

ra

TPG

vs.i

iNet

Com

cast

vs.V

eriz

on

Com

cast

vs.C

ox

Tels

tra

vs.i

iNet

Tels

tra

vs.O

ptus

Tels

tra

vs.O

ptus

Com

cast

vs.C

ox

Com

cast

vs.V

eriz

on

Com

cast

vs.C

ox

Speed-tier [8, 12] [0, 8] [20, 25] [0, 8] [8, 12] [12, 20] [20, 25] [0, 8] [12, 20] [0, 8] [0, 8] [0, 8] [12, 20] [100, 1000] [30, 50] [30, 50] [50, 75] [75, 100] [25, 30] [25, 30]

Year 2016 2016 2016 2016 2016 2016 2016 2015 2016 2015 2015 2015 2016 2015 2016 2016 2016 2016 2016 2016

●●●●●

●

●

(a) Negative correlation (Cox, 458 tests from 98.174.39.22) – high speedduring un-congested period, and low speed during fairly congested period.

●● ●● ●● ●●●● ●●●●● ●● ●●● ●● ●● ●●● ● ●●●●●● ●● ●● ●●● ●●● ●●●●● ● ●● ●●●● ●●● ●●● ● ●●●●●●●●● ● ●● ●●●●●● ●● ● ●● ●●●● ●●●●●●● ● ●●●● ●●●● ●● ●●● ●●●●●●●● ●●●● ●●● ●● ●●● ●●● ●●●●●●●●● ●●● ● ●●● ●● ●●●●●●● ●●●● ●● ●●●● ●●● ●●

●●

●

(b) Positive correlation (City of Thomasville Utilities), 896 tests from64.39.155.194) – high speed even during highly congested period, andlow speed even during uncongested period.

Fig. 12. Two samples of correlation between download-speed and congestion-count.

(a) AU. (b) US.Fig. 13. Negative/Positive correlation across large/small ISPs in (a) AU, and (b) US.

11

●●●● ●●●●

● ●● ●●

●

●●●

●

●

●

(a) Negative correlation (Cox, 458 tests from 98.174.39.22).

●●● ●● ●● ●●●●●●●●●●●●●●● ●● ●●● ●●●● ●●●●●●●●●●●●●●●●● ●●●

●●●●●● ●●●● ●●● ●●●●●●● ●●●●●●●●●●●●●

●●●● ●● ●● ●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●● ● ● ●●●● ●●●●●● ●●●●●●●●

●

(b) Positive correlation (City, 896 tests from 64.39.155.194).Fig. 14. Consistency of correlation between download-speed and congestion-count across four months: (a) negative, and (b) positive correlation.

count) should correspond to a low TCP throughput (downloadspeed), and vice versa.

We denote the Pearson’s correlation coefficient betweenthe measured download speed and recorded congestion countby ρ. This parameter is computed across all tests correspond-ing to a given client IP address.

We expect ρ to be negative for any given household, ashigher broadband speed should correlate with lower conges-tion, and this is indeed the case for a majority of client IPaddresses contained in the M-Lab data. However, for some IPaddresses, we observe strong positive correlations (i.e., ρ > 0).Our hypothesis for this unexpected phenomenon is that whenmultiple houses of an ISP network are sharing an IP address;the speed measurements can vary in a wide range dependingon broadband capacity of individual households, whereascongestion counts would have smaller variations reflecting thecondition of the network. Thus, having mixed measurements(speed and congestion-count) from multiple households willlikely result in imbalanced data pairs causing an unexpectedpositive correlation between speed and congestion-count.

To better visualize our discussion and hypothesis, wepresent in Fig. 12 samples of the correlation between thedownload-speed and congestion-count observed over a 12-month period from two IP addresses. In each plot, thenormalized density distribution of download-speed measure-ments is depicted by solid black lines. We overlay it byscatter plot of download-speed (x-axis) and its correspondingcongestion-counts (y-axis), shown by square/circle markers.Note that for a given IP address, we unit-scale (normalize)measured download-speed and congestion-count separately bydividing each data point by corresponding maximum value(i.e., Xi/Xmax and Yi/Ymax; where [Xi, Yi] is the pair ofdownload-speed and congestion-count for a client IP). In ourplots, the scaled value of congestion count for each test-pointis proportional to the size of the corresponding marker, tieredin two colors – low/medium (i.e., < 0.5) congestion counts arein green, and high/very-high (i.e., ≥ 0.5) congestion counts arein red.

Fig. 12(a) shows a negative correlation (i.e., ρ = −0.83) for458 test-points obtained from an IP address served by CoxISP in the US – smaller green squares are mainly skewed

towards the bottom right of the plot (low congestion and highspeed values), and larger red circles are grouped at the top leftregion of the plot (i.e., high congestion and low speed values).On the other hand, Fig. 12(b) shows a positive correlation(ρ = 0.69) for 896 test-points from City ISP in the US –smaller green squares are mainly spread from left to middlebottom of the plot (low congestion and low/medium speedvalues), and larger red circles are clustered at top middle ofthe plot (high congestion and medium/high speed values).

B. Large-Scale Consistency Validation

We now go back to our M-Lab data to analyze the ρ pa-rameter across various ISPs of different size as well as acrossmonths checking whether a consistent pattern of correlation isobserved.

1) Across ISPs: Large ISPs such as AT&T and Comcsat inthe US with a wealth of public IP addresses (i.e., 91 millionand 51 million) Smaller ISPs who own smaller pool of IPv4addresses (e.g., class C blocks) are more likely forced toemploy NAT (or dynamic lease in the best case) for bettermanagement of their limited address resources. On the otherhand, larger ISPs who were assigned class A address blockswould have discretion to statically allocate one public IPaddress to each of their clients.

We, therefore, start examining the aggregate ρ parameterfor each ISP in AU and US. We select two of large and twoof small ISPs from each country for comparison: in Australia,Telstra and Optus as large providers, and Harbour and CEnetas small providers; in the US, Comcast and AT&T as largeproviders, and Hurricane and Lightower as small providers.We present in Fig. 13 the normalized density distribution of ρvalue across unique IP addresses of each ISP. We find ∼12K,∼4K, 79, and 24 unique addresses from network of Aus-tralian ISPs Telstra, Optus, Harbour and CEnet respectivelyconducting total of ∼453K, ∼182K, 5273, and 4638 NDT testsover 12-month period (Aug’16 - Jul’17). Fig. 13(a) showsthe ρ distribution for our selected operators in Australia. Itis seen that the ρ parameter is predominately negative in largeISPs (shown by solid red lines for Telstra and dashed greenlines for Optus in Fig. 13(a)) suggesting that majority of IP

12

TABLE IIIISP PAIRS WITH COMMON SUPPORT IN AU AND US (REFINED DATASET).

ISP pair ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

ISP1 vs ISP2 AT

&T

vs.C

ox

Virg

invs

.BT

Com

cast

vs.A

T&

T

Com

cast

vs.V

eriz

on

Virg

invs

.BT

Com

cast

vs.V

eriz

on

Com

cast

vs.C

ox

Virg

invs

.BT

Com

cast

vs.V

eriz

on

Virg

invs

.Sky

Tels

tra

vs.O

ptus

Virg

invs

.Sky

Virg

invs

.Sky

Virg

invs

.Sky

Tels

tra

vs.O

ptus

Virg

invs

.BT

Virg

invs

.Sky

Virg

invs

.Sky

AT

&T

vs.V

eriz

on

Com

cast

vs.A

T&

T

Com

cast

vs.A

T&

T

Speed-tier [50, 75] [50, 75] [50, 75] [75, 100] [30, 50] [20, 25] [50, 75] [30, 50] [25, 30] [30, 50] [25, 30] [20, 25] [25, 30] [25, 30] [50, 75] [12, 20] [30, 50] [12, 20] [75, 100] [30, 50] [20, 25]

Year 2016 2015 2015 2015 2015 2015 2015 2016 2015 2016 2016 2015 2015 2016 2016 2015 2015 2016 2016 2015 2015

ISP pair ID 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

ISP1 vs ISP2 Virg

invs

.Tal

kTal

k

Com

cast

vs.C

ox

AT

&T

vs.C

ox

Virg

invs

.Sky

Com

cast

vs.A

T&

T

Com

cast

vs.A

T&

T

TPG

vs.O

ptus

Com

cast

vs.A

T&

T

TPG

vs.T

elst

ra

Com

cast

vs.V

eriz

on

Com

cast

vs.C

ox

Com

cast

vs.C

ox

AT

&T

vs.C

ox

Com

cast

vs.C

ox

Com

cast

vs.V

eriz

on

Com

cast

vs.C

ox

Com

cast

vs.C

ox

Tels

tra

vs.i

iNet

Com

cast

vs.A

T&

T

TPG

vs.T

elst

ra

Tels

tra

vs.i

iNet

Speed-tier [30, 50] [30, 50] [8, 12] [12, 20] [25, 30] [8, 12] [12, 20] [0, 8] [12, 20] [50, 75] [75, 100] [12, 20] [12, 20] [8, 12] [0, 8] [0, 8] [25, 30] [12, 20] [12, 20] [8, 12] [0, 8]

Year 2016 2015 2016 2015 2015 2015 2015 2015 2015 2015 2015 2015 2016 2015 2015 2015 2015 2016 2015 2015 2016

ISP pair ID 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

ISP1 vs ISP2 Tels

tra

vs.i

iNet

Com

cast

vs.V

eriz

on

Com

cast

vs.V

eriz

on

AT

&T

vs.V

eriz

on

Com

cast

vs.V

eriz

on

Tels

tra

vs.T

PG

Tels

tra

vs.O

ptus

Tels

tra

vs.T

PG

Com

cast

vs.C

ox

Tels

tra

vs.O

ptus

Tels

tra

vs.O

ptus

Tels

tra

vs.T

PG

Tels

tra

vs.i

iNet

Virg

invs

.Sky

Virg

invs

.Sky

Virg

invs

.BT

TPG

vs.T

elst

ra

TPG

vs.i

iNet

AT

&T

vs.C

omca

st

Tels

tra

vs.i

iNet

Tels

tra

vs.O

ptus

Speed-tier [8, 12] [8, 12] [30, 50] [50, 75] [12, 20] [8, 12] [0, 8] [0, 8] [20, 25] [8, 12] [12, 20] [12, 20] [20, 25] [0, 8] [0, 8] [0, 8] [0, 8] [0, 8] [25, 30] [30, 50] [30, 50]

Year 2016 2015 2015 2016 2015 2016 2016 2016 2015 2016 2016 2016 2016 2015 2016 2015 2015 2015 2016 2016 2016

ISP pair ID 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81

ISP1 vs ISP2 AT

&T

vs.C

ox

Tels

tra

vs.O

ptus

AT

&T

vs.V

eriz

on

AT

&T

vs.C

ox

AT

&T

vs.C

omca

st

AT

&T

vs.C

omca

st

AT

&T

vs.C

omca

st

AT

&T

vs.V

eriz

on

AT

&T

vs.V

eriz

on

AT

&T

vs.C

omca

st

AT

&T

vs.V

eriz

on

Com

cast

vs.C

ox

AT

&T

vs.V

eriz

on

AT

&T

vs.C

ox

AT

&T

vs.C

ox

AT

&T

vs.C

omca

st

AT

&T

vs.C

omca

st

AT

&T

vs.C

omca

st

Speed-tier [75, 100] [20, 25] [0, 8] [0, 8] [8, 12] [0, 8] [75, 100] [8, 12] [25, 30] [12, 20] [20, 25] [100, 1000] [12, 20] [25, 30] [30, 50] [20, 25] [50, 75] [30, 50]

Year 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 2015 2016 2016 2016 2016 2016 2016

● ●● ●● ●●● ●● ● ● ●●● ●● ●●●●● ●● ●●● ● ●●● ●●● ●● ●● ●●● ●●● ●● ●● ●● ●●●● ●● ●● ●● ●● ●●● ●●●●● ● ●●● ●●● ●●●● ●●● ●●●●● ● ●● ●● ●●●● ●●●● ●●● ●●●● ●●● ●● ● ●● ● ●●● ●●● ● ●● ● ● ●● ●●●● ●●● ●●● ●●● ●●●● ●●●●●●● ●● ●●●●●●● ●●●● ●● ●● ●●●●●●● ●●● ● ●● ●● ●●● ●●●●●●● ● ●●●●●●● ●●●●●●● ● ●●● ●

Fig. 15. Outliers in speed measurements.

addresses present in M-Lab data (from these two large ISPs)are consistently assigned to single households. Moreover, the ρdistribution is fairly biased towards positive values for smallerISPs – average ρ = 0.31 for Harbour (its distribution isshown by dotted blue lines) and average ρ = 0.58 for CEnet(its distribution is shown by dashed-dotted purple lines) inFig. 13(a), meaning that IP addresses are mainly shared bymultiple households of varied broadband capacity.

Similarly, we observe an aggregate negative correlationvalues for large ISPs in the US along with neutral/positivecorrelation for smaller ISPs, as shown in Fig. 13(b). For ourUS selected ISPs Comcast, AT&T, Hurricane, and Lightower,we have ∼4.3m, ∼2.8m, ∼46K, and ∼14K NDT test-pointsrespectively indexed by ∼98K, ∼53K, 424, and 176 uniqueaddresses. The average ρ for large operators Comcast andAT&T is −0.39 and −0.43 respectively, whereas smalleroperators Hurricane and Lightower exhibit positive average

●●

●●

●●●●●●●●

●●●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●

●●

●●●●

●●●●

●

●

●

●

●

●

●

●

●

●

●●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●●

●●●

●

●●●

●

●●●

●●

●

●●

●●

●

●●

●●●●

●●●●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

−10

0

10

1 10 20 30 40 50 60 70 80ISP pairs ID

Diff

. of d

ownl

oad

spee

d (M

bps)

● nrr

Fig. 16. Comparing ISP speeds for 81 pairs in AU and US (refined dataset).

correlation of 0.21 and 0.10 respectively.We see on average a negative correlation between measured

download-speed and congestion-count across large networkoperators (with large pool of IP addresses), and positivecorrelation values across small network operators (with smallpool of IP addresses) in two countries Australia and US.

2) Across Months: We now track the correlation valuewithin a network operator across various months to checkwhether change of network conditions would affect the ρvalue. This verifies the validity of our hypothesis over time.We, therefore, compute the ρ value for a given IP address ona monthly basis using data points observed within a month,e.g., April 2017.

13

In Fig. 12, we saw speed, congestion and the ρ valuecomputed on aggregate data of 12-month period for onesample of IP address in each network (large and small sepa-rately). We visualize in Fig. 14 the monthly data along withcorresponding ρ values for the same IP addresses and theirrespective networks. We observe an strong negative correlationfor data of address 98.174.39.22 from Cox (one of the topten large ISPs in the US) consistent across four months in2017, as shown in Fig. 14(a). Individual monthly speed densitycurves (narrow single hump) and congestion clusters are fairlysimilar to the plot in Fig. 12(a), and the ρ value is −0.90,−0.88, −0.82, and −0.76 for successive months April, May,June, and July respectively.

Considering the IP address from a smaller operator inFig. 14(b), a strong positive correlation is observed consent-ingly across four successive months in 2016-2017. In eachplot, download-speed density curve depicts two humps andcongestion markers are aligned (green squares on the left andred circles on the right) in opposite direction to that which isexpected, just similar to aggregate performance measurementsin Fig. 12(b). We again see a strong positive ρ values of 0.61,0.62, 0.69, and 0.62 respectively for November and Decemberin 2016, and January and February in 2017.

Our analysis of M-Lab data across various network opera-tors and across various months validates that our hypothesisholds true.

C. Estimating Household Speed-Tier

We now filter measurements corresponding to those IPaddresses that exhibit positive correlation between theirdownload-speed and congestion-count (i.e., ρ > 0). We notethat a large fraction of IP addresses from small ISPs are filtereddue to positive ρ value. For example, no data from CEnet (inAU) is considered as single house [28].

After removing data of multiple households, we estimatethe speed-tier as a proxy for broadband capacity of eachhouse. We term this maximum possible speed available to thehousehold as its “speed-tier”. As far as maximum downloadspeed is concerned, in some cases we observe very largevalues in measurements which are more likely to be outliers.Fig. 15 exemplifies measured download speed from a samplehousehold. We use green solid lines to show the densitydistribution of speed overlayed by black circles stacked alongthe x-axis representing actual data points. We can see thatthere are several outliers observed around 60 Mbps in Fig. 15while the rest of measurements fall under 30 Mbps – themaximum speed value seems to be about half of outliers value.The dashed vertical red line depicts the cut-off point to filteroutliers.

In order detect and exclude outliers data in our study, weemploy the standard modified Thompson Tau technique [29] tostatistically determine rejection zone. This method eliminatesoutliers more than two standard deviations from the meanvalue. After filtering outliers from our dataset, we pick themaximum value of remaining data points as the estimatedspeed-tier of corresponding house (i.e., IP address) [28].

D. Multi-Variate Matching on Refined DatasetLastly, we apply the multi-variate matching technique on the

refined dataset (i.e., removing data-points of households withpositive ρ followed by eliminating outliers) to re-evaluate ourcomparison of ISP performance. Similar to previous section,we begin by sorting ISP pairs in ascending order of theiraverage difference of download speed. Note that we now have81 ISP pairs (i.e., 2 pairs less than the original dataset inthe previous section) with sufficient common support – newpairs are listed in Table III. Our matching results are shown inFig. 16. We observe that for majority of ISP pairs, matchingestimated differences are closer to zero compared to the naiveaverage – though we see a few exceptions (e.g., pair IDs 22,64, 72). This reiterates our view that when ISPs are comparedfairly by adjusting for test conditions, they are not so different.

VII. CONCLUSION

This paper is a first step towards a fair comparison of speedperformance across broadband ISPs, by applying emergingcausal inference techniques widely used in medicine to thelarge volume of measurement data from M-Lab. We first builta tool to pre-process and visualize M-Lab data, giving prelim-inary insights into the factors affecting speed performance. Wethen demonstrated that test attributes such as access speed-tier,host TCP window size, and server distance vary in distributionacross ISPs, and further that these attributes affect measure-ment outcomes. We then applied multi-variate matching toreduce the confounding bias, and our fair comparison betweenpairs revealed that the difference between ISPs is much lowerthan what naive aggregates may suggest. Our future work willexpand this study by estimating the comparative performanceof ISPs for individual customers rather than just aggregates.This will be achieved using more sophisticated methods, suchas machine learning based Targeted Maximum Likelihood(TML) algorithms, which can deal with both confounding aswell as differential causal effects.

REFERENCES

[1] X. Deng, J. Hamilton, J. Thorne, and V. Sivaraman, “Measuring Broad-band Performance using M-Lab: Why Averages Tell a Poor Tale,”in PRoc. International Telecommunication Networks and ApplicationsConference (ITNAC), Sydney, Australia, Nov 2015.

[2] Netflix. Netflix ISP Speed Index. http://ispspeedindex.netflix.com/.[3] Youtube. Youtube Video Quality Report. https://www.google.com/get/

videoqualityreport/.[4] FCC. Broadband Speed. https://www.fcc.gov/general/broadband-speed.[5] Australian Competition and Consumer Commission. Australia’s broad-

band speeds.[6] European Commission DG Communications Networks. Eu analysis of

broadband speed.[7] V. Bajpai and J. Schonwalder, “A Survey on Internet Performance

Measurement Platforms and Related Standardization Efforts,” IEEECommunication Surveys and Tutorials, vol. 17, no. 3, pp. 1313–1341,2015.

[8] S. Bauer, D. Clark, and W. Lehr, “Understanding Broadband SpeedMeasurements,” in Proc. 38th Research Conference on Communication,Information and Internet Policy, Sep 2010.

[9] M. B. Tariq, M. Motiwala, N. Feamster, and M. Ammar, “DetectingNetwork Neutrality Violations with Causal Inference,” in Proc. ACMCoNEXT, Rome, Italy, Dec 2009.

[10] R. Mahajan, M. Zhang, L. Poole, and V. Pai, “Uncovering PerformanceDifferences Among Backbone ISPs with Netdiff,” in Proc. USENIXNSDI. San Francisco, California, USA: USENIX Association, Apr2008.

http://ispspeedindex.netflix.com/

https://www.google.com/get/videoqualityreport/

https://www.google.com/get/videoqualityreport/

https://www.fcc.gov/general/broadband-speed

14

[11] M. Jain and C. Dovrolis, “End-to-End Available Bandwidth: Measure-ment Methodology, Dynamics, and Relation with TCP Throughput,” inProc. ACM SIGCOMM, Pittsburgh, PA, USA, Aug 2002.

[12] ——, “Ten Fallacies and Pitfalls on End-to-End Available BandwidthEstimation,” in Proc. ACM Internet Measurement Conference, Taormina,Sicily, Italy, Oct 2004.

[13] M. Mirza, J. SOmmers, P. Barford, and X. Zhu, “A Machine LearningApproach to TCP Throughput Prediction,” in Proc. ACM SIGMETRICS,San Diego, CA, USA, Jun 2007.

[14] J. Pearl, “Causal inference in statistics: An overview,” Statistics Surveys,vol. 3, pp. 96–146, 2009.

[15] D. B. Rubin, “Causal Inference Using Potential Outcomes: Design,Modeling, Decisions,” Journal of the American Statistical Association,vol. 100, no. 468, pp. 322–331, 2005.

[16] J. Sekhon, “The Neyman-Rubin Model of Causal Inference and Esti-mation via Matching Methods,” in The Oxford Handbook of PoliticalMethodology, 2006.

[17] E. A. Stuart, “Matching Methods for Causal Inference:A Review and aLook Forward,” in Institute of Mathematical Statistics, 2010.

[18] J. Sekhon, “Multivariate and Propensity Score Matching Software withAutomated Balance Optimization: The Matching Package for R,” Jour-nal of Statistical Software, vol. 42, no. 7, 2011.

[19] M-Lab. Measurement Lab. http://www.measurementlab.net/.[20] Various Authors. perfSONAR: PERFormance Service Oriented Network

monitoring ARchitecture. https://www.perfsonar.net/.[21] Ookla. Speed-Test. http://www.speedtest.net/.[22] Samknows. Samknows internet performance platform.[23] M-Lab. BISmark. http://www.measurementlab.net/tools/bismark/.[24] M. Mathis and J. Heffner and R. Raghunarayan. TCP Extended Statistics

MIB. https://www.ietf.org/rfc/rfc4898.txt.[25] J. Bleich, A. Kapelner, E. George, and S. Jensen, “Variable Selection

for BART: An Application to Gene Regulation,” The Annals of AppliedStatistics, vol. 8, no. 3, pp. 1750–1781, 2014.

[26] J. Semke, J. Mahdavi, and M. Mathis, “Automatic TCP Buffer Tuning,”in Proc. ACM SIGCOMM, Vancouver, British Columbia, Canada, Aug1998.

[27] E. A. Stuart, “Matching methods for causal inference: A review and alook forward.” Statistical Science, vol. 25, pp. 1–21, 2010.

[28] X. Deng, Y. Feng, H. Habibi Gharakheili, and V. Sivaraman, “EstimatingResidential Broadband Capacity using Big Data from M-Lab,” arXivpreprint, arXiv:1901.07059v1, Jan 2019.

[29] Tau. Modified Thompson Tau Outliers detection. http://www.statisticshowto.com/modified-thompson-tau-test/.

Xiaohong Deng received her B.Sc. and M.Sc. de-grees of Computer Science from Chongqing Uni-versity of Posts and Telecommunications in 2004and Beijing University of Posts and Communicationsin 2007 respectively, and her Ph.D. in ElectricalEngineering and Telecommunications from the Uni-versity of New South Wales in Sydney, Australiain 2019. She served France Telecom from 2008 to2013 as a Network Architect and Project Lead. Herresearch interests include broadband networks andbig data analytics of network performance data.

Yun Feng received his B.Sc. and M.Sc. degrees ofTelecommunication from Xidian University in Chinaand the University of New South Wales in Sydneyin 2014 and 2018 respectively. He was a researchassistant in the School of Electrical Engineering andTelecommunications at the University of New SouthWales. He is currently working at Shanghai HuaweiTechnologies. His research interests include big data,machine learning and application of embedded sys-tem.

Thanchanok Sutjarittham is currently pursuing herPh.D. in Electrical Engineering and Telecommunica-tions at the University of New South Wales (UNSWSydney), where she has also received her B.Eng. inElectrical Engineering and Telecommunications in2016. Her primary research interests include Inter-net of Things, sensors data analytics, and appliedmachine learning.

Hassan Habibi Gharakheili received his B.Sc. andM.Sc. degrees of Electrical Engineering from theSharif University of Technology in Tehran, Iran in2001 and 2004 respectively, and his Ph.D. in Elec-trical Engineering and Telecommunications from theUniversity of New South Wales in Sydney, Australiain 2015. He is currently a Senior Lecturer at theUniversity of New South Wales in Sydney, Australia.His current research interests include programmablenetworks, learning-based networked systems, anddata analytics in computer systems.

Blanca Gallego Luxan received the B.S. degreefrom the Universidad Autonoma Metropolitana, andthe Ph.D. degree from the University of Califor-nia, Los Angeles. She is currently an AssociateProfessor at the Centre for Big Data Research inHealth, UNSW. She has extensive international re-search experience in data analysis and computationalmodeling and has made significant and innovativecontributions to the design, analysis, and develop-ment of models derived from complex empirical datafor a wide range of applications, such as patient

safety, biosurveillance, corporate sustainability reporting, ecological footprintanalysis, and climate variability.

Vijay Sivaraman received his B. Tech. from theIndian Institute of Technology in Delhi, India, in1994, his M.S. from North Carolina State Universityin 1996, and his Ph.D. from the University ofCalifornia at Los Angeles in 2000. He has workedat Bell-Labs as a student Fellow, in a silicon valleystart-up manufacturing optical switch-routers, andas a Senior Research Engineer at the CSIRO inAustralia. He is now a Professor at the University ofNew South Wales in Sydney, Australia. His researchinterests include Software Defined Networking, net-

work architectures, and cyber-security particularly for IoT networks.

http://www.measurementlab.net/

https://www.perfsonar.net/

http://www.speedtest.net/

http://www.measurementlab.net/tools/bismark/

https://www.ietf.org/rfc/rfc4898.txt

http://www.statisticshowto.com/modified-thompson-tau-test/

http://www.statisticshowto.com/modified-thompson-tau-test/

Documents

Comparing Broadband ISP Performance using Big Data from M-Lab