14
Drongo: Speeding Up CDNs with Subnet Assimilation from the Client Marc Anthony Warrior Northwestern University [email protected] Uri Klarman Northwestern University [email protected] Marcel Flores Northwestern University marcelfl[email protected] Aleksandar Kuzmanovic Northwestern University [email protected] ABSTRACT Currently, the attempt to choose the “best" content replica server for a client is carried out solely by CDNs. While CDNs have a decent view of load distribution and content place- ment, they receive little input from the clients themselves. We propose a hybrid solution, subnet assimilation, where the client participates in the server selection process while still leaving the final say to the CDN. Subnet assimilation allows clients to declare their own “network location," different from the actual one, which in turn drives a CDN towards mak- ing better decisions. To demonstrate, we introduce Drongo, a client-side system, readily deployable on existing clients without any changes to the CDNs, that employs subnet as- similation to dramatically improve replica server selection. We implemented and extensively evaluated Drongo on a set of 429 clients spread across 177 countries and 6 major CDNs. We show that Drongo affects 69.93% of all clients, prompting better CDN replica choices which reduce the la- tency of affected requests by up to an order of magnitude and by 24.89% on average across six major providers, with Google’s performance improving by 50% in the median case. Our results indicate that client participation holds great op- portunities for the advancement of CDN performance. CCS CONCEPTS Networks Application layer protocols; Naming and ad- dressing; Location based services; Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. CoNEXT ’17, December 12–15, 2017, Incheon, Republic of Korea © 2017 Association for Computing Machinery. ACM ISBN 978-1-4503-5422-6/17/12. . . $15.00 https://doi.org/10.1145/3143361.3143365 KEYWORDS DNS, CDN, replica server, subnet ACM Reference Format: Marc Anthony Warrior, Uri Klarman, Marcel Flores, and Aleksan- dar Kuzmanovic. 2017. Drongo: Speeding Up CDNs with Subnet Assimilation from the Client. In CoNEXT ’17: CoNEXT ’17: The 13th International Conference on emerging Networking EXperiments and Technologies, December 12–15, 2017, Incheon, Republic of Ko- rea. ACM, New York, NY, USA, 14 pages. https://doi.org/10.1145/ 3143361.3143365 1 INTRODUCTION Latency between clients and servers on the Internet is a key measure that fundamentally affects users’ perception of the Internet speed. It directly impacts end-to-end page load time, which in turn affects user experience and business rev- enues [50]. Amazon has reported that every 100 ms increase in page load time costs them 1 % in sales [18]. Client-server la- tency is essential for other killer apps such as web search [9] and video streaming [37]. Reduced latency directly improves throughput, which has been shown both theoretically [43] and empirically [47]. It is thus no exaggeration to say that literally every single millisecond of reduced client-server latency on the Internet counts. Significant efforts have been invested in an attempt to move servers closer to the clients via Content Distribution Networks (CDNs), which distribute content via hundreds of thousands of servers worldwide [1, 11]. Additionally, to minimize the client-server latency, such systems perform extensive network and server measurements and use them to redirect clients to different servers. While this significantly improves performance, it is no secret that CDNs do not al- ways direct clients to the CDN replica that is closest in the network sense [47]. This happens for several reasons: ( i ) CDNs’ mapping of the Internet isn’t perfect, particularly for regions that are more distant from the core CDN infrastruc- ture [19, 38, 47], ( ii ) CDNs aim to balance the traffic load or deploy other policies, which may conflict with minimizing

Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

Drongo Speeding Up CDNs with Subnet Assimilationfrom the Client

Marc Anthony WarriorNorthwestern University

warriorunorthwesternedu

Uri KlarmanNorthwestern University

uriklarmanunorthwesternedu

Marcel FloresNorthwestern University

marcelflores2007unorthwesternedu

Aleksandar KuzmanovicNorthwestern University

akuzmanorthwesternedu

ABSTRACTCurrently the attempt to choose the ldquobest content replicaserver for a client is carried out solely by CDNs While CDNshave a decent view of load distribution and content place-ment they receive little input from the clients themselvesWe propose a hybrid solution subnet assimilation where theclient participates in the server selection process while stillleaving the final say to the CDN Subnet assimilation allowsclients to declare their own ldquonetwork location different fromthe actual one which in turn drives a CDN towards mak-ing better decisions To demonstrate we introduce Drongoa client-side system readily deployable on existing clientswithout any changes to the CDNs that employs subnet as-similation to dramatically improve replica server selectionWe implemented and extensively evaluated Drongo on a

set of 429 clients spread across 177 countries and 6 majorCDNs We show that Drongo affects 6993 of all clientsprompting better CDN replica choices which reduce the la-tency of affected requests by up to an order of magnitudeand by 2489 on average across six major providers withGooglersquos performance improving by 50 in the median caseOur results indicate that client participation holds great op-portunities for the advancement of CDN performance

CCS CONCEPTSbull Networks rarr Application layer protocols Naming and ad-dressing Location based services

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bearthis notice and the full citation on the first page Copyrights for componentsof this work owned by others than ACMmust be honored Abstracting withcredit is permitted To copy otherwise or republish to post on servers or toredistribute to lists requires prior specific permission andor a fee Requestpermissions from permissionsacmorgCoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Koreacopy 2017 Association for Computing MachineryACM ISBN 978-1-4503-5422-61712 $1500httpsdoiorg10114531433613143365

KEYWORDSDNS CDN replica server subnet

ACM Reference FormatMarc Anthony Warrior Uri Klarman Marcel Flores and Aleksan-dar Kuzmanovic 2017 Drongo Speeding Up CDNs with SubnetAssimilation from the Client In CoNEXT rsquo17 CoNEXT rsquo17 The13th International Conference on emerging Networking EXperimentsand Technologies December 12ndash15 2017 Incheon Republic of Ko-rea ACM New York NY USA 14 pages httpsdoiorg10114531433613143365

1 INTRODUCTIONLatency between clients and servers on the Internet is akey measure that fundamentally affects usersrsquo perception ofthe Internet speed It directly impacts end-to-end page loadtime which in turn affects user experience and business rev-enues [50] Amazon has reported that every 100 ms increasein page load time costs them 1 in sales [18] Client-server la-tency is essential for other killer apps such as web search [9]and video streaming [37] Reduced latency directly improvesthroughput which has been shown both theoretically [43]and empirically [47] It is thus no exaggeration to say thatliterally every single millisecond of reduced client-serverlatency on the Internet countsSignificant efforts have been invested in an attempt to

move servers closer to the clients via Content DistributionNetworks (CDNs) which distribute content via hundredsof thousands of servers worldwide [1 11] Additionally tominimize the client-server latency such systems performextensive network and server measurements and use them toredirect clients to different servers While this significantlyimproves performance it is no secret that CDNs do not al-ways direct clients to the CDN replica that is closest in thenetwork sense [47] This happens for several reasons (i)CDNsrsquo mapping of the Internet isnrsquot perfect particularly forregions that are more distant from the core CDN infrastruc-ture [19 38 47] (ii) CDNs aim to balance the traffic load ordeploy other policies which may conflict with minimizing

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

the client-server latency and (iii) comprehensively measur-ing the Internet to capture latency variations between CDNreplicas and the rest of the IP space over short timescales ischallenging

In this paper we demonstrate that client-server latency isfar from optimal We embrace a hybrid approach that enablesclients to join CDNs in addressing the above challenges asfollows (i) Clients help with CDN replica mapping indeedit is far easier for a single client to find a subnet that is con-sistently suggested well-performing replicas with respectto the client than for a CDN to evaluate the entire IP space(ii) Our approach respects load balancing and other CDNpolicies CDNs still make all the replica selection decisionsand clients respect those decisions (iii) CDNsrsquo measurementburden is shared with clients who measure CDNs and helpthem come upwith better more fine-grained decisions Mostimportantly all of this is readily available today with onlyminor client-side upgrades without any changes to CDNsor to existing protocols and without the need for broadadoption of our systemWe devise a simple heuristic and a lightweight method

that helps clients realize when they are not served by anearby CDN replica After obtaining a replica selection froma CDN the client traceroutes the path towards that replicaand explores if the upstream hops on the path are potentiallydirected to different replicas This is done by utilizing EDNS0client subnet extension (ECS) [32] to issue DNS requestson behalf of hops If hops are indeed directed to differentreplicas then it is possible that the latency between theclient and a replica recommended to a hop is smaller thanthe latency between the client and the replica recommendedto the client1 We refer to such a scenario as a latency valleyOur goal is not to find lower-latency replicas ndash we aim to findsubnets to which lower-latency replicas relative to the clientare consistently suggested By ultimately leaving the decisionand access systems entirely up to the CDN we ensure thatany additional policies a CDN may have (such as networkaccess restrictions and load balancing) are unaffectedBy conducting experiments on six major content pro-

viders we show that latency valleys are common phenomenaand demonstrate that they systematically speed up objectdownloads In particular latency valleys can be found acrossall the CDNs we have tested 26ndash76 of routes towardsCDN replicas discover at least one latency valley Our keypractical contribution lies in showing that valley-prone sub-nets that lead to these lower-latency replicas are easily foundfrom the client are simple to identify incur negligible mea-surement overhead and are persistently valley-prone over

1 We emphasize that all of our measurements are performed between theclient and a CDN replica no measurements are performed directly fromupstream nodes

timescales of days Most importantly we show that we caneffectively leverage latency valleys via valley-prone subnetswith subnet assimilation an approach in which clients useECS to improve CDN replica-mapping

We implement and evaluate Drongo a client-side systemthat leverages upstream subnets to consistently receive lower-latency replicas than what the client would ordinarily havebeen recommended Our measurements show that Drongocan improve requestsrsquo latency by up to an order of mag-nitude Moreover we evaluate the optimal parameters tocapture these latency gains and find that 5 measurementson the timescale of days are the only requirement to maxi-mize Drongorsquos gains Using the optimal parameters Drongoaffects the replica selection of 6993 of clients and affectedrequests experience a 2489 reduction in latency in the me-dian case Moreover Drongorsquos significant impact on theserequests translates into an overall improvement in client-perceived aggregate CDN performance

We make the following contributionsbull We introduce the first approach to enable clients toactively measure CDNs and effectively improve theirselection decisions while requiring no changes to theCDNs and while respecting CDNsrsquo policies

bull We extensively analyze client-side CDNmeasurementsand determine critical time-scales and key parametersthat empower clients to leverage their advanced viewsof CDNs

bull We introduce Drongo a client-side CDN measurementcrowd-sourcing system and demonstrate its ability tosubstantially improve CDNsrsquo performance in the wild

2 PREMISE21 BackgroundCDNs attempt to improve web and streaming performanceby delivering content to end users from multiple geographi-cally distributed servers typically located at the edge of thenetwork [1 4 11 21] Since most major CDNs have replicasin ISP points-of-presence clientsrsquo requests can be dynami-cally forwarded to close-by CDN servers Historically one ofthe key reasons for systematic CDN imperfections was thedistance between clients and their local DNS (LDNS) servers[23 35 39 45] This issue was further dramatically amplifiedin recent years (see [28]) with the proliferation of public DNSresolvers eg [5 10 12 15 20 22]In an attempt to remedy poor server selection resulting

from LDNS servers there has been a recent push spear-headed by public DNS providers to adopt the EDNS0 clientsubnet option (ECS) [42] With ECS the clientrsquos IP address(truncated to a 24 or 20 subnet for privacy) is passed throughthe recursive steps of DNS resolution as opposed to passingthe LDNS serverrsquos address

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

However even when the actual clientrsquos network locationis accurate numerous factors prevent CDNs from providingoptimal CDN replica selection [38] First creating accuratenetwork mapping for the Internet is a non-trivial task giventhat routing can inflate both end-to-end latency and latencyvariance eg [44] As a result CDN selections could be heav-ily under-performing (see [38] for a thorough analysis) Theunderlying causes are diverse including lack of peeringrouting misconfigurations side-effects of traffic engineeringand systematic diversions from destination-based forward-ing [33] Second CDN measurements are necessarily coarsegrained ndash measuring each and every client (each individualIP address) from a CDN is impossible for scalability reasonsand because the actual source IP address isnrsquot available dueto ECS truncation On top of this CDNs often have otheroptimization goals eg load balancing which can divertclients further away from close-by replicas

Our key idea is to enable clients to join CDNs in addressingthe above challenges In particular it is far easier for a singleclient to find a subnet that is consistently recommended well-performing replicas than it is for a CDN to evaluate the entireIP space Not only does the client-supported approach scaleit enables far better CDN replica selection decisions In sum-mary in our approach (i) clients conduct measurements tofind other subnets that lead to potentially lower-latency CDNreplicas (ii) they evaluate the performance of such subnetsand associated replicas via light infrequent measurementsover long time-scales and (iii) finally they utilize these sub-nets to drive CDNsrsquo decisions towards lower-latency replicas

22 Respecting CDN Policies andIncentives for Adoption

As stated above CDNs can sometimes deploy policies that canprevent them from serving clients nearby replicas In princi-pal there are two such scenarios First on longer timescalesa CDN may have a business logic where it wants a certainIP subnet and no one else to be able to use a certain CDNcluster or server For example an ISP may allow a CDN todeploy a CDN server inside its network but on the condi-tion that only IP addresses owned by the ISP may benefitfrom that server Our approach is completely compatiblewith such arrangements because such a policy can be easilyenforced via access network-level firewalls which forcesour system to avoid such CDN replicas Second on shortertimescales CDNs may deploy load-balancing policies thatdistribute the traffic load such that clients are not alwaysdirected to the closest replica server Our system respectssuch load-balancing policies because it always selects thefirst CDN replica from a recommended set ie does notopportunistically select the ldquobest replica from the set

Clients have clear incentives to adopt our approach since itdirectly improves their performance While it is certainly thecase that CDNs share the same incentives for our systemrsquosadoption this is evenmore true with the proliferation ofCDNbrokers eg [8] CDN brokers actively measure performanceto multiple CDNs and they can redirect a client to a differentCDN on the fly in case the QoE isnrsquot satisfactory [40] It hasbeen demonstrated that this particularly hurts large CDNswhich often have better replicas in the vicinity of the clientswhich the broker is unfortunately unaware of this leads tothe loss of clients and revenues for CDNs [40] Thus utiliz-ing clientsrsquo help in selecting the best CDN replicas in theirvicinity directly benefits CDNs

We thus argue that an unrestricted adoption of the ECSoption which is the key mechanism that enables the subnetassimilation used by our system (Section 23) is in the bestinterest of every CDN In particular the unrestricted ECSoption enables a client to use the ECS field to specify a subnetdifferent from the clientrsquos to change the way a CDNmaps theclient to its replicas While most CDNs do enable ECS in itsunrestricted form eg Google and EdgeCast among othersAkamai does so only via OpenDNS [16] and GoogleDNS [13]public DNS services using the actual IP address of the re-quester As such Akamairsquos CDN is currently not directlyusable by our system as our system requires sending ECSrequests on behalf of hop IPs One possible reason for such arestriction imposed by Akamai might be the ability of third-parties to accurately reverse-engineer a CDNrsquos scale andcoverage without significant infrastructural resources eg[26 46] However that even without unrestricted ECS Aka-mairsquos network has been quite comprehensively analyzed inthe past eg [36 47 48] One of our main contributions liesin showing that the benefits achievable by letting clientshelp improve CDNsrsquo decisions far outweigh the potentialdrawbacks of unrestricted ECS adoption

23 Subnet AssimilationSubnet assimilation is the deliberate use of the ECS field tospecify a subnet different from the clientrsquos to change theway a CDN maps the client to its replicas In this paper weshow that the assimilation of subnets found along the pathbetween the client and its ldquodefault replica may allow theclient to ldquoreposition itself in the CDNrsquos mapping schemesuch that the client will consistently receive lower-latencyreplica recommendations from the CDN Subnet assimilationis a key mechanism used both to detect and utilize lower-latency CDN replicas

Latency valley Figure 1 illustrates the key heuristic thathelps clients realize when they are not optimally served by aCDN Consider a client C redirected to a CDN server S1 asillustrated in the figure If a hop on the path is not redirected

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

CS1

S2H1 H2 H3 H4

100ms

0ms

latency

Figure 1 Illustration of a latency valley

to the same server S1 then it is possible that a lower-latencyreplica eg S2 has been observed such that the latencybetween C and S2 is smaller than the latency between C andS1 We refer to such a scenario as a latency valley

For the remainder of this paper we refer to the set ofreplicas suggested to the clientrsquos actual subnet as the clientreplica set (CR-set) and a replica from the CR-set as a clientreplica (CR) We refer to the routers along the presumedpath from the client to any given CR as hops Next we referto the set of replicas suggested to a hop discoverable viasubnet assimilation as a hop replica set (HR-set) We denotea measurement from the client to a CR as a client replicameasurement (CRM) Likewise we denote a replica fromthe HR-set as a hop replica (HR) and a measurement to thatreplica from the client as a hop replica measurement (HRM)Finally we define a latency valley to be any occurrence of thefollowing inequality HRMCRM lt vt le 1 We take vt = 1until we optimize our parameter selection in Section 51It is imperative to note that our goal is not to find lower-

latency replicas perhaps included in an HR-set Instead weaim to find subnets to which lower-latency replicas are con-sistently suggested in other words subnets that are proneto valley-occurrences in replica sets obtained at any time

24 Identifying Latency ValleysIn order to identify latency valleys we first identify thehops along the path which we achieve using traceroute Weare using the upstream path for simplicity the downstream

path which may be assymetric could yield different resultsbut would require more overhead or control of the CDNfor execution We then identify the corresponding HR-setfor each hop using ECS queries which assimilate the hopsrsquosubnets Lastly we must compare the HRMs to the CRM Unless otherwise noted we calculate HRMs and CRMs us-ing ping RTTs (obtained by averaging the results of threeback-to-back pings) and we refer to these values as laten-cies Throughout this paper latency units are millisecondsFor simplicity we refer to the ratio HRMCRM used in ourlatency valley definition as the latency ratio A latency ratiobelow 1 indicates that the HR has outperformed the CR alatency ratio above one indicates the CR has outperformedthe HR and finally a latency ratio equal to 1 indicates thatthe CR and HR performed equally (which in general onlyhappens when CR and HR refer to the same replica)

The above operation cannot be executed on-the-fly as thetime consumed by this process will outweigh any latencygains achieved Moreover multiple measurements might berequired Fortunately our experiments in Section 3 showthat a small number of measurements per hop mdash less than10 mdash are required and that the measurement results remainapplicable on timescales of days Because we can rely onmea-surements obtained in idle time real-time measurements arenot necessary for our system Thus our proposed techniquewill use past measurements to predictively choose a goodsubnet for future assimilation

3 EXPLORING VALLEYSIn this section we establish that latency valleys are commonphenomena in the wild and that they offer substantial perfor-mance gains In addition we lay the groundwork for findingvalleys a prerequisite to harnessing their performance gainsvia subnet assimilation

31 Testing for ValleysWe perform our preliminary tests using PlanetLab nodesa platform which offers a large number of vantage pointsfrom around the globe primarily deployed in academic in-stitutions [31] We further investigate latency valleys as ex-perienced by a large variety of clients using the RIPE Atlasplatform [17] as detailed in Section 5 While PlanetLab lacksthe scale and variety offered by RIPE Atlas the flexibilityand freedom it provides made it a prime choice for our pre-liminary analysis Our PlanetLab experiments use 95 nodesspread across 51 test locations andwe obtain the IP addressesof hops between nodes and CDN replicas via tracerouteWhile latency valleys are a common occurrence as we

demonstrate below many hops between the clients and theproviderrsquos servers can be discarded when searching for val-leys One category of such hops is of those which reside

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

within the same local network as the client which will besuggested the same replicas as the client Another disposablecategory mdash hops with private IP addresses mdash will not berecognized by the EDNS server and will yield generic replicachoices regardless of the hopsrsquo locations To ensure that ahop is indeed usable a hop must (i) belong to a different 16subnet than the client (ii) have a different domain than theclient and (iii) belong to a different ASN than the client Weonly filter hops that fail these conditions at the beginning ofthe route once a hop is observed that meets the above threeconstraints we stop filtering for the remainder of the route

311 Provider Selection While ECS is gaining momen-tum among CDNs as it allows them to better estimate theirclientsrsquo location it is important to note that ECS was onlyrecently deployed and its implementation varies among thedifferent providers ldquoA Faster Internet an initiative headedby Google and OpenDNS is drawing together a growingnumber of large CDNs and content providers to adopt andpromote the adoption of ECS [42] In order to attain a set ofCDNs for our tests we have selected those providers whichimplement an unrestricted form of ECS as described in Sec-tion 22In order to best select the CDNs for our tests we scraped

URLs from over 3000 sites arbitrarily selected from the AlexaTop 1M list [2] Then we reduced our set of URLs to thosethat ended in a known file type in order to ensure our abil-ity to perform download tests on the selected URLs Whenapplicable we resolved CNAME domains to their respectiveCDN domains We then performed multiple DNS queries foreach URL to determine if unrestricted ECS is implementedFrom the remaining URLs we have extracted the 6 CDNdomains which have appeared most frequently along withtheir respective URLs

Before we can begin seeking subnets with better mapping-groups it is first necessary to prove that subnets with dif-ferent mapping-groups exist and are easy to find We defineusable route length as the number of hops along the path thatfulfill the above filtering Next we define divergence as thefraction of usable hops along the path which were recom-mended at least one replica that was not recommended tothe clientFigure 2 depicts usable route length and divergence for

different CDNs The figure shows that for example when re-questing replicas from Google each client had an average us-able route length of 78 and the divergence is approximately92 It is evident from Figure 2 that hops are indeed sug-gested different replicas than their client As we encounter awider variety of recommendations we increase our numberof opportunities to find latency valleys The high divergence

Goo

gle

Clo

udFr

ont

Alib

aba

CD

Net

wor

ks

Chi

naN

etCtr

Cub

eCD

N

0 0

0 2

0 4

0 6

0 8

div

erg

ence

mean divergence mean usable route length

0

2

4

6

8

10

12

14

usa

ble

route

length

Figure 2 Average divergence and average usable routelength per CDN

shown in Figure 2 indicates that other and possibly lower-latency replicas can be mapped to clients if they were toassimilate their hopsrsquo IP addresses

For the remainder of this paper we use the following set ofproviders Google Amazon CloudFront Alibaba ChinaNet-Center CDNetworks and CubeCDN a set which is bothdiverse and comprehensive Googlersquos CDN infrastructureis massive and dispersed as of 2013 around 30000 CDN IPaddresses spread across over 700 ASes were observable inone day [25] As of this writing Amazon CloudFront offeredover 50 points-of-presence spread throughout the world[6] CDNetworks offers over 200 points-of-presence alsoglobally dispersed and heavily employing anycast for serverselection Alibaba and ChinaNetCenter offer over 500 CDNnode locations each within China where they are centeredin addition to a growing number of service locations outsideof China [3 30] Finally CubeCDN is a smaller CDN withlocations spread primarily across Turkey [14]

312 Test Execution In order to assert the existence oflatency valleys in the wild we have executed a series of trialsusing the aforementioned URLs and the PlanetLab nodes asclients Each trial consists of the following steps

(1) We randomly select a URL(2) Client retrieves CR-set for the selected URLrsquos domain(3) For each CR client uses traceroute to identify hops(4) Using subnet assimilation client retrieves the HR-set

for each hop(5) Client measures CRMs for every CR in its CR-set and

HRMs for every HR it has seen (across all HR-setsobtained in that trial)

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

Figure 3 Scatter plot of HRMs and minimum CRMsThe area below the diagonal is the valley region

We have performed a series of 45 trials per PlanetLabnode executed between one and two hours apart For theremainder of this Section we refer to this dataset

32 Valley PrevalenceIn order to establish the prevalence and significance of la-tency valleys we now compare HRMs to their respectiveCRMs It is important to note that it is possible that a clienthas multipleCRs and hence multipleCRMs and in practicea client can only choose one replica Therefore we compareall HRMs to the minimum CRM obtained in their trial thusestablishing a lower bound for our expected performancegain and allowing us to easily assert any gains or lossesprovided by the alternative replica choices (the HRs)With our PlanetLab dataset we compared each HRM to

the minimum CRM ie the best client replica obtained inthe same respective trial Figure 3 shows the results of thisassessment We plot HRM on the y axis and minimum CRMon the x axis thus creating a visual representation of thelatency ratio We distinguish between the CDNs using differ-ent colors To better explain the results we draw the equalityline where HRM = minimum CRM Every data point alongthis line represents a hoprsquos subnet which is being suggesteda replica that performs on a par with the replicas suggested

to the clientrsquos subnet More importantly every data pointbelow the equality line represents a valley occurrence as thealternative replicarsquos HRM is smaller than the CRM The per-centage of valleys by provider ranges from 1402 (Cloud-Front) to 3858 (CubeCDN) with an average of 22 acrossall providers Table 1 shows the results ie Column 2 (Valleys Overall) lists these percentages for each CDN

Now that we have established that valleys exist we wishto determine where we can expect to find valleys We con-tinue using the minimum CRM from each trial for reasonsdescribed above However in order to better reflect prac-tical scenarios where only one of a given set of replicas isused we now also begin choosing an individual replica froman HR-set Instead of choosing the minimum HRM for agiven hop we conservatively choose the median Each hoprsquoschosen HRM is the median of its HR-set for that trial Wecontinue this pattern (choosing the clientrsquos best CRM fromthe trial and a hoprsquos median HRM) for the remainder of ourPlanetLab data analysis Our PlanetLab data thus representsa lower bound on valleys and their performance implicationswe compare the median performance of our proposed proce-dure to the absolute best the existing methods have to offerIn Section 5 using our RIPE Atlas testbed we remove thisconstraint to demonstrate the real-world performance of oursystemWe further wish to consider how common valleys are

within a given route from the client to the provider Column 3(Average of Valleys per Route) of Table 1 shows given someroute in one trial the average percent of usable hops thatincur valleysWe see for Alibaba that on average over a thirdof the usable hops in a given route are likely to incur valleysand for CDNetworks nearly a fourth of the route Meanwhilefor CloudFront we see that on average valleys occur for onlya small portion of hops in a given route Column 4 ( Routeswith a Valley) details the percentage of all observed routesthat contained at least one valley in the trial in which theywere observed For Alibaba and CDNetworks around 75of the observed routes contain valleys while more than 50for Google

321 Do Valleys Persist Next we assess whether subnetsare persistently valley-prone For some number of trialshow often can the client expect a valley to occur from someparticular hop To answer this we need to be able to describehow frequently valleys occurred for some hop subnet in agiven set of trials

Valley frequency We define a new simple metric thevalley frequency as follows if we look at a hop-client pairacross a set of N trials and v trials are valleys the valleyfrequency vf is

vf =v

N

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Table 1 Detailed findings for each provider based on PlanetLab data

Provider ValleyOverall

Avg Valleysper Route

Routeswith Valley

Hop-Client Pairsw Valley Freq gt05

Google 2024 1641 5330 1098Amazon CloudFront 1402 872 2582 1000Alibaba 3368 3594 7583 3097CDNetworks 1561 2441 7308 1409ChinaNetCenter 2742 1426 3810 1674CubeCDN 3858 1795 2549 2632

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(a) ping

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

sGoogle

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(b) total download time

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(c) total post-caching download time

Figure 4 CDFs of clients by valley frequency In 4a the subnet-response measurements are ping times averagedfrom bursts of 3 back to back pings In 4b the subnet-response measurements are total download times on firstattempts while in 4c the subnet-response measurements are total download times on consecutive attempts (re-peated downloads that take place immediately after first attempt to account for the potential impact of caching)Downloads were performed by using curl where we set an IP as a destination and set the domain as the HOSTattribute Measurements for 4b and 4c were performed back-to-back so that 4c reflects download times with pre-sumably primed caches

A frequency of 05 would imply that in half the trials per-formed a valley was found in said hop For each providerwe plot the valley frequency for each hop-client pair acrossour PlanetLab dataset

Figure 4 shows the results Note that Figures 4b and 4c usethe total download time and the post-caching total downloadtime respectively for the subnet measurements as opposedto pings2 We further note that their results closely follow thoseobtained using pings as in Figure 4a For simplicity and dueto download limitations of our RIPE testbed we revert toonly using pings for the remainder of this paperFigure 4 shows that approximately 5-20 of hop-client

pairs resulted in valleys 100 of the time (see y-axis in thefigures for x=10) for every trial across the entire three daytest period Such hops will be easy to find given a largeenough dataset However of more interest to us are hop-client pairs that inconsistently incur valleys For exampleperhaps if valleys can be found 50 of the time for some hop-client pair mdash ie a valley frequency of 05 mdash that hop may bea sufficiently persistent producer of valleys for us to reliablyexpect good replica choices This would provide our system2We fetched png and js files 1kB ndash 1MB large hosted at the CDNs

with more opportunities to employ subnet assimilation andideally improve performance Column 5 ( Hop-Client Pairswith Valley Frequency gt 05) of Table 1 shows the percentageof hop-client pairs that have valley frequencies (calculatedacross all 45 trials) greater than 05 ie hop-client pairs thatincurred valleys in the majority of their trials

322 Can We Predict Valleys While valley frequencycan tell us how often valleys occurred on a past dataset themetric makes no strong implications about what we canexpect to see for a future dataset To answer this let us con-sider how the latency ratio changes as we increase the timepassed between the trials we compare In addition per ourreasoning in the vf metric we may want to compare a set ofconsecutive trials mdash a window mdash to another consecutive setof the same size rather than only comparing individual trials(which is essentially comparing windows of size 1) Given aset of trials we take a window of size N and compare it toall other windows of size N including those that overlap bytaking the difference in their latency ratios (HRMCRM) Ifwe plot this against the time distance between the windowswe can observe the trend in latency ratios as the distance in

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(a) All client-hop pairs

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(b) Valley at least once

Figure 5 In both figures we compare the change in latency ratio between two trial windows to the distance intime between the two windows In figure a we use all hop client pairs In figure b we restrict our set to pairs thatexperience at least one valley in at least one of their 45 trials

time between windows increases An upward slope wouldimply that the latency ratio is drifting farther apart as thetime gap increases while a horizontal line would imply thatthe change in latency ratio is not varying with time (ie win-dows an hour apart are as similar to each other as windowsdays apart) A jagged or wildly varying line would indicatethat future behavior is unpredictable

Figure 5 plots the latency ratio versus distance in time forseveral window sizes ranging from 1 to 15 In these plots weuse the windowsrsquo median latency ratio values for compari-son In particular in Figure 5a we use our entire dataset ieconsider all hop-client pairs independently of if they ever in-cur latency valleys or not and plot latency ratio differencesover time For example assume that for a client-hop pairHRM = 80 ms at one measurement point while it rises to120 ms at another measurement point Meanwhile assumethatCRM remains 100 ms at both measurement points Thusthe latency ratioHRMCRM changes from 80100 to 120100ie from 08 to 12 The latency ratio difference is 04 Fig-ure 5a shows that the latency ratio difference values bothincrease and vary wildly as windows become more distantThis shows that hop-subnet performance overall is likelyextremely unpredictable We hypothesize this results fromunmapped subnets receiving generic answers as observedin [47] However subnet assimilation requires that we havesome idea of how a subnet will perform in advance

Since valley-prone subnets are of particular interest to usin Figure 5b we reduce our dataset to only include client-hoppairs that have at least one valley occurrence across all 45 ofits trials combined The plotrsquos behavior becomes dramaticallymore stable after applying this simple minor constraint Theeffects of the window size also become apparent in Figure5b Even with a window of size 1 the slope is very small andthe line is nearly flat after over 50 hours two windows are

only 10 to 20 more dissimilar than two windows a singlehour apart For window sizes greater than 5 latency ratiosare often within 5 of each other regardless of their distancein time Going from a window size of 1 to 5 shows drasticimprovement while each following increase in window sizeshows a smaller impact The results clearly show that a clientcan effectively identify valley-prone subnets and predictvalleys with a sufficiently long window size

323 Valley Utility Analysis Figure 6 shows the distribu-tion of the lower bound latency ratios seen by the set of allvalley occurrences for each provider The latency ratio rep-resents a lower bound because we use the minimum latencymeasure for replicas recommended to the client For exampleif the minimum CRM is 100 ms and HRM is 80 ms then thelatency ratio is 08 Thus the closer to 0 the latency ratiosare the larger the gain from subnet assimilation The red lineshows median the box bounds the 25th and 75th percentilesand the whiskers extend up to data points 15 times theinterquartile-range (75th percentile - 25th percentile) abovethe 75th percentile and below the 25th percentile Data pointsbeyond the whiskers if they exist are considered outliersand not shownFigure 6 shows that most of the providers show oppor-

tunities for significant latency reduction From this plot itappears that Amazon CloudFront and ChinaNetCenter offerthe greatest potential for gains With the exception of CD-Networks we see all of our providers have 25th percentilesnear or below a latency ratio of 08 a 20 performance gainWe also observe the diversity of valley ldquodepth For exam-ple we see in Figure 6 that ChinaNetCenterrsquos interquartilerange spans over 40 of the possible value space (between 0and 1) With such a wide variety of gains for 50 of its val-leys it opens the door to being more selective Rather thansimply chasing all valleys we could set strict requirements

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

0 2

0 4

0 6

0 8

1 0

late

ncy

rati

o

Figure 6 Lower bound of latency ratio of all valley oc-currences

tightening our definition of whatrsquos ldquogood enough for subnetassimilation as we evaluate in Section 51

CDNetworksrsquo performance is more tightly bounded as itsinterquartile range covers less than 10 of the value spaceand sits very near to 1 This implies CDNetworks probablyoffers very little opportunity for our technique to improve itsreplica selection We hypothesize that this is a byproduct ofCDNetworks anycast implementation for replica selectionanycast depends more on routing and network propertiesthan DNS and IP selection [27] In addition Googlersquos medianand 75th percentile latency ratios sit very near 10 Howeverby being more selective as described above we may be ableto pursue better valleys in the lower quartiles below themedian We could potentially improve our systemrsquos valley-prone hop selection by filtering out ldquoshallow valleys fromour consideration We demonstrate and discuss the effectsof different levels of selectiveness in Section 51

4 DRONGO SYSTEM OVERVIEWWe introduce Drongo a client-side system that employs sub-net assimilation to speed up CDNs Drongo sits on top of aclientrsquos DNS system In a full deployment Drongo will beinstalled as a LDNS proxy on the client machine where itwould have easy access to the ECS option and DNS responsesit needs to perform trials Drongo is set by the client as itsdefault local DNS resolver and acts as a middle party re-shaping outgoing DNS messages via subnet assimilation andstoring data from incoming DNS responses In our currentimplementation Drongo uses Googlersquos public DNS serviceat 8888 to complete DNS queriesUpon reception of an outgoing query from the client

Drongo must decide whether to use the clientrsquos own subnetor to perform subnet assimilation with some known alterna-tive subnet If Drongo has sufficient data for that subnet incombination with that domain it makes a decision whetheror not to use that subnet for the name resolution If Drongolacks sufficient data it issues an ordinary query (using theclientrsquos own subnet for ECS)

41 Window SizeWe now face the question What is a sufficient amount ofdata needed by Drongo to ensure quality subnet assimila-tion decisions We choose to measure the data ldquoquantityby the number of trials for some subnet where the subnetis obtained via traceroutes performed during times whenthe clientrsquos network was idle A sufficient quantity must beenough trials to fill some predetermined window size As weobserved in Section 321 the marginal benefit of increasingthe window size decreases with each additional trial To keepstorage needs low while also obtaining most of the benefitof a larger window we set Drongorsquos window size at 5

42 Data CollectionDrongo must execute trials defined in Section 312 in orderto fill its window and collect sufficient data As demonstratedin Figure 5b when these trials occur is of little to no signifi-cance In our experiments we perform our trials at randomlysampled intervals our trial spacing varies from minutes todays with a tendency toward being near an hour apart Thissporadic spacing parallels the variety of timings we expect tohappen on a real client the client may be online at randomunpredictable times for unpredictable lengths of time

43 Decision MakingHere we detail Drongorsquos logic assuming it has sufficient data(a full window) with which to make a decision about a partic-ular subnet For some domain Drongo must decide whethera subnet is sufficiently valley-prone for subnet assimilationIf so Drongo will use that subnet for the DNS query if notDrongo will use the clientrsquos own subnet From Figure 5b weknow we can anticipate that future behavior will resemblewhat Drongo sees in its current window if Drongo has seenat least one valley occurrence for the domain of interest fromthe subnet under consideration However in Figure 6 wesee that many valleys offer negligible performance gainswhich might not outweigh the performance risk imposedby subnet assimilation To avoid these potentially high riskhops Drongo may benefit from a more selective system thatrequires a high valley frequency in the training window toallow subnet assimilation We explore the effects of changingthe vf parameter in Section 5

It is possible that for a single domain multiple hop subnetsmay qualify as sufficiently valley-prone for subnet assimi-lation When this occurs Drongo must attempt to choosethe best performing from the set of the qualified hops Todo this Drongo selects the hop subnet with the highest val-ley frequency in its training window in the event of a tieDrongo chooses randomly

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

02 04 06 08 10valley threshold

095

100

105

110

115

120

125

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 7 Average latency ratio of overall system as wevary vf and vt

5 DRONGO EVALUATIONHere we evaluate Drongorsquos ability to intelligently choosewell-mapped hop subnets for subnet assimilation and theperformance gains imposed on clients where subnet assimi-lation is appliedUsing the RIPE Atlas platform [17] we collect a trace of

429 probes spread across 177 countries for 1 month In ourexperiments we use the first portion of a clientrsquos trials as atraining window Following the completion of the trainingwindow we use the remaining trials to test Drongorsquos abilityto select good hops for real time subnet assimilation Eachclient performed 10 trials per provider trials 0 through 4to form its training window and trials 5 through 9 to testthe quality of Drongorsquos decisions In our evaluation Drongoalways selects the first CR from a CR-set and the first HRfrom a HR-set mirroring real world client behavior mdash noreal-time on-the-fly measurements are conducted and alldecisions are based on the existing window

51 Parameter SelectionFigure 7 shows the effects of Drongo on our entire RIPEtracersquos average latency ratio ie we consider all requestsgenerated by clients including those that arenrsquot affected byDrongo The figure plots average latency ratio (shown onthe y axis) as a function of the valley threshold vt shownon the x axis The valley threshold introduced in Section 23determines maximum latency ratio a hop-client pair musthave to classify as a valley occurrence For example whenthe threshold equals 10 all latency valleys are consideredon the other hand when vt is 06 Drongo triggers clientassimilation only for latency valleys that have promise toimprove performance by more than 40 ie latency ratiosmaller than 06 The figure shows 5 curves each represent-ing a different valley frequency parameter varied between02 and 10

Figure 7 shows the average results across all tested clientsWe make several insights If the valley frequency is small

02 04 06 08 10valley threshold

05

10

15

20

25

30

35

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 8 Average latency ratio of cases where subnetassimilation was performed

eg 02 Drongo will unselectively trigger subnet assimila-tion for all valleys that have frequency equal to or largerthan 02 which requires only one valley occurrence for ourwindow size of 5 Conversely as the minimum vf parame-ter increases overall system performance improves mdash thelatency ratio drops below 10 Thus the valley frequency isa strong indicator of valley-proneness further supportingour prior findings from Figure 5b

Meanwhile two more characteristics stand out as we varyvt First the valley frequency parameter can completely al-ter the slope of the latency ratio plotted against vt Thisbehavior echoes the observation we made in the previousparagraph with extremely loose requirements Drongo doesnot sufficiently filter out poor-performing subnets from itsset of candidates Intersetingly with a strictvf (closer to 10)the slope changes and the average latency ratio decreasesas we raise the valley threshold This is because if Drongois too strict it filters out too many potential candidates forsubnet assimilation Second we see that the curve eventuallybends upward with high vt values indicating that even thevt can be too lenient The results show that the minimumaverage latency ratio of 09482 (y-axis) is achieved for thevalley threshold of 095 (x-axis) Thus with vf == 10 andvt == 095 Drongo produces its maximum performancegains averaging 518 (10 - 09482) across all of our testedclients By balancing these parameters Drongo filters outsubnets that offer the least benefit subnets where valleysseldom occur and subnets where valleys tend to be shallow

Figure 8 plots the average latency ratio in the samemanneras Figure 7 yet only for queries where Drongo applied sub-net assimilation First the figure again confirms that lowervalley frequency parameters degrade Drongorsquos performanceSecond paired with our knowledge of Figure 7 it becomesclear that as valley threshold decreases the number of la-tency valleys shrinks while the quality of the remainingvalleys becomes more potent This is why the latency ratiodecreases as vt decreases However if the threshold is too

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

02 04 06 08 10valley threshold

00

02

04

06

08

fract

ion

clie

nts

affe

cted

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 9 Percentage of clients where subnet assimila-tion was performed

02 04 06 08 10valley threshold

085

090

095

100

late

ncy

ratio

Google(08)CloudFront(08)

Alibaba(04)CDNetworks(10)

ChinaNetCtr(06)CubeCDN(10)

Figure 10 Per-provider system performance for allqueries Optimal vf is set for each provider and notedin parentheses

small the number of available valleys becomes so small thatthe performance becomes unpredictable ie outlier behaviorcan dominate causing the spike in average latency ratios forvalley thresholds under 02

Finally Figure 9 plots the percent of clients for whichDrongo performed subnet assimilation for at least one providerNecessarily the less strict the frequency constraint is (vf ge

02 is the least strict constraint) the more frequently Drongoacts As shown in Figure 9 6993 of our clients were affectedby Drongo with vf and vt set at the the peak aggregate per-formance values (10 and 095 respectively) found above

Summary In this section we selected parametersvf = 1and vt = 095 where Drongo reaches its peak aggregateperformance gains of 518

52 Per-Provider PerformanceIn the previous subsection we used a constant parameter setacross all six providers in our analysis In this section weanalyze Drongo on a per-provider basis By choosing an op-timal parameter for each provider as we do below Drongorsquosaggregate performance gains increases to 585 In Figure 10

Google

(0807)CloudFront

(0808) Alibaba

(0409)

CDNetworks

(1009)ChinaNetCtr

(06085)CubeCDN

(10095)

02040608101214

late

ncy

ratio

Figure 11 Per-provider system performance forqueries where subnet assimilation was applied Paren-theses contain optimal values for each respectiveprovider formatted (vf vt )

we show how Drongo impacts the average performance ofeach provider while setting valley frequency to each individ-ual providerrsquos respective optimal value (noted in parenthesesunder each provider name) Note that for most providersthe individual optimal valley frequency lies near the valueobtained in Section 51 (10) While the intricacies of eachproviderrsquos behavior clearly hinge on opaque characteristicsof their respective networks we offer several explanations forthe observed performance CDNetworks which sees smallgains across all valley threshold values further validates thehypothesis proposed in Section 323 regarding anycast net-works Meanwhile Google the largest provider of our seton a global scale also experienced the second highest peakaverage gains from Drongo is not well served by Google itis likely that there exist nearby subnets being directed to dif-ferent replicas In other words Drongo has great opportunityfor improving CDNs that use fine grained subnet mapping

Finally Figure 11 showsDrongorsquos performance per-providerexclusively in scenarios when it applies subnet assimilationComparing to the results shown in Figure 6 for the PlanetLab experiments we observe significant differences Most no-tably the latency valley ratios are much smaller in Figure 11than in Figure 6 For example the Googlersquos median latencyratio is close to 10 in Figure 6 indicating insigificant gainsOn the contrary Figure 11 Googlersquos median latency ratio isaround 05 implying gains of 50 (10 minus 05) and up to anorder of magnitude in edge cases Considering all providersDrongo-influenced replica selections are on average 2489better performing than Drongo-free selectionsThere are three reasons for this behavior First Figure 6

shows the lower-bound system performance for PlanetLab asthe best CR is always used for comparison Such a restrictionis not imposed in the the RIPE Atlas scenario Second theRIPE set is more diverse and includes widely distributed end-points thus allowing CDNs greater opportunity for subnetmapping error Third contrary to the Planet Lab scenario

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

where we used all the valleys under the valley threshold of10 here we use optimal vt values such that Drongo suc-cessfully filters out most ldquoshallow valleys that would diluteperformance gains

6 RELATEDWORKGiven a large number of CDNs with vastly different deploy-ment and performance in different territories CDN brokeringhas emerged as a way to improve user-perceived CDNsrsquo per-formance eg [7 34] By monitoring the performance ofmultiple CDNs brokers are capable of determining whichparticular CDN works better for a particular client at a pointin time Necessarily this approach requires content providersto contract with multiple CDNs to host and distribute theircontent Contrary to CDN brokering Drongo manages toimprove performance of each individual CDN Still it workscompletely independently from CDNs and it is readily de-ployable on the clients Moreover Drongo is completelycompatible with CDN brokering since it in principle hasthe capacity to improve the performance of whichever CDNis selected by a broker

There has been tremendous effort expended in the ongoingbattle to make web pages load faster improve streamingperformance etc As a result many advanced client-basedprotocols and systems have emerged from both industry(eg QUIC SPDY and HTTP2) and academia [24 41 50] inthe Web domain and likewise in streaming eg [37] WhileDrongo is also a client-based system it is generic and helpsspeed-up all CDN-hosted content By reducing the CDNlatency experienced by end users it systematically improvesall the above systems and protocols which in turn helps allassociated applicationsRecent work has demonstrated that injecting fake infor-

mation can help protocols achieve better performance Oneexample is routing [49] where fake nodes and links are intro-duced into an underlying linkstate routing protocol so thatrouters compute their own forwarding tables based on theaugmented topology Similar ideas are shown useful in thecontext of traffic engineering [29] where robust and efficientnetwork utilization is accomplished via carefully dissemi-nated bogus information (ldquolies) While similar in spirit tothese approaches Drongo operates in a completely differentdomain aiming to address the CDN pitfalls In addition tousing ldquofake (source subnet) information in order to improveCDN decisions Drongo also invests efforts in discoveringvalley-prone subnets and determining conditions when usingthem is beneficial

7 DISCUSSIONA mass deployment of Drongo is non-trivial and carrieswith it some complexities that deserve careful consideration

There is some concern that maintainence of CDN allocationpolicies may still be compromised despite our efforts to re-spect their mechanisms in Drongorsquos design We propose thata broadly deployed form of Drongo could be carefully tunedto effect only the most extreme cases Figure 11 shows thatsubnet assimilation carries some risk of a loss in performanceso it is in clientsrsquo best interests to apply it conservativelyFirst we know from Figure 9 that the number of clients us-ing assimilated subnets can be significantly throttled withstrict parameteres Further in todayrsquos Internet many clientsare already free to choose arbitrary LDNS servers ([25 36])Many of these servers do not make use of the client subnetoption at all thus rendering the nature of their side-effects mdashpotentially disrupting policies enforced only in DNS mdash equiv-alent to that of Drongo It is difficult to say whether or notDrongorsquos ultimate impact on CDN policies would be anymore sigificant than the presence of such clients

Mass adoption also raises scalability concerns particularlyregarding the amount of measurement traffic being sent intothe network To keep the number of measurements smallwhile ensuring their freshness a distributed peer-to-peercomponent where clients in the same subnet share trial datacould be incorporated into Drongorsquos design We leave thismodification for future work

8 CONCLUSIONSIn this paper we proposed the first approach that enablesclients to actively measure CDNs and effectively improvetheir replica selection decisions without requiring any changesto the CDNs and while respecting CDNsrsquo policies We haveintroduced and explored latency valleys scenarios wherereplicas suggested to upstream subnets outperform thoseprovided to a clientrsquos own subnet We have asserted thatlatency valleys are common phenomena and we have foundthem across all the CDNs we have tested in 26-76 of routesWe showed that valley-prone upstream subnets are easilyfound from the client are simple to identify with few andinfrequent measurements and once found are persistentover timescales of daysWe have introduced Drongo a client-side system that

leverages the performance gains of latency valleys by iden-tifying valley-prone subnets Our measurements show thatDrongo can improve requestsrsquo latency by up to an orderof magnitude Moreover Drongo achieves this with excep-tionally low overhead a mere 5 measurements suffice fortimescales of days Using experimentally derived parame-ters Drongo affects the replica selection of requests made by6993 of clients improving their latency by 2489 in themedian case Drongorsquos significant impact on these requeststranslates into an overall improvement in client-perceivedaggregate CDN performance

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 2: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

the client-server latency and (iii) comprehensively measur-ing the Internet to capture latency variations between CDNreplicas and the rest of the IP space over short timescales ischallenging

In this paper we demonstrate that client-server latency isfar from optimal We embrace a hybrid approach that enablesclients to join CDNs in addressing the above challenges asfollows (i) Clients help with CDN replica mapping indeedit is far easier for a single client to find a subnet that is con-sistently suggested well-performing replicas with respectto the client than for a CDN to evaluate the entire IP space(ii) Our approach respects load balancing and other CDNpolicies CDNs still make all the replica selection decisionsand clients respect those decisions (iii) CDNsrsquo measurementburden is shared with clients who measure CDNs and helpthem come upwith better more fine-grained decisions Mostimportantly all of this is readily available today with onlyminor client-side upgrades without any changes to CDNsor to existing protocols and without the need for broadadoption of our systemWe devise a simple heuristic and a lightweight method

that helps clients realize when they are not served by anearby CDN replica After obtaining a replica selection froma CDN the client traceroutes the path towards that replicaand explores if the upstream hops on the path are potentiallydirected to different replicas This is done by utilizing EDNS0client subnet extension (ECS) [32] to issue DNS requestson behalf of hops If hops are indeed directed to differentreplicas then it is possible that the latency between theclient and a replica recommended to a hop is smaller thanthe latency between the client and the replica recommendedto the client1 We refer to such a scenario as a latency valleyOur goal is not to find lower-latency replicas ndash we aim to findsubnets to which lower-latency replicas relative to the clientare consistently suggested By ultimately leaving the decisionand access systems entirely up to the CDN we ensure thatany additional policies a CDN may have (such as networkaccess restrictions and load balancing) are unaffectedBy conducting experiments on six major content pro-

viders we show that latency valleys are common phenomenaand demonstrate that they systematically speed up objectdownloads In particular latency valleys can be found acrossall the CDNs we have tested 26ndash76 of routes towardsCDN replicas discover at least one latency valley Our keypractical contribution lies in showing that valley-prone sub-nets that lead to these lower-latency replicas are easily foundfrom the client are simple to identify incur negligible mea-surement overhead and are persistently valley-prone over

1 We emphasize that all of our measurements are performed between theclient and a CDN replica no measurements are performed directly fromupstream nodes

timescales of days Most importantly we show that we caneffectively leverage latency valleys via valley-prone subnetswith subnet assimilation an approach in which clients useECS to improve CDN replica-mapping

We implement and evaluate Drongo a client-side systemthat leverages upstream subnets to consistently receive lower-latency replicas than what the client would ordinarily havebeen recommended Our measurements show that Drongocan improve requestsrsquo latency by up to an order of mag-nitude Moreover we evaluate the optimal parameters tocapture these latency gains and find that 5 measurementson the timescale of days are the only requirement to maxi-mize Drongorsquos gains Using the optimal parameters Drongoaffects the replica selection of 6993 of clients and affectedrequests experience a 2489 reduction in latency in the me-dian case Moreover Drongorsquos significant impact on theserequests translates into an overall improvement in client-perceived aggregate CDN performance

We make the following contributionsbull We introduce the first approach to enable clients toactively measure CDNs and effectively improve theirselection decisions while requiring no changes to theCDNs and while respecting CDNsrsquo policies

bull We extensively analyze client-side CDNmeasurementsand determine critical time-scales and key parametersthat empower clients to leverage their advanced viewsof CDNs

bull We introduce Drongo a client-side CDN measurementcrowd-sourcing system and demonstrate its ability tosubstantially improve CDNsrsquo performance in the wild

2 PREMISE21 BackgroundCDNs attempt to improve web and streaming performanceby delivering content to end users from multiple geographi-cally distributed servers typically located at the edge of thenetwork [1 4 11 21] Since most major CDNs have replicasin ISP points-of-presence clientsrsquo requests can be dynami-cally forwarded to close-by CDN servers Historically one ofthe key reasons for systematic CDN imperfections was thedistance between clients and their local DNS (LDNS) servers[23 35 39 45] This issue was further dramatically amplifiedin recent years (see [28]) with the proliferation of public DNSresolvers eg [5 10 12 15 20 22]In an attempt to remedy poor server selection resulting

from LDNS servers there has been a recent push spear-headed by public DNS providers to adopt the EDNS0 clientsubnet option (ECS) [42] With ECS the clientrsquos IP address(truncated to a 24 or 20 subnet for privacy) is passed throughthe recursive steps of DNS resolution as opposed to passingthe LDNS serverrsquos address

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

However even when the actual clientrsquos network locationis accurate numerous factors prevent CDNs from providingoptimal CDN replica selection [38] First creating accuratenetwork mapping for the Internet is a non-trivial task giventhat routing can inflate both end-to-end latency and latencyvariance eg [44] As a result CDN selections could be heav-ily under-performing (see [38] for a thorough analysis) Theunderlying causes are diverse including lack of peeringrouting misconfigurations side-effects of traffic engineeringand systematic diversions from destination-based forward-ing [33] Second CDN measurements are necessarily coarsegrained ndash measuring each and every client (each individualIP address) from a CDN is impossible for scalability reasonsand because the actual source IP address isnrsquot available dueto ECS truncation On top of this CDNs often have otheroptimization goals eg load balancing which can divertclients further away from close-by replicas

Our key idea is to enable clients to join CDNs in addressingthe above challenges In particular it is far easier for a singleclient to find a subnet that is consistently recommended well-performing replicas than it is for a CDN to evaluate the entireIP space Not only does the client-supported approach scaleit enables far better CDN replica selection decisions In sum-mary in our approach (i) clients conduct measurements tofind other subnets that lead to potentially lower-latency CDNreplicas (ii) they evaluate the performance of such subnetsand associated replicas via light infrequent measurementsover long time-scales and (iii) finally they utilize these sub-nets to drive CDNsrsquo decisions towards lower-latency replicas

22 Respecting CDN Policies andIncentives for Adoption

As stated above CDNs can sometimes deploy policies that canprevent them from serving clients nearby replicas In princi-pal there are two such scenarios First on longer timescalesa CDN may have a business logic where it wants a certainIP subnet and no one else to be able to use a certain CDNcluster or server For example an ISP may allow a CDN todeploy a CDN server inside its network but on the condi-tion that only IP addresses owned by the ISP may benefitfrom that server Our approach is completely compatiblewith such arrangements because such a policy can be easilyenforced via access network-level firewalls which forcesour system to avoid such CDN replicas Second on shortertimescales CDNs may deploy load-balancing policies thatdistribute the traffic load such that clients are not alwaysdirected to the closest replica server Our system respectssuch load-balancing policies because it always selects thefirst CDN replica from a recommended set ie does notopportunistically select the ldquobest replica from the set

Clients have clear incentives to adopt our approach since itdirectly improves their performance While it is certainly thecase that CDNs share the same incentives for our systemrsquosadoption this is evenmore true with the proliferation ofCDNbrokers eg [8] CDN brokers actively measure performanceto multiple CDNs and they can redirect a client to a differentCDN on the fly in case the QoE isnrsquot satisfactory [40] It hasbeen demonstrated that this particularly hurts large CDNswhich often have better replicas in the vicinity of the clientswhich the broker is unfortunately unaware of this leads tothe loss of clients and revenues for CDNs [40] Thus utiliz-ing clientsrsquo help in selecting the best CDN replicas in theirvicinity directly benefits CDNs

We thus argue that an unrestricted adoption of the ECSoption which is the key mechanism that enables the subnetassimilation used by our system (Section 23) is in the bestinterest of every CDN In particular the unrestricted ECSoption enables a client to use the ECS field to specify a subnetdifferent from the clientrsquos to change the way a CDNmaps theclient to its replicas While most CDNs do enable ECS in itsunrestricted form eg Google and EdgeCast among othersAkamai does so only via OpenDNS [16] and GoogleDNS [13]public DNS services using the actual IP address of the re-quester As such Akamairsquos CDN is currently not directlyusable by our system as our system requires sending ECSrequests on behalf of hop IPs One possible reason for such arestriction imposed by Akamai might be the ability of third-parties to accurately reverse-engineer a CDNrsquos scale andcoverage without significant infrastructural resources eg[26 46] However that even without unrestricted ECS Aka-mairsquos network has been quite comprehensively analyzed inthe past eg [36 47 48] One of our main contributions liesin showing that the benefits achievable by letting clientshelp improve CDNsrsquo decisions far outweigh the potentialdrawbacks of unrestricted ECS adoption

23 Subnet AssimilationSubnet assimilation is the deliberate use of the ECS field tospecify a subnet different from the clientrsquos to change theway a CDN maps the client to its replicas In this paper weshow that the assimilation of subnets found along the pathbetween the client and its ldquodefault replica may allow theclient to ldquoreposition itself in the CDNrsquos mapping schemesuch that the client will consistently receive lower-latencyreplica recommendations from the CDN Subnet assimilationis a key mechanism used both to detect and utilize lower-latency CDN replicas

Latency valley Figure 1 illustrates the key heuristic thathelps clients realize when they are not optimally served by aCDN Consider a client C redirected to a CDN server S1 asillustrated in the figure If a hop on the path is not redirected

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

CS1

S2H1 H2 H3 H4

100ms

0ms

latency

Figure 1 Illustration of a latency valley

to the same server S1 then it is possible that a lower-latencyreplica eg S2 has been observed such that the latencybetween C and S2 is smaller than the latency between C andS1 We refer to such a scenario as a latency valley

For the remainder of this paper we refer to the set ofreplicas suggested to the clientrsquos actual subnet as the clientreplica set (CR-set) and a replica from the CR-set as a clientreplica (CR) We refer to the routers along the presumedpath from the client to any given CR as hops Next we referto the set of replicas suggested to a hop discoverable viasubnet assimilation as a hop replica set (HR-set) We denotea measurement from the client to a CR as a client replicameasurement (CRM) Likewise we denote a replica fromthe HR-set as a hop replica (HR) and a measurement to thatreplica from the client as a hop replica measurement (HRM)Finally we define a latency valley to be any occurrence of thefollowing inequality HRMCRM lt vt le 1 We take vt = 1until we optimize our parameter selection in Section 51It is imperative to note that our goal is not to find lower-

latency replicas perhaps included in an HR-set Instead weaim to find subnets to which lower-latency replicas are con-sistently suggested in other words subnets that are proneto valley-occurrences in replica sets obtained at any time

24 Identifying Latency ValleysIn order to identify latency valleys we first identify thehops along the path which we achieve using traceroute Weare using the upstream path for simplicity the downstream

path which may be assymetric could yield different resultsbut would require more overhead or control of the CDNfor execution We then identify the corresponding HR-setfor each hop using ECS queries which assimilate the hopsrsquosubnets Lastly we must compare the HRMs to the CRM Unless otherwise noted we calculate HRMs and CRMs us-ing ping RTTs (obtained by averaging the results of threeback-to-back pings) and we refer to these values as laten-cies Throughout this paper latency units are millisecondsFor simplicity we refer to the ratio HRMCRM used in ourlatency valley definition as the latency ratio A latency ratiobelow 1 indicates that the HR has outperformed the CR alatency ratio above one indicates the CR has outperformedthe HR and finally a latency ratio equal to 1 indicates thatthe CR and HR performed equally (which in general onlyhappens when CR and HR refer to the same replica)

The above operation cannot be executed on-the-fly as thetime consumed by this process will outweigh any latencygains achieved Moreover multiple measurements might berequired Fortunately our experiments in Section 3 showthat a small number of measurements per hop mdash less than10 mdash are required and that the measurement results remainapplicable on timescales of days Because we can rely onmea-surements obtained in idle time real-time measurements arenot necessary for our system Thus our proposed techniquewill use past measurements to predictively choose a goodsubnet for future assimilation

3 EXPLORING VALLEYSIn this section we establish that latency valleys are commonphenomena in the wild and that they offer substantial perfor-mance gains In addition we lay the groundwork for findingvalleys a prerequisite to harnessing their performance gainsvia subnet assimilation

31 Testing for ValleysWe perform our preliminary tests using PlanetLab nodesa platform which offers a large number of vantage pointsfrom around the globe primarily deployed in academic in-stitutions [31] We further investigate latency valleys as ex-perienced by a large variety of clients using the RIPE Atlasplatform [17] as detailed in Section 5 While PlanetLab lacksthe scale and variety offered by RIPE Atlas the flexibilityand freedom it provides made it a prime choice for our pre-liminary analysis Our PlanetLab experiments use 95 nodesspread across 51 test locations andwe obtain the IP addressesof hops between nodes and CDN replicas via tracerouteWhile latency valleys are a common occurrence as we

demonstrate below many hops between the clients and theproviderrsquos servers can be discarded when searching for val-leys One category of such hops is of those which reside

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

within the same local network as the client which will besuggested the same replicas as the client Another disposablecategory mdash hops with private IP addresses mdash will not berecognized by the EDNS server and will yield generic replicachoices regardless of the hopsrsquo locations To ensure that ahop is indeed usable a hop must (i) belong to a different 16subnet than the client (ii) have a different domain than theclient and (iii) belong to a different ASN than the client Weonly filter hops that fail these conditions at the beginning ofthe route once a hop is observed that meets the above threeconstraints we stop filtering for the remainder of the route

311 Provider Selection While ECS is gaining momen-tum among CDNs as it allows them to better estimate theirclientsrsquo location it is important to note that ECS was onlyrecently deployed and its implementation varies among thedifferent providers ldquoA Faster Internet an initiative headedby Google and OpenDNS is drawing together a growingnumber of large CDNs and content providers to adopt andpromote the adoption of ECS [42] In order to attain a set ofCDNs for our tests we have selected those providers whichimplement an unrestricted form of ECS as described in Sec-tion 22In order to best select the CDNs for our tests we scraped

URLs from over 3000 sites arbitrarily selected from the AlexaTop 1M list [2] Then we reduced our set of URLs to thosethat ended in a known file type in order to ensure our abil-ity to perform download tests on the selected URLs Whenapplicable we resolved CNAME domains to their respectiveCDN domains We then performed multiple DNS queries foreach URL to determine if unrestricted ECS is implementedFrom the remaining URLs we have extracted the 6 CDNdomains which have appeared most frequently along withtheir respective URLs

Before we can begin seeking subnets with better mapping-groups it is first necessary to prove that subnets with dif-ferent mapping-groups exist and are easy to find We defineusable route length as the number of hops along the path thatfulfill the above filtering Next we define divergence as thefraction of usable hops along the path which were recom-mended at least one replica that was not recommended tothe clientFigure 2 depicts usable route length and divergence for

different CDNs The figure shows that for example when re-questing replicas from Google each client had an average us-able route length of 78 and the divergence is approximately92 It is evident from Figure 2 that hops are indeed sug-gested different replicas than their client As we encounter awider variety of recommendations we increase our numberof opportunities to find latency valleys The high divergence

Goo

gle

Clo

udFr

ont

Alib

aba

CD

Net

wor

ks

Chi

naN

etCtr

Cub

eCD

N

0 0

0 2

0 4

0 6

0 8

div

erg

ence

mean divergence mean usable route length

0

2

4

6

8

10

12

14

usa

ble

route

length

Figure 2 Average divergence and average usable routelength per CDN

shown in Figure 2 indicates that other and possibly lower-latency replicas can be mapped to clients if they were toassimilate their hopsrsquo IP addresses

For the remainder of this paper we use the following set ofproviders Google Amazon CloudFront Alibaba ChinaNet-Center CDNetworks and CubeCDN a set which is bothdiverse and comprehensive Googlersquos CDN infrastructureis massive and dispersed as of 2013 around 30000 CDN IPaddresses spread across over 700 ASes were observable inone day [25] As of this writing Amazon CloudFront offeredover 50 points-of-presence spread throughout the world[6] CDNetworks offers over 200 points-of-presence alsoglobally dispersed and heavily employing anycast for serverselection Alibaba and ChinaNetCenter offer over 500 CDNnode locations each within China where they are centeredin addition to a growing number of service locations outsideof China [3 30] Finally CubeCDN is a smaller CDN withlocations spread primarily across Turkey [14]

312 Test Execution In order to assert the existence oflatency valleys in the wild we have executed a series of trialsusing the aforementioned URLs and the PlanetLab nodes asclients Each trial consists of the following steps

(1) We randomly select a URL(2) Client retrieves CR-set for the selected URLrsquos domain(3) For each CR client uses traceroute to identify hops(4) Using subnet assimilation client retrieves the HR-set

for each hop(5) Client measures CRMs for every CR in its CR-set and

HRMs for every HR it has seen (across all HR-setsobtained in that trial)

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

Figure 3 Scatter plot of HRMs and minimum CRMsThe area below the diagonal is the valley region

We have performed a series of 45 trials per PlanetLabnode executed between one and two hours apart For theremainder of this Section we refer to this dataset

32 Valley PrevalenceIn order to establish the prevalence and significance of la-tency valleys we now compare HRMs to their respectiveCRMs It is important to note that it is possible that a clienthas multipleCRs and hence multipleCRMs and in practicea client can only choose one replica Therefore we compareall HRMs to the minimum CRM obtained in their trial thusestablishing a lower bound for our expected performancegain and allowing us to easily assert any gains or lossesprovided by the alternative replica choices (the HRs)With our PlanetLab dataset we compared each HRM to

the minimum CRM ie the best client replica obtained inthe same respective trial Figure 3 shows the results of thisassessment We plot HRM on the y axis and minimum CRMon the x axis thus creating a visual representation of thelatency ratio We distinguish between the CDNs using differ-ent colors To better explain the results we draw the equalityline where HRM = minimum CRM Every data point alongthis line represents a hoprsquos subnet which is being suggesteda replica that performs on a par with the replicas suggested

to the clientrsquos subnet More importantly every data pointbelow the equality line represents a valley occurrence as thealternative replicarsquos HRM is smaller than the CRM The per-centage of valleys by provider ranges from 1402 (Cloud-Front) to 3858 (CubeCDN) with an average of 22 acrossall providers Table 1 shows the results ie Column 2 (Valleys Overall) lists these percentages for each CDN

Now that we have established that valleys exist we wishto determine where we can expect to find valleys We con-tinue using the minimum CRM from each trial for reasonsdescribed above However in order to better reflect prac-tical scenarios where only one of a given set of replicas isused we now also begin choosing an individual replica froman HR-set Instead of choosing the minimum HRM for agiven hop we conservatively choose the median Each hoprsquoschosen HRM is the median of its HR-set for that trial Wecontinue this pattern (choosing the clientrsquos best CRM fromthe trial and a hoprsquos median HRM) for the remainder of ourPlanetLab data analysis Our PlanetLab data thus representsa lower bound on valleys and their performance implicationswe compare the median performance of our proposed proce-dure to the absolute best the existing methods have to offerIn Section 5 using our RIPE Atlas testbed we remove thisconstraint to demonstrate the real-world performance of oursystemWe further wish to consider how common valleys are

within a given route from the client to the provider Column 3(Average of Valleys per Route) of Table 1 shows given someroute in one trial the average percent of usable hops thatincur valleysWe see for Alibaba that on average over a thirdof the usable hops in a given route are likely to incur valleysand for CDNetworks nearly a fourth of the route Meanwhilefor CloudFront we see that on average valleys occur for onlya small portion of hops in a given route Column 4 ( Routeswith a Valley) details the percentage of all observed routesthat contained at least one valley in the trial in which theywere observed For Alibaba and CDNetworks around 75of the observed routes contain valleys while more than 50for Google

321 Do Valleys Persist Next we assess whether subnetsare persistently valley-prone For some number of trialshow often can the client expect a valley to occur from someparticular hop To answer this we need to be able to describehow frequently valleys occurred for some hop subnet in agiven set of trials

Valley frequency We define a new simple metric thevalley frequency as follows if we look at a hop-client pairacross a set of N trials and v trials are valleys the valleyfrequency vf is

vf =v

N

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Table 1 Detailed findings for each provider based on PlanetLab data

Provider ValleyOverall

Avg Valleysper Route

Routeswith Valley

Hop-Client Pairsw Valley Freq gt05

Google 2024 1641 5330 1098Amazon CloudFront 1402 872 2582 1000Alibaba 3368 3594 7583 3097CDNetworks 1561 2441 7308 1409ChinaNetCenter 2742 1426 3810 1674CubeCDN 3858 1795 2549 2632

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(a) ping

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

sGoogle

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(b) total download time

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(c) total post-caching download time

Figure 4 CDFs of clients by valley frequency In 4a the subnet-response measurements are ping times averagedfrom bursts of 3 back to back pings In 4b the subnet-response measurements are total download times on firstattempts while in 4c the subnet-response measurements are total download times on consecutive attempts (re-peated downloads that take place immediately after first attempt to account for the potential impact of caching)Downloads were performed by using curl where we set an IP as a destination and set the domain as the HOSTattribute Measurements for 4b and 4c were performed back-to-back so that 4c reflects download times with pre-sumably primed caches

A frequency of 05 would imply that in half the trials per-formed a valley was found in said hop For each providerwe plot the valley frequency for each hop-client pair acrossour PlanetLab dataset

Figure 4 shows the results Note that Figures 4b and 4c usethe total download time and the post-caching total downloadtime respectively for the subnet measurements as opposedto pings2 We further note that their results closely follow thoseobtained using pings as in Figure 4a For simplicity and dueto download limitations of our RIPE testbed we revert toonly using pings for the remainder of this paperFigure 4 shows that approximately 5-20 of hop-client

pairs resulted in valleys 100 of the time (see y-axis in thefigures for x=10) for every trial across the entire three daytest period Such hops will be easy to find given a largeenough dataset However of more interest to us are hop-client pairs that inconsistently incur valleys For exampleperhaps if valleys can be found 50 of the time for some hop-client pair mdash ie a valley frequency of 05 mdash that hop may bea sufficiently persistent producer of valleys for us to reliablyexpect good replica choices This would provide our system2We fetched png and js files 1kB ndash 1MB large hosted at the CDNs

with more opportunities to employ subnet assimilation andideally improve performance Column 5 ( Hop-Client Pairswith Valley Frequency gt 05) of Table 1 shows the percentageof hop-client pairs that have valley frequencies (calculatedacross all 45 trials) greater than 05 ie hop-client pairs thatincurred valleys in the majority of their trials

322 Can We Predict Valleys While valley frequencycan tell us how often valleys occurred on a past dataset themetric makes no strong implications about what we canexpect to see for a future dataset To answer this let us con-sider how the latency ratio changes as we increase the timepassed between the trials we compare In addition per ourreasoning in the vf metric we may want to compare a set ofconsecutive trials mdash a window mdash to another consecutive setof the same size rather than only comparing individual trials(which is essentially comparing windows of size 1) Given aset of trials we take a window of size N and compare it toall other windows of size N including those that overlap bytaking the difference in their latency ratios (HRMCRM) Ifwe plot this against the time distance between the windowswe can observe the trend in latency ratios as the distance in

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(a) All client-hop pairs

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(b) Valley at least once

Figure 5 In both figures we compare the change in latency ratio between two trial windows to the distance intime between the two windows In figure a we use all hop client pairs In figure b we restrict our set to pairs thatexperience at least one valley in at least one of their 45 trials

time between windows increases An upward slope wouldimply that the latency ratio is drifting farther apart as thetime gap increases while a horizontal line would imply thatthe change in latency ratio is not varying with time (ie win-dows an hour apart are as similar to each other as windowsdays apart) A jagged or wildly varying line would indicatethat future behavior is unpredictable

Figure 5 plots the latency ratio versus distance in time forseveral window sizes ranging from 1 to 15 In these plots weuse the windowsrsquo median latency ratio values for compari-son In particular in Figure 5a we use our entire dataset ieconsider all hop-client pairs independently of if they ever in-cur latency valleys or not and plot latency ratio differencesover time For example assume that for a client-hop pairHRM = 80 ms at one measurement point while it rises to120 ms at another measurement point Meanwhile assumethatCRM remains 100 ms at both measurement points Thusthe latency ratioHRMCRM changes from 80100 to 120100ie from 08 to 12 The latency ratio difference is 04 Fig-ure 5a shows that the latency ratio difference values bothincrease and vary wildly as windows become more distantThis shows that hop-subnet performance overall is likelyextremely unpredictable We hypothesize this results fromunmapped subnets receiving generic answers as observedin [47] However subnet assimilation requires that we havesome idea of how a subnet will perform in advance

Since valley-prone subnets are of particular interest to usin Figure 5b we reduce our dataset to only include client-hoppairs that have at least one valley occurrence across all 45 ofits trials combined The plotrsquos behavior becomes dramaticallymore stable after applying this simple minor constraint Theeffects of the window size also become apparent in Figure5b Even with a window of size 1 the slope is very small andthe line is nearly flat after over 50 hours two windows are

only 10 to 20 more dissimilar than two windows a singlehour apart For window sizes greater than 5 latency ratiosare often within 5 of each other regardless of their distancein time Going from a window size of 1 to 5 shows drasticimprovement while each following increase in window sizeshows a smaller impact The results clearly show that a clientcan effectively identify valley-prone subnets and predictvalleys with a sufficiently long window size

323 Valley Utility Analysis Figure 6 shows the distribu-tion of the lower bound latency ratios seen by the set of allvalley occurrences for each provider The latency ratio rep-resents a lower bound because we use the minimum latencymeasure for replicas recommended to the client For exampleif the minimum CRM is 100 ms and HRM is 80 ms then thelatency ratio is 08 Thus the closer to 0 the latency ratiosare the larger the gain from subnet assimilation The red lineshows median the box bounds the 25th and 75th percentilesand the whiskers extend up to data points 15 times theinterquartile-range (75th percentile - 25th percentile) abovethe 75th percentile and below the 25th percentile Data pointsbeyond the whiskers if they exist are considered outliersand not shownFigure 6 shows that most of the providers show oppor-

tunities for significant latency reduction From this plot itappears that Amazon CloudFront and ChinaNetCenter offerthe greatest potential for gains With the exception of CD-Networks we see all of our providers have 25th percentilesnear or below a latency ratio of 08 a 20 performance gainWe also observe the diversity of valley ldquodepth For exam-ple we see in Figure 6 that ChinaNetCenterrsquos interquartilerange spans over 40 of the possible value space (between 0and 1) With such a wide variety of gains for 50 of its val-leys it opens the door to being more selective Rather thansimply chasing all valleys we could set strict requirements

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

0 2

0 4

0 6

0 8

1 0

late

ncy

rati

o

Figure 6 Lower bound of latency ratio of all valley oc-currences

tightening our definition of whatrsquos ldquogood enough for subnetassimilation as we evaluate in Section 51

CDNetworksrsquo performance is more tightly bounded as itsinterquartile range covers less than 10 of the value spaceand sits very near to 1 This implies CDNetworks probablyoffers very little opportunity for our technique to improve itsreplica selection We hypothesize that this is a byproduct ofCDNetworks anycast implementation for replica selectionanycast depends more on routing and network propertiesthan DNS and IP selection [27] In addition Googlersquos medianand 75th percentile latency ratios sit very near 10 Howeverby being more selective as described above we may be ableto pursue better valleys in the lower quartiles below themedian We could potentially improve our systemrsquos valley-prone hop selection by filtering out ldquoshallow valleys fromour consideration We demonstrate and discuss the effectsof different levels of selectiveness in Section 51

4 DRONGO SYSTEM OVERVIEWWe introduce Drongo a client-side system that employs sub-net assimilation to speed up CDNs Drongo sits on top of aclientrsquos DNS system In a full deployment Drongo will beinstalled as a LDNS proxy on the client machine where itwould have easy access to the ECS option and DNS responsesit needs to perform trials Drongo is set by the client as itsdefault local DNS resolver and acts as a middle party re-shaping outgoing DNS messages via subnet assimilation andstoring data from incoming DNS responses In our currentimplementation Drongo uses Googlersquos public DNS serviceat 8888 to complete DNS queriesUpon reception of an outgoing query from the client

Drongo must decide whether to use the clientrsquos own subnetor to perform subnet assimilation with some known alterna-tive subnet If Drongo has sufficient data for that subnet incombination with that domain it makes a decision whetheror not to use that subnet for the name resolution If Drongolacks sufficient data it issues an ordinary query (using theclientrsquos own subnet for ECS)

41 Window SizeWe now face the question What is a sufficient amount ofdata needed by Drongo to ensure quality subnet assimila-tion decisions We choose to measure the data ldquoquantityby the number of trials for some subnet where the subnetis obtained via traceroutes performed during times whenthe clientrsquos network was idle A sufficient quantity must beenough trials to fill some predetermined window size As weobserved in Section 321 the marginal benefit of increasingthe window size decreases with each additional trial To keepstorage needs low while also obtaining most of the benefitof a larger window we set Drongorsquos window size at 5

42 Data CollectionDrongo must execute trials defined in Section 312 in orderto fill its window and collect sufficient data As demonstratedin Figure 5b when these trials occur is of little to no signifi-cance In our experiments we perform our trials at randomlysampled intervals our trial spacing varies from minutes todays with a tendency toward being near an hour apart Thissporadic spacing parallels the variety of timings we expect tohappen on a real client the client may be online at randomunpredictable times for unpredictable lengths of time

43 Decision MakingHere we detail Drongorsquos logic assuming it has sufficient data(a full window) with which to make a decision about a partic-ular subnet For some domain Drongo must decide whethera subnet is sufficiently valley-prone for subnet assimilationIf so Drongo will use that subnet for the DNS query if notDrongo will use the clientrsquos own subnet From Figure 5b weknow we can anticipate that future behavior will resemblewhat Drongo sees in its current window if Drongo has seenat least one valley occurrence for the domain of interest fromthe subnet under consideration However in Figure 6 wesee that many valleys offer negligible performance gainswhich might not outweigh the performance risk imposedby subnet assimilation To avoid these potentially high riskhops Drongo may benefit from a more selective system thatrequires a high valley frequency in the training window toallow subnet assimilation We explore the effects of changingthe vf parameter in Section 5

It is possible that for a single domain multiple hop subnetsmay qualify as sufficiently valley-prone for subnet assimi-lation When this occurs Drongo must attempt to choosethe best performing from the set of the qualified hops Todo this Drongo selects the hop subnet with the highest val-ley frequency in its training window in the event of a tieDrongo chooses randomly

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

02 04 06 08 10valley threshold

095

100

105

110

115

120

125

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 7 Average latency ratio of overall system as wevary vf and vt

5 DRONGO EVALUATIONHere we evaluate Drongorsquos ability to intelligently choosewell-mapped hop subnets for subnet assimilation and theperformance gains imposed on clients where subnet assimi-lation is appliedUsing the RIPE Atlas platform [17] we collect a trace of

429 probes spread across 177 countries for 1 month In ourexperiments we use the first portion of a clientrsquos trials as atraining window Following the completion of the trainingwindow we use the remaining trials to test Drongorsquos abilityto select good hops for real time subnet assimilation Eachclient performed 10 trials per provider trials 0 through 4to form its training window and trials 5 through 9 to testthe quality of Drongorsquos decisions In our evaluation Drongoalways selects the first CR from a CR-set and the first HRfrom a HR-set mirroring real world client behavior mdash noreal-time on-the-fly measurements are conducted and alldecisions are based on the existing window

51 Parameter SelectionFigure 7 shows the effects of Drongo on our entire RIPEtracersquos average latency ratio ie we consider all requestsgenerated by clients including those that arenrsquot affected byDrongo The figure plots average latency ratio (shown onthe y axis) as a function of the valley threshold vt shownon the x axis The valley threshold introduced in Section 23determines maximum latency ratio a hop-client pair musthave to classify as a valley occurrence For example whenthe threshold equals 10 all latency valleys are consideredon the other hand when vt is 06 Drongo triggers clientassimilation only for latency valleys that have promise toimprove performance by more than 40 ie latency ratiosmaller than 06 The figure shows 5 curves each represent-ing a different valley frequency parameter varied between02 and 10

Figure 7 shows the average results across all tested clientsWe make several insights If the valley frequency is small

02 04 06 08 10valley threshold

05

10

15

20

25

30

35

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 8 Average latency ratio of cases where subnetassimilation was performed

eg 02 Drongo will unselectively trigger subnet assimila-tion for all valleys that have frequency equal to or largerthan 02 which requires only one valley occurrence for ourwindow size of 5 Conversely as the minimum vf parame-ter increases overall system performance improves mdash thelatency ratio drops below 10 Thus the valley frequency isa strong indicator of valley-proneness further supportingour prior findings from Figure 5b

Meanwhile two more characteristics stand out as we varyvt First the valley frequency parameter can completely al-ter the slope of the latency ratio plotted against vt Thisbehavior echoes the observation we made in the previousparagraph with extremely loose requirements Drongo doesnot sufficiently filter out poor-performing subnets from itsset of candidates Intersetingly with a strictvf (closer to 10)the slope changes and the average latency ratio decreasesas we raise the valley threshold This is because if Drongois too strict it filters out too many potential candidates forsubnet assimilation Second we see that the curve eventuallybends upward with high vt values indicating that even thevt can be too lenient The results show that the minimumaverage latency ratio of 09482 (y-axis) is achieved for thevalley threshold of 095 (x-axis) Thus with vf == 10 andvt == 095 Drongo produces its maximum performancegains averaging 518 (10 - 09482) across all of our testedclients By balancing these parameters Drongo filters outsubnets that offer the least benefit subnets where valleysseldom occur and subnets where valleys tend to be shallow

Figure 8 plots the average latency ratio in the samemanneras Figure 7 yet only for queries where Drongo applied sub-net assimilation First the figure again confirms that lowervalley frequency parameters degrade Drongorsquos performanceSecond paired with our knowledge of Figure 7 it becomesclear that as valley threshold decreases the number of la-tency valleys shrinks while the quality of the remainingvalleys becomes more potent This is why the latency ratiodecreases as vt decreases However if the threshold is too

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

02 04 06 08 10valley threshold

00

02

04

06

08

fract

ion

clie

nts

affe

cted

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 9 Percentage of clients where subnet assimila-tion was performed

02 04 06 08 10valley threshold

085

090

095

100

late

ncy

ratio

Google(08)CloudFront(08)

Alibaba(04)CDNetworks(10)

ChinaNetCtr(06)CubeCDN(10)

Figure 10 Per-provider system performance for allqueries Optimal vf is set for each provider and notedin parentheses

small the number of available valleys becomes so small thatthe performance becomes unpredictable ie outlier behaviorcan dominate causing the spike in average latency ratios forvalley thresholds under 02

Finally Figure 9 plots the percent of clients for whichDrongo performed subnet assimilation for at least one providerNecessarily the less strict the frequency constraint is (vf ge

02 is the least strict constraint) the more frequently Drongoacts As shown in Figure 9 6993 of our clients were affectedby Drongo with vf and vt set at the the peak aggregate per-formance values (10 and 095 respectively) found above

Summary In this section we selected parametersvf = 1and vt = 095 where Drongo reaches its peak aggregateperformance gains of 518

52 Per-Provider PerformanceIn the previous subsection we used a constant parameter setacross all six providers in our analysis In this section weanalyze Drongo on a per-provider basis By choosing an op-timal parameter for each provider as we do below Drongorsquosaggregate performance gains increases to 585 In Figure 10

Google

(0807)CloudFront

(0808) Alibaba

(0409)

CDNetworks

(1009)ChinaNetCtr

(06085)CubeCDN

(10095)

02040608101214

late

ncy

ratio

Figure 11 Per-provider system performance forqueries where subnet assimilation was applied Paren-theses contain optimal values for each respectiveprovider formatted (vf vt )

we show how Drongo impacts the average performance ofeach provider while setting valley frequency to each individ-ual providerrsquos respective optimal value (noted in parenthesesunder each provider name) Note that for most providersthe individual optimal valley frequency lies near the valueobtained in Section 51 (10) While the intricacies of eachproviderrsquos behavior clearly hinge on opaque characteristicsof their respective networks we offer several explanations forthe observed performance CDNetworks which sees smallgains across all valley threshold values further validates thehypothesis proposed in Section 323 regarding anycast net-works Meanwhile Google the largest provider of our seton a global scale also experienced the second highest peakaverage gains from Drongo is not well served by Google itis likely that there exist nearby subnets being directed to dif-ferent replicas In other words Drongo has great opportunityfor improving CDNs that use fine grained subnet mapping

Finally Figure 11 showsDrongorsquos performance per-providerexclusively in scenarios when it applies subnet assimilationComparing to the results shown in Figure 6 for the PlanetLab experiments we observe significant differences Most no-tably the latency valley ratios are much smaller in Figure 11than in Figure 6 For example the Googlersquos median latencyratio is close to 10 in Figure 6 indicating insigificant gainsOn the contrary Figure 11 Googlersquos median latency ratio isaround 05 implying gains of 50 (10 minus 05) and up to anorder of magnitude in edge cases Considering all providersDrongo-influenced replica selections are on average 2489better performing than Drongo-free selectionsThere are three reasons for this behavior First Figure 6

shows the lower-bound system performance for PlanetLab asthe best CR is always used for comparison Such a restrictionis not imposed in the the RIPE Atlas scenario Second theRIPE set is more diverse and includes widely distributed end-points thus allowing CDNs greater opportunity for subnetmapping error Third contrary to the Planet Lab scenario

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

where we used all the valleys under the valley threshold of10 here we use optimal vt values such that Drongo suc-cessfully filters out most ldquoshallow valleys that would diluteperformance gains

6 RELATEDWORKGiven a large number of CDNs with vastly different deploy-ment and performance in different territories CDN brokeringhas emerged as a way to improve user-perceived CDNsrsquo per-formance eg [7 34] By monitoring the performance ofmultiple CDNs brokers are capable of determining whichparticular CDN works better for a particular client at a pointin time Necessarily this approach requires content providersto contract with multiple CDNs to host and distribute theircontent Contrary to CDN brokering Drongo manages toimprove performance of each individual CDN Still it workscompletely independently from CDNs and it is readily de-ployable on the clients Moreover Drongo is completelycompatible with CDN brokering since it in principle hasthe capacity to improve the performance of whichever CDNis selected by a broker

There has been tremendous effort expended in the ongoingbattle to make web pages load faster improve streamingperformance etc As a result many advanced client-basedprotocols and systems have emerged from both industry(eg QUIC SPDY and HTTP2) and academia [24 41 50] inthe Web domain and likewise in streaming eg [37] WhileDrongo is also a client-based system it is generic and helpsspeed-up all CDN-hosted content By reducing the CDNlatency experienced by end users it systematically improvesall the above systems and protocols which in turn helps allassociated applicationsRecent work has demonstrated that injecting fake infor-

mation can help protocols achieve better performance Oneexample is routing [49] where fake nodes and links are intro-duced into an underlying linkstate routing protocol so thatrouters compute their own forwarding tables based on theaugmented topology Similar ideas are shown useful in thecontext of traffic engineering [29] where robust and efficientnetwork utilization is accomplished via carefully dissemi-nated bogus information (ldquolies) While similar in spirit tothese approaches Drongo operates in a completely differentdomain aiming to address the CDN pitfalls In addition tousing ldquofake (source subnet) information in order to improveCDN decisions Drongo also invests efforts in discoveringvalley-prone subnets and determining conditions when usingthem is beneficial

7 DISCUSSIONA mass deployment of Drongo is non-trivial and carrieswith it some complexities that deserve careful consideration

There is some concern that maintainence of CDN allocationpolicies may still be compromised despite our efforts to re-spect their mechanisms in Drongorsquos design We propose thata broadly deployed form of Drongo could be carefully tunedto effect only the most extreme cases Figure 11 shows thatsubnet assimilation carries some risk of a loss in performanceso it is in clientsrsquo best interests to apply it conservativelyFirst we know from Figure 9 that the number of clients us-ing assimilated subnets can be significantly throttled withstrict parameteres Further in todayrsquos Internet many clientsare already free to choose arbitrary LDNS servers ([25 36])Many of these servers do not make use of the client subnetoption at all thus rendering the nature of their side-effects mdashpotentially disrupting policies enforced only in DNS mdash equiv-alent to that of Drongo It is difficult to say whether or notDrongorsquos ultimate impact on CDN policies would be anymore sigificant than the presence of such clients

Mass adoption also raises scalability concerns particularlyregarding the amount of measurement traffic being sent intothe network To keep the number of measurements smallwhile ensuring their freshness a distributed peer-to-peercomponent where clients in the same subnet share trial datacould be incorporated into Drongorsquos design We leave thismodification for future work

8 CONCLUSIONSIn this paper we proposed the first approach that enablesclients to actively measure CDNs and effectively improvetheir replica selection decisions without requiring any changesto the CDNs and while respecting CDNsrsquo policies We haveintroduced and explored latency valleys scenarios wherereplicas suggested to upstream subnets outperform thoseprovided to a clientrsquos own subnet We have asserted thatlatency valleys are common phenomena and we have foundthem across all the CDNs we have tested in 26-76 of routesWe showed that valley-prone upstream subnets are easilyfound from the client are simple to identify with few andinfrequent measurements and once found are persistentover timescales of daysWe have introduced Drongo a client-side system that

leverages the performance gains of latency valleys by iden-tifying valley-prone subnets Our measurements show thatDrongo can improve requestsrsquo latency by up to an orderof magnitude Moreover Drongo achieves this with excep-tionally low overhead a mere 5 measurements suffice fortimescales of days Using experimentally derived parame-ters Drongo affects the replica selection of requests made by6993 of clients improving their latency by 2489 in themedian case Drongorsquos significant impact on these requeststranslates into an overall improvement in client-perceivedaggregate CDN performance

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 3: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

However even when the actual clientrsquos network locationis accurate numerous factors prevent CDNs from providingoptimal CDN replica selection [38] First creating accuratenetwork mapping for the Internet is a non-trivial task giventhat routing can inflate both end-to-end latency and latencyvariance eg [44] As a result CDN selections could be heav-ily under-performing (see [38] for a thorough analysis) Theunderlying causes are diverse including lack of peeringrouting misconfigurations side-effects of traffic engineeringand systematic diversions from destination-based forward-ing [33] Second CDN measurements are necessarily coarsegrained ndash measuring each and every client (each individualIP address) from a CDN is impossible for scalability reasonsand because the actual source IP address isnrsquot available dueto ECS truncation On top of this CDNs often have otheroptimization goals eg load balancing which can divertclients further away from close-by replicas

Our key idea is to enable clients to join CDNs in addressingthe above challenges In particular it is far easier for a singleclient to find a subnet that is consistently recommended well-performing replicas than it is for a CDN to evaluate the entireIP space Not only does the client-supported approach scaleit enables far better CDN replica selection decisions In sum-mary in our approach (i) clients conduct measurements tofind other subnets that lead to potentially lower-latency CDNreplicas (ii) they evaluate the performance of such subnetsand associated replicas via light infrequent measurementsover long time-scales and (iii) finally they utilize these sub-nets to drive CDNsrsquo decisions towards lower-latency replicas

22 Respecting CDN Policies andIncentives for Adoption

As stated above CDNs can sometimes deploy policies that canprevent them from serving clients nearby replicas In princi-pal there are two such scenarios First on longer timescalesa CDN may have a business logic where it wants a certainIP subnet and no one else to be able to use a certain CDNcluster or server For example an ISP may allow a CDN todeploy a CDN server inside its network but on the condi-tion that only IP addresses owned by the ISP may benefitfrom that server Our approach is completely compatiblewith such arrangements because such a policy can be easilyenforced via access network-level firewalls which forcesour system to avoid such CDN replicas Second on shortertimescales CDNs may deploy load-balancing policies thatdistribute the traffic load such that clients are not alwaysdirected to the closest replica server Our system respectssuch load-balancing policies because it always selects thefirst CDN replica from a recommended set ie does notopportunistically select the ldquobest replica from the set

Clients have clear incentives to adopt our approach since itdirectly improves their performance While it is certainly thecase that CDNs share the same incentives for our systemrsquosadoption this is evenmore true with the proliferation ofCDNbrokers eg [8] CDN brokers actively measure performanceto multiple CDNs and they can redirect a client to a differentCDN on the fly in case the QoE isnrsquot satisfactory [40] It hasbeen demonstrated that this particularly hurts large CDNswhich often have better replicas in the vicinity of the clientswhich the broker is unfortunately unaware of this leads tothe loss of clients and revenues for CDNs [40] Thus utiliz-ing clientsrsquo help in selecting the best CDN replicas in theirvicinity directly benefits CDNs

We thus argue that an unrestricted adoption of the ECSoption which is the key mechanism that enables the subnetassimilation used by our system (Section 23) is in the bestinterest of every CDN In particular the unrestricted ECSoption enables a client to use the ECS field to specify a subnetdifferent from the clientrsquos to change the way a CDNmaps theclient to its replicas While most CDNs do enable ECS in itsunrestricted form eg Google and EdgeCast among othersAkamai does so only via OpenDNS [16] and GoogleDNS [13]public DNS services using the actual IP address of the re-quester As such Akamairsquos CDN is currently not directlyusable by our system as our system requires sending ECSrequests on behalf of hop IPs One possible reason for such arestriction imposed by Akamai might be the ability of third-parties to accurately reverse-engineer a CDNrsquos scale andcoverage without significant infrastructural resources eg[26 46] However that even without unrestricted ECS Aka-mairsquos network has been quite comprehensively analyzed inthe past eg [36 47 48] One of our main contributions liesin showing that the benefits achievable by letting clientshelp improve CDNsrsquo decisions far outweigh the potentialdrawbacks of unrestricted ECS adoption

23 Subnet AssimilationSubnet assimilation is the deliberate use of the ECS field tospecify a subnet different from the clientrsquos to change theway a CDN maps the client to its replicas In this paper weshow that the assimilation of subnets found along the pathbetween the client and its ldquodefault replica may allow theclient to ldquoreposition itself in the CDNrsquos mapping schemesuch that the client will consistently receive lower-latencyreplica recommendations from the CDN Subnet assimilationis a key mechanism used both to detect and utilize lower-latency CDN replicas

Latency valley Figure 1 illustrates the key heuristic thathelps clients realize when they are not optimally served by aCDN Consider a client C redirected to a CDN server S1 asillustrated in the figure If a hop on the path is not redirected

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

CS1

S2H1 H2 H3 H4

100ms

0ms

latency

Figure 1 Illustration of a latency valley

to the same server S1 then it is possible that a lower-latencyreplica eg S2 has been observed such that the latencybetween C and S2 is smaller than the latency between C andS1 We refer to such a scenario as a latency valley

For the remainder of this paper we refer to the set ofreplicas suggested to the clientrsquos actual subnet as the clientreplica set (CR-set) and a replica from the CR-set as a clientreplica (CR) We refer to the routers along the presumedpath from the client to any given CR as hops Next we referto the set of replicas suggested to a hop discoverable viasubnet assimilation as a hop replica set (HR-set) We denotea measurement from the client to a CR as a client replicameasurement (CRM) Likewise we denote a replica fromthe HR-set as a hop replica (HR) and a measurement to thatreplica from the client as a hop replica measurement (HRM)Finally we define a latency valley to be any occurrence of thefollowing inequality HRMCRM lt vt le 1 We take vt = 1until we optimize our parameter selection in Section 51It is imperative to note that our goal is not to find lower-

latency replicas perhaps included in an HR-set Instead weaim to find subnets to which lower-latency replicas are con-sistently suggested in other words subnets that are proneto valley-occurrences in replica sets obtained at any time

24 Identifying Latency ValleysIn order to identify latency valleys we first identify thehops along the path which we achieve using traceroute Weare using the upstream path for simplicity the downstream

path which may be assymetric could yield different resultsbut would require more overhead or control of the CDNfor execution We then identify the corresponding HR-setfor each hop using ECS queries which assimilate the hopsrsquosubnets Lastly we must compare the HRMs to the CRM Unless otherwise noted we calculate HRMs and CRMs us-ing ping RTTs (obtained by averaging the results of threeback-to-back pings) and we refer to these values as laten-cies Throughout this paper latency units are millisecondsFor simplicity we refer to the ratio HRMCRM used in ourlatency valley definition as the latency ratio A latency ratiobelow 1 indicates that the HR has outperformed the CR alatency ratio above one indicates the CR has outperformedthe HR and finally a latency ratio equal to 1 indicates thatthe CR and HR performed equally (which in general onlyhappens when CR and HR refer to the same replica)

The above operation cannot be executed on-the-fly as thetime consumed by this process will outweigh any latencygains achieved Moreover multiple measurements might berequired Fortunately our experiments in Section 3 showthat a small number of measurements per hop mdash less than10 mdash are required and that the measurement results remainapplicable on timescales of days Because we can rely onmea-surements obtained in idle time real-time measurements arenot necessary for our system Thus our proposed techniquewill use past measurements to predictively choose a goodsubnet for future assimilation

3 EXPLORING VALLEYSIn this section we establish that latency valleys are commonphenomena in the wild and that they offer substantial perfor-mance gains In addition we lay the groundwork for findingvalleys a prerequisite to harnessing their performance gainsvia subnet assimilation

31 Testing for ValleysWe perform our preliminary tests using PlanetLab nodesa platform which offers a large number of vantage pointsfrom around the globe primarily deployed in academic in-stitutions [31] We further investigate latency valleys as ex-perienced by a large variety of clients using the RIPE Atlasplatform [17] as detailed in Section 5 While PlanetLab lacksthe scale and variety offered by RIPE Atlas the flexibilityand freedom it provides made it a prime choice for our pre-liminary analysis Our PlanetLab experiments use 95 nodesspread across 51 test locations andwe obtain the IP addressesof hops between nodes and CDN replicas via tracerouteWhile latency valleys are a common occurrence as we

demonstrate below many hops between the clients and theproviderrsquos servers can be discarded when searching for val-leys One category of such hops is of those which reside

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

within the same local network as the client which will besuggested the same replicas as the client Another disposablecategory mdash hops with private IP addresses mdash will not berecognized by the EDNS server and will yield generic replicachoices regardless of the hopsrsquo locations To ensure that ahop is indeed usable a hop must (i) belong to a different 16subnet than the client (ii) have a different domain than theclient and (iii) belong to a different ASN than the client Weonly filter hops that fail these conditions at the beginning ofthe route once a hop is observed that meets the above threeconstraints we stop filtering for the remainder of the route

311 Provider Selection While ECS is gaining momen-tum among CDNs as it allows them to better estimate theirclientsrsquo location it is important to note that ECS was onlyrecently deployed and its implementation varies among thedifferent providers ldquoA Faster Internet an initiative headedby Google and OpenDNS is drawing together a growingnumber of large CDNs and content providers to adopt andpromote the adoption of ECS [42] In order to attain a set ofCDNs for our tests we have selected those providers whichimplement an unrestricted form of ECS as described in Sec-tion 22In order to best select the CDNs for our tests we scraped

URLs from over 3000 sites arbitrarily selected from the AlexaTop 1M list [2] Then we reduced our set of URLs to thosethat ended in a known file type in order to ensure our abil-ity to perform download tests on the selected URLs Whenapplicable we resolved CNAME domains to their respectiveCDN domains We then performed multiple DNS queries foreach URL to determine if unrestricted ECS is implementedFrom the remaining URLs we have extracted the 6 CDNdomains which have appeared most frequently along withtheir respective URLs

Before we can begin seeking subnets with better mapping-groups it is first necessary to prove that subnets with dif-ferent mapping-groups exist and are easy to find We defineusable route length as the number of hops along the path thatfulfill the above filtering Next we define divergence as thefraction of usable hops along the path which were recom-mended at least one replica that was not recommended tothe clientFigure 2 depicts usable route length and divergence for

different CDNs The figure shows that for example when re-questing replicas from Google each client had an average us-able route length of 78 and the divergence is approximately92 It is evident from Figure 2 that hops are indeed sug-gested different replicas than their client As we encounter awider variety of recommendations we increase our numberof opportunities to find latency valleys The high divergence

Goo

gle

Clo

udFr

ont

Alib

aba

CD

Net

wor

ks

Chi

naN

etCtr

Cub

eCD

N

0 0

0 2

0 4

0 6

0 8

div

erg

ence

mean divergence mean usable route length

0

2

4

6

8

10

12

14

usa

ble

route

length

Figure 2 Average divergence and average usable routelength per CDN

shown in Figure 2 indicates that other and possibly lower-latency replicas can be mapped to clients if they were toassimilate their hopsrsquo IP addresses

For the remainder of this paper we use the following set ofproviders Google Amazon CloudFront Alibaba ChinaNet-Center CDNetworks and CubeCDN a set which is bothdiverse and comprehensive Googlersquos CDN infrastructureis massive and dispersed as of 2013 around 30000 CDN IPaddresses spread across over 700 ASes were observable inone day [25] As of this writing Amazon CloudFront offeredover 50 points-of-presence spread throughout the world[6] CDNetworks offers over 200 points-of-presence alsoglobally dispersed and heavily employing anycast for serverselection Alibaba and ChinaNetCenter offer over 500 CDNnode locations each within China where they are centeredin addition to a growing number of service locations outsideof China [3 30] Finally CubeCDN is a smaller CDN withlocations spread primarily across Turkey [14]

312 Test Execution In order to assert the existence oflatency valleys in the wild we have executed a series of trialsusing the aforementioned URLs and the PlanetLab nodes asclients Each trial consists of the following steps

(1) We randomly select a URL(2) Client retrieves CR-set for the selected URLrsquos domain(3) For each CR client uses traceroute to identify hops(4) Using subnet assimilation client retrieves the HR-set

for each hop(5) Client measures CRMs for every CR in its CR-set and

HRMs for every HR it has seen (across all HR-setsobtained in that trial)

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

Figure 3 Scatter plot of HRMs and minimum CRMsThe area below the diagonal is the valley region

We have performed a series of 45 trials per PlanetLabnode executed between one and two hours apart For theremainder of this Section we refer to this dataset

32 Valley PrevalenceIn order to establish the prevalence and significance of la-tency valleys we now compare HRMs to their respectiveCRMs It is important to note that it is possible that a clienthas multipleCRs and hence multipleCRMs and in practicea client can only choose one replica Therefore we compareall HRMs to the minimum CRM obtained in their trial thusestablishing a lower bound for our expected performancegain and allowing us to easily assert any gains or lossesprovided by the alternative replica choices (the HRs)With our PlanetLab dataset we compared each HRM to

the minimum CRM ie the best client replica obtained inthe same respective trial Figure 3 shows the results of thisassessment We plot HRM on the y axis and minimum CRMon the x axis thus creating a visual representation of thelatency ratio We distinguish between the CDNs using differ-ent colors To better explain the results we draw the equalityline where HRM = minimum CRM Every data point alongthis line represents a hoprsquos subnet which is being suggesteda replica that performs on a par with the replicas suggested

to the clientrsquos subnet More importantly every data pointbelow the equality line represents a valley occurrence as thealternative replicarsquos HRM is smaller than the CRM The per-centage of valleys by provider ranges from 1402 (Cloud-Front) to 3858 (CubeCDN) with an average of 22 acrossall providers Table 1 shows the results ie Column 2 (Valleys Overall) lists these percentages for each CDN

Now that we have established that valleys exist we wishto determine where we can expect to find valleys We con-tinue using the minimum CRM from each trial for reasonsdescribed above However in order to better reflect prac-tical scenarios where only one of a given set of replicas isused we now also begin choosing an individual replica froman HR-set Instead of choosing the minimum HRM for agiven hop we conservatively choose the median Each hoprsquoschosen HRM is the median of its HR-set for that trial Wecontinue this pattern (choosing the clientrsquos best CRM fromthe trial and a hoprsquos median HRM) for the remainder of ourPlanetLab data analysis Our PlanetLab data thus representsa lower bound on valleys and their performance implicationswe compare the median performance of our proposed proce-dure to the absolute best the existing methods have to offerIn Section 5 using our RIPE Atlas testbed we remove thisconstraint to demonstrate the real-world performance of oursystemWe further wish to consider how common valleys are

within a given route from the client to the provider Column 3(Average of Valleys per Route) of Table 1 shows given someroute in one trial the average percent of usable hops thatincur valleysWe see for Alibaba that on average over a thirdof the usable hops in a given route are likely to incur valleysand for CDNetworks nearly a fourth of the route Meanwhilefor CloudFront we see that on average valleys occur for onlya small portion of hops in a given route Column 4 ( Routeswith a Valley) details the percentage of all observed routesthat contained at least one valley in the trial in which theywere observed For Alibaba and CDNetworks around 75of the observed routes contain valleys while more than 50for Google

321 Do Valleys Persist Next we assess whether subnetsare persistently valley-prone For some number of trialshow often can the client expect a valley to occur from someparticular hop To answer this we need to be able to describehow frequently valleys occurred for some hop subnet in agiven set of trials

Valley frequency We define a new simple metric thevalley frequency as follows if we look at a hop-client pairacross a set of N trials and v trials are valleys the valleyfrequency vf is

vf =v

N

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Table 1 Detailed findings for each provider based on PlanetLab data

Provider ValleyOverall

Avg Valleysper Route

Routeswith Valley

Hop-Client Pairsw Valley Freq gt05

Google 2024 1641 5330 1098Amazon CloudFront 1402 872 2582 1000Alibaba 3368 3594 7583 3097CDNetworks 1561 2441 7308 1409ChinaNetCenter 2742 1426 3810 1674CubeCDN 3858 1795 2549 2632

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(a) ping

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

sGoogle

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(b) total download time

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(c) total post-caching download time

Figure 4 CDFs of clients by valley frequency In 4a the subnet-response measurements are ping times averagedfrom bursts of 3 back to back pings In 4b the subnet-response measurements are total download times on firstattempts while in 4c the subnet-response measurements are total download times on consecutive attempts (re-peated downloads that take place immediately after first attempt to account for the potential impact of caching)Downloads were performed by using curl where we set an IP as a destination and set the domain as the HOSTattribute Measurements for 4b and 4c were performed back-to-back so that 4c reflects download times with pre-sumably primed caches

A frequency of 05 would imply that in half the trials per-formed a valley was found in said hop For each providerwe plot the valley frequency for each hop-client pair acrossour PlanetLab dataset

Figure 4 shows the results Note that Figures 4b and 4c usethe total download time and the post-caching total downloadtime respectively for the subnet measurements as opposedto pings2 We further note that their results closely follow thoseobtained using pings as in Figure 4a For simplicity and dueto download limitations of our RIPE testbed we revert toonly using pings for the remainder of this paperFigure 4 shows that approximately 5-20 of hop-client

pairs resulted in valleys 100 of the time (see y-axis in thefigures for x=10) for every trial across the entire three daytest period Such hops will be easy to find given a largeenough dataset However of more interest to us are hop-client pairs that inconsistently incur valleys For exampleperhaps if valleys can be found 50 of the time for some hop-client pair mdash ie a valley frequency of 05 mdash that hop may bea sufficiently persistent producer of valleys for us to reliablyexpect good replica choices This would provide our system2We fetched png and js files 1kB ndash 1MB large hosted at the CDNs

with more opportunities to employ subnet assimilation andideally improve performance Column 5 ( Hop-Client Pairswith Valley Frequency gt 05) of Table 1 shows the percentageof hop-client pairs that have valley frequencies (calculatedacross all 45 trials) greater than 05 ie hop-client pairs thatincurred valleys in the majority of their trials

322 Can We Predict Valleys While valley frequencycan tell us how often valleys occurred on a past dataset themetric makes no strong implications about what we canexpect to see for a future dataset To answer this let us con-sider how the latency ratio changes as we increase the timepassed between the trials we compare In addition per ourreasoning in the vf metric we may want to compare a set ofconsecutive trials mdash a window mdash to another consecutive setof the same size rather than only comparing individual trials(which is essentially comparing windows of size 1) Given aset of trials we take a window of size N and compare it toall other windows of size N including those that overlap bytaking the difference in their latency ratios (HRMCRM) Ifwe plot this against the time distance between the windowswe can observe the trend in latency ratios as the distance in

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(a) All client-hop pairs

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(b) Valley at least once

Figure 5 In both figures we compare the change in latency ratio between two trial windows to the distance intime between the two windows In figure a we use all hop client pairs In figure b we restrict our set to pairs thatexperience at least one valley in at least one of their 45 trials

time between windows increases An upward slope wouldimply that the latency ratio is drifting farther apart as thetime gap increases while a horizontal line would imply thatthe change in latency ratio is not varying with time (ie win-dows an hour apart are as similar to each other as windowsdays apart) A jagged or wildly varying line would indicatethat future behavior is unpredictable

Figure 5 plots the latency ratio versus distance in time forseveral window sizes ranging from 1 to 15 In these plots weuse the windowsrsquo median latency ratio values for compari-son In particular in Figure 5a we use our entire dataset ieconsider all hop-client pairs independently of if they ever in-cur latency valleys or not and plot latency ratio differencesover time For example assume that for a client-hop pairHRM = 80 ms at one measurement point while it rises to120 ms at another measurement point Meanwhile assumethatCRM remains 100 ms at both measurement points Thusthe latency ratioHRMCRM changes from 80100 to 120100ie from 08 to 12 The latency ratio difference is 04 Fig-ure 5a shows that the latency ratio difference values bothincrease and vary wildly as windows become more distantThis shows that hop-subnet performance overall is likelyextremely unpredictable We hypothesize this results fromunmapped subnets receiving generic answers as observedin [47] However subnet assimilation requires that we havesome idea of how a subnet will perform in advance

Since valley-prone subnets are of particular interest to usin Figure 5b we reduce our dataset to only include client-hoppairs that have at least one valley occurrence across all 45 ofits trials combined The plotrsquos behavior becomes dramaticallymore stable after applying this simple minor constraint Theeffects of the window size also become apparent in Figure5b Even with a window of size 1 the slope is very small andthe line is nearly flat after over 50 hours two windows are

only 10 to 20 more dissimilar than two windows a singlehour apart For window sizes greater than 5 latency ratiosare often within 5 of each other regardless of their distancein time Going from a window size of 1 to 5 shows drasticimprovement while each following increase in window sizeshows a smaller impact The results clearly show that a clientcan effectively identify valley-prone subnets and predictvalleys with a sufficiently long window size

323 Valley Utility Analysis Figure 6 shows the distribu-tion of the lower bound latency ratios seen by the set of allvalley occurrences for each provider The latency ratio rep-resents a lower bound because we use the minimum latencymeasure for replicas recommended to the client For exampleif the minimum CRM is 100 ms and HRM is 80 ms then thelatency ratio is 08 Thus the closer to 0 the latency ratiosare the larger the gain from subnet assimilation The red lineshows median the box bounds the 25th and 75th percentilesand the whiskers extend up to data points 15 times theinterquartile-range (75th percentile - 25th percentile) abovethe 75th percentile and below the 25th percentile Data pointsbeyond the whiskers if they exist are considered outliersand not shownFigure 6 shows that most of the providers show oppor-

tunities for significant latency reduction From this plot itappears that Amazon CloudFront and ChinaNetCenter offerthe greatest potential for gains With the exception of CD-Networks we see all of our providers have 25th percentilesnear or below a latency ratio of 08 a 20 performance gainWe also observe the diversity of valley ldquodepth For exam-ple we see in Figure 6 that ChinaNetCenterrsquos interquartilerange spans over 40 of the possible value space (between 0and 1) With such a wide variety of gains for 50 of its val-leys it opens the door to being more selective Rather thansimply chasing all valleys we could set strict requirements

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

0 2

0 4

0 6

0 8

1 0

late

ncy

rati

o

Figure 6 Lower bound of latency ratio of all valley oc-currences

tightening our definition of whatrsquos ldquogood enough for subnetassimilation as we evaluate in Section 51

CDNetworksrsquo performance is more tightly bounded as itsinterquartile range covers less than 10 of the value spaceand sits very near to 1 This implies CDNetworks probablyoffers very little opportunity for our technique to improve itsreplica selection We hypothesize that this is a byproduct ofCDNetworks anycast implementation for replica selectionanycast depends more on routing and network propertiesthan DNS and IP selection [27] In addition Googlersquos medianand 75th percentile latency ratios sit very near 10 Howeverby being more selective as described above we may be ableto pursue better valleys in the lower quartiles below themedian We could potentially improve our systemrsquos valley-prone hop selection by filtering out ldquoshallow valleys fromour consideration We demonstrate and discuss the effectsof different levels of selectiveness in Section 51

4 DRONGO SYSTEM OVERVIEWWe introduce Drongo a client-side system that employs sub-net assimilation to speed up CDNs Drongo sits on top of aclientrsquos DNS system In a full deployment Drongo will beinstalled as a LDNS proxy on the client machine where itwould have easy access to the ECS option and DNS responsesit needs to perform trials Drongo is set by the client as itsdefault local DNS resolver and acts as a middle party re-shaping outgoing DNS messages via subnet assimilation andstoring data from incoming DNS responses In our currentimplementation Drongo uses Googlersquos public DNS serviceat 8888 to complete DNS queriesUpon reception of an outgoing query from the client

Drongo must decide whether to use the clientrsquos own subnetor to perform subnet assimilation with some known alterna-tive subnet If Drongo has sufficient data for that subnet incombination with that domain it makes a decision whetheror not to use that subnet for the name resolution If Drongolacks sufficient data it issues an ordinary query (using theclientrsquos own subnet for ECS)

41 Window SizeWe now face the question What is a sufficient amount ofdata needed by Drongo to ensure quality subnet assimila-tion decisions We choose to measure the data ldquoquantityby the number of trials for some subnet where the subnetis obtained via traceroutes performed during times whenthe clientrsquos network was idle A sufficient quantity must beenough trials to fill some predetermined window size As weobserved in Section 321 the marginal benefit of increasingthe window size decreases with each additional trial To keepstorage needs low while also obtaining most of the benefitof a larger window we set Drongorsquos window size at 5

42 Data CollectionDrongo must execute trials defined in Section 312 in orderto fill its window and collect sufficient data As demonstratedin Figure 5b when these trials occur is of little to no signifi-cance In our experiments we perform our trials at randomlysampled intervals our trial spacing varies from minutes todays with a tendency toward being near an hour apart Thissporadic spacing parallels the variety of timings we expect tohappen on a real client the client may be online at randomunpredictable times for unpredictable lengths of time

43 Decision MakingHere we detail Drongorsquos logic assuming it has sufficient data(a full window) with which to make a decision about a partic-ular subnet For some domain Drongo must decide whethera subnet is sufficiently valley-prone for subnet assimilationIf so Drongo will use that subnet for the DNS query if notDrongo will use the clientrsquos own subnet From Figure 5b weknow we can anticipate that future behavior will resemblewhat Drongo sees in its current window if Drongo has seenat least one valley occurrence for the domain of interest fromthe subnet under consideration However in Figure 6 wesee that many valleys offer negligible performance gainswhich might not outweigh the performance risk imposedby subnet assimilation To avoid these potentially high riskhops Drongo may benefit from a more selective system thatrequires a high valley frequency in the training window toallow subnet assimilation We explore the effects of changingthe vf parameter in Section 5

It is possible that for a single domain multiple hop subnetsmay qualify as sufficiently valley-prone for subnet assimi-lation When this occurs Drongo must attempt to choosethe best performing from the set of the qualified hops Todo this Drongo selects the hop subnet with the highest val-ley frequency in its training window in the event of a tieDrongo chooses randomly

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

02 04 06 08 10valley threshold

095

100

105

110

115

120

125

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 7 Average latency ratio of overall system as wevary vf and vt

5 DRONGO EVALUATIONHere we evaluate Drongorsquos ability to intelligently choosewell-mapped hop subnets for subnet assimilation and theperformance gains imposed on clients where subnet assimi-lation is appliedUsing the RIPE Atlas platform [17] we collect a trace of

429 probes spread across 177 countries for 1 month In ourexperiments we use the first portion of a clientrsquos trials as atraining window Following the completion of the trainingwindow we use the remaining trials to test Drongorsquos abilityto select good hops for real time subnet assimilation Eachclient performed 10 trials per provider trials 0 through 4to form its training window and trials 5 through 9 to testthe quality of Drongorsquos decisions In our evaluation Drongoalways selects the first CR from a CR-set and the first HRfrom a HR-set mirroring real world client behavior mdash noreal-time on-the-fly measurements are conducted and alldecisions are based on the existing window

51 Parameter SelectionFigure 7 shows the effects of Drongo on our entire RIPEtracersquos average latency ratio ie we consider all requestsgenerated by clients including those that arenrsquot affected byDrongo The figure plots average latency ratio (shown onthe y axis) as a function of the valley threshold vt shownon the x axis The valley threshold introduced in Section 23determines maximum latency ratio a hop-client pair musthave to classify as a valley occurrence For example whenthe threshold equals 10 all latency valleys are consideredon the other hand when vt is 06 Drongo triggers clientassimilation only for latency valleys that have promise toimprove performance by more than 40 ie latency ratiosmaller than 06 The figure shows 5 curves each represent-ing a different valley frequency parameter varied between02 and 10

Figure 7 shows the average results across all tested clientsWe make several insights If the valley frequency is small

02 04 06 08 10valley threshold

05

10

15

20

25

30

35

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 8 Average latency ratio of cases where subnetassimilation was performed

eg 02 Drongo will unselectively trigger subnet assimila-tion for all valleys that have frequency equal to or largerthan 02 which requires only one valley occurrence for ourwindow size of 5 Conversely as the minimum vf parame-ter increases overall system performance improves mdash thelatency ratio drops below 10 Thus the valley frequency isa strong indicator of valley-proneness further supportingour prior findings from Figure 5b

Meanwhile two more characteristics stand out as we varyvt First the valley frequency parameter can completely al-ter the slope of the latency ratio plotted against vt Thisbehavior echoes the observation we made in the previousparagraph with extremely loose requirements Drongo doesnot sufficiently filter out poor-performing subnets from itsset of candidates Intersetingly with a strictvf (closer to 10)the slope changes and the average latency ratio decreasesas we raise the valley threshold This is because if Drongois too strict it filters out too many potential candidates forsubnet assimilation Second we see that the curve eventuallybends upward with high vt values indicating that even thevt can be too lenient The results show that the minimumaverage latency ratio of 09482 (y-axis) is achieved for thevalley threshold of 095 (x-axis) Thus with vf == 10 andvt == 095 Drongo produces its maximum performancegains averaging 518 (10 - 09482) across all of our testedclients By balancing these parameters Drongo filters outsubnets that offer the least benefit subnets where valleysseldom occur and subnets where valleys tend to be shallow

Figure 8 plots the average latency ratio in the samemanneras Figure 7 yet only for queries where Drongo applied sub-net assimilation First the figure again confirms that lowervalley frequency parameters degrade Drongorsquos performanceSecond paired with our knowledge of Figure 7 it becomesclear that as valley threshold decreases the number of la-tency valleys shrinks while the quality of the remainingvalleys becomes more potent This is why the latency ratiodecreases as vt decreases However if the threshold is too

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

02 04 06 08 10valley threshold

00

02

04

06

08

fract

ion

clie

nts

affe

cted

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 9 Percentage of clients where subnet assimila-tion was performed

02 04 06 08 10valley threshold

085

090

095

100

late

ncy

ratio

Google(08)CloudFront(08)

Alibaba(04)CDNetworks(10)

ChinaNetCtr(06)CubeCDN(10)

Figure 10 Per-provider system performance for allqueries Optimal vf is set for each provider and notedin parentheses

small the number of available valleys becomes so small thatthe performance becomes unpredictable ie outlier behaviorcan dominate causing the spike in average latency ratios forvalley thresholds under 02

Finally Figure 9 plots the percent of clients for whichDrongo performed subnet assimilation for at least one providerNecessarily the less strict the frequency constraint is (vf ge

02 is the least strict constraint) the more frequently Drongoacts As shown in Figure 9 6993 of our clients were affectedby Drongo with vf and vt set at the the peak aggregate per-formance values (10 and 095 respectively) found above

Summary In this section we selected parametersvf = 1and vt = 095 where Drongo reaches its peak aggregateperformance gains of 518

52 Per-Provider PerformanceIn the previous subsection we used a constant parameter setacross all six providers in our analysis In this section weanalyze Drongo on a per-provider basis By choosing an op-timal parameter for each provider as we do below Drongorsquosaggregate performance gains increases to 585 In Figure 10

Google

(0807)CloudFront

(0808) Alibaba

(0409)

CDNetworks

(1009)ChinaNetCtr

(06085)CubeCDN

(10095)

02040608101214

late

ncy

ratio

Figure 11 Per-provider system performance forqueries where subnet assimilation was applied Paren-theses contain optimal values for each respectiveprovider formatted (vf vt )

we show how Drongo impacts the average performance ofeach provider while setting valley frequency to each individ-ual providerrsquos respective optimal value (noted in parenthesesunder each provider name) Note that for most providersthe individual optimal valley frequency lies near the valueobtained in Section 51 (10) While the intricacies of eachproviderrsquos behavior clearly hinge on opaque characteristicsof their respective networks we offer several explanations forthe observed performance CDNetworks which sees smallgains across all valley threshold values further validates thehypothesis proposed in Section 323 regarding anycast net-works Meanwhile Google the largest provider of our seton a global scale also experienced the second highest peakaverage gains from Drongo is not well served by Google itis likely that there exist nearby subnets being directed to dif-ferent replicas In other words Drongo has great opportunityfor improving CDNs that use fine grained subnet mapping

Finally Figure 11 showsDrongorsquos performance per-providerexclusively in scenarios when it applies subnet assimilationComparing to the results shown in Figure 6 for the PlanetLab experiments we observe significant differences Most no-tably the latency valley ratios are much smaller in Figure 11than in Figure 6 For example the Googlersquos median latencyratio is close to 10 in Figure 6 indicating insigificant gainsOn the contrary Figure 11 Googlersquos median latency ratio isaround 05 implying gains of 50 (10 minus 05) and up to anorder of magnitude in edge cases Considering all providersDrongo-influenced replica selections are on average 2489better performing than Drongo-free selectionsThere are three reasons for this behavior First Figure 6

shows the lower-bound system performance for PlanetLab asthe best CR is always used for comparison Such a restrictionis not imposed in the the RIPE Atlas scenario Second theRIPE set is more diverse and includes widely distributed end-points thus allowing CDNs greater opportunity for subnetmapping error Third contrary to the Planet Lab scenario

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

where we used all the valleys under the valley threshold of10 here we use optimal vt values such that Drongo suc-cessfully filters out most ldquoshallow valleys that would diluteperformance gains

6 RELATEDWORKGiven a large number of CDNs with vastly different deploy-ment and performance in different territories CDN brokeringhas emerged as a way to improve user-perceived CDNsrsquo per-formance eg [7 34] By monitoring the performance ofmultiple CDNs brokers are capable of determining whichparticular CDN works better for a particular client at a pointin time Necessarily this approach requires content providersto contract with multiple CDNs to host and distribute theircontent Contrary to CDN brokering Drongo manages toimprove performance of each individual CDN Still it workscompletely independently from CDNs and it is readily de-ployable on the clients Moreover Drongo is completelycompatible with CDN brokering since it in principle hasthe capacity to improve the performance of whichever CDNis selected by a broker

There has been tremendous effort expended in the ongoingbattle to make web pages load faster improve streamingperformance etc As a result many advanced client-basedprotocols and systems have emerged from both industry(eg QUIC SPDY and HTTP2) and academia [24 41 50] inthe Web domain and likewise in streaming eg [37] WhileDrongo is also a client-based system it is generic and helpsspeed-up all CDN-hosted content By reducing the CDNlatency experienced by end users it systematically improvesall the above systems and protocols which in turn helps allassociated applicationsRecent work has demonstrated that injecting fake infor-

mation can help protocols achieve better performance Oneexample is routing [49] where fake nodes and links are intro-duced into an underlying linkstate routing protocol so thatrouters compute their own forwarding tables based on theaugmented topology Similar ideas are shown useful in thecontext of traffic engineering [29] where robust and efficientnetwork utilization is accomplished via carefully dissemi-nated bogus information (ldquolies) While similar in spirit tothese approaches Drongo operates in a completely differentdomain aiming to address the CDN pitfalls In addition tousing ldquofake (source subnet) information in order to improveCDN decisions Drongo also invests efforts in discoveringvalley-prone subnets and determining conditions when usingthem is beneficial

7 DISCUSSIONA mass deployment of Drongo is non-trivial and carrieswith it some complexities that deserve careful consideration

There is some concern that maintainence of CDN allocationpolicies may still be compromised despite our efforts to re-spect their mechanisms in Drongorsquos design We propose thata broadly deployed form of Drongo could be carefully tunedto effect only the most extreme cases Figure 11 shows thatsubnet assimilation carries some risk of a loss in performanceso it is in clientsrsquo best interests to apply it conservativelyFirst we know from Figure 9 that the number of clients us-ing assimilated subnets can be significantly throttled withstrict parameteres Further in todayrsquos Internet many clientsare already free to choose arbitrary LDNS servers ([25 36])Many of these servers do not make use of the client subnetoption at all thus rendering the nature of their side-effects mdashpotentially disrupting policies enforced only in DNS mdash equiv-alent to that of Drongo It is difficult to say whether or notDrongorsquos ultimate impact on CDN policies would be anymore sigificant than the presence of such clients

Mass adoption also raises scalability concerns particularlyregarding the amount of measurement traffic being sent intothe network To keep the number of measurements smallwhile ensuring their freshness a distributed peer-to-peercomponent where clients in the same subnet share trial datacould be incorporated into Drongorsquos design We leave thismodification for future work

8 CONCLUSIONSIn this paper we proposed the first approach that enablesclients to actively measure CDNs and effectively improvetheir replica selection decisions without requiring any changesto the CDNs and while respecting CDNsrsquo policies We haveintroduced and explored latency valleys scenarios wherereplicas suggested to upstream subnets outperform thoseprovided to a clientrsquos own subnet We have asserted thatlatency valleys are common phenomena and we have foundthem across all the CDNs we have tested in 26-76 of routesWe showed that valley-prone upstream subnets are easilyfound from the client are simple to identify with few andinfrequent measurements and once found are persistentover timescales of daysWe have introduced Drongo a client-side system that

leverages the performance gains of latency valleys by iden-tifying valley-prone subnets Our measurements show thatDrongo can improve requestsrsquo latency by up to an orderof magnitude Moreover Drongo achieves this with excep-tionally low overhead a mere 5 measurements suffice fortimescales of days Using experimentally derived parame-ters Drongo affects the replica selection of requests made by6993 of clients improving their latency by 2489 in themedian case Drongorsquos significant impact on these requeststranslates into an overall improvement in client-perceivedaggregate CDN performance

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 4: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

CS1

S2H1 H2 H3 H4

100ms

0ms

latency

Figure 1 Illustration of a latency valley

to the same server S1 then it is possible that a lower-latencyreplica eg S2 has been observed such that the latencybetween C and S2 is smaller than the latency between C andS1 We refer to such a scenario as a latency valley

For the remainder of this paper we refer to the set ofreplicas suggested to the clientrsquos actual subnet as the clientreplica set (CR-set) and a replica from the CR-set as a clientreplica (CR) We refer to the routers along the presumedpath from the client to any given CR as hops Next we referto the set of replicas suggested to a hop discoverable viasubnet assimilation as a hop replica set (HR-set) We denotea measurement from the client to a CR as a client replicameasurement (CRM) Likewise we denote a replica fromthe HR-set as a hop replica (HR) and a measurement to thatreplica from the client as a hop replica measurement (HRM)Finally we define a latency valley to be any occurrence of thefollowing inequality HRMCRM lt vt le 1 We take vt = 1until we optimize our parameter selection in Section 51It is imperative to note that our goal is not to find lower-

latency replicas perhaps included in an HR-set Instead weaim to find subnets to which lower-latency replicas are con-sistently suggested in other words subnets that are proneto valley-occurrences in replica sets obtained at any time

24 Identifying Latency ValleysIn order to identify latency valleys we first identify thehops along the path which we achieve using traceroute Weare using the upstream path for simplicity the downstream

path which may be assymetric could yield different resultsbut would require more overhead or control of the CDNfor execution We then identify the corresponding HR-setfor each hop using ECS queries which assimilate the hopsrsquosubnets Lastly we must compare the HRMs to the CRM Unless otherwise noted we calculate HRMs and CRMs us-ing ping RTTs (obtained by averaging the results of threeback-to-back pings) and we refer to these values as laten-cies Throughout this paper latency units are millisecondsFor simplicity we refer to the ratio HRMCRM used in ourlatency valley definition as the latency ratio A latency ratiobelow 1 indicates that the HR has outperformed the CR alatency ratio above one indicates the CR has outperformedthe HR and finally a latency ratio equal to 1 indicates thatthe CR and HR performed equally (which in general onlyhappens when CR and HR refer to the same replica)

The above operation cannot be executed on-the-fly as thetime consumed by this process will outweigh any latencygains achieved Moreover multiple measurements might berequired Fortunately our experiments in Section 3 showthat a small number of measurements per hop mdash less than10 mdash are required and that the measurement results remainapplicable on timescales of days Because we can rely onmea-surements obtained in idle time real-time measurements arenot necessary for our system Thus our proposed techniquewill use past measurements to predictively choose a goodsubnet for future assimilation

3 EXPLORING VALLEYSIn this section we establish that latency valleys are commonphenomena in the wild and that they offer substantial perfor-mance gains In addition we lay the groundwork for findingvalleys a prerequisite to harnessing their performance gainsvia subnet assimilation

31 Testing for ValleysWe perform our preliminary tests using PlanetLab nodesa platform which offers a large number of vantage pointsfrom around the globe primarily deployed in academic in-stitutions [31] We further investigate latency valleys as ex-perienced by a large variety of clients using the RIPE Atlasplatform [17] as detailed in Section 5 While PlanetLab lacksthe scale and variety offered by RIPE Atlas the flexibilityand freedom it provides made it a prime choice for our pre-liminary analysis Our PlanetLab experiments use 95 nodesspread across 51 test locations andwe obtain the IP addressesof hops between nodes and CDN replicas via tracerouteWhile latency valleys are a common occurrence as we

demonstrate below many hops between the clients and theproviderrsquos servers can be discarded when searching for val-leys One category of such hops is of those which reside

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

within the same local network as the client which will besuggested the same replicas as the client Another disposablecategory mdash hops with private IP addresses mdash will not berecognized by the EDNS server and will yield generic replicachoices regardless of the hopsrsquo locations To ensure that ahop is indeed usable a hop must (i) belong to a different 16subnet than the client (ii) have a different domain than theclient and (iii) belong to a different ASN than the client Weonly filter hops that fail these conditions at the beginning ofthe route once a hop is observed that meets the above threeconstraints we stop filtering for the remainder of the route

311 Provider Selection While ECS is gaining momen-tum among CDNs as it allows them to better estimate theirclientsrsquo location it is important to note that ECS was onlyrecently deployed and its implementation varies among thedifferent providers ldquoA Faster Internet an initiative headedby Google and OpenDNS is drawing together a growingnumber of large CDNs and content providers to adopt andpromote the adoption of ECS [42] In order to attain a set ofCDNs for our tests we have selected those providers whichimplement an unrestricted form of ECS as described in Sec-tion 22In order to best select the CDNs for our tests we scraped

URLs from over 3000 sites arbitrarily selected from the AlexaTop 1M list [2] Then we reduced our set of URLs to thosethat ended in a known file type in order to ensure our abil-ity to perform download tests on the selected URLs Whenapplicable we resolved CNAME domains to their respectiveCDN domains We then performed multiple DNS queries foreach URL to determine if unrestricted ECS is implementedFrom the remaining URLs we have extracted the 6 CDNdomains which have appeared most frequently along withtheir respective URLs

Before we can begin seeking subnets with better mapping-groups it is first necessary to prove that subnets with dif-ferent mapping-groups exist and are easy to find We defineusable route length as the number of hops along the path thatfulfill the above filtering Next we define divergence as thefraction of usable hops along the path which were recom-mended at least one replica that was not recommended tothe clientFigure 2 depicts usable route length and divergence for

different CDNs The figure shows that for example when re-questing replicas from Google each client had an average us-able route length of 78 and the divergence is approximately92 It is evident from Figure 2 that hops are indeed sug-gested different replicas than their client As we encounter awider variety of recommendations we increase our numberof opportunities to find latency valleys The high divergence

Goo

gle

Clo

udFr

ont

Alib

aba

CD

Net

wor

ks

Chi

naN

etCtr

Cub

eCD

N

0 0

0 2

0 4

0 6

0 8

div

erg

ence

mean divergence mean usable route length

0

2

4

6

8

10

12

14

usa

ble

route

length

Figure 2 Average divergence and average usable routelength per CDN

shown in Figure 2 indicates that other and possibly lower-latency replicas can be mapped to clients if they were toassimilate their hopsrsquo IP addresses

For the remainder of this paper we use the following set ofproviders Google Amazon CloudFront Alibaba ChinaNet-Center CDNetworks and CubeCDN a set which is bothdiverse and comprehensive Googlersquos CDN infrastructureis massive and dispersed as of 2013 around 30000 CDN IPaddresses spread across over 700 ASes were observable inone day [25] As of this writing Amazon CloudFront offeredover 50 points-of-presence spread throughout the world[6] CDNetworks offers over 200 points-of-presence alsoglobally dispersed and heavily employing anycast for serverselection Alibaba and ChinaNetCenter offer over 500 CDNnode locations each within China where they are centeredin addition to a growing number of service locations outsideof China [3 30] Finally CubeCDN is a smaller CDN withlocations spread primarily across Turkey [14]

312 Test Execution In order to assert the existence oflatency valleys in the wild we have executed a series of trialsusing the aforementioned URLs and the PlanetLab nodes asclients Each trial consists of the following steps

(1) We randomly select a URL(2) Client retrieves CR-set for the selected URLrsquos domain(3) For each CR client uses traceroute to identify hops(4) Using subnet assimilation client retrieves the HR-set

for each hop(5) Client measures CRMs for every CR in its CR-set and

HRMs for every HR it has seen (across all HR-setsobtained in that trial)

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

Figure 3 Scatter plot of HRMs and minimum CRMsThe area below the diagonal is the valley region

We have performed a series of 45 trials per PlanetLabnode executed between one and two hours apart For theremainder of this Section we refer to this dataset

32 Valley PrevalenceIn order to establish the prevalence and significance of la-tency valleys we now compare HRMs to their respectiveCRMs It is important to note that it is possible that a clienthas multipleCRs and hence multipleCRMs and in practicea client can only choose one replica Therefore we compareall HRMs to the minimum CRM obtained in their trial thusestablishing a lower bound for our expected performancegain and allowing us to easily assert any gains or lossesprovided by the alternative replica choices (the HRs)With our PlanetLab dataset we compared each HRM to

the minimum CRM ie the best client replica obtained inthe same respective trial Figure 3 shows the results of thisassessment We plot HRM on the y axis and minimum CRMon the x axis thus creating a visual representation of thelatency ratio We distinguish between the CDNs using differ-ent colors To better explain the results we draw the equalityline where HRM = minimum CRM Every data point alongthis line represents a hoprsquos subnet which is being suggesteda replica that performs on a par with the replicas suggested

to the clientrsquos subnet More importantly every data pointbelow the equality line represents a valley occurrence as thealternative replicarsquos HRM is smaller than the CRM The per-centage of valleys by provider ranges from 1402 (Cloud-Front) to 3858 (CubeCDN) with an average of 22 acrossall providers Table 1 shows the results ie Column 2 (Valleys Overall) lists these percentages for each CDN

Now that we have established that valleys exist we wishto determine where we can expect to find valleys We con-tinue using the minimum CRM from each trial for reasonsdescribed above However in order to better reflect prac-tical scenarios where only one of a given set of replicas isused we now also begin choosing an individual replica froman HR-set Instead of choosing the minimum HRM for agiven hop we conservatively choose the median Each hoprsquoschosen HRM is the median of its HR-set for that trial Wecontinue this pattern (choosing the clientrsquos best CRM fromthe trial and a hoprsquos median HRM) for the remainder of ourPlanetLab data analysis Our PlanetLab data thus representsa lower bound on valleys and their performance implicationswe compare the median performance of our proposed proce-dure to the absolute best the existing methods have to offerIn Section 5 using our RIPE Atlas testbed we remove thisconstraint to demonstrate the real-world performance of oursystemWe further wish to consider how common valleys are

within a given route from the client to the provider Column 3(Average of Valleys per Route) of Table 1 shows given someroute in one trial the average percent of usable hops thatincur valleysWe see for Alibaba that on average over a thirdof the usable hops in a given route are likely to incur valleysand for CDNetworks nearly a fourth of the route Meanwhilefor CloudFront we see that on average valleys occur for onlya small portion of hops in a given route Column 4 ( Routeswith a Valley) details the percentage of all observed routesthat contained at least one valley in the trial in which theywere observed For Alibaba and CDNetworks around 75of the observed routes contain valleys while more than 50for Google

321 Do Valleys Persist Next we assess whether subnetsare persistently valley-prone For some number of trialshow often can the client expect a valley to occur from someparticular hop To answer this we need to be able to describehow frequently valleys occurred for some hop subnet in agiven set of trials

Valley frequency We define a new simple metric thevalley frequency as follows if we look at a hop-client pairacross a set of N trials and v trials are valleys the valleyfrequency vf is

vf =v

N

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Table 1 Detailed findings for each provider based on PlanetLab data

Provider ValleyOverall

Avg Valleysper Route

Routeswith Valley

Hop-Client Pairsw Valley Freq gt05

Google 2024 1641 5330 1098Amazon CloudFront 1402 872 2582 1000Alibaba 3368 3594 7583 3097CDNetworks 1561 2441 7308 1409ChinaNetCenter 2742 1426 3810 1674CubeCDN 3858 1795 2549 2632

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(a) ping

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

sGoogle

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(b) total download time

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(c) total post-caching download time

Figure 4 CDFs of clients by valley frequency In 4a the subnet-response measurements are ping times averagedfrom bursts of 3 back to back pings In 4b the subnet-response measurements are total download times on firstattempts while in 4c the subnet-response measurements are total download times on consecutive attempts (re-peated downloads that take place immediately after first attempt to account for the potential impact of caching)Downloads were performed by using curl where we set an IP as a destination and set the domain as the HOSTattribute Measurements for 4b and 4c were performed back-to-back so that 4c reflects download times with pre-sumably primed caches

A frequency of 05 would imply that in half the trials per-formed a valley was found in said hop For each providerwe plot the valley frequency for each hop-client pair acrossour PlanetLab dataset

Figure 4 shows the results Note that Figures 4b and 4c usethe total download time and the post-caching total downloadtime respectively for the subnet measurements as opposedto pings2 We further note that their results closely follow thoseobtained using pings as in Figure 4a For simplicity and dueto download limitations of our RIPE testbed we revert toonly using pings for the remainder of this paperFigure 4 shows that approximately 5-20 of hop-client

pairs resulted in valleys 100 of the time (see y-axis in thefigures for x=10) for every trial across the entire three daytest period Such hops will be easy to find given a largeenough dataset However of more interest to us are hop-client pairs that inconsistently incur valleys For exampleperhaps if valleys can be found 50 of the time for some hop-client pair mdash ie a valley frequency of 05 mdash that hop may bea sufficiently persistent producer of valleys for us to reliablyexpect good replica choices This would provide our system2We fetched png and js files 1kB ndash 1MB large hosted at the CDNs

with more opportunities to employ subnet assimilation andideally improve performance Column 5 ( Hop-Client Pairswith Valley Frequency gt 05) of Table 1 shows the percentageof hop-client pairs that have valley frequencies (calculatedacross all 45 trials) greater than 05 ie hop-client pairs thatincurred valleys in the majority of their trials

322 Can We Predict Valleys While valley frequencycan tell us how often valleys occurred on a past dataset themetric makes no strong implications about what we canexpect to see for a future dataset To answer this let us con-sider how the latency ratio changes as we increase the timepassed between the trials we compare In addition per ourreasoning in the vf metric we may want to compare a set ofconsecutive trials mdash a window mdash to another consecutive setof the same size rather than only comparing individual trials(which is essentially comparing windows of size 1) Given aset of trials we take a window of size N and compare it toall other windows of size N including those that overlap bytaking the difference in their latency ratios (HRMCRM) Ifwe plot this against the time distance between the windowswe can observe the trend in latency ratios as the distance in

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(a) All client-hop pairs

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(b) Valley at least once

Figure 5 In both figures we compare the change in latency ratio between two trial windows to the distance intime between the two windows In figure a we use all hop client pairs In figure b we restrict our set to pairs thatexperience at least one valley in at least one of their 45 trials

time between windows increases An upward slope wouldimply that the latency ratio is drifting farther apart as thetime gap increases while a horizontal line would imply thatthe change in latency ratio is not varying with time (ie win-dows an hour apart are as similar to each other as windowsdays apart) A jagged or wildly varying line would indicatethat future behavior is unpredictable

Figure 5 plots the latency ratio versus distance in time forseveral window sizes ranging from 1 to 15 In these plots weuse the windowsrsquo median latency ratio values for compari-son In particular in Figure 5a we use our entire dataset ieconsider all hop-client pairs independently of if they ever in-cur latency valleys or not and plot latency ratio differencesover time For example assume that for a client-hop pairHRM = 80 ms at one measurement point while it rises to120 ms at another measurement point Meanwhile assumethatCRM remains 100 ms at both measurement points Thusthe latency ratioHRMCRM changes from 80100 to 120100ie from 08 to 12 The latency ratio difference is 04 Fig-ure 5a shows that the latency ratio difference values bothincrease and vary wildly as windows become more distantThis shows that hop-subnet performance overall is likelyextremely unpredictable We hypothesize this results fromunmapped subnets receiving generic answers as observedin [47] However subnet assimilation requires that we havesome idea of how a subnet will perform in advance

Since valley-prone subnets are of particular interest to usin Figure 5b we reduce our dataset to only include client-hoppairs that have at least one valley occurrence across all 45 ofits trials combined The plotrsquos behavior becomes dramaticallymore stable after applying this simple minor constraint Theeffects of the window size also become apparent in Figure5b Even with a window of size 1 the slope is very small andthe line is nearly flat after over 50 hours two windows are

only 10 to 20 more dissimilar than two windows a singlehour apart For window sizes greater than 5 latency ratiosare often within 5 of each other regardless of their distancein time Going from a window size of 1 to 5 shows drasticimprovement while each following increase in window sizeshows a smaller impact The results clearly show that a clientcan effectively identify valley-prone subnets and predictvalleys with a sufficiently long window size

323 Valley Utility Analysis Figure 6 shows the distribu-tion of the lower bound latency ratios seen by the set of allvalley occurrences for each provider The latency ratio rep-resents a lower bound because we use the minimum latencymeasure for replicas recommended to the client For exampleif the minimum CRM is 100 ms and HRM is 80 ms then thelatency ratio is 08 Thus the closer to 0 the latency ratiosare the larger the gain from subnet assimilation The red lineshows median the box bounds the 25th and 75th percentilesand the whiskers extend up to data points 15 times theinterquartile-range (75th percentile - 25th percentile) abovethe 75th percentile and below the 25th percentile Data pointsbeyond the whiskers if they exist are considered outliersand not shownFigure 6 shows that most of the providers show oppor-

tunities for significant latency reduction From this plot itappears that Amazon CloudFront and ChinaNetCenter offerthe greatest potential for gains With the exception of CD-Networks we see all of our providers have 25th percentilesnear or below a latency ratio of 08 a 20 performance gainWe also observe the diversity of valley ldquodepth For exam-ple we see in Figure 6 that ChinaNetCenterrsquos interquartilerange spans over 40 of the possible value space (between 0and 1) With such a wide variety of gains for 50 of its val-leys it opens the door to being more selective Rather thansimply chasing all valleys we could set strict requirements

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

0 2

0 4

0 6

0 8

1 0

late

ncy

rati

o

Figure 6 Lower bound of latency ratio of all valley oc-currences

tightening our definition of whatrsquos ldquogood enough for subnetassimilation as we evaluate in Section 51

CDNetworksrsquo performance is more tightly bounded as itsinterquartile range covers less than 10 of the value spaceand sits very near to 1 This implies CDNetworks probablyoffers very little opportunity for our technique to improve itsreplica selection We hypothesize that this is a byproduct ofCDNetworks anycast implementation for replica selectionanycast depends more on routing and network propertiesthan DNS and IP selection [27] In addition Googlersquos medianand 75th percentile latency ratios sit very near 10 Howeverby being more selective as described above we may be ableto pursue better valleys in the lower quartiles below themedian We could potentially improve our systemrsquos valley-prone hop selection by filtering out ldquoshallow valleys fromour consideration We demonstrate and discuss the effectsof different levels of selectiveness in Section 51

4 DRONGO SYSTEM OVERVIEWWe introduce Drongo a client-side system that employs sub-net assimilation to speed up CDNs Drongo sits on top of aclientrsquos DNS system In a full deployment Drongo will beinstalled as a LDNS proxy on the client machine where itwould have easy access to the ECS option and DNS responsesit needs to perform trials Drongo is set by the client as itsdefault local DNS resolver and acts as a middle party re-shaping outgoing DNS messages via subnet assimilation andstoring data from incoming DNS responses In our currentimplementation Drongo uses Googlersquos public DNS serviceat 8888 to complete DNS queriesUpon reception of an outgoing query from the client

Drongo must decide whether to use the clientrsquos own subnetor to perform subnet assimilation with some known alterna-tive subnet If Drongo has sufficient data for that subnet incombination with that domain it makes a decision whetheror not to use that subnet for the name resolution If Drongolacks sufficient data it issues an ordinary query (using theclientrsquos own subnet for ECS)

41 Window SizeWe now face the question What is a sufficient amount ofdata needed by Drongo to ensure quality subnet assimila-tion decisions We choose to measure the data ldquoquantityby the number of trials for some subnet where the subnetis obtained via traceroutes performed during times whenthe clientrsquos network was idle A sufficient quantity must beenough trials to fill some predetermined window size As weobserved in Section 321 the marginal benefit of increasingthe window size decreases with each additional trial To keepstorage needs low while also obtaining most of the benefitof a larger window we set Drongorsquos window size at 5

42 Data CollectionDrongo must execute trials defined in Section 312 in orderto fill its window and collect sufficient data As demonstratedin Figure 5b when these trials occur is of little to no signifi-cance In our experiments we perform our trials at randomlysampled intervals our trial spacing varies from minutes todays with a tendency toward being near an hour apart Thissporadic spacing parallels the variety of timings we expect tohappen on a real client the client may be online at randomunpredictable times for unpredictable lengths of time

43 Decision MakingHere we detail Drongorsquos logic assuming it has sufficient data(a full window) with which to make a decision about a partic-ular subnet For some domain Drongo must decide whethera subnet is sufficiently valley-prone for subnet assimilationIf so Drongo will use that subnet for the DNS query if notDrongo will use the clientrsquos own subnet From Figure 5b weknow we can anticipate that future behavior will resemblewhat Drongo sees in its current window if Drongo has seenat least one valley occurrence for the domain of interest fromthe subnet under consideration However in Figure 6 wesee that many valleys offer negligible performance gainswhich might not outweigh the performance risk imposedby subnet assimilation To avoid these potentially high riskhops Drongo may benefit from a more selective system thatrequires a high valley frequency in the training window toallow subnet assimilation We explore the effects of changingthe vf parameter in Section 5

It is possible that for a single domain multiple hop subnetsmay qualify as sufficiently valley-prone for subnet assimi-lation When this occurs Drongo must attempt to choosethe best performing from the set of the qualified hops Todo this Drongo selects the hop subnet with the highest val-ley frequency in its training window in the event of a tieDrongo chooses randomly

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

02 04 06 08 10valley threshold

095

100

105

110

115

120

125

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 7 Average latency ratio of overall system as wevary vf and vt

5 DRONGO EVALUATIONHere we evaluate Drongorsquos ability to intelligently choosewell-mapped hop subnets for subnet assimilation and theperformance gains imposed on clients where subnet assimi-lation is appliedUsing the RIPE Atlas platform [17] we collect a trace of

429 probes spread across 177 countries for 1 month In ourexperiments we use the first portion of a clientrsquos trials as atraining window Following the completion of the trainingwindow we use the remaining trials to test Drongorsquos abilityto select good hops for real time subnet assimilation Eachclient performed 10 trials per provider trials 0 through 4to form its training window and trials 5 through 9 to testthe quality of Drongorsquos decisions In our evaluation Drongoalways selects the first CR from a CR-set and the first HRfrom a HR-set mirroring real world client behavior mdash noreal-time on-the-fly measurements are conducted and alldecisions are based on the existing window

51 Parameter SelectionFigure 7 shows the effects of Drongo on our entire RIPEtracersquos average latency ratio ie we consider all requestsgenerated by clients including those that arenrsquot affected byDrongo The figure plots average latency ratio (shown onthe y axis) as a function of the valley threshold vt shownon the x axis The valley threshold introduced in Section 23determines maximum latency ratio a hop-client pair musthave to classify as a valley occurrence For example whenthe threshold equals 10 all latency valleys are consideredon the other hand when vt is 06 Drongo triggers clientassimilation only for latency valleys that have promise toimprove performance by more than 40 ie latency ratiosmaller than 06 The figure shows 5 curves each represent-ing a different valley frequency parameter varied between02 and 10

Figure 7 shows the average results across all tested clientsWe make several insights If the valley frequency is small

02 04 06 08 10valley threshold

05

10

15

20

25

30

35

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 8 Average latency ratio of cases where subnetassimilation was performed

eg 02 Drongo will unselectively trigger subnet assimila-tion for all valleys that have frequency equal to or largerthan 02 which requires only one valley occurrence for ourwindow size of 5 Conversely as the minimum vf parame-ter increases overall system performance improves mdash thelatency ratio drops below 10 Thus the valley frequency isa strong indicator of valley-proneness further supportingour prior findings from Figure 5b

Meanwhile two more characteristics stand out as we varyvt First the valley frequency parameter can completely al-ter the slope of the latency ratio plotted against vt Thisbehavior echoes the observation we made in the previousparagraph with extremely loose requirements Drongo doesnot sufficiently filter out poor-performing subnets from itsset of candidates Intersetingly with a strictvf (closer to 10)the slope changes and the average latency ratio decreasesas we raise the valley threshold This is because if Drongois too strict it filters out too many potential candidates forsubnet assimilation Second we see that the curve eventuallybends upward with high vt values indicating that even thevt can be too lenient The results show that the minimumaverage latency ratio of 09482 (y-axis) is achieved for thevalley threshold of 095 (x-axis) Thus with vf == 10 andvt == 095 Drongo produces its maximum performancegains averaging 518 (10 - 09482) across all of our testedclients By balancing these parameters Drongo filters outsubnets that offer the least benefit subnets where valleysseldom occur and subnets where valleys tend to be shallow

Figure 8 plots the average latency ratio in the samemanneras Figure 7 yet only for queries where Drongo applied sub-net assimilation First the figure again confirms that lowervalley frequency parameters degrade Drongorsquos performanceSecond paired with our knowledge of Figure 7 it becomesclear that as valley threshold decreases the number of la-tency valleys shrinks while the quality of the remainingvalleys becomes more potent This is why the latency ratiodecreases as vt decreases However if the threshold is too

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

02 04 06 08 10valley threshold

00

02

04

06

08

fract

ion

clie

nts

affe

cted

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 9 Percentage of clients where subnet assimila-tion was performed

02 04 06 08 10valley threshold

085

090

095

100

late

ncy

ratio

Google(08)CloudFront(08)

Alibaba(04)CDNetworks(10)

ChinaNetCtr(06)CubeCDN(10)

Figure 10 Per-provider system performance for allqueries Optimal vf is set for each provider and notedin parentheses

small the number of available valleys becomes so small thatthe performance becomes unpredictable ie outlier behaviorcan dominate causing the spike in average latency ratios forvalley thresholds under 02

Finally Figure 9 plots the percent of clients for whichDrongo performed subnet assimilation for at least one providerNecessarily the less strict the frequency constraint is (vf ge

02 is the least strict constraint) the more frequently Drongoacts As shown in Figure 9 6993 of our clients were affectedby Drongo with vf and vt set at the the peak aggregate per-formance values (10 and 095 respectively) found above

Summary In this section we selected parametersvf = 1and vt = 095 where Drongo reaches its peak aggregateperformance gains of 518

52 Per-Provider PerformanceIn the previous subsection we used a constant parameter setacross all six providers in our analysis In this section weanalyze Drongo on a per-provider basis By choosing an op-timal parameter for each provider as we do below Drongorsquosaggregate performance gains increases to 585 In Figure 10

Google

(0807)CloudFront

(0808) Alibaba

(0409)

CDNetworks

(1009)ChinaNetCtr

(06085)CubeCDN

(10095)

02040608101214

late

ncy

ratio

Figure 11 Per-provider system performance forqueries where subnet assimilation was applied Paren-theses contain optimal values for each respectiveprovider formatted (vf vt )

we show how Drongo impacts the average performance ofeach provider while setting valley frequency to each individ-ual providerrsquos respective optimal value (noted in parenthesesunder each provider name) Note that for most providersthe individual optimal valley frequency lies near the valueobtained in Section 51 (10) While the intricacies of eachproviderrsquos behavior clearly hinge on opaque characteristicsof their respective networks we offer several explanations forthe observed performance CDNetworks which sees smallgains across all valley threshold values further validates thehypothesis proposed in Section 323 regarding anycast net-works Meanwhile Google the largest provider of our seton a global scale also experienced the second highest peakaverage gains from Drongo is not well served by Google itis likely that there exist nearby subnets being directed to dif-ferent replicas In other words Drongo has great opportunityfor improving CDNs that use fine grained subnet mapping

Finally Figure 11 showsDrongorsquos performance per-providerexclusively in scenarios when it applies subnet assimilationComparing to the results shown in Figure 6 for the PlanetLab experiments we observe significant differences Most no-tably the latency valley ratios are much smaller in Figure 11than in Figure 6 For example the Googlersquos median latencyratio is close to 10 in Figure 6 indicating insigificant gainsOn the contrary Figure 11 Googlersquos median latency ratio isaround 05 implying gains of 50 (10 minus 05) and up to anorder of magnitude in edge cases Considering all providersDrongo-influenced replica selections are on average 2489better performing than Drongo-free selectionsThere are three reasons for this behavior First Figure 6

shows the lower-bound system performance for PlanetLab asthe best CR is always used for comparison Such a restrictionis not imposed in the the RIPE Atlas scenario Second theRIPE set is more diverse and includes widely distributed end-points thus allowing CDNs greater opportunity for subnetmapping error Third contrary to the Planet Lab scenario

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

where we used all the valleys under the valley threshold of10 here we use optimal vt values such that Drongo suc-cessfully filters out most ldquoshallow valleys that would diluteperformance gains

6 RELATEDWORKGiven a large number of CDNs with vastly different deploy-ment and performance in different territories CDN brokeringhas emerged as a way to improve user-perceived CDNsrsquo per-formance eg [7 34] By monitoring the performance ofmultiple CDNs brokers are capable of determining whichparticular CDN works better for a particular client at a pointin time Necessarily this approach requires content providersto contract with multiple CDNs to host and distribute theircontent Contrary to CDN brokering Drongo manages toimprove performance of each individual CDN Still it workscompletely independently from CDNs and it is readily de-ployable on the clients Moreover Drongo is completelycompatible with CDN brokering since it in principle hasthe capacity to improve the performance of whichever CDNis selected by a broker

There has been tremendous effort expended in the ongoingbattle to make web pages load faster improve streamingperformance etc As a result many advanced client-basedprotocols and systems have emerged from both industry(eg QUIC SPDY and HTTP2) and academia [24 41 50] inthe Web domain and likewise in streaming eg [37] WhileDrongo is also a client-based system it is generic and helpsspeed-up all CDN-hosted content By reducing the CDNlatency experienced by end users it systematically improvesall the above systems and protocols which in turn helps allassociated applicationsRecent work has demonstrated that injecting fake infor-

mation can help protocols achieve better performance Oneexample is routing [49] where fake nodes and links are intro-duced into an underlying linkstate routing protocol so thatrouters compute their own forwarding tables based on theaugmented topology Similar ideas are shown useful in thecontext of traffic engineering [29] where robust and efficientnetwork utilization is accomplished via carefully dissemi-nated bogus information (ldquolies) While similar in spirit tothese approaches Drongo operates in a completely differentdomain aiming to address the CDN pitfalls In addition tousing ldquofake (source subnet) information in order to improveCDN decisions Drongo also invests efforts in discoveringvalley-prone subnets and determining conditions when usingthem is beneficial

7 DISCUSSIONA mass deployment of Drongo is non-trivial and carrieswith it some complexities that deserve careful consideration

There is some concern that maintainence of CDN allocationpolicies may still be compromised despite our efforts to re-spect their mechanisms in Drongorsquos design We propose thata broadly deployed form of Drongo could be carefully tunedto effect only the most extreme cases Figure 11 shows thatsubnet assimilation carries some risk of a loss in performanceso it is in clientsrsquo best interests to apply it conservativelyFirst we know from Figure 9 that the number of clients us-ing assimilated subnets can be significantly throttled withstrict parameteres Further in todayrsquos Internet many clientsare already free to choose arbitrary LDNS servers ([25 36])Many of these servers do not make use of the client subnetoption at all thus rendering the nature of their side-effects mdashpotentially disrupting policies enforced only in DNS mdash equiv-alent to that of Drongo It is difficult to say whether or notDrongorsquos ultimate impact on CDN policies would be anymore sigificant than the presence of such clients

Mass adoption also raises scalability concerns particularlyregarding the amount of measurement traffic being sent intothe network To keep the number of measurements smallwhile ensuring their freshness a distributed peer-to-peercomponent where clients in the same subnet share trial datacould be incorporated into Drongorsquos design We leave thismodification for future work

8 CONCLUSIONSIn this paper we proposed the first approach that enablesclients to actively measure CDNs and effectively improvetheir replica selection decisions without requiring any changesto the CDNs and while respecting CDNsrsquo policies We haveintroduced and explored latency valleys scenarios wherereplicas suggested to upstream subnets outperform thoseprovided to a clientrsquos own subnet We have asserted thatlatency valleys are common phenomena and we have foundthem across all the CDNs we have tested in 26-76 of routesWe showed that valley-prone upstream subnets are easilyfound from the client are simple to identify with few andinfrequent measurements and once found are persistentover timescales of daysWe have introduced Drongo a client-side system that

leverages the performance gains of latency valleys by iden-tifying valley-prone subnets Our measurements show thatDrongo can improve requestsrsquo latency by up to an orderof magnitude Moreover Drongo achieves this with excep-tionally low overhead a mere 5 measurements suffice fortimescales of days Using experimentally derived parame-ters Drongo affects the replica selection of requests made by6993 of clients improving their latency by 2489 in themedian case Drongorsquos significant impact on these requeststranslates into an overall improvement in client-perceivedaggregate CDN performance

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 5: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

within the same local network as the client which will besuggested the same replicas as the client Another disposablecategory mdash hops with private IP addresses mdash will not berecognized by the EDNS server and will yield generic replicachoices regardless of the hopsrsquo locations To ensure that ahop is indeed usable a hop must (i) belong to a different 16subnet than the client (ii) have a different domain than theclient and (iii) belong to a different ASN than the client Weonly filter hops that fail these conditions at the beginning ofthe route once a hop is observed that meets the above threeconstraints we stop filtering for the remainder of the route

311 Provider Selection While ECS is gaining momen-tum among CDNs as it allows them to better estimate theirclientsrsquo location it is important to note that ECS was onlyrecently deployed and its implementation varies among thedifferent providers ldquoA Faster Internet an initiative headedby Google and OpenDNS is drawing together a growingnumber of large CDNs and content providers to adopt andpromote the adoption of ECS [42] In order to attain a set ofCDNs for our tests we have selected those providers whichimplement an unrestricted form of ECS as described in Sec-tion 22In order to best select the CDNs for our tests we scraped

URLs from over 3000 sites arbitrarily selected from the AlexaTop 1M list [2] Then we reduced our set of URLs to thosethat ended in a known file type in order to ensure our abil-ity to perform download tests on the selected URLs Whenapplicable we resolved CNAME domains to their respectiveCDN domains We then performed multiple DNS queries foreach URL to determine if unrestricted ECS is implementedFrom the remaining URLs we have extracted the 6 CDNdomains which have appeared most frequently along withtheir respective URLs

Before we can begin seeking subnets with better mapping-groups it is first necessary to prove that subnets with dif-ferent mapping-groups exist and are easy to find We defineusable route length as the number of hops along the path thatfulfill the above filtering Next we define divergence as thefraction of usable hops along the path which were recom-mended at least one replica that was not recommended tothe clientFigure 2 depicts usable route length and divergence for

different CDNs The figure shows that for example when re-questing replicas from Google each client had an average us-able route length of 78 and the divergence is approximately92 It is evident from Figure 2 that hops are indeed sug-gested different replicas than their client As we encounter awider variety of recommendations we increase our numberof opportunities to find latency valleys The high divergence

Goo

gle

Clo

udFr

ont

Alib

aba

CD

Net

wor

ks

Chi

naN

etCtr

Cub

eCD

N

0 0

0 2

0 4

0 6

0 8

div

erg

ence

mean divergence mean usable route length

0

2

4

6

8

10

12

14

usa

ble

route

length

Figure 2 Average divergence and average usable routelength per CDN

shown in Figure 2 indicates that other and possibly lower-latency replicas can be mapped to clients if they were toassimilate their hopsrsquo IP addresses

For the remainder of this paper we use the following set ofproviders Google Amazon CloudFront Alibaba ChinaNet-Center CDNetworks and CubeCDN a set which is bothdiverse and comprehensive Googlersquos CDN infrastructureis massive and dispersed as of 2013 around 30000 CDN IPaddresses spread across over 700 ASes were observable inone day [25] As of this writing Amazon CloudFront offeredover 50 points-of-presence spread throughout the world[6] CDNetworks offers over 200 points-of-presence alsoglobally dispersed and heavily employing anycast for serverselection Alibaba and ChinaNetCenter offer over 500 CDNnode locations each within China where they are centeredin addition to a growing number of service locations outsideof China [3 30] Finally CubeCDN is a smaller CDN withlocations spread primarily across Turkey [14]

312 Test Execution In order to assert the existence oflatency valleys in the wild we have executed a series of trialsusing the aforementioned URLs and the PlanetLab nodes asclients Each trial consists of the following steps

(1) We randomly select a URL(2) Client retrieves CR-set for the selected URLrsquos domain(3) For each CR client uses traceroute to identify hops(4) Using subnet assimilation client retrieves the HR-set

for each hop(5) Client measures CRMs for every CR in its CR-set and

HRMs for every HR it has seen (across all HR-setsobtained in that trial)

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

Figure 3 Scatter plot of HRMs and minimum CRMsThe area below the diagonal is the valley region

We have performed a series of 45 trials per PlanetLabnode executed between one and two hours apart For theremainder of this Section we refer to this dataset

32 Valley PrevalenceIn order to establish the prevalence and significance of la-tency valleys we now compare HRMs to their respectiveCRMs It is important to note that it is possible that a clienthas multipleCRs and hence multipleCRMs and in practicea client can only choose one replica Therefore we compareall HRMs to the minimum CRM obtained in their trial thusestablishing a lower bound for our expected performancegain and allowing us to easily assert any gains or lossesprovided by the alternative replica choices (the HRs)With our PlanetLab dataset we compared each HRM to

the minimum CRM ie the best client replica obtained inthe same respective trial Figure 3 shows the results of thisassessment We plot HRM on the y axis and minimum CRMon the x axis thus creating a visual representation of thelatency ratio We distinguish between the CDNs using differ-ent colors To better explain the results we draw the equalityline where HRM = minimum CRM Every data point alongthis line represents a hoprsquos subnet which is being suggesteda replica that performs on a par with the replicas suggested

to the clientrsquos subnet More importantly every data pointbelow the equality line represents a valley occurrence as thealternative replicarsquos HRM is smaller than the CRM The per-centage of valleys by provider ranges from 1402 (Cloud-Front) to 3858 (CubeCDN) with an average of 22 acrossall providers Table 1 shows the results ie Column 2 (Valleys Overall) lists these percentages for each CDN

Now that we have established that valleys exist we wishto determine where we can expect to find valleys We con-tinue using the minimum CRM from each trial for reasonsdescribed above However in order to better reflect prac-tical scenarios where only one of a given set of replicas isused we now also begin choosing an individual replica froman HR-set Instead of choosing the minimum HRM for agiven hop we conservatively choose the median Each hoprsquoschosen HRM is the median of its HR-set for that trial Wecontinue this pattern (choosing the clientrsquos best CRM fromthe trial and a hoprsquos median HRM) for the remainder of ourPlanetLab data analysis Our PlanetLab data thus representsa lower bound on valleys and their performance implicationswe compare the median performance of our proposed proce-dure to the absolute best the existing methods have to offerIn Section 5 using our RIPE Atlas testbed we remove thisconstraint to demonstrate the real-world performance of oursystemWe further wish to consider how common valleys are

within a given route from the client to the provider Column 3(Average of Valleys per Route) of Table 1 shows given someroute in one trial the average percent of usable hops thatincur valleysWe see for Alibaba that on average over a thirdof the usable hops in a given route are likely to incur valleysand for CDNetworks nearly a fourth of the route Meanwhilefor CloudFront we see that on average valleys occur for onlya small portion of hops in a given route Column 4 ( Routeswith a Valley) details the percentage of all observed routesthat contained at least one valley in the trial in which theywere observed For Alibaba and CDNetworks around 75of the observed routes contain valleys while more than 50for Google

321 Do Valleys Persist Next we assess whether subnetsare persistently valley-prone For some number of trialshow often can the client expect a valley to occur from someparticular hop To answer this we need to be able to describehow frequently valleys occurred for some hop subnet in agiven set of trials

Valley frequency We define a new simple metric thevalley frequency as follows if we look at a hop-client pairacross a set of N trials and v trials are valleys the valleyfrequency vf is

vf =v

N

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Table 1 Detailed findings for each provider based on PlanetLab data

Provider ValleyOverall

Avg Valleysper Route

Routeswith Valley

Hop-Client Pairsw Valley Freq gt05

Google 2024 1641 5330 1098Amazon CloudFront 1402 872 2582 1000Alibaba 3368 3594 7583 3097CDNetworks 1561 2441 7308 1409ChinaNetCenter 2742 1426 3810 1674CubeCDN 3858 1795 2549 2632

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(a) ping

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

sGoogle

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(b) total download time

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(c) total post-caching download time

Figure 4 CDFs of clients by valley frequency In 4a the subnet-response measurements are ping times averagedfrom bursts of 3 back to back pings In 4b the subnet-response measurements are total download times on firstattempts while in 4c the subnet-response measurements are total download times on consecutive attempts (re-peated downloads that take place immediately after first attempt to account for the potential impact of caching)Downloads were performed by using curl where we set an IP as a destination and set the domain as the HOSTattribute Measurements for 4b and 4c were performed back-to-back so that 4c reflects download times with pre-sumably primed caches

A frequency of 05 would imply that in half the trials per-formed a valley was found in said hop For each providerwe plot the valley frequency for each hop-client pair acrossour PlanetLab dataset

Figure 4 shows the results Note that Figures 4b and 4c usethe total download time and the post-caching total downloadtime respectively for the subnet measurements as opposedto pings2 We further note that their results closely follow thoseobtained using pings as in Figure 4a For simplicity and dueto download limitations of our RIPE testbed we revert toonly using pings for the remainder of this paperFigure 4 shows that approximately 5-20 of hop-client

pairs resulted in valleys 100 of the time (see y-axis in thefigures for x=10) for every trial across the entire three daytest period Such hops will be easy to find given a largeenough dataset However of more interest to us are hop-client pairs that inconsistently incur valleys For exampleperhaps if valleys can be found 50 of the time for some hop-client pair mdash ie a valley frequency of 05 mdash that hop may bea sufficiently persistent producer of valleys for us to reliablyexpect good replica choices This would provide our system2We fetched png and js files 1kB ndash 1MB large hosted at the CDNs

with more opportunities to employ subnet assimilation andideally improve performance Column 5 ( Hop-Client Pairswith Valley Frequency gt 05) of Table 1 shows the percentageof hop-client pairs that have valley frequencies (calculatedacross all 45 trials) greater than 05 ie hop-client pairs thatincurred valleys in the majority of their trials

322 Can We Predict Valleys While valley frequencycan tell us how often valleys occurred on a past dataset themetric makes no strong implications about what we canexpect to see for a future dataset To answer this let us con-sider how the latency ratio changes as we increase the timepassed between the trials we compare In addition per ourreasoning in the vf metric we may want to compare a set ofconsecutive trials mdash a window mdash to another consecutive setof the same size rather than only comparing individual trials(which is essentially comparing windows of size 1) Given aset of trials we take a window of size N and compare it toall other windows of size N including those that overlap bytaking the difference in their latency ratios (HRMCRM) Ifwe plot this against the time distance between the windowswe can observe the trend in latency ratios as the distance in

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(a) All client-hop pairs

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(b) Valley at least once

Figure 5 In both figures we compare the change in latency ratio between two trial windows to the distance intime between the two windows In figure a we use all hop client pairs In figure b we restrict our set to pairs thatexperience at least one valley in at least one of their 45 trials

time between windows increases An upward slope wouldimply that the latency ratio is drifting farther apart as thetime gap increases while a horizontal line would imply thatthe change in latency ratio is not varying with time (ie win-dows an hour apart are as similar to each other as windowsdays apart) A jagged or wildly varying line would indicatethat future behavior is unpredictable

Figure 5 plots the latency ratio versus distance in time forseveral window sizes ranging from 1 to 15 In these plots weuse the windowsrsquo median latency ratio values for compari-son In particular in Figure 5a we use our entire dataset ieconsider all hop-client pairs independently of if they ever in-cur latency valleys or not and plot latency ratio differencesover time For example assume that for a client-hop pairHRM = 80 ms at one measurement point while it rises to120 ms at another measurement point Meanwhile assumethatCRM remains 100 ms at both measurement points Thusthe latency ratioHRMCRM changes from 80100 to 120100ie from 08 to 12 The latency ratio difference is 04 Fig-ure 5a shows that the latency ratio difference values bothincrease and vary wildly as windows become more distantThis shows that hop-subnet performance overall is likelyextremely unpredictable We hypothesize this results fromunmapped subnets receiving generic answers as observedin [47] However subnet assimilation requires that we havesome idea of how a subnet will perform in advance

Since valley-prone subnets are of particular interest to usin Figure 5b we reduce our dataset to only include client-hoppairs that have at least one valley occurrence across all 45 ofits trials combined The plotrsquos behavior becomes dramaticallymore stable after applying this simple minor constraint Theeffects of the window size also become apparent in Figure5b Even with a window of size 1 the slope is very small andthe line is nearly flat after over 50 hours two windows are

only 10 to 20 more dissimilar than two windows a singlehour apart For window sizes greater than 5 latency ratiosare often within 5 of each other regardless of their distancein time Going from a window size of 1 to 5 shows drasticimprovement while each following increase in window sizeshows a smaller impact The results clearly show that a clientcan effectively identify valley-prone subnets and predictvalleys with a sufficiently long window size

323 Valley Utility Analysis Figure 6 shows the distribu-tion of the lower bound latency ratios seen by the set of allvalley occurrences for each provider The latency ratio rep-resents a lower bound because we use the minimum latencymeasure for replicas recommended to the client For exampleif the minimum CRM is 100 ms and HRM is 80 ms then thelatency ratio is 08 Thus the closer to 0 the latency ratiosare the larger the gain from subnet assimilation The red lineshows median the box bounds the 25th and 75th percentilesand the whiskers extend up to data points 15 times theinterquartile-range (75th percentile - 25th percentile) abovethe 75th percentile and below the 25th percentile Data pointsbeyond the whiskers if they exist are considered outliersand not shownFigure 6 shows that most of the providers show oppor-

tunities for significant latency reduction From this plot itappears that Amazon CloudFront and ChinaNetCenter offerthe greatest potential for gains With the exception of CD-Networks we see all of our providers have 25th percentilesnear or below a latency ratio of 08 a 20 performance gainWe also observe the diversity of valley ldquodepth For exam-ple we see in Figure 6 that ChinaNetCenterrsquos interquartilerange spans over 40 of the possible value space (between 0and 1) With such a wide variety of gains for 50 of its val-leys it opens the door to being more selective Rather thansimply chasing all valleys we could set strict requirements

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

0 2

0 4

0 6

0 8

1 0

late

ncy

rati

o

Figure 6 Lower bound of latency ratio of all valley oc-currences

tightening our definition of whatrsquos ldquogood enough for subnetassimilation as we evaluate in Section 51

CDNetworksrsquo performance is more tightly bounded as itsinterquartile range covers less than 10 of the value spaceand sits very near to 1 This implies CDNetworks probablyoffers very little opportunity for our technique to improve itsreplica selection We hypothesize that this is a byproduct ofCDNetworks anycast implementation for replica selectionanycast depends more on routing and network propertiesthan DNS and IP selection [27] In addition Googlersquos medianand 75th percentile latency ratios sit very near 10 Howeverby being more selective as described above we may be ableto pursue better valleys in the lower quartiles below themedian We could potentially improve our systemrsquos valley-prone hop selection by filtering out ldquoshallow valleys fromour consideration We demonstrate and discuss the effectsof different levels of selectiveness in Section 51

4 DRONGO SYSTEM OVERVIEWWe introduce Drongo a client-side system that employs sub-net assimilation to speed up CDNs Drongo sits on top of aclientrsquos DNS system In a full deployment Drongo will beinstalled as a LDNS proxy on the client machine where itwould have easy access to the ECS option and DNS responsesit needs to perform trials Drongo is set by the client as itsdefault local DNS resolver and acts as a middle party re-shaping outgoing DNS messages via subnet assimilation andstoring data from incoming DNS responses In our currentimplementation Drongo uses Googlersquos public DNS serviceat 8888 to complete DNS queriesUpon reception of an outgoing query from the client

Drongo must decide whether to use the clientrsquos own subnetor to perform subnet assimilation with some known alterna-tive subnet If Drongo has sufficient data for that subnet incombination with that domain it makes a decision whetheror not to use that subnet for the name resolution If Drongolacks sufficient data it issues an ordinary query (using theclientrsquos own subnet for ECS)

41 Window SizeWe now face the question What is a sufficient amount ofdata needed by Drongo to ensure quality subnet assimila-tion decisions We choose to measure the data ldquoquantityby the number of trials for some subnet where the subnetis obtained via traceroutes performed during times whenthe clientrsquos network was idle A sufficient quantity must beenough trials to fill some predetermined window size As weobserved in Section 321 the marginal benefit of increasingthe window size decreases with each additional trial To keepstorage needs low while also obtaining most of the benefitof a larger window we set Drongorsquos window size at 5

42 Data CollectionDrongo must execute trials defined in Section 312 in orderto fill its window and collect sufficient data As demonstratedin Figure 5b when these trials occur is of little to no signifi-cance In our experiments we perform our trials at randomlysampled intervals our trial spacing varies from minutes todays with a tendency toward being near an hour apart Thissporadic spacing parallels the variety of timings we expect tohappen on a real client the client may be online at randomunpredictable times for unpredictable lengths of time

43 Decision MakingHere we detail Drongorsquos logic assuming it has sufficient data(a full window) with which to make a decision about a partic-ular subnet For some domain Drongo must decide whethera subnet is sufficiently valley-prone for subnet assimilationIf so Drongo will use that subnet for the DNS query if notDrongo will use the clientrsquos own subnet From Figure 5b weknow we can anticipate that future behavior will resemblewhat Drongo sees in its current window if Drongo has seenat least one valley occurrence for the domain of interest fromthe subnet under consideration However in Figure 6 wesee that many valleys offer negligible performance gainswhich might not outweigh the performance risk imposedby subnet assimilation To avoid these potentially high riskhops Drongo may benefit from a more selective system thatrequires a high valley frequency in the training window toallow subnet assimilation We explore the effects of changingthe vf parameter in Section 5

It is possible that for a single domain multiple hop subnetsmay qualify as sufficiently valley-prone for subnet assimi-lation When this occurs Drongo must attempt to choosethe best performing from the set of the qualified hops Todo this Drongo selects the hop subnet with the highest val-ley frequency in its training window in the event of a tieDrongo chooses randomly

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

02 04 06 08 10valley threshold

095

100

105

110

115

120

125

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 7 Average latency ratio of overall system as wevary vf and vt

5 DRONGO EVALUATIONHere we evaluate Drongorsquos ability to intelligently choosewell-mapped hop subnets for subnet assimilation and theperformance gains imposed on clients where subnet assimi-lation is appliedUsing the RIPE Atlas platform [17] we collect a trace of

429 probes spread across 177 countries for 1 month In ourexperiments we use the first portion of a clientrsquos trials as atraining window Following the completion of the trainingwindow we use the remaining trials to test Drongorsquos abilityto select good hops for real time subnet assimilation Eachclient performed 10 trials per provider trials 0 through 4to form its training window and trials 5 through 9 to testthe quality of Drongorsquos decisions In our evaluation Drongoalways selects the first CR from a CR-set and the first HRfrom a HR-set mirroring real world client behavior mdash noreal-time on-the-fly measurements are conducted and alldecisions are based on the existing window

51 Parameter SelectionFigure 7 shows the effects of Drongo on our entire RIPEtracersquos average latency ratio ie we consider all requestsgenerated by clients including those that arenrsquot affected byDrongo The figure plots average latency ratio (shown onthe y axis) as a function of the valley threshold vt shownon the x axis The valley threshold introduced in Section 23determines maximum latency ratio a hop-client pair musthave to classify as a valley occurrence For example whenthe threshold equals 10 all latency valleys are consideredon the other hand when vt is 06 Drongo triggers clientassimilation only for latency valleys that have promise toimprove performance by more than 40 ie latency ratiosmaller than 06 The figure shows 5 curves each represent-ing a different valley frequency parameter varied between02 and 10

Figure 7 shows the average results across all tested clientsWe make several insights If the valley frequency is small

02 04 06 08 10valley threshold

05

10

15

20

25

30

35

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 8 Average latency ratio of cases where subnetassimilation was performed

eg 02 Drongo will unselectively trigger subnet assimila-tion for all valleys that have frequency equal to or largerthan 02 which requires only one valley occurrence for ourwindow size of 5 Conversely as the minimum vf parame-ter increases overall system performance improves mdash thelatency ratio drops below 10 Thus the valley frequency isa strong indicator of valley-proneness further supportingour prior findings from Figure 5b

Meanwhile two more characteristics stand out as we varyvt First the valley frequency parameter can completely al-ter the slope of the latency ratio plotted against vt Thisbehavior echoes the observation we made in the previousparagraph with extremely loose requirements Drongo doesnot sufficiently filter out poor-performing subnets from itsset of candidates Intersetingly with a strictvf (closer to 10)the slope changes and the average latency ratio decreasesas we raise the valley threshold This is because if Drongois too strict it filters out too many potential candidates forsubnet assimilation Second we see that the curve eventuallybends upward with high vt values indicating that even thevt can be too lenient The results show that the minimumaverage latency ratio of 09482 (y-axis) is achieved for thevalley threshold of 095 (x-axis) Thus with vf == 10 andvt == 095 Drongo produces its maximum performancegains averaging 518 (10 - 09482) across all of our testedclients By balancing these parameters Drongo filters outsubnets that offer the least benefit subnets where valleysseldom occur and subnets where valleys tend to be shallow

Figure 8 plots the average latency ratio in the samemanneras Figure 7 yet only for queries where Drongo applied sub-net assimilation First the figure again confirms that lowervalley frequency parameters degrade Drongorsquos performanceSecond paired with our knowledge of Figure 7 it becomesclear that as valley threshold decreases the number of la-tency valleys shrinks while the quality of the remainingvalleys becomes more potent This is why the latency ratiodecreases as vt decreases However if the threshold is too

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

02 04 06 08 10valley threshold

00

02

04

06

08

fract

ion

clie

nts

affe

cted

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 9 Percentage of clients where subnet assimila-tion was performed

02 04 06 08 10valley threshold

085

090

095

100

late

ncy

ratio

Google(08)CloudFront(08)

Alibaba(04)CDNetworks(10)

ChinaNetCtr(06)CubeCDN(10)

Figure 10 Per-provider system performance for allqueries Optimal vf is set for each provider and notedin parentheses

small the number of available valleys becomes so small thatthe performance becomes unpredictable ie outlier behaviorcan dominate causing the spike in average latency ratios forvalley thresholds under 02

Finally Figure 9 plots the percent of clients for whichDrongo performed subnet assimilation for at least one providerNecessarily the less strict the frequency constraint is (vf ge

02 is the least strict constraint) the more frequently Drongoacts As shown in Figure 9 6993 of our clients were affectedby Drongo with vf and vt set at the the peak aggregate per-formance values (10 and 095 respectively) found above

Summary In this section we selected parametersvf = 1and vt = 095 where Drongo reaches its peak aggregateperformance gains of 518

52 Per-Provider PerformanceIn the previous subsection we used a constant parameter setacross all six providers in our analysis In this section weanalyze Drongo on a per-provider basis By choosing an op-timal parameter for each provider as we do below Drongorsquosaggregate performance gains increases to 585 In Figure 10

Google

(0807)CloudFront

(0808) Alibaba

(0409)

CDNetworks

(1009)ChinaNetCtr

(06085)CubeCDN

(10095)

02040608101214

late

ncy

ratio

Figure 11 Per-provider system performance forqueries where subnet assimilation was applied Paren-theses contain optimal values for each respectiveprovider formatted (vf vt )

we show how Drongo impacts the average performance ofeach provider while setting valley frequency to each individ-ual providerrsquos respective optimal value (noted in parenthesesunder each provider name) Note that for most providersthe individual optimal valley frequency lies near the valueobtained in Section 51 (10) While the intricacies of eachproviderrsquos behavior clearly hinge on opaque characteristicsof their respective networks we offer several explanations forthe observed performance CDNetworks which sees smallgains across all valley threshold values further validates thehypothesis proposed in Section 323 regarding anycast net-works Meanwhile Google the largest provider of our seton a global scale also experienced the second highest peakaverage gains from Drongo is not well served by Google itis likely that there exist nearby subnets being directed to dif-ferent replicas In other words Drongo has great opportunityfor improving CDNs that use fine grained subnet mapping

Finally Figure 11 showsDrongorsquos performance per-providerexclusively in scenarios when it applies subnet assimilationComparing to the results shown in Figure 6 for the PlanetLab experiments we observe significant differences Most no-tably the latency valley ratios are much smaller in Figure 11than in Figure 6 For example the Googlersquos median latencyratio is close to 10 in Figure 6 indicating insigificant gainsOn the contrary Figure 11 Googlersquos median latency ratio isaround 05 implying gains of 50 (10 minus 05) and up to anorder of magnitude in edge cases Considering all providersDrongo-influenced replica selections are on average 2489better performing than Drongo-free selectionsThere are three reasons for this behavior First Figure 6

shows the lower-bound system performance for PlanetLab asthe best CR is always used for comparison Such a restrictionis not imposed in the the RIPE Atlas scenario Second theRIPE set is more diverse and includes widely distributed end-points thus allowing CDNs greater opportunity for subnetmapping error Third contrary to the Planet Lab scenario

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

where we used all the valleys under the valley threshold of10 here we use optimal vt values such that Drongo suc-cessfully filters out most ldquoshallow valleys that would diluteperformance gains

6 RELATEDWORKGiven a large number of CDNs with vastly different deploy-ment and performance in different territories CDN brokeringhas emerged as a way to improve user-perceived CDNsrsquo per-formance eg [7 34] By monitoring the performance ofmultiple CDNs brokers are capable of determining whichparticular CDN works better for a particular client at a pointin time Necessarily this approach requires content providersto contract with multiple CDNs to host and distribute theircontent Contrary to CDN brokering Drongo manages toimprove performance of each individual CDN Still it workscompletely independently from CDNs and it is readily de-ployable on the clients Moreover Drongo is completelycompatible with CDN brokering since it in principle hasthe capacity to improve the performance of whichever CDNis selected by a broker

There has been tremendous effort expended in the ongoingbattle to make web pages load faster improve streamingperformance etc As a result many advanced client-basedprotocols and systems have emerged from both industry(eg QUIC SPDY and HTTP2) and academia [24 41 50] inthe Web domain and likewise in streaming eg [37] WhileDrongo is also a client-based system it is generic and helpsspeed-up all CDN-hosted content By reducing the CDNlatency experienced by end users it systematically improvesall the above systems and protocols which in turn helps allassociated applicationsRecent work has demonstrated that injecting fake infor-

mation can help protocols achieve better performance Oneexample is routing [49] where fake nodes and links are intro-duced into an underlying linkstate routing protocol so thatrouters compute their own forwarding tables based on theaugmented topology Similar ideas are shown useful in thecontext of traffic engineering [29] where robust and efficientnetwork utilization is accomplished via carefully dissemi-nated bogus information (ldquolies) While similar in spirit tothese approaches Drongo operates in a completely differentdomain aiming to address the CDN pitfalls In addition tousing ldquofake (source subnet) information in order to improveCDN decisions Drongo also invests efforts in discoveringvalley-prone subnets and determining conditions when usingthem is beneficial

7 DISCUSSIONA mass deployment of Drongo is non-trivial and carrieswith it some complexities that deserve careful consideration

There is some concern that maintainence of CDN allocationpolicies may still be compromised despite our efforts to re-spect their mechanisms in Drongorsquos design We propose thata broadly deployed form of Drongo could be carefully tunedto effect only the most extreme cases Figure 11 shows thatsubnet assimilation carries some risk of a loss in performanceso it is in clientsrsquo best interests to apply it conservativelyFirst we know from Figure 9 that the number of clients us-ing assimilated subnets can be significantly throttled withstrict parameteres Further in todayrsquos Internet many clientsare already free to choose arbitrary LDNS servers ([25 36])Many of these servers do not make use of the client subnetoption at all thus rendering the nature of their side-effects mdashpotentially disrupting policies enforced only in DNS mdash equiv-alent to that of Drongo It is difficult to say whether or notDrongorsquos ultimate impact on CDN policies would be anymore sigificant than the presence of such clients

Mass adoption also raises scalability concerns particularlyregarding the amount of measurement traffic being sent intothe network To keep the number of measurements smallwhile ensuring their freshness a distributed peer-to-peercomponent where clients in the same subnet share trial datacould be incorporated into Drongorsquos design We leave thismodification for future work

8 CONCLUSIONSIn this paper we proposed the first approach that enablesclients to actively measure CDNs and effectively improvetheir replica selection decisions without requiring any changesto the CDNs and while respecting CDNsrsquo policies We haveintroduced and explored latency valleys scenarios wherereplicas suggested to upstream subnets outperform thoseprovided to a clientrsquos own subnet We have asserted thatlatency valleys are common phenomena and we have foundthem across all the CDNs we have tested in 26-76 of routesWe showed that valley-prone upstream subnets are easilyfound from the client are simple to identify with few andinfrequent measurements and once found are persistentover timescales of daysWe have introduced Drongo a client-side system that

leverages the performance gains of latency valleys by iden-tifying valley-prone subnets Our measurements show thatDrongo can improve requestsrsquo latency by up to an orderof magnitude Moreover Drongo achieves this with excep-tionally low overhead a mere 5 measurements suffice fortimescales of days Using experimentally derived parame-ters Drongo affects the replica selection of requests made by6993 of clients improving their latency by 2489 in themedian case Drongorsquos significant impact on these requeststranslates into an overall improvement in client-perceivedaggregate CDN performance

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 6: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

Figure 3 Scatter plot of HRMs and minimum CRMsThe area below the diagonal is the valley region

We have performed a series of 45 trials per PlanetLabnode executed between one and two hours apart For theremainder of this Section we refer to this dataset

32 Valley PrevalenceIn order to establish the prevalence and significance of la-tency valleys we now compare HRMs to their respectiveCRMs It is important to note that it is possible that a clienthas multipleCRs and hence multipleCRMs and in practicea client can only choose one replica Therefore we compareall HRMs to the minimum CRM obtained in their trial thusestablishing a lower bound for our expected performancegain and allowing us to easily assert any gains or lossesprovided by the alternative replica choices (the HRs)With our PlanetLab dataset we compared each HRM to

the minimum CRM ie the best client replica obtained inthe same respective trial Figure 3 shows the results of thisassessment We plot HRM on the y axis and minimum CRMon the x axis thus creating a visual representation of thelatency ratio We distinguish between the CDNs using differ-ent colors To better explain the results we draw the equalityline where HRM = minimum CRM Every data point alongthis line represents a hoprsquos subnet which is being suggesteda replica that performs on a par with the replicas suggested

to the clientrsquos subnet More importantly every data pointbelow the equality line represents a valley occurrence as thealternative replicarsquos HRM is smaller than the CRM The per-centage of valleys by provider ranges from 1402 (Cloud-Front) to 3858 (CubeCDN) with an average of 22 acrossall providers Table 1 shows the results ie Column 2 (Valleys Overall) lists these percentages for each CDN

Now that we have established that valleys exist we wishto determine where we can expect to find valleys We con-tinue using the minimum CRM from each trial for reasonsdescribed above However in order to better reflect prac-tical scenarios where only one of a given set of replicas isused we now also begin choosing an individual replica froman HR-set Instead of choosing the minimum HRM for agiven hop we conservatively choose the median Each hoprsquoschosen HRM is the median of its HR-set for that trial Wecontinue this pattern (choosing the clientrsquos best CRM fromthe trial and a hoprsquos median HRM) for the remainder of ourPlanetLab data analysis Our PlanetLab data thus representsa lower bound on valleys and their performance implicationswe compare the median performance of our proposed proce-dure to the absolute best the existing methods have to offerIn Section 5 using our RIPE Atlas testbed we remove thisconstraint to demonstrate the real-world performance of oursystemWe further wish to consider how common valleys are

within a given route from the client to the provider Column 3(Average of Valleys per Route) of Table 1 shows given someroute in one trial the average percent of usable hops thatincur valleysWe see for Alibaba that on average over a thirdof the usable hops in a given route are likely to incur valleysand for CDNetworks nearly a fourth of the route Meanwhilefor CloudFront we see that on average valleys occur for onlya small portion of hops in a given route Column 4 ( Routeswith a Valley) details the percentage of all observed routesthat contained at least one valley in the trial in which theywere observed For Alibaba and CDNetworks around 75of the observed routes contain valleys while more than 50for Google

321 Do Valleys Persist Next we assess whether subnetsare persistently valley-prone For some number of trialshow often can the client expect a valley to occur from someparticular hop To answer this we need to be able to describehow frequently valleys occurred for some hop subnet in agiven set of trials

Valley frequency We define a new simple metric thevalley frequency as follows if we look at a hop-client pairacross a set of N trials and v trials are valleys the valleyfrequency vf is

vf =v

N

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Table 1 Detailed findings for each provider based on PlanetLab data

Provider ValleyOverall

Avg Valleysper Route

Routeswith Valley

Hop-Client Pairsw Valley Freq gt05

Google 2024 1641 5330 1098Amazon CloudFront 1402 872 2582 1000Alibaba 3368 3594 7583 3097CDNetworks 1561 2441 7308 1409ChinaNetCenter 2742 1426 3810 1674CubeCDN 3858 1795 2549 2632

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(a) ping

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

sGoogle

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(b) total download time

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(c) total post-caching download time

Figure 4 CDFs of clients by valley frequency In 4a the subnet-response measurements are ping times averagedfrom bursts of 3 back to back pings In 4b the subnet-response measurements are total download times on firstattempts while in 4c the subnet-response measurements are total download times on consecutive attempts (re-peated downloads that take place immediately after first attempt to account for the potential impact of caching)Downloads were performed by using curl where we set an IP as a destination and set the domain as the HOSTattribute Measurements for 4b and 4c were performed back-to-back so that 4c reflects download times with pre-sumably primed caches

A frequency of 05 would imply that in half the trials per-formed a valley was found in said hop For each providerwe plot the valley frequency for each hop-client pair acrossour PlanetLab dataset

Figure 4 shows the results Note that Figures 4b and 4c usethe total download time and the post-caching total downloadtime respectively for the subnet measurements as opposedto pings2 We further note that their results closely follow thoseobtained using pings as in Figure 4a For simplicity and dueto download limitations of our RIPE testbed we revert toonly using pings for the remainder of this paperFigure 4 shows that approximately 5-20 of hop-client

pairs resulted in valleys 100 of the time (see y-axis in thefigures for x=10) for every trial across the entire three daytest period Such hops will be easy to find given a largeenough dataset However of more interest to us are hop-client pairs that inconsistently incur valleys For exampleperhaps if valleys can be found 50 of the time for some hop-client pair mdash ie a valley frequency of 05 mdash that hop may bea sufficiently persistent producer of valleys for us to reliablyexpect good replica choices This would provide our system2We fetched png and js files 1kB ndash 1MB large hosted at the CDNs

with more opportunities to employ subnet assimilation andideally improve performance Column 5 ( Hop-Client Pairswith Valley Frequency gt 05) of Table 1 shows the percentageof hop-client pairs that have valley frequencies (calculatedacross all 45 trials) greater than 05 ie hop-client pairs thatincurred valleys in the majority of their trials

322 Can We Predict Valleys While valley frequencycan tell us how often valleys occurred on a past dataset themetric makes no strong implications about what we canexpect to see for a future dataset To answer this let us con-sider how the latency ratio changes as we increase the timepassed between the trials we compare In addition per ourreasoning in the vf metric we may want to compare a set ofconsecutive trials mdash a window mdash to another consecutive setof the same size rather than only comparing individual trials(which is essentially comparing windows of size 1) Given aset of trials we take a window of size N and compare it toall other windows of size N including those that overlap bytaking the difference in their latency ratios (HRMCRM) Ifwe plot this against the time distance between the windowswe can observe the trend in latency ratios as the distance in

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(a) All client-hop pairs

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(b) Valley at least once

Figure 5 In both figures we compare the change in latency ratio between two trial windows to the distance intime between the two windows In figure a we use all hop client pairs In figure b we restrict our set to pairs thatexperience at least one valley in at least one of their 45 trials

time between windows increases An upward slope wouldimply that the latency ratio is drifting farther apart as thetime gap increases while a horizontal line would imply thatthe change in latency ratio is not varying with time (ie win-dows an hour apart are as similar to each other as windowsdays apart) A jagged or wildly varying line would indicatethat future behavior is unpredictable

Figure 5 plots the latency ratio versus distance in time forseveral window sizes ranging from 1 to 15 In these plots weuse the windowsrsquo median latency ratio values for compari-son In particular in Figure 5a we use our entire dataset ieconsider all hop-client pairs independently of if they ever in-cur latency valleys or not and plot latency ratio differencesover time For example assume that for a client-hop pairHRM = 80 ms at one measurement point while it rises to120 ms at another measurement point Meanwhile assumethatCRM remains 100 ms at both measurement points Thusthe latency ratioHRMCRM changes from 80100 to 120100ie from 08 to 12 The latency ratio difference is 04 Fig-ure 5a shows that the latency ratio difference values bothincrease and vary wildly as windows become more distantThis shows that hop-subnet performance overall is likelyextremely unpredictable We hypothesize this results fromunmapped subnets receiving generic answers as observedin [47] However subnet assimilation requires that we havesome idea of how a subnet will perform in advance

Since valley-prone subnets are of particular interest to usin Figure 5b we reduce our dataset to only include client-hoppairs that have at least one valley occurrence across all 45 ofits trials combined The plotrsquos behavior becomes dramaticallymore stable after applying this simple minor constraint Theeffects of the window size also become apparent in Figure5b Even with a window of size 1 the slope is very small andthe line is nearly flat after over 50 hours two windows are

only 10 to 20 more dissimilar than two windows a singlehour apart For window sizes greater than 5 latency ratiosare often within 5 of each other regardless of their distancein time Going from a window size of 1 to 5 shows drasticimprovement while each following increase in window sizeshows a smaller impact The results clearly show that a clientcan effectively identify valley-prone subnets and predictvalleys with a sufficiently long window size

323 Valley Utility Analysis Figure 6 shows the distribu-tion of the lower bound latency ratios seen by the set of allvalley occurrences for each provider The latency ratio rep-resents a lower bound because we use the minimum latencymeasure for replicas recommended to the client For exampleif the minimum CRM is 100 ms and HRM is 80 ms then thelatency ratio is 08 Thus the closer to 0 the latency ratiosare the larger the gain from subnet assimilation The red lineshows median the box bounds the 25th and 75th percentilesand the whiskers extend up to data points 15 times theinterquartile-range (75th percentile - 25th percentile) abovethe 75th percentile and below the 25th percentile Data pointsbeyond the whiskers if they exist are considered outliersand not shownFigure 6 shows that most of the providers show oppor-

tunities for significant latency reduction From this plot itappears that Amazon CloudFront and ChinaNetCenter offerthe greatest potential for gains With the exception of CD-Networks we see all of our providers have 25th percentilesnear or below a latency ratio of 08 a 20 performance gainWe also observe the diversity of valley ldquodepth For exam-ple we see in Figure 6 that ChinaNetCenterrsquos interquartilerange spans over 40 of the possible value space (between 0and 1) With such a wide variety of gains for 50 of its val-leys it opens the door to being more selective Rather thansimply chasing all valleys we could set strict requirements

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

0 2

0 4

0 6

0 8

1 0

late

ncy

rati

o

Figure 6 Lower bound of latency ratio of all valley oc-currences

tightening our definition of whatrsquos ldquogood enough for subnetassimilation as we evaluate in Section 51

CDNetworksrsquo performance is more tightly bounded as itsinterquartile range covers less than 10 of the value spaceand sits very near to 1 This implies CDNetworks probablyoffers very little opportunity for our technique to improve itsreplica selection We hypothesize that this is a byproduct ofCDNetworks anycast implementation for replica selectionanycast depends more on routing and network propertiesthan DNS and IP selection [27] In addition Googlersquos medianand 75th percentile latency ratios sit very near 10 Howeverby being more selective as described above we may be ableto pursue better valleys in the lower quartiles below themedian We could potentially improve our systemrsquos valley-prone hop selection by filtering out ldquoshallow valleys fromour consideration We demonstrate and discuss the effectsof different levels of selectiveness in Section 51

4 DRONGO SYSTEM OVERVIEWWe introduce Drongo a client-side system that employs sub-net assimilation to speed up CDNs Drongo sits on top of aclientrsquos DNS system In a full deployment Drongo will beinstalled as a LDNS proxy on the client machine where itwould have easy access to the ECS option and DNS responsesit needs to perform trials Drongo is set by the client as itsdefault local DNS resolver and acts as a middle party re-shaping outgoing DNS messages via subnet assimilation andstoring data from incoming DNS responses In our currentimplementation Drongo uses Googlersquos public DNS serviceat 8888 to complete DNS queriesUpon reception of an outgoing query from the client

Drongo must decide whether to use the clientrsquos own subnetor to perform subnet assimilation with some known alterna-tive subnet If Drongo has sufficient data for that subnet incombination with that domain it makes a decision whetheror not to use that subnet for the name resolution If Drongolacks sufficient data it issues an ordinary query (using theclientrsquos own subnet for ECS)

41 Window SizeWe now face the question What is a sufficient amount ofdata needed by Drongo to ensure quality subnet assimila-tion decisions We choose to measure the data ldquoquantityby the number of trials for some subnet where the subnetis obtained via traceroutes performed during times whenthe clientrsquos network was idle A sufficient quantity must beenough trials to fill some predetermined window size As weobserved in Section 321 the marginal benefit of increasingthe window size decreases with each additional trial To keepstorage needs low while also obtaining most of the benefitof a larger window we set Drongorsquos window size at 5

42 Data CollectionDrongo must execute trials defined in Section 312 in orderto fill its window and collect sufficient data As demonstratedin Figure 5b when these trials occur is of little to no signifi-cance In our experiments we perform our trials at randomlysampled intervals our trial spacing varies from minutes todays with a tendency toward being near an hour apart Thissporadic spacing parallels the variety of timings we expect tohappen on a real client the client may be online at randomunpredictable times for unpredictable lengths of time

43 Decision MakingHere we detail Drongorsquos logic assuming it has sufficient data(a full window) with which to make a decision about a partic-ular subnet For some domain Drongo must decide whethera subnet is sufficiently valley-prone for subnet assimilationIf so Drongo will use that subnet for the DNS query if notDrongo will use the clientrsquos own subnet From Figure 5b weknow we can anticipate that future behavior will resemblewhat Drongo sees in its current window if Drongo has seenat least one valley occurrence for the domain of interest fromthe subnet under consideration However in Figure 6 wesee that many valleys offer negligible performance gainswhich might not outweigh the performance risk imposedby subnet assimilation To avoid these potentially high riskhops Drongo may benefit from a more selective system thatrequires a high valley frequency in the training window toallow subnet assimilation We explore the effects of changingthe vf parameter in Section 5

It is possible that for a single domain multiple hop subnetsmay qualify as sufficiently valley-prone for subnet assimi-lation When this occurs Drongo must attempt to choosethe best performing from the set of the qualified hops Todo this Drongo selects the hop subnet with the highest val-ley frequency in its training window in the event of a tieDrongo chooses randomly

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

02 04 06 08 10valley threshold

095

100

105

110

115

120

125

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 7 Average latency ratio of overall system as wevary vf and vt

5 DRONGO EVALUATIONHere we evaluate Drongorsquos ability to intelligently choosewell-mapped hop subnets for subnet assimilation and theperformance gains imposed on clients where subnet assimi-lation is appliedUsing the RIPE Atlas platform [17] we collect a trace of

429 probes spread across 177 countries for 1 month In ourexperiments we use the first portion of a clientrsquos trials as atraining window Following the completion of the trainingwindow we use the remaining trials to test Drongorsquos abilityto select good hops for real time subnet assimilation Eachclient performed 10 trials per provider trials 0 through 4to form its training window and trials 5 through 9 to testthe quality of Drongorsquos decisions In our evaluation Drongoalways selects the first CR from a CR-set and the first HRfrom a HR-set mirroring real world client behavior mdash noreal-time on-the-fly measurements are conducted and alldecisions are based on the existing window

51 Parameter SelectionFigure 7 shows the effects of Drongo on our entire RIPEtracersquos average latency ratio ie we consider all requestsgenerated by clients including those that arenrsquot affected byDrongo The figure plots average latency ratio (shown onthe y axis) as a function of the valley threshold vt shownon the x axis The valley threshold introduced in Section 23determines maximum latency ratio a hop-client pair musthave to classify as a valley occurrence For example whenthe threshold equals 10 all latency valleys are consideredon the other hand when vt is 06 Drongo triggers clientassimilation only for latency valleys that have promise toimprove performance by more than 40 ie latency ratiosmaller than 06 The figure shows 5 curves each represent-ing a different valley frequency parameter varied between02 and 10

Figure 7 shows the average results across all tested clientsWe make several insights If the valley frequency is small

02 04 06 08 10valley threshold

05

10

15

20

25

30

35

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 8 Average latency ratio of cases where subnetassimilation was performed

eg 02 Drongo will unselectively trigger subnet assimila-tion for all valleys that have frequency equal to or largerthan 02 which requires only one valley occurrence for ourwindow size of 5 Conversely as the minimum vf parame-ter increases overall system performance improves mdash thelatency ratio drops below 10 Thus the valley frequency isa strong indicator of valley-proneness further supportingour prior findings from Figure 5b

Meanwhile two more characteristics stand out as we varyvt First the valley frequency parameter can completely al-ter the slope of the latency ratio plotted against vt Thisbehavior echoes the observation we made in the previousparagraph with extremely loose requirements Drongo doesnot sufficiently filter out poor-performing subnets from itsset of candidates Intersetingly with a strictvf (closer to 10)the slope changes and the average latency ratio decreasesas we raise the valley threshold This is because if Drongois too strict it filters out too many potential candidates forsubnet assimilation Second we see that the curve eventuallybends upward with high vt values indicating that even thevt can be too lenient The results show that the minimumaverage latency ratio of 09482 (y-axis) is achieved for thevalley threshold of 095 (x-axis) Thus with vf == 10 andvt == 095 Drongo produces its maximum performancegains averaging 518 (10 - 09482) across all of our testedclients By balancing these parameters Drongo filters outsubnets that offer the least benefit subnets where valleysseldom occur and subnets where valleys tend to be shallow

Figure 8 plots the average latency ratio in the samemanneras Figure 7 yet only for queries where Drongo applied sub-net assimilation First the figure again confirms that lowervalley frequency parameters degrade Drongorsquos performanceSecond paired with our knowledge of Figure 7 it becomesclear that as valley threshold decreases the number of la-tency valleys shrinks while the quality of the remainingvalleys becomes more potent This is why the latency ratiodecreases as vt decreases However if the threshold is too

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

02 04 06 08 10valley threshold

00

02

04

06

08

fract

ion

clie

nts

affe

cted

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 9 Percentage of clients where subnet assimila-tion was performed

02 04 06 08 10valley threshold

085

090

095

100

late

ncy

ratio

Google(08)CloudFront(08)

Alibaba(04)CDNetworks(10)

ChinaNetCtr(06)CubeCDN(10)

Figure 10 Per-provider system performance for allqueries Optimal vf is set for each provider and notedin parentheses

small the number of available valleys becomes so small thatthe performance becomes unpredictable ie outlier behaviorcan dominate causing the spike in average latency ratios forvalley thresholds under 02

Finally Figure 9 plots the percent of clients for whichDrongo performed subnet assimilation for at least one providerNecessarily the less strict the frequency constraint is (vf ge

02 is the least strict constraint) the more frequently Drongoacts As shown in Figure 9 6993 of our clients were affectedby Drongo with vf and vt set at the the peak aggregate per-formance values (10 and 095 respectively) found above

Summary In this section we selected parametersvf = 1and vt = 095 where Drongo reaches its peak aggregateperformance gains of 518

52 Per-Provider PerformanceIn the previous subsection we used a constant parameter setacross all six providers in our analysis In this section weanalyze Drongo on a per-provider basis By choosing an op-timal parameter for each provider as we do below Drongorsquosaggregate performance gains increases to 585 In Figure 10

Google

(0807)CloudFront

(0808) Alibaba

(0409)

CDNetworks

(1009)ChinaNetCtr

(06085)CubeCDN

(10095)

02040608101214

late

ncy

ratio

Figure 11 Per-provider system performance forqueries where subnet assimilation was applied Paren-theses contain optimal values for each respectiveprovider formatted (vf vt )

we show how Drongo impacts the average performance ofeach provider while setting valley frequency to each individ-ual providerrsquos respective optimal value (noted in parenthesesunder each provider name) Note that for most providersthe individual optimal valley frequency lies near the valueobtained in Section 51 (10) While the intricacies of eachproviderrsquos behavior clearly hinge on opaque characteristicsof their respective networks we offer several explanations forthe observed performance CDNetworks which sees smallgains across all valley threshold values further validates thehypothesis proposed in Section 323 regarding anycast net-works Meanwhile Google the largest provider of our seton a global scale also experienced the second highest peakaverage gains from Drongo is not well served by Google itis likely that there exist nearby subnets being directed to dif-ferent replicas In other words Drongo has great opportunityfor improving CDNs that use fine grained subnet mapping

Finally Figure 11 showsDrongorsquos performance per-providerexclusively in scenarios when it applies subnet assimilationComparing to the results shown in Figure 6 for the PlanetLab experiments we observe significant differences Most no-tably the latency valley ratios are much smaller in Figure 11than in Figure 6 For example the Googlersquos median latencyratio is close to 10 in Figure 6 indicating insigificant gainsOn the contrary Figure 11 Googlersquos median latency ratio isaround 05 implying gains of 50 (10 minus 05) and up to anorder of magnitude in edge cases Considering all providersDrongo-influenced replica selections are on average 2489better performing than Drongo-free selectionsThere are three reasons for this behavior First Figure 6

shows the lower-bound system performance for PlanetLab asthe best CR is always used for comparison Such a restrictionis not imposed in the the RIPE Atlas scenario Second theRIPE set is more diverse and includes widely distributed end-points thus allowing CDNs greater opportunity for subnetmapping error Third contrary to the Planet Lab scenario

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

where we used all the valleys under the valley threshold of10 here we use optimal vt values such that Drongo suc-cessfully filters out most ldquoshallow valleys that would diluteperformance gains

6 RELATEDWORKGiven a large number of CDNs with vastly different deploy-ment and performance in different territories CDN brokeringhas emerged as a way to improve user-perceived CDNsrsquo per-formance eg [7 34] By monitoring the performance ofmultiple CDNs brokers are capable of determining whichparticular CDN works better for a particular client at a pointin time Necessarily this approach requires content providersto contract with multiple CDNs to host and distribute theircontent Contrary to CDN brokering Drongo manages toimprove performance of each individual CDN Still it workscompletely independently from CDNs and it is readily de-ployable on the clients Moreover Drongo is completelycompatible with CDN brokering since it in principle hasthe capacity to improve the performance of whichever CDNis selected by a broker

There has been tremendous effort expended in the ongoingbattle to make web pages load faster improve streamingperformance etc As a result many advanced client-basedprotocols and systems have emerged from both industry(eg QUIC SPDY and HTTP2) and academia [24 41 50] inthe Web domain and likewise in streaming eg [37] WhileDrongo is also a client-based system it is generic and helpsspeed-up all CDN-hosted content By reducing the CDNlatency experienced by end users it systematically improvesall the above systems and protocols which in turn helps allassociated applicationsRecent work has demonstrated that injecting fake infor-

mation can help protocols achieve better performance Oneexample is routing [49] where fake nodes and links are intro-duced into an underlying linkstate routing protocol so thatrouters compute their own forwarding tables based on theaugmented topology Similar ideas are shown useful in thecontext of traffic engineering [29] where robust and efficientnetwork utilization is accomplished via carefully dissemi-nated bogus information (ldquolies) While similar in spirit tothese approaches Drongo operates in a completely differentdomain aiming to address the CDN pitfalls In addition tousing ldquofake (source subnet) information in order to improveCDN decisions Drongo also invests efforts in discoveringvalley-prone subnets and determining conditions when usingthem is beneficial

7 DISCUSSIONA mass deployment of Drongo is non-trivial and carrieswith it some complexities that deserve careful consideration

There is some concern that maintainence of CDN allocationpolicies may still be compromised despite our efforts to re-spect their mechanisms in Drongorsquos design We propose thata broadly deployed form of Drongo could be carefully tunedto effect only the most extreme cases Figure 11 shows thatsubnet assimilation carries some risk of a loss in performanceso it is in clientsrsquo best interests to apply it conservativelyFirst we know from Figure 9 that the number of clients us-ing assimilated subnets can be significantly throttled withstrict parameteres Further in todayrsquos Internet many clientsare already free to choose arbitrary LDNS servers ([25 36])Many of these servers do not make use of the client subnetoption at all thus rendering the nature of their side-effects mdashpotentially disrupting policies enforced only in DNS mdash equiv-alent to that of Drongo It is difficult to say whether or notDrongorsquos ultimate impact on CDN policies would be anymore sigificant than the presence of such clients

Mass adoption also raises scalability concerns particularlyregarding the amount of measurement traffic being sent intothe network To keep the number of measurements smallwhile ensuring their freshness a distributed peer-to-peercomponent where clients in the same subnet share trial datacould be incorporated into Drongorsquos design We leave thismodification for future work

8 CONCLUSIONSIn this paper we proposed the first approach that enablesclients to actively measure CDNs and effectively improvetheir replica selection decisions without requiring any changesto the CDNs and while respecting CDNsrsquo policies We haveintroduced and explored latency valleys scenarios wherereplicas suggested to upstream subnets outperform thoseprovided to a clientrsquos own subnet We have asserted thatlatency valleys are common phenomena and we have foundthem across all the CDNs we have tested in 26-76 of routesWe showed that valley-prone upstream subnets are easilyfound from the client are simple to identify with few andinfrequent measurements and once found are persistentover timescales of daysWe have introduced Drongo a client-side system that

leverages the performance gains of latency valleys by iden-tifying valley-prone subnets Our measurements show thatDrongo can improve requestsrsquo latency by up to an orderof magnitude Moreover Drongo achieves this with excep-tionally low overhead a mere 5 measurements suffice fortimescales of days Using experimentally derived parame-ters Drongo affects the replica selection of requests made by6993 of clients improving their latency by 2489 in themedian case Drongorsquos significant impact on these requeststranslates into an overall improvement in client-perceivedaggregate CDN performance

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 7: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Table 1 Detailed findings for each provider based on PlanetLab data

Provider ValleyOverall

Avg Valleysper Route

Routeswith Valley

Hop-Client Pairsw Valley Freq gt05

Google 2024 1641 5330 1098Amazon CloudFront 1402 872 2582 1000Alibaba 3368 3594 7583 3097CDNetworks 1561 2441 7308 1409ChinaNetCenter 2742 1426 3810 1674CubeCDN 3858 1795 2549 2632

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(a) ping

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

sGoogle

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(b) total download time

0 0 0 2 0 4 0 6 0 8 1 0valley frequency

0 4

0 5

0 6

0 7

0 8

0 9

1 0

CD

F of

hop-c

lient

pair

s

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

(c) total post-caching download time

Figure 4 CDFs of clients by valley frequency In 4a the subnet-response measurements are ping times averagedfrom bursts of 3 back to back pings In 4b the subnet-response measurements are total download times on firstattempts while in 4c the subnet-response measurements are total download times on consecutive attempts (re-peated downloads that take place immediately after first attempt to account for the potential impact of caching)Downloads were performed by using curl where we set an IP as a destination and set the domain as the HOSTattribute Measurements for 4b and 4c were performed back-to-back so that 4c reflects download times with pre-sumably primed caches

A frequency of 05 would imply that in half the trials per-formed a valley was found in said hop For each providerwe plot the valley frequency for each hop-client pair acrossour PlanetLab dataset

Figure 4 shows the results Note that Figures 4b and 4c usethe total download time and the post-caching total downloadtime respectively for the subnet measurements as opposedto pings2 We further note that their results closely follow thoseobtained using pings as in Figure 4a For simplicity and dueto download limitations of our RIPE testbed we revert toonly using pings for the remainder of this paperFigure 4 shows that approximately 5-20 of hop-client

pairs resulted in valleys 100 of the time (see y-axis in thefigures for x=10) for every trial across the entire three daytest period Such hops will be easy to find given a largeenough dataset However of more interest to us are hop-client pairs that inconsistently incur valleys For exampleperhaps if valleys can be found 50 of the time for some hop-client pair mdash ie a valley frequency of 05 mdash that hop may bea sufficiently persistent producer of valleys for us to reliablyexpect good replica choices This would provide our system2We fetched png and js files 1kB ndash 1MB large hosted at the CDNs

with more opportunities to employ subnet assimilation andideally improve performance Column 5 ( Hop-Client Pairswith Valley Frequency gt 05) of Table 1 shows the percentageof hop-client pairs that have valley frequencies (calculatedacross all 45 trials) greater than 05 ie hop-client pairs thatincurred valleys in the majority of their trials

322 Can We Predict Valleys While valley frequencycan tell us how often valleys occurred on a past dataset themetric makes no strong implications about what we canexpect to see for a future dataset To answer this let us con-sider how the latency ratio changes as we increase the timepassed between the trials we compare In addition per ourreasoning in the vf metric we may want to compare a set ofconsecutive trials mdash a window mdash to another consecutive setof the same size rather than only comparing individual trials(which is essentially comparing windows of size 1) Given aset of trials we take a window of size N and compare it toall other windows of size N including those that overlap bytaking the difference in their latency ratios (HRMCRM) Ifwe plot this against the time distance between the windowswe can observe the trend in latency ratios as the distance in

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(a) All client-hop pairs

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(b) Valley at least once

Figure 5 In both figures we compare the change in latency ratio between two trial windows to the distance intime between the two windows In figure a we use all hop client pairs In figure b we restrict our set to pairs thatexperience at least one valley in at least one of their 45 trials

time between windows increases An upward slope wouldimply that the latency ratio is drifting farther apart as thetime gap increases while a horizontal line would imply thatthe change in latency ratio is not varying with time (ie win-dows an hour apart are as similar to each other as windowsdays apart) A jagged or wildly varying line would indicatethat future behavior is unpredictable

Figure 5 plots the latency ratio versus distance in time forseveral window sizes ranging from 1 to 15 In these plots weuse the windowsrsquo median latency ratio values for compari-son In particular in Figure 5a we use our entire dataset ieconsider all hop-client pairs independently of if they ever in-cur latency valleys or not and plot latency ratio differencesover time For example assume that for a client-hop pairHRM = 80 ms at one measurement point while it rises to120 ms at another measurement point Meanwhile assumethatCRM remains 100 ms at both measurement points Thusthe latency ratioHRMCRM changes from 80100 to 120100ie from 08 to 12 The latency ratio difference is 04 Fig-ure 5a shows that the latency ratio difference values bothincrease and vary wildly as windows become more distantThis shows that hop-subnet performance overall is likelyextremely unpredictable We hypothesize this results fromunmapped subnets receiving generic answers as observedin [47] However subnet assimilation requires that we havesome idea of how a subnet will perform in advance

Since valley-prone subnets are of particular interest to usin Figure 5b we reduce our dataset to only include client-hoppairs that have at least one valley occurrence across all 45 ofits trials combined The plotrsquos behavior becomes dramaticallymore stable after applying this simple minor constraint Theeffects of the window size also become apparent in Figure5b Even with a window of size 1 the slope is very small andthe line is nearly flat after over 50 hours two windows are

only 10 to 20 more dissimilar than two windows a singlehour apart For window sizes greater than 5 latency ratiosare often within 5 of each other regardless of their distancein time Going from a window size of 1 to 5 shows drasticimprovement while each following increase in window sizeshows a smaller impact The results clearly show that a clientcan effectively identify valley-prone subnets and predictvalleys with a sufficiently long window size

323 Valley Utility Analysis Figure 6 shows the distribu-tion of the lower bound latency ratios seen by the set of allvalley occurrences for each provider The latency ratio rep-resents a lower bound because we use the minimum latencymeasure for replicas recommended to the client For exampleif the minimum CRM is 100 ms and HRM is 80 ms then thelatency ratio is 08 Thus the closer to 0 the latency ratiosare the larger the gain from subnet assimilation The red lineshows median the box bounds the 25th and 75th percentilesand the whiskers extend up to data points 15 times theinterquartile-range (75th percentile - 25th percentile) abovethe 75th percentile and below the 25th percentile Data pointsbeyond the whiskers if they exist are considered outliersand not shownFigure 6 shows that most of the providers show oppor-

tunities for significant latency reduction From this plot itappears that Amazon CloudFront and ChinaNetCenter offerthe greatest potential for gains With the exception of CD-Networks we see all of our providers have 25th percentilesnear or below a latency ratio of 08 a 20 performance gainWe also observe the diversity of valley ldquodepth For exam-ple we see in Figure 6 that ChinaNetCenterrsquos interquartilerange spans over 40 of the possible value space (between 0and 1) With such a wide variety of gains for 50 of its val-leys it opens the door to being more selective Rather thansimply chasing all valleys we could set strict requirements

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

0 2

0 4

0 6

0 8

1 0

late

ncy

rati

o

Figure 6 Lower bound of latency ratio of all valley oc-currences

tightening our definition of whatrsquos ldquogood enough for subnetassimilation as we evaluate in Section 51

CDNetworksrsquo performance is more tightly bounded as itsinterquartile range covers less than 10 of the value spaceand sits very near to 1 This implies CDNetworks probablyoffers very little opportunity for our technique to improve itsreplica selection We hypothesize that this is a byproduct ofCDNetworks anycast implementation for replica selectionanycast depends more on routing and network propertiesthan DNS and IP selection [27] In addition Googlersquos medianand 75th percentile latency ratios sit very near 10 Howeverby being more selective as described above we may be ableto pursue better valleys in the lower quartiles below themedian We could potentially improve our systemrsquos valley-prone hop selection by filtering out ldquoshallow valleys fromour consideration We demonstrate and discuss the effectsof different levels of selectiveness in Section 51

4 DRONGO SYSTEM OVERVIEWWe introduce Drongo a client-side system that employs sub-net assimilation to speed up CDNs Drongo sits on top of aclientrsquos DNS system In a full deployment Drongo will beinstalled as a LDNS proxy on the client machine where itwould have easy access to the ECS option and DNS responsesit needs to perform trials Drongo is set by the client as itsdefault local DNS resolver and acts as a middle party re-shaping outgoing DNS messages via subnet assimilation andstoring data from incoming DNS responses In our currentimplementation Drongo uses Googlersquos public DNS serviceat 8888 to complete DNS queriesUpon reception of an outgoing query from the client

Drongo must decide whether to use the clientrsquos own subnetor to perform subnet assimilation with some known alterna-tive subnet If Drongo has sufficient data for that subnet incombination with that domain it makes a decision whetheror not to use that subnet for the name resolution If Drongolacks sufficient data it issues an ordinary query (using theclientrsquos own subnet for ECS)

41 Window SizeWe now face the question What is a sufficient amount ofdata needed by Drongo to ensure quality subnet assimila-tion decisions We choose to measure the data ldquoquantityby the number of trials for some subnet where the subnetis obtained via traceroutes performed during times whenthe clientrsquos network was idle A sufficient quantity must beenough trials to fill some predetermined window size As weobserved in Section 321 the marginal benefit of increasingthe window size decreases with each additional trial To keepstorage needs low while also obtaining most of the benefitof a larger window we set Drongorsquos window size at 5

42 Data CollectionDrongo must execute trials defined in Section 312 in orderto fill its window and collect sufficient data As demonstratedin Figure 5b when these trials occur is of little to no signifi-cance In our experiments we perform our trials at randomlysampled intervals our trial spacing varies from minutes todays with a tendency toward being near an hour apart Thissporadic spacing parallels the variety of timings we expect tohappen on a real client the client may be online at randomunpredictable times for unpredictable lengths of time

43 Decision MakingHere we detail Drongorsquos logic assuming it has sufficient data(a full window) with which to make a decision about a partic-ular subnet For some domain Drongo must decide whethera subnet is sufficiently valley-prone for subnet assimilationIf so Drongo will use that subnet for the DNS query if notDrongo will use the clientrsquos own subnet From Figure 5b weknow we can anticipate that future behavior will resemblewhat Drongo sees in its current window if Drongo has seenat least one valley occurrence for the domain of interest fromthe subnet under consideration However in Figure 6 wesee that many valleys offer negligible performance gainswhich might not outweigh the performance risk imposedby subnet assimilation To avoid these potentially high riskhops Drongo may benefit from a more selective system thatrequires a high valley frequency in the training window toallow subnet assimilation We explore the effects of changingthe vf parameter in Section 5

It is possible that for a single domain multiple hop subnetsmay qualify as sufficiently valley-prone for subnet assimi-lation When this occurs Drongo must attempt to choosethe best performing from the set of the qualified hops Todo this Drongo selects the hop subnet with the highest val-ley frequency in its training window in the event of a tieDrongo chooses randomly

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

02 04 06 08 10valley threshold

095

100

105

110

115

120

125

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 7 Average latency ratio of overall system as wevary vf and vt

5 DRONGO EVALUATIONHere we evaluate Drongorsquos ability to intelligently choosewell-mapped hop subnets for subnet assimilation and theperformance gains imposed on clients where subnet assimi-lation is appliedUsing the RIPE Atlas platform [17] we collect a trace of

429 probes spread across 177 countries for 1 month In ourexperiments we use the first portion of a clientrsquos trials as atraining window Following the completion of the trainingwindow we use the remaining trials to test Drongorsquos abilityto select good hops for real time subnet assimilation Eachclient performed 10 trials per provider trials 0 through 4to form its training window and trials 5 through 9 to testthe quality of Drongorsquos decisions In our evaluation Drongoalways selects the first CR from a CR-set and the first HRfrom a HR-set mirroring real world client behavior mdash noreal-time on-the-fly measurements are conducted and alldecisions are based on the existing window

51 Parameter SelectionFigure 7 shows the effects of Drongo on our entire RIPEtracersquos average latency ratio ie we consider all requestsgenerated by clients including those that arenrsquot affected byDrongo The figure plots average latency ratio (shown onthe y axis) as a function of the valley threshold vt shownon the x axis The valley threshold introduced in Section 23determines maximum latency ratio a hop-client pair musthave to classify as a valley occurrence For example whenthe threshold equals 10 all latency valleys are consideredon the other hand when vt is 06 Drongo triggers clientassimilation only for latency valleys that have promise toimprove performance by more than 40 ie latency ratiosmaller than 06 The figure shows 5 curves each represent-ing a different valley frequency parameter varied between02 and 10

Figure 7 shows the average results across all tested clientsWe make several insights If the valley frequency is small

02 04 06 08 10valley threshold

05

10

15

20

25

30

35

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 8 Average latency ratio of cases where subnetassimilation was performed

eg 02 Drongo will unselectively trigger subnet assimila-tion for all valleys that have frequency equal to or largerthan 02 which requires only one valley occurrence for ourwindow size of 5 Conversely as the minimum vf parame-ter increases overall system performance improves mdash thelatency ratio drops below 10 Thus the valley frequency isa strong indicator of valley-proneness further supportingour prior findings from Figure 5b

Meanwhile two more characteristics stand out as we varyvt First the valley frequency parameter can completely al-ter the slope of the latency ratio plotted against vt Thisbehavior echoes the observation we made in the previousparagraph with extremely loose requirements Drongo doesnot sufficiently filter out poor-performing subnets from itsset of candidates Intersetingly with a strictvf (closer to 10)the slope changes and the average latency ratio decreasesas we raise the valley threshold This is because if Drongois too strict it filters out too many potential candidates forsubnet assimilation Second we see that the curve eventuallybends upward with high vt values indicating that even thevt can be too lenient The results show that the minimumaverage latency ratio of 09482 (y-axis) is achieved for thevalley threshold of 095 (x-axis) Thus with vf == 10 andvt == 095 Drongo produces its maximum performancegains averaging 518 (10 - 09482) across all of our testedclients By balancing these parameters Drongo filters outsubnets that offer the least benefit subnets where valleysseldom occur and subnets where valleys tend to be shallow

Figure 8 plots the average latency ratio in the samemanneras Figure 7 yet only for queries where Drongo applied sub-net assimilation First the figure again confirms that lowervalley frequency parameters degrade Drongorsquos performanceSecond paired with our knowledge of Figure 7 it becomesclear that as valley threshold decreases the number of la-tency valleys shrinks while the quality of the remainingvalleys becomes more potent This is why the latency ratiodecreases as vt decreases However if the threshold is too

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

02 04 06 08 10valley threshold

00

02

04

06

08

fract

ion

clie

nts

affe

cted

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 9 Percentage of clients where subnet assimila-tion was performed

02 04 06 08 10valley threshold

085

090

095

100

late

ncy

ratio

Google(08)CloudFront(08)

Alibaba(04)CDNetworks(10)

ChinaNetCtr(06)CubeCDN(10)

Figure 10 Per-provider system performance for allqueries Optimal vf is set for each provider and notedin parentheses

small the number of available valleys becomes so small thatthe performance becomes unpredictable ie outlier behaviorcan dominate causing the spike in average latency ratios forvalley thresholds under 02

Finally Figure 9 plots the percent of clients for whichDrongo performed subnet assimilation for at least one providerNecessarily the less strict the frequency constraint is (vf ge

02 is the least strict constraint) the more frequently Drongoacts As shown in Figure 9 6993 of our clients were affectedby Drongo with vf and vt set at the the peak aggregate per-formance values (10 and 095 respectively) found above

Summary In this section we selected parametersvf = 1and vt = 095 where Drongo reaches its peak aggregateperformance gains of 518

52 Per-Provider PerformanceIn the previous subsection we used a constant parameter setacross all six providers in our analysis In this section weanalyze Drongo on a per-provider basis By choosing an op-timal parameter for each provider as we do below Drongorsquosaggregate performance gains increases to 585 In Figure 10

Google

(0807)CloudFront

(0808) Alibaba

(0409)

CDNetworks

(1009)ChinaNetCtr

(06085)CubeCDN

(10095)

02040608101214

late

ncy

ratio

Figure 11 Per-provider system performance forqueries where subnet assimilation was applied Paren-theses contain optimal values for each respectiveprovider formatted (vf vt )

we show how Drongo impacts the average performance ofeach provider while setting valley frequency to each individ-ual providerrsquos respective optimal value (noted in parenthesesunder each provider name) Note that for most providersthe individual optimal valley frequency lies near the valueobtained in Section 51 (10) While the intricacies of eachproviderrsquos behavior clearly hinge on opaque characteristicsof their respective networks we offer several explanations forthe observed performance CDNetworks which sees smallgains across all valley threshold values further validates thehypothesis proposed in Section 323 regarding anycast net-works Meanwhile Google the largest provider of our seton a global scale also experienced the second highest peakaverage gains from Drongo is not well served by Google itis likely that there exist nearby subnets being directed to dif-ferent replicas In other words Drongo has great opportunityfor improving CDNs that use fine grained subnet mapping

Finally Figure 11 showsDrongorsquos performance per-providerexclusively in scenarios when it applies subnet assimilationComparing to the results shown in Figure 6 for the PlanetLab experiments we observe significant differences Most no-tably the latency valley ratios are much smaller in Figure 11than in Figure 6 For example the Googlersquos median latencyratio is close to 10 in Figure 6 indicating insigificant gainsOn the contrary Figure 11 Googlersquos median latency ratio isaround 05 implying gains of 50 (10 minus 05) and up to anorder of magnitude in edge cases Considering all providersDrongo-influenced replica selections are on average 2489better performing than Drongo-free selectionsThere are three reasons for this behavior First Figure 6

shows the lower-bound system performance for PlanetLab asthe best CR is always used for comparison Such a restrictionis not imposed in the the RIPE Atlas scenario Second theRIPE set is more diverse and includes widely distributed end-points thus allowing CDNs greater opportunity for subnetmapping error Third contrary to the Planet Lab scenario

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

where we used all the valleys under the valley threshold of10 here we use optimal vt values such that Drongo suc-cessfully filters out most ldquoshallow valleys that would diluteperformance gains

6 RELATEDWORKGiven a large number of CDNs with vastly different deploy-ment and performance in different territories CDN brokeringhas emerged as a way to improve user-perceived CDNsrsquo per-formance eg [7 34] By monitoring the performance ofmultiple CDNs brokers are capable of determining whichparticular CDN works better for a particular client at a pointin time Necessarily this approach requires content providersto contract with multiple CDNs to host and distribute theircontent Contrary to CDN brokering Drongo manages toimprove performance of each individual CDN Still it workscompletely independently from CDNs and it is readily de-ployable on the clients Moreover Drongo is completelycompatible with CDN brokering since it in principle hasthe capacity to improve the performance of whichever CDNis selected by a broker

There has been tremendous effort expended in the ongoingbattle to make web pages load faster improve streamingperformance etc As a result many advanced client-basedprotocols and systems have emerged from both industry(eg QUIC SPDY and HTTP2) and academia [24 41 50] inthe Web domain and likewise in streaming eg [37] WhileDrongo is also a client-based system it is generic and helpsspeed-up all CDN-hosted content By reducing the CDNlatency experienced by end users it systematically improvesall the above systems and protocols which in turn helps allassociated applicationsRecent work has demonstrated that injecting fake infor-

mation can help protocols achieve better performance Oneexample is routing [49] where fake nodes and links are intro-duced into an underlying linkstate routing protocol so thatrouters compute their own forwarding tables based on theaugmented topology Similar ideas are shown useful in thecontext of traffic engineering [29] where robust and efficientnetwork utilization is accomplished via carefully dissemi-nated bogus information (ldquolies) While similar in spirit tothese approaches Drongo operates in a completely differentdomain aiming to address the CDN pitfalls In addition tousing ldquofake (source subnet) information in order to improveCDN decisions Drongo also invests efforts in discoveringvalley-prone subnets and determining conditions when usingthem is beneficial

7 DISCUSSIONA mass deployment of Drongo is non-trivial and carrieswith it some complexities that deserve careful consideration

There is some concern that maintainence of CDN allocationpolicies may still be compromised despite our efforts to re-spect their mechanisms in Drongorsquos design We propose thata broadly deployed form of Drongo could be carefully tunedto effect only the most extreme cases Figure 11 shows thatsubnet assimilation carries some risk of a loss in performanceso it is in clientsrsquo best interests to apply it conservativelyFirst we know from Figure 9 that the number of clients us-ing assimilated subnets can be significantly throttled withstrict parameteres Further in todayrsquos Internet many clientsare already free to choose arbitrary LDNS servers ([25 36])Many of these servers do not make use of the client subnetoption at all thus rendering the nature of their side-effects mdashpotentially disrupting policies enforced only in DNS mdash equiv-alent to that of Drongo It is difficult to say whether or notDrongorsquos ultimate impact on CDN policies would be anymore sigificant than the presence of such clients

Mass adoption also raises scalability concerns particularlyregarding the amount of measurement traffic being sent intothe network To keep the number of measurements smallwhile ensuring their freshness a distributed peer-to-peercomponent where clients in the same subnet share trial datacould be incorporated into Drongorsquos design We leave thismodification for future work

8 CONCLUSIONSIn this paper we proposed the first approach that enablesclients to actively measure CDNs and effectively improvetheir replica selection decisions without requiring any changesto the CDNs and while respecting CDNsrsquo policies We haveintroduced and explored latency valleys scenarios wherereplicas suggested to upstream subnets outperform thoseprovided to a clientrsquos own subnet We have asserted thatlatency valleys are common phenomena and we have foundthem across all the CDNs we have tested in 26-76 of routesWe showed that valley-prone upstream subnets are easilyfound from the client are simple to identify with few andinfrequent measurements and once found are persistentover timescales of daysWe have introduced Drongo a client-side system that

leverages the performance gains of latency valleys by iden-tifying valley-prone subnets Our measurements show thatDrongo can improve requestsrsquo latency by up to an orderof magnitude Moreover Drongo achieves this with excep-tionally low overhead a mere 5 measurements suffice fortimescales of days Using experimentally derived parame-ters Drongo affects the replica selection of requests made by6993 of clients improving their latency by 2489 in themedian case Drongorsquos significant impact on these requeststranslates into an overall improvement in client-perceivedaggregate CDN performance

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 8: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(a) All client-hop pairs

10 20 30 40 50distance (hours)

0 0

0 5

1 0

1 5

2 0

2 5

3 0

late

ncy

rati

o d

iffe

rence

Wind 1 Wind 5 Wind 10 Wind 15

(b) Valley at least once

Figure 5 In both figures we compare the change in latency ratio between two trial windows to the distance intime between the two windows In figure a we use all hop client pairs In figure b we restrict our set to pairs thatexperience at least one valley in at least one of their 45 trials

time between windows increases An upward slope wouldimply that the latency ratio is drifting farther apart as thetime gap increases while a horizontal line would imply thatthe change in latency ratio is not varying with time (ie win-dows an hour apart are as similar to each other as windowsdays apart) A jagged or wildly varying line would indicatethat future behavior is unpredictable

Figure 5 plots the latency ratio versus distance in time forseveral window sizes ranging from 1 to 15 In these plots weuse the windowsrsquo median latency ratio values for compari-son In particular in Figure 5a we use our entire dataset ieconsider all hop-client pairs independently of if they ever in-cur latency valleys or not and plot latency ratio differencesover time For example assume that for a client-hop pairHRM = 80 ms at one measurement point while it rises to120 ms at another measurement point Meanwhile assumethatCRM remains 100 ms at both measurement points Thusthe latency ratioHRMCRM changes from 80100 to 120100ie from 08 to 12 The latency ratio difference is 04 Fig-ure 5a shows that the latency ratio difference values bothincrease and vary wildly as windows become more distantThis shows that hop-subnet performance overall is likelyextremely unpredictable We hypothesize this results fromunmapped subnets receiving generic answers as observedin [47] However subnet assimilation requires that we havesome idea of how a subnet will perform in advance

Since valley-prone subnets are of particular interest to usin Figure 5b we reduce our dataset to only include client-hoppairs that have at least one valley occurrence across all 45 ofits trials combined The plotrsquos behavior becomes dramaticallymore stable after applying this simple minor constraint Theeffects of the window size also become apparent in Figure5b Even with a window of size 1 the slope is very small andthe line is nearly flat after over 50 hours two windows are

only 10 to 20 more dissimilar than two windows a singlehour apart For window sizes greater than 5 latency ratiosare often within 5 of each other regardless of their distancein time Going from a window size of 1 to 5 shows drasticimprovement while each following increase in window sizeshows a smaller impact The results clearly show that a clientcan effectively identify valley-prone subnets and predictvalleys with a sufficiently long window size

323 Valley Utility Analysis Figure 6 shows the distribu-tion of the lower bound latency ratios seen by the set of allvalley occurrences for each provider The latency ratio rep-resents a lower bound because we use the minimum latencymeasure for replicas recommended to the client For exampleif the minimum CRM is 100 ms and HRM is 80 ms then thelatency ratio is 08 Thus the closer to 0 the latency ratiosare the larger the gain from subnet assimilation The red lineshows median the box bounds the 25th and 75th percentilesand the whiskers extend up to data points 15 times theinterquartile-range (75th percentile - 25th percentile) abovethe 75th percentile and below the 25th percentile Data pointsbeyond the whiskers if they exist are considered outliersand not shownFigure 6 shows that most of the providers show oppor-

tunities for significant latency reduction From this plot itappears that Amazon CloudFront and ChinaNetCenter offerthe greatest potential for gains With the exception of CD-Networks we see all of our providers have 25th percentilesnear or below a latency ratio of 08 a 20 performance gainWe also observe the diversity of valley ldquodepth For exam-ple we see in Figure 6 that ChinaNetCenterrsquos interquartilerange spans over 40 of the possible value space (between 0and 1) With such a wide variety of gains for 50 of its val-leys it opens the door to being more selective Rather thansimply chasing all valleys we could set strict requirements

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

0 2

0 4

0 6

0 8

1 0

late

ncy

rati

o

Figure 6 Lower bound of latency ratio of all valley oc-currences

tightening our definition of whatrsquos ldquogood enough for subnetassimilation as we evaluate in Section 51

CDNetworksrsquo performance is more tightly bounded as itsinterquartile range covers less than 10 of the value spaceand sits very near to 1 This implies CDNetworks probablyoffers very little opportunity for our technique to improve itsreplica selection We hypothesize that this is a byproduct ofCDNetworks anycast implementation for replica selectionanycast depends more on routing and network propertiesthan DNS and IP selection [27] In addition Googlersquos medianand 75th percentile latency ratios sit very near 10 Howeverby being more selective as described above we may be ableto pursue better valleys in the lower quartiles below themedian We could potentially improve our systemrsquos valley-prone hop selection by filtering out ldquoshallow valleys fromour consideration We demonstrate and discuss the effectsof different levels of selectiveness in Section 51

4 DRONGO SYSTEM OVERVIEWWe introduce Drongo a client-side system that employs sub-net assimilation to speed up CDNs Drongo sits on top of aclientrsquos DNS system In a full deployment Drongo will beinstalled as a LDNS proxy on the client machine where itwould have easy access to the ECS option and DNS responsesit needs to perform trials Drongo is set by the client as itsdefault local DNS resolver and acts as a middle party re-shaping outgoing DNS messages via subnet assimilation andstoring data from incoming DNS responses In our currentimplementation Drongo uses Googlersquos public DNS serviceat 8888 to complete DNS queriesUpon reception of an outgoing query from the client

Drongo must decide whether to use the clientrsquos own subnetor to perform subnet assimilation with some known alterna-tive subnet If Drongo has sufficient data for that subnet incombination with that domain it makes a decision whetheror not to use that subnet for the name resolution If Drongolacks sufficient data it issues an ordinary query (using theclientrsquos own subnet for ECS)

41 Window SizeWe now face the question What is a sufficient amount ofdata needed by Drongo to ensure quality subnet assimila-tion decisions We choose to measure the data ldquoquantityby the number of trials for some subnet where the subnetis obtained via traceroutes performed during times whenthe clientrsquos network was idle A sufficient quantity must beenough trials to fill some predetermined window size As weobserved in Section 321 the marginal benefit of increasingthe window size decreases with each additional trial To keepstorage needs low while also obtaining most of the benefitof a larger window we set Drongorsquos window size at 5

42 Data CollectionDrongo must execute trials defined in Section 312 in orderto fill its window and collect sufficient data As demonstratedin Figure 5b when these trials occur is of little to no signifi-cance In our experiments we perform our trials at randomlysampled intervals our trial spacing varies from minutes todays with a tendency toward being near an hour apart Thissporadic spacing parallels the variety of timings we expect tohappen on a real client the client may be online at randomunpredictable times for unpredictable lengths of time

43 Decision MakingHere we detail Drongorsquos logic assuming it has sufficient data(a full window) with which to make a decision about a partic-ular subnet For some domain Drongo must decide whethera subnet is sufficiently valley-prone for subnet assimilationIf so Drongo will use that subnet for the DNS query if notDrongo will use the clientrsquos own subnet From Figure 5b weknow we can anticipate that future behavior will resemblewhat Drongo sees in its current window if Drongo has seenat least one valley occurrence for the domain of interest fromthe subnet under consideration However in Figure 6 wesee that many valleys offer negligible performance gainswhich might not outweigh the performance risk imposedby subnet assimilation To avoid these potentially high riskhops Drongo may benefit from a more selective system thatrequires a high valley frequency in the training window toallow subnet assimilation We explore the effects of changingthe vf parameter in Section 5

It is possible that for a single domain multiple hop subnetsmay qualify as sufficiently valley-prone for subnet assimi-lation When this occurs Drongo must attempt to choosethe best performing from the set of the qualified hops Todo this Drongo selects the hop subnet with the highest val-ley frequency in its training window in the event of a tieDrongo chooses randomly

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

02 04 06 08 10valley threshold

095

100

105

110

115

120

125

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 7 Average latency ratio of overall system as wevary vf and vt

5 DRONGO EVALUATIONHere we evaluate Drongorsquos ability to intelligently choosewell-mapped hop subnets for subnet assimilation and theperformance gains imposed on clients where subnet assimi-lation is appliedUsing the RIPE Atlas platform [17] we collect a trace of

429 probes spread across 177 countries for 1 month In ourexperiments we use the first portion of a clientrsquos trials as atraining window Following the completion of the trainingwindow we use the remaining trials to test Drongorsquos abilityto select good hops for real time subnet assimilation Eachclient performed 10 trials per provider trials 0 through 4to form its training window and trials 5 through 9 to testthe quality of Drongorsquos decisions In our evaluation Drongoalways selects the first CR from a CR-set and the first HRfrom a HR-set mirroring real world client behavior mdash noreal-time on-the-fly measurements are conducted and alldecisions are based on the existing window

51 Parameter SelectionFigure 7 shows the effects of Drongo on our entire RIPEtracersquos average latency ratio ie we consider all requestsgenerated by clients including those that arenrsquot affected byDrongo The figure plots average latency ratio (shown onthe y axis) as a function of the valley threshold vt shownon the x axis The valley threshold introduced in Section 23determines maximum latency ratio a hop-client pair musthave to classify as a valley occurrence For example whenthe threshold equals 10 all latency valleys are consideredon the other hand when vt is 06 Drongo triggers clientassimilation only for latency valleys that have promise toimprove performance by more than 40 ie latency ratiosmaller than 06 The figure shows 5 curves each represent-ing a different valley frequency parameter varied between02 and 10

Figure 7 shows the average results across all tested clientsWe make several insights If the valley frequency is small

02 04 06 08 10valley threshold

05

10

15

20

25

30

35

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 8 Average latency ratio of cases where subnetassimilation was performed

eg 02 Drongo will unselectively trigger subnet assimila-tion for all valleys that have frequency equal to or largerthan 02 which requires only one valley occurrence for ourwindow size of 5 Conversely as the minimum vf parame-ter increases overall system performance improves mdash thelatency ratio drops below 10 Thus the valley frequency isa strong indicator of valley-proneness further supportingour prior findings from Figure 5b

Meanwhile two more characteristics stand out as we varyvt First the valley frequency parameter can completely al-ter the slope of the latency ratio plotted against vt Thisbehavior echoes the observation we made in the previousparagraph with extremely loose requirements Drongo doesnot sufficiently filter out poor-performing subnets from itsset of candidates Intersetingly with a strictvf (closer to 10)the slope changes and the average latency ratio decreasesas we raise the valley threshold This is because if Drongois too strict it filters out too many potential candidates forsubnet assimilation Second we see that the curve eventuallybends upward with high vt values indicating that even thevt can be too lenient The results show that the minimumaverage latency ratio of 09482 (y-axis) is achieved for thevalley threshold of 095 (x-axis) Thus with vf == 10 andvt == 095 Drongo produces its maximum performancegains averaging 518 (10 - 09482) across all of our testedclients By balancing these parameters Drongo filters outsubnets that offer the least benefit subnets where valleysseldom occur and subnets where valleys tend to be shallow

Figure 8 plots the average latency ratio in the samemanneras Figure 7 yet only for queries where Drongo applied sub-net assimilation First the figure again confirms that lowervalley frequency parameters degrade Drongorsquos performanceSecond paired with our knowledge of Figure 7 it becomesclear that as valley threshold decreases the number of la-tency valleys shrinks while the quality of the remainingvalleys becomes more potent This is why the latency ratiodecreases as vt decreases However if the threshold is too

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

02 04 06 08 10valley threshold

00

02

04

06

08

fract

ion

clie

nts

affe

cted

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 9 Percentage of clients where subnet assimila-tion was performed

02 04 06 08 10valley threshold

085

090

095

100

late

ncy

ratio

Google(08)CloudFront(08)

Alibaba(04)CDNetworks(10)

ChinaNetCtr(06)CubeCDN(10)

Figure 10 Per-provider system performance for allqueries Optimal vf is set for each provider and notedin parentheses

small the number of available valleys becomes so small thatthe performance becomes unpredictable ie outlier behaviorcan dominate causing the spike in average latency ratios forvalley thresholds under 02

Finally Figure 9 plots the percent of clients for whichDrongo performed subnet assimilation for at least one providerNecessarily the less strict the frequency constraint is (vf ge

02 is the least strict constraint) the more frequently Drongoacts As shown in Figure 9 6993 of our clients were affectedby Drongo with vf and vt set at the the peak aggregate per-formance values (10 and 095 respectively) found above

Summary In this section we selected parametersvf = 1and vt = 095 where Drongo reaches its peak aggregateperformance gains of 518

52 Per-Provider PerformanceIn the previous subsection we used a constant parameter setacross all six providers in our analysis In this section weanalyze Drongo on a per-provider basis By choosing an op-timal parameter for each provider as we do below Drongorsquosaggregate performance gains increases to 585 In Figure 10

Google

(0807)CloudFront

(0808) Alibaba

(0409)

CDNetworks

(1009)ChinaNetCtr

(06085)CubeCDN

(10095)

02040608101214

late

ncy

ratio

Figure 11 Per-provider system performance forqueries where subnet assimilation was applied Paren-theses contain optimal values for each respectiveprovider formatted (vf vt )

we show how Drongo impacts the average performance ofeach provider while setting valley frequency to each individ-ual providerrsquos respective optimal value (noted in parenthesesunder each provider name) Note that for most providersthe individual optimal valley frequency lies near the valueobtained in Section 51 (10) While the intricacies of eachproviderrsquos behavior clearly hinge on opaque characteristicsof their respective networks we offer several explanations forthe observed performance CDNetworks which sees smallgains across all valley threshold values further validates thehypothesis proposed in Section 323 regarding anycast net-works Meanwhile Google the largest provider of our seton a global scale also experienced the second highest peakaverage gains from Drongo is not well served by Google itis likely that there exist nearby subnets being directed to dif-ferent replicas In other words Drongo has great opportunityfor improving CDNs that use fine grained subnet mapping

Finally Figure 11 showsDrongorsquos performance per-providerexclusively in scenarios when it applies subnet assimilationComparing to the results shown in Figure 6 for the PlanetLab experiments we observe significant differences Most no-tably the latency valley ratios are much smaller in Figure 11than in Figure 6 For example the Googlersquos median latencyratio is close to 10 in Figure 6 indicating insigificant gainsOn the contrary Figure 11 Googlersquos median latency ratio isaround 05 implying gains of 50 (10 minus 05) and up to anorder of magnitude in edge cases Considering all providersDrongo-influenced replica selections are on average 2489better performing than Drongo-free selectionsThere are three reasons for this behavior First Figure 6

shows the lower-bound system performance for PlanetLab asthe best CR is always used for comparison Such a restrictionis not imposed in the the RIPE Atlas scenario Second theRIPE set is more diverse and includes widely distributed end-points thus allowing CDNs greater opportunity for subnetmapping error Third contrary to the Planet Lab scenario

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

where we used all the valleys under the valley threshold of10 here we use optimal vt values such that Drongo suc-cessfully filters out most ldquoshallow valleys that would diluteperformance gains

6 RELATEDWORKGiven a large number of CDNs with vastly different deploy-ment and performance in different territories CDN brokeringhas emerged as a way to improve user-perceived CDNsrsquo per-formance eg [7 34] By monitoring the performance ofmultiple CDNs brokers are capable of determining whichparticular CDN works better for a particular client at a pointin time Necessarily this approach requires content providersto contract with multiple CDNs to host and distribute theircontent Contrary to CDN brokering Drongo manages toimprove performance of each individual CDN Still it workscompletely independently from CDNs and it is readily de-ployable on the clients Moreover Drongo is completelycompatible with CDN brokering since it in principle hasthe capacity to improve the performance of whichever CDNis selected by a broker

There has been tremendous effort expended in the ongoingbattle to make web pages load faster improve streamingperformance etc As a result many advanced client-basedprotocols and systems have emerged from both industry(eg QUIC SPDY and HTTP2) and academia [24 41 50] inthe Web domain and likewise in streaming eg [37] WhileDrongo is also a client-based system it is generic and helpsspeed-up all CDN-hosted content By reducing the CDNlatency experienced by end users it systematically improvesall the above systems and protocols which in turn helps allassociated applicationsRecent work has demonstrated that injecting fake infor-

mation can help protocols achieve better performance Oneexample is routing [49] where fake nodes and links are intro-duced into an underlying linkstate routing protocol so thatrouters compute their own forwarding tables based on theaugmented topology Similar ideas are shown useful in thecontext of traffic engineering [29] where robust and efficientnetwork utilization is accomplished via carefully dissemi-nated bogus information (ldquolies) While similar in spirit tothese approaches Drongo operates in a completely differentdomain aiming to address the CDN pitfalls In addition tousing ldquofake (source subnet) information in order to improveCDN decisions Drongo also invests efforts in discoveringvalley-prone subnets and determining conditions when usingthem is beneficial

7 DISCUSSIONA mass deployment of Drongo is non-trivial and carrieswith it some complexities that deserve careful consideration

There is some concern that maintainence of CDN allocationpolicies may still be compromised despite our efforts to re-spect their mechanisms in Drongorsquos design We propose thata broadly deployed form of Drongo could be carefully tunedto effect only the most extreme cases Figure 11 shows thatsubnet assimilation carries some risk of a loss in performanceso it is in clientsrsquo best interests to apply it conservativelyFirst we know from Figure 9 that the number of clients us-ing assimilated subnets can be significantly throttled withstrict parameteres Further in todayrsquos Internet many clientsare already free to choose arbitrary LDNS servers ([25 36])Many of these servers do not make use of the client subnetoption at all thus rendering the nature of their side-effects mdashpotentially disrupting policies enforced only in DNS mdash equiv-alent to that of Drongo It is difficult to say whether or notDrongorsquos ultimate impact on CDN policies would be anymore sigificant than the presence of such clients

Mass adoption also raises scalability concerns particularlyregarding the amount of measurement traffic being sent intothe network To keep the number of measurements smallwhile ensuring their freshness a distributed peer-to-peercomponent where clients in the same subnet share trial datacould be incorporated into Drongorsquos design We leave thismodification for future work

8 CONCLUSIONSIn this paper we proposed the first approach that enablesclients to actively measure CDNs and effectively improvetheir replica selection decisions without requiring any changesto the CDNs and while respecting CDNsrsquo policies We haveintroduced and explored latency valleys scenarios wherereplicas suggested to upstream subnets outperform thoseprovided to a clientrsquos own subnet We have asserted thatlatency valleys are common phenomena and we have foundthem across all the CDNs we have tested in 26-76 of routesWe showed that valley-prone upstream subnets are easilyfound from the client are simple to identify with few andinfrequent measurements and once found are persistentover timescales of daysWe have introduced Drongo a client-side system that

leverages the performance gains of latency valleys by iden-tifying valley-prone subnets Our measurements show thatDrongo can improve requestsrsquo latency by up to an orderof magnitude Moreover Drongo achieves this with excep-tionally low overhead a mere 5 measurements suffice fortimescales of days Using experimentally derived parame-ters Drongo affects the replica selection of requests made by6993 of clients improving their latency by 2489 in themedian case Drongorsquos significant impact on these requeststranslates into an overall improvement in client-perceivedaggregate CDN performance

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 9: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

Google

CloudFront

Alibaba

CDNetworks

ChinaNetCtr

CubeCDN

0 2

0 4

0 6

0 8

1 0

late

ncy

rati

o

Figure 6 Lower bound of latency ratio of all valley oc-currences

tightening our definition of whatrsquos ldquogood enough for subnetassimilation as we evaluate in Section 51

CDNetworksrsquo performance is more tightly bounded as itsinterquartile range covers less than 10 of the value spaceand sits very near to 1 This implies CDNetworks probablyoffers very little opportunity for our technique to improve itsreplica selection We hypothesize that this is a byproduct ofCDNetworks anycast implementation for replica selectionanycast depends more on routing and network propertiesthan DNS and IP selection [27] In addition Googlersquos medianand 75th percentile latency ratios sit very near 10 Howeverby being more selective as described above we may be ableto pursue better valleys in the lower quartiles below themedian We could potentially improve our systemrsquos valley-prone hop selection by filtering out ldquoshallow valleys fromour consideration We demonstrate and discuss the effectsof different levels of selectiveness in Section 51

4 DRONGO SYSTEM OVERVIEWWe introduce Drongo a client-side system that employs sub-net assimilation to speed up CDNs Drongo sits on top of aclientrsquos DNS system In a full deployment Drongo will beinstalled as a LDNS proxy on the client machine where itwould have easy access to the ECS option and DNS responsesit needs to perform trials Drongo is set by the client as itsdefault local DNS resolver and acts as a middle party re-shaping outgoing DNS messages via subnet assimilation andstoring data from incoming DNS responses In our currentimplementation Drongo uses Googlersquos public DNS serviceat 8888 to complete DNS queriesUpon reception of an outgoing query from the client

Drongo must decide whether to use the clientrsquos own subnetor to perform subnet assimilation with some known alterna-tive subnet If Drongo has sufficient data for that subnet incombination with that domain it makes a decision whetheror not to use that subnet for the name resolution If Drongolacks sufficient data it issues an ordinary query (using theclientrsquos own subnet for ECS)

41 Window SizeWe now face the question What is a sufficient amount ofdata needed by Drongo to ensure quality subnet assimila-tion decisions We choose to measure the data ldquoquantityby the number of trials for some subnet where the subnetis obtained via traceroutes performed during times whenthe clientrsquos network was idle A sufficient quantity must beenough trials to fill some predetermined window size As weobserved in Section 321 the marginal benefit of increasingthe window size decreases with each additional trial To keepstorage needs low while also obtaining most of the benefitof a larger window we set Drongorsquos window size at 5

42 Data CollectionDrongo must execute trials defined in Section 312 in orderto fill its window and collect sufficient data As demonstratedin Figure 5b when these trials occur is of little to no signifi-cance In our experiments we perform our trials at randomlysampled intervals our trial spacing varies from minutes todays with a tendency toward being near an hour apart Thissporadic spacing parallels the variety of timings we expect tohappen on a real client the client may be online at randomunpredictable times for unpredictable lengths of time

43 Decision MakingHere we detail Drongorsquos logic assuming it has sufficient data(a full window) with which to make a decision about a partic-ular subnet For some domain Drongo must decide whethera subnet is sufficiently valley-prone for subnet assimilationIf so Drongo will use that subnet for the DNS query if notDrongo will use the clientrsquos own subnet From Figure 5b weknow we can anticipate that future behavior will resemblewhat Drongo sees in its current window if Drongo has seenat least one valley occurrence for the domain of interest fromthe subnet under consideration However in Figure 6 wesee that many valleys offer negligible performance gainswhich might not outweigh the performance risk imposedby subnet assimilation To avoid these potentially high riskhops Drongo may benefit from a more selective system thatrequires a high valley frequency in the training window toallow subnet assimilation We explore the effects of changingthe vf parameter in Section 5

It is possible that for a single domain multiple hop subnetsmay qualify as sufficiently valley-prone for subnet assimi-lation When this occurs Drongo must attempt to choosethe best performing from the set of the qualified hops Todo this Drongo selects the hop subnet with the highest val-ley frequency in its training window in the event of a tieDrongo chooses randomly

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

02 04 06 08 10valley threshold

095

100

105

110

115

120

125

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 7 Average latency ratio of overall system as wevary vf and vt

5 DRONGO EVALUATIONHere we evaluate Drongorsquos ability to intelligently choosewell-mapped hop subnets for subnet assimilation and theperformance gains imposed on clients where subnet assimi-lation is appliedUsing the RIPE Atlas platform [17] we collect a trace of

429 probes spread across 177 countries for 1 month In ourexperiments we use the first portion of a clientrsquos trials as atraining window Following the completion of the trainingwindow we use the remaining trials to test Drongorsquos abilityto select good hops for real time subnet assimilation Eachclient performed 10 trials per provider trials 0 through 4to form its training window and trials 5 through 9 to testthe quality of Drongorsquos decisions In our evaluation Drongoalways selects the first CR from a CR-set and the first HRfrom a HR-set mirroring real world client behavior mdash noreal-time on-the-fly measurements are conducted and alldecisions are based on the existing window

51 Parameter SelectionFigure 7 shows the effects of Drongo on our entire RIPEtracersquos average latency ratio ie we consider all requestsgenerated by clients including those that arenrsquot affected byDrongo The figure plots average latency ratio (shown onthe y axis) as a function of the valley threshold vt shownon the x axis The valley threshold introduced in Section 23determines maximum latency ratio a hop-client pair musthave to classify as a valley occurrence For example whenthe threshold equals 10 all latency valleys are consideredon the other hand when vt is 06 Drongo triggers clientassimilation only for latency valleys that have promise toimprove performance by more than 40 ie latency ratiosmaller than 06 The figure shows 5 curves each represent-ing a different valley frequency parameter varied between02 and 10

Figure 7 shows the average results across all tested clientsWe make several insights If the valley frequency is small

02 04 06 08 10valley threshold

05

10

15

20

25

30

35

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 8 Average latency ratio of cases where subnetassimilation was performed

eg 02 Drongo will unselectively trigger subnet assimila-tion for all valleys that have frequency equal to or largerthan 02 which requires only one valley occurrence for ourwindow size of 5 Conversely as the minimum vf parame-ter increases overall system performance improves mdash thelatency ratio drops below 10 Thus the valley frequency isa strong indicator of valley-proneness further supportingour prior findings from Figure 5b

Meanwhile two more characteristics stand out as we varyvt First the valley frequency parameter can completely al-ter the slope of the latency ratio plotted against vt Thisbehavior echoes the observation we made in the previousparagraph with extremely loose requirements Drongo doesnot sufficiently filter out poor-performing subnets from itsset of candidates Intersetingly with a strictvf (closer to 10)the slope changes and the average latency ratio decreasesas we raise the valley threshold This is because if Drongois too strict it filters out too many potential candidates forsubnet assimilation Second we see that the curve eventuallybends upward with high vt values indicating that even thevt can be too lenient The results show that the minimumaverage latency ratio of 09482 (y-axis) is achieved for thevalley threshold of 095 (x-axis) Thus with vf == 10 andvt == 095 Drongo produces its maximum performancegains averaging 518 (10 - 09482) across all of our testedclients By balancing these parameters Drongo filters outsubnets that offer the least benefit subnets where valleysseldom occur and subnets where valleys tend to be shallow

Figure 8 plots the average latency ratio in the samemanneras Figure 7 yet only for queries where Drongo applied sub-net assimilation First the figure again confirms that lowervalley frequency parameters degrade Drongorsquos performanceSecond paired with our knowledge of Figure 7 it becomesclear that as valley threshold decreases the number of la-tency valleys shrinks while the quality of the remainingvalleys becomes more potent This is why the latency ratiodecreases as vt decreases However if the threshold is too

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

02 04 06 08 10valley threshold

00

02

04

06

08

fract

ion

clie

nts

affe

cted

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 9 Percentage of clients where subnet assimila-tion was performed

02 04 06 08 10valley threshold

085

090

095

100

late

ncy

ratio

Google(08)CloudFront(08)

Alibaba(04)CDNetworks(10)

ChinaNetCtr(06)CubeCDN(10)

Figure 10 Per-provider system performance for allqueries Optimal vf is set for each provider and notedin parentheses

small the number of available valleys becomes so small thatthe performance becomes unpredictable ie outlier behaviorcan dominate causing the spike in average latency ratios forvalley thresholds under 02

Finally Figure 9 plots the percent of clients for whichDrongo performed subnet assimilation for at least one providerNecessarily the less strict the frequency constraint is (vf ge

02 is the least strict constraint) the more frequently Drongoacts As shown in Figure 9 6993 of our clients were affectedby Drongo with vf and vt set at the the peak aggregate per-formance values (10 and 095 respectively) found above

Summary In this section we selected parametersvf = 1and vt = 095 where Drongo reaches its peak aggregateperformance gains of 518

52 Per-Provider PerformanceIn the previous subsection we used a constant parameter setacross all six providers in our analysis In this section weanalyze Drongo on a per-provider basis By choosing an op-timal parameter for each provider as we do below Drongorsquosaggregate performance gains increases to 585 In Figure 10

Google

(0807)CloudFront

(0808) Alibaba

(0409)

CDNetworks

(1009)ChinaNetCtr

(06085)CubeCDN

(10095)

02040608101214

late

ncy

ratio

Figure 11 Per-provider system performance forqueries where subnet assimilation was applied Paren-theses contain optimal values for each respectiveprovider formatted (vf vt )

we show how Drongo impacts the average performance ofeach provider while setting valley frequency to each individ-ual providerrsquos respective optimal value (noted in parenthesesunder each provider name) Note that for most providersthe individual optimal valley frequency lies near the valueobtained in Section 51 (10) While the intricacies of eachproviderrsquos behavior clearly hinge on opaque characteristicsof their respective networks we offer several explanations forthe observed performance CDNetworks which sees smallgains across all valley threshold values further validates thehypothesis proposed in Section 323 regarding anycast net-works Meanwhile Google the largest provider of our seton a global scale also experienced the second highest peakaverage gains from Drongo is not well served by Google itis likely that there exist nearby subnets being directed to dif-ferent replicas In other words Drongo has great opportunityfor improving CDNs that use fine grained subnet mapping

Finally Figure 11 showsDrongorsquos performance per-providerexclusively in scenarios when it applies subnet assimilationComparing to the results shown in Figure 6 for the PlanetLab experiments we observe significant differences Most no-tably the latency valley ratios are much smaller in Figure 11than in Figure 6 For example the Googlersquos median latencyratio is close to 10 in Figure 6 indicating insigificant gainsOn the contrary Figure 11 Googlersquos median latency ratio isaround 05 implying gains of 50 (10 minus 05) and up to anorder of magnitude in edge cases Considering all providersDrongo-influenced replica selections are on average 2489better performing than Drongo-free selectionsThere are three reasons for this behavior First Figure 6

shows the lower-bound system performance for PlanetLab asthe best CR is always used for comparison Such a restrictionis not imposed in the the RIPE Atlas scenario Second theRIPE set is more diverse and includes widely distributed end-points thus allowing CDNs greater opportunity for subnetmapping error Third contrary to the Planet Lab scenario

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

where we used all the valleys under the valley threshold of10 here we use optimal vt values such that Drongo suc-cessfully filters out most ldquoshallow valleys that would diluteperformance gains

6 RELATEDWORKGiven a large number of CDNs with vastly different deploy-ment and performance in different territories CDN brokeringhas emerged as a way to improve user-perceived CDNsrsquo per-formance eg [7 34] By monitoring the performance ofmultiple CDNs brokers are capable of determining whichparticular CDN works better for a particular client at a pointin time Necessarily this approach requires content providersto contract with multiple CDNs to host and distribute theircontent Contrary to CDN brokering Drongo manages toimprove performance of each individual CDN Still it workscompletely independently from CDNs and it is readily de-ployable on the clients Moreover Drongo is completelycompatible with CDN brokering since it in principle hasthe capacity to improve the performance of whichever CDNis selected by a broker

There has been tremendous effort expended in the ongoingbattle to make web pages load faster improve streamingperformance etc As a result many advanced client-basedprotocols and systems have emerged from both industry(eg QUIC SPDY and HTTP2) and academia [24 41 50] inthe Web domain and likewise in streaming eg [37] WhileDrongo is also a client-based system it is generic and helpsspeed-up all CDN-hosted content By reducing the CDNlatency experienced by end users it systematically improvesall the above systems and protocols which in turn helps allassociated applicationsRecent work has demonstrated that injecting fake infor-

mation can help protocols achieve better performance Oneexample is routing [49] where fake nodes and links are intro-duced into an underlying linkstate routing protocol so thatrouters compute their own forwarding tables based on theaugmented topology Similar ideas are shown useful in thecontext of traffic engineering [29] where robust and efficientnetwork utilization is accomplished via carefully dissemi-nated bogus information (ldquolies) While similar in spirit tothese approaches Drongo operates in a completely differentdomain aiming to address the CDN pitfalls In addition tousing ldquofake (source subnet) information in order to improveCDN decisions Drongo also invests efforts in discoveringvalley-prone subnets and determining conditions when usingthem is beneficial

7 DISCUSSIONA mass deployment of Drongo is non-trivial and carrieswith it some complexities that deserve careful consideration

There is some concern that maintainence of CDN allocationpolicies may still be compromised despite our efforts to re-spect their mechanisms in Drongorsquos design We propose thata broadly deployed form of Drongo could be carefully tunedto effect only the most extreme cases Figure 11 shows thatsubnet assimilation carries some risk of a loss in performanceso it is in clientsrsquo best interests to apply it conservativelyFirst we know from Figure 9 that the number of clients us-ing assimilated subnets can be significantly throttled withstrict parameteres Further in todayrsquos Internet many clientsare already free to choose arbitrary LDNS servers ([25 36])Many of these servers do not make use of the client subnetoption at all thus rendering the nature of their side-effects mdashpotentially disrupting policies enforced only in DNS mdash equiv-alent to that of Drongo It is difficult to say whether or notDrongorsquos ultimate impact on CDN policies would be anymore sigificant than the presence of such clients

Mass adoption also raises scalability concerns particularlyregarding the amount of measurement traffic being sent intothe network To keep the number of measurements smallwhile ensuring their freshness a distributed peer-to-peercomponent where clients in the same subnet share trial datacould be incorporated into Drongorsquos design We leave thismodification for future work

8 CONCLUSIONSIn this paper we proposed the first approach that enablesclients to actively measure CDNs and effectively improvetheir replica selection decisions without requiring any changesto the CDNs and while respecting CDNsrsquo policies We haveintroduced and explored latency valleys scenarios wherereplicas suggested to upstream subnets outperform thoseprovided to a clientrsquos own subnet We have asserted thatlatency valleys are common phenomena and we have foundthem across all the CDNs we have tested in 26-76 of routesWe showed that valley-prone upstream subnets are easilyfound from the client are simple to identify with few andinfrequent measurements and once found are persistentover timescales of daysWe have introduced Drongo a client-side system that

leverages the performance gains of latency valleys by iden-tifying valley-prone subnets Our measurements show thatDrongo can improve requestsrsquo latency by up to an orderof magnitude Moreover Drongo achieves this with excep-tionally low overhead a mere 5 measurements suffice fortimescales of days Using experimentally derived parame-ters Drongo affects the replica selection of requests made by6993 of clients improving their latency by 2489 in themedian case Drongorsquos significant impact on these requeststranslates into an overall improvement in client-perceivedaggregate CDN performance

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 10: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

02 04 06 08 10valley threshold

095

100

105

110

115

120

125

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 7 Average latency ratio of overall system as wevary vf and vt

5 DRONGO EVALUATIONHere we evaluate Drongorsquos ability to intelligently choosewell-mapped hop subnets for subnet assimilation and theperformance gains imposed on clients where subnet assimi-lation is appliedUsing the RIPE Atlas platform [17] we collect a trace of

429 probes spread across 177 countries for 1 month In ourexperiments we use the first portion of a clientrsquos trials as atraining window Following the completion of the trainingwindow we use the remaining trials to test Drongorsquos abilityto select good hops for real time subnet assimilation Eachclient performed 10 trials per provider trials 0 through 4to form its training window and trials 5 through 9 to testthe quality of Drongorsquos decisions In our evaluation Drongoalways selects the first CR from a CR-set and the first HRfrom a HR-set mirroring real world client behavior mdash noreal-time on-the-fly measurements are conducted and alldecisions are based on the existing window

51 Parameter SelectionFigure 7 shows the effects of Drongo on our entire RIPEtracersquos average latency ratio ie we consider all requestsgenerated by clients including those that arenrsquot affected byDrongo The figure plots average latency ratio (shown onthe y axis) as a function of the valley threshold vt shownon the x axis The valley threshold introduced in Section 23determines maximum latency ratio a hop-client pair musthave to classify as a valley occurrence For example whenthe threshold equals 10 all latency valleys are consideredon the other hand when vt is 06 Drongo triggers clientassimilation only for latency valleys that have promise toimprove performance by more than 40 ie latency ratiosmaller than 06 The figure shows 5 curves each represent-ing a different valley frequency parameter varied between02 and 10

Figure 7 shows the average results across all tested clientsWe make several insights If the valley frequency is small

02 04 06 08 10valley threshold

05

10

15

20

25

30

35

late

ncy

ratio

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 8 Average latency ratio of cases where subnetassimilation was performed

eg 02 Drongo will unselectively trigger subnet assimila-tion for all valleys that have frequency equal to or largerthan 02 which requires only one valley occurrence for ourwindow size of 5 Conversely as the minimum vf parame-ter increases overall system performance improves mdash thelatency ratio drops below 10 Thus the valley frequency isa strong indicator of valley-proneness further supportingour prior findings from Figure 5b

Meanwhile two more characteristics stand out as we varyvt First the valley frequency parameter can completely al-ter the slope of the latency ratio plotted against vt Thisbehavior echoes the observation we made in the previousparagraph with extremely loose requirements Drongo doesnot sufficiently filter out poor-performing subnets from itsset of candidates Intersetingly with a strictvf (closer to 10)the slope changes and the average latency ratio decreasesas we raise the valley threshold This is because if Drongois too strict it filters out too many potential candidates forsubnet assimilation Second we see that the curve eventuallybends upward with high vt values indicating that even thevt can be too lenient The results show that the minimumaverage latency ratio of 09482 (y-axis) is achieved for thevalley threshold of 095 (x-axis) Thus with vf == 10 andvt == 095 Drongo produces its maximum performancegains averaging 518 (10 - 09482) across all of our testedclients By balancing these parameters Drongo filters outsubnets that offer the least benefit subnets where valleysseldom occur and subnets where valleys tend to be shallow

Figure 8 plots the average latency ratio in the samemanneras Figure 7 yet only for queries where Drongo applied sub-net assimilation First the figure again confirms that lowervalley frequency parameters degrade Drongorsquos performanceSecond paired with our knowledge of Figure 7 it becomesclear that as valley threshold decreases the number of la-tency valleys shrinks while the quality of the remainingvalleys becomes more potent This is why the latency ratiodecreases as vt decreases However if the threshold is too

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

02 04 06 08 10valley threshold

00

02

04

06

08

fract

ion

clie

nts

affe

cted

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 9 Percentage of clients where subnet assimila-tion was performed

02 04 06 08 10valley threshold

085

090

095

100

late

ncy

ratio

Google(08)CloudFront(08)

Alibaba(04)CDNetworks(10)

ChinaNetCtr(06)CubeCDN(10)

Figure 10 Per-provider system performance for allqueries Optimal vf is set for each provider and notedin parentheses

small the number of available valleys becomes so small thatthe performance becomes unpredictable ie outlier behaviorcan dominate causing the spike in average latency ratios forvalley thresholds under 02

Finally Figure 9 plots the percent of clients for whichDrongo performed subnet assimilation for at least one providerNecessarily the less strict the frequency constraint is (vf ge

02 is the least strict constraint) the more frequently Drongoacts As shown in Figure 9 6993 of our clients were affectedby Drongo with vf and vt set at the the peak aggregate per-formance values (10 and 095 respectively) found above

Summary In this section we selected parametersvf = 1and vt = 095 where Drongo reaches its peak aggregateperformance gains of 518

52 Per-Provider PerformanceIn the previous subsection we used a constant parameter setacross all six providers in our analysis In this section weanalyze Drongo on a per-provider basis By choosing an op-timal parameter for each provider as we do below Drongorsquosaggregate performance gains increases to 585 In Figure 10

Google

(0807)CloudFront

(0808) Alibaba

(0409)

CDNetworks

(1009)ChinaNetCtr

(06085)CubeCDN

(10095)

02040608101214

late

ncy

ratio

Figure 11 Per-provider system performance forqueries where subnet assimilation was applied Paren-theses contain optimal values for each respectiveprovider formatted (vf vt )

we show how Drongo impacts the average performance ofeach provider while setting valley frequency to each individ-ual providerrsquos respective optimal value (noted in parenthesesunder each provider name) Note that for most providersthe individual optimal valley frequency lies near the valueobtained in Section 51 (10) While the intricacies of eachproviderrsquos behavior clearly hinge on opaque characteristicsof their respective networks we offer several explanations forthe observed performance CDNetworks which sees smallgains across all valley threshold values further validates thehypothesis proposed in Section 323 regarding anycast net-works Meanwhile Google the largest provider of our seton a global scale also experienced the second highest peakaverage gains from Drongo is not well served by Google itis likely that there exist nearby subnets being directed to dif-ferent replicas In other words Drongo has great opportunityfor improving CDNs that use fine grained subnet mapping

Finally Figure 11 showsDrongorsquos performance per-providerexclusively in scenarios when it applies subnet assimilationComparing to the results shown in Figure 6 for the PlanetLab experiments we observe significant differences Most no-tably the latency valley ratios are much smaller in Figure 11than in Figure 6 For example the Googlersquos median latencyratio is close to 10 in Figure 6 indicating insigificant gainsOn the contrary Figure 11 Googlersquos median latency ratio isaround 05 implying gains of 50 (10 minus 05) and up to anorder of magnitude in edge cases Considering all providersDrongo-influenced replica selections are on average 2489better performing than Drongo-free selectionsThere are three reasons for this behavior First Figure 6

shows the lower-bound system performance for PlanetLab asthe best CR is always used for comparison Such a restrictionis not imposed in the the RIPE Atlas scenario Second theRIPE set is more diverse and includes widely distributed end-points thus allowing CDNs greater opportunity for subnetmapping error Third contrary to the Planet Lab scenario

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

where we used all the valleys under the valley threshold of10 here we use optimal vt values such that Drongo suc-cessfully filters out most ldquoshallow valleys that would diluteperformance gains

6 RELATEDWORKGiven a large number of CDNs with vastly different deploy-ment and performance in different territories CDN brokeringhas emerged as a way to improve user-perceived CDNsrsquo per-formance eg [7 34] By monitoring the performance ofmultiple CDNs brokers are capable of determining whichparticular CDN works better for a particular client at a pointin time Necessarily this approach requires content providersto contract with multiple CDNs to host and distribute theircontent Contrary to CDN brokering Drongo manages toimprove performance of each individual CDN Still it workscompletely independently from CDNs and it is readily de-ployable on the clients Moreover Drongo is completelycompatible with CDN brokering since it in principle hasthe capacity to improve the performance of whichever CDNis selected by a broker

There has been tremendous effort expended in the ongoingbattle to make web pages load faster improve streamingperformance etc As a result many advanced client-basedprotocols and systems have emerged from both industry(eg QUIC SPDY and HTTP2) and academia [24 41 50] inthe Web domain and likewise in streaming eg [37] WhileDrongo is also a client-based system it is generic and helpsspeed-up all CDN-hosted content By reducing the CDNlatency experienced by end users it systematically improvesall the above systems and protocols which in turn helps allassociated applicationsRecent work has demonstrated that injecting fake infor-

mation can help protocols achieve better performance Oneexample is routing [49] where fake nodes and links are intro-duced into an underlying linkstate routing protocol so thatrouters compute their own forwarding tables based on theaugmented topology Similar ideas are shown useful in thecontext of traffic engineering [29] where robust and efficientnetwork utilization is accomplished via carefully dissemi-nated bogus information (ldquolies) While similar in spirit tothese approaches Drongo operates in a completely differentdomain aiming to address the CDN pitfalls In addition tousing ldquofake (source subnet) information in order to improveCDN decisions Drongo also invests efforts in discoveringvalley-prone subnets and determining conditions when usingthem is beneficial

7 DISCUSSIONA mass deployment of Drongo is non-trivial and carrieswith it some complexities that deserve careful consideration

There is some concern that maintainence of CDN allocationpolicies may still be compromised despite our efforts to re-spect their mechanisms in Drongorsquos design We propose thata broadly deployed form of Drongo could be carefully tunedto effect only the most extreme cases Figure 11 shows thatsubnet assimilation carries some risk of a loss in performanceso it is in clientsrsquo best interests to apply it conservativelyFirst we know from Figure 9 that the number of clients us-ing assimilated subnets can be significantly throttled withstrict parameteres Further in todayrsquos Internet many clientsare already free to choose arbitrary LDNS servers ([25 36])Many of these servers do not make use of the client subnetoption at all thus rendering the nature of their side-effects mdashpotentially disrupting policies enforced only in DNS mdash equiv-alent to that of Drongo It is difficult to say whether or notDrongorsquos ultimate impact on CDN policies would be anymore sigificant than the presence of such clients

Mass adoption also raises scalability concerns particularlyregarding the amount of measurement traffic being sent intothe network To keep the number of measurements smallwhile ensuring their freshness a distributed peer-to-peercomponent where clients in the same subnet share trial datacould be incorporated into Drongorsquos design We leave thismodification for future work

8 CONCLUSIONSIn this paper we proposed the first approach that enablesclients to actively measure CDNs and effectively improvetheir replica selection decisions without requiring any changesto the CDNs and while respecting CDNsrsquo policies We haveintroduced and explored latency valleys scenarios wherereplicas suggested to upstream subnets outperform thoseprovided to a clientrsquos own subnet We have asserted thatlatency valleys are common phenomena and we have foundthem across all the CDNs we have tested in 26-76 of routesWe showed that valley-prone upstream subnets are easilyfound from the client are simple to identify with few andinfrequent measurements and once found are persistentover timescales of daysWe have introduced Drongo a client-side system that

leverages the performance gains of latency valleys by iden-tifying valley-prone subnets Our measurements show thatDrongo can improve requestsrsquo latency by up to an orderof magnitude Moreover Drongo achieves this with excep-tionally low overhead a mere 5 measurements suffice fortimescales of days Using experimentally derived parame-ters Drongo affects the replica selection of requests made by6993 of clients improving their latency by 2489 in themedian case Drongorsquos significant impact on these requeststranslates into an overall improvement in client-perceivedaggregate CDN performance

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 11: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

02 04 06 08 10valley threshold

00

02

04

06

08

fract

ion

clie

nts

affe

cted

vfge02 vfge04 vfge06 vfge08 vfge10

Figure 9 Percentage of clients where subnet assimila-tion was performed

02 04 06 08 10valley threshold

085

090

095

100

late

ncy

ratio

Google(08)CloudFront(08)

Alibaba(04)CDNetworks(10)

ChinaNetCtr(06)CubeCDN(10)

Figure 10 Per-provider system performance for allqueries Optimal vf is set for each provider and notedin parentheses

small the number of available valleys becomes so small thatthe performance becomes unpredictable ie outlier behaviorcan dominate causing the spike in average latency ratios forvalley thresholds under 02

Finally Figure 9 plots the percent of clients for whichDrongo performed subnet assimilation for at least one providerNecessarily the less strict the frequency constraint is (vf ge

02 is the least strict constraint) the more frequently Drongoacts As shown in Figure 9 6993 of our clients were affectedby Drongo with vf and vt set at the the peak aggregate per-formance values (10 and 095 respectively) found above

Summary In this section we selected parametersvf = 1and vt = 095 where Drongo reaches its peak aggregateperformance gains of 518

52 Per-Provider PerformanceIn the previous subsection we used a constant parameter setacross all six providers in our analysis In this section weanalyze Drongo on a per-provider basis By choosing an op-timal parameter for each provider as we do below Drongorsquosaggregate performance gains increases to 585 In Figure 10

Google

(0807)CloudFront

(0808) Alibaba

(0409)

CDNetworks

(1009)ChinaNetCtr

(06085)CubeCDN

(10095)

02040608101214

late

ncy

ratio

Figure 11 Per-provider system performance forqueries where subnet assimilation was applied Paren-theses contain optimal values for each respectiveprovider formatted (vf vt )

we show how Drongo impacts the average performance ofeach provider while setting valley frequency to each individ-ual providerrsquos respective optimal value (noted in parenthesesunder each provider name) Note that for most providersthe individual optimal valley frequency lies near the valueobtained in Section 51 (10) While the intricacies of eachproviderrsquos behavior clearly hinge on opaque characteristicsof their respective networks we offer several explanations forthe observed performance CDNetworks which sees smallgains across all valley threshold values further validates thehypothesis proposed in Section 323 regarding anycast net-works Meanwhile Google the largest provider of our seton a global scale also experienced the second highest peakaverage gains from Drongo is not well served by Google itis likely that there exist nearby subnets being directed to dif-ferent replicas In other words Drongo has great opportunityfor improving CDNs that use fine grained subnet mapping

Finally Figure 11 showsDrongorsquos performance per-providerexclusively in scenarios when it applies subnet assimilationComparing to the results shown in Figure 6 for the PlanetLab experiments we observe significant differences Most no-tably the latency valley ratios are much smaller in Figure 11than in Figure 6 For example the Googlersquos median latencyratio is close to 10 in Figure 6 indicating insigificant gainsOn the contrary Figure 11 Googlersquos median latency ratio isaround 05 implying gains of 50 (10 minus 05) and up to anorder of magnitude in edge cases Considering all providersDrongo-influenced replica selections are on average 2489better performing than Drongo-free selectionsThere are three reasons for this behavior First Figure 6

shows the lower-bound system performance for PlanetLab asthe best CR is always used for comparison Such a restrictionis not imposed in the the RIPE Atlas scenario Second theRIPE set is more diverse and includes widely distributed end-points thus allowing CDNs greater opportunity for subnetmapping error Third contrary to the Planet Lab scenario

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

where we used all the valleys under the valley threshold of10 here we use optimal vt values such that Drongo suc-cessfully filters out most ldquoshallow valleys that would diluteperformance gains

6 RELATEDWORKGiven a large number of CDNs with vastly different deploy-ment and performance in different territories CDN brokeringhas emerged as a way to improve user-perceived CDNsrsquo per-formance eg [7 34] By monitoring the performance ofmultiple CDNs brokers are capable of determining whichparticular CDN works better for a particular client at a pointin time Necessarily this approach requires content providersto contract with multiple CDNs to host and distribute theircontent Contrary to CDN brokering Drongo manages toimprove performance of each individual CDN Still it workscompletely independently from CDNs and it is readily de-ployable on the clients Moreover Drongo is completelycompatible with CDN brokering since it in principle hasthe capacity to improve the performance of whichever CDNis selected by a broker

There has been tremendous effort expended in the ongoingbattle to make web pages load faster improve streamingperformance etc As a result many advanced client-basedprotocols and systems have emerged from both industry(eg QUIC SPDY and HTTP2) and academia [24 41 50] inthe Web domain and likewise in streaming eg [37] WhileDrongo is also a client-based system it is generic and helpsspeed-up all CDN-hosted content By reducing the CDNlatency experienced by end users it systematically improvesall the above systems and protocols which in turn helps allassociated applicationsRecent work has demonstrated that injecting fake infor-

mation can help protocols achieve better performance Oneexample is routing [49] where fake nodes and links are intro-duced into an underlying linkstate routing protocol so thatrouters compute their own forwarding tables based on theaugmented topology Similar ideas are shown useful in thecontext of traffic engineering [29] where robust and efficientnetwork utilization is accomplished via carefully dissemi-nated bogus information (ldquolies) While similar in spirit tothese approaches Drongo operates in a completely differentdomain aiming to address the CDN pitfalls In addition tousing ldquofake (source subnet) information in order to improveCDN decisions Drongo also invests efforts in discoveringvalley-prone subnets and determining conditions when usingthem is beneficial

7 DISCUSSIONA mass deployment of Drongo is non-trivial and carrieswith it some complexities that deserve careful consideration

There is some concern that maintainence of CDN allocationpolicies may still be compromised despite our efforts to re-spect their mechanisms in Drongorsquos design We propose thata broadly deployed form of Drongo could be carefully tunedto effect only the most extreme cases Figure 11 shows thatsubnet assimilation carries some risk of a loss in performanceso it is in clientsrsquo best interests to apply it conservativelyFirst we know from Figure 9 that the number of clients us-ing assimilated subnets can be significantly throttled withstrict parameteres Further in todayrsquos Internet many clientsare already free to choose arbitrary LDNS servers ([25 36])Many of these servers do not make use of the client subnetoption at all thus rendering the nature of their side-effects mdashpotentially disrupting policies enforced only in DNS mdash equiv-alent to that of Drongo It is difficult to say whether or notDrongorsquos ultimate impact on CDN policies would be anymore sigificant than the presence of such clients

Mass adoption also raises scalability concerns particularlyregarding the amount of measurement traffic being sent intothe network To keep the number of measurements smallwhile ensuring their freshness a distributed peer-to-peercomponent where clients in the same subnet share trial datacould be incorporated into Drongorsquos design We leave thismodification for future work

8 CONCLUSIONSIn this paper we proposed the first approach that enablesclients to actively measure CDNs and effectively improvetheir replica selection decisions without requiring any changesto the CDNs and while respecting CDNsrsquo policies We haveintroduced and explored latency valleys scenarios wherereplicas suggested to upstream subnets outperform thoseprovided to a clientrsquos own subnet We have asserted thatlatency valleys are common phenomena and we have foundthem across all the CDNs we have tested in 26-76 of routesWe showed that valley-prone upstream subnets are easilyfound from the client are simple to identify with few andinfrequent measurements and once found are persistentover timescales of daysWe have introduced Drongo a client-side system that

leverages the performance gains of latency valleys by iden-tifying valley-prone subnets Our measurements show thatDrongo can improve requestsrsquo latency by up to an orderof magnitude Moreover Drongo achieves this with excep-tionally low overhead a mere 5 measurements suffice fortimescales of days Using experimentally derived parame-ters Drongo affects the replica selection of requests made by6993 of clients improving their latency by 2489 in themedian case Drongorsquos significant impact on these requeststranslates into an overall improvement in client-perceivedaggregate CDN performance

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 12: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

where we used all the valleys under the valley threshold of10 here we use optimal vt values such that Drongo suc-cessfully filters out most ldquoshallow valleys that would diluteperformance gains

6 RELATEDWORKGiven a large number of CDNs with vastly different deploy-ment and performance in different territories CDN brokeringhas emerged as a way to improve user-perceived CDNsrsquo per-formance eg [7 34] By monitoring the performance ofmultiple CDNs brokers are capable of determining whichparticular CDN works better for a particular client at a pointin time Necessarily this approach requires content providersto contract with multiple CDNs to host and distribute theircontent Contrary to CDN brokering Drongo manages toimprove performance of each individual CDN Still it workscompletely independently from CDNs and it is readily de-ployable on the clients Moreover Drongo is completelycompatible with CDN brokering since it in principle hasthe capacity to improve the performance of whichever CDNis selected by a broker

There has been tremendous effort expended in the ongoingbattle to make web pages load faster improve streamingperformance etc As a result many advanced client-basedprotocols and systems have emerged from both industry(eg QUIC SPDY and HTTP2) and academia [24 41 50] inthe Web domain and likewise in streaming eg [37] WhileDrongo is also a client-based system it is generic and helpsspeed-up all CDN-hosted content By reducing the CDNlatency experienced by end users it systematically improvesall the above systems and protocols which in turn helps allassociated applicationsRecent work has demonstrated that injecting fake infor-

mation can help protocols achieve better performance Oneexample is routing [49] where fake nodes and links are intro-duced into an underlying linkstate routing protocol so thatrouters compute their own forwarding tables based on theaugmented topology Similar ideas are shown useful in thecontext of traffic engineering [29] where robust and efficientnetwork utilization is accomplished via carefully dissemi-nated bogus information (ldquolies) While similar in spirit tothese approaches Drongo operates in a completely differentdomain aiming to address the CDN pitfalls In addition tousing ldquofake (source subnet) information in order to improveCDN decisions Drongo also invests efforts in discoveringvalley-prone subnets and determining conditions when usingthem is beneficial

7 DISCUSSIONA mass deployment of Drongo is non-trivial and carrieswith it some complexities that deserve careful consideration

There is some concern that maintainence of CDN allocationpolicies may still be compromised despite our efforts to re-spect their mechanisms in Drongorsquos design We propose thata broadly deployed form of Drongo could be carefully tunedto effect only the most extreme cases Figure 11 shows thatsubnet assimilation carries some risk of a loss in performanceso it is in clientsrsquo best interests to apply it conservativelyFirst we know from Figure 9 that the number of clients us-ing assimilated subnets can be significantly throttled withstrict parameteres Further in todayrsquos Internet many clientsare already free to choose arbitrary LDNS servers ([25 36])Many of these servers do not make use of the client subnetoption at all thus rendering the nature of their side-effects mdashpotentially disrupting policies enforced only in DNS mdash equiv-alent to that of Drongo It is difficult to say whether or notDrongorsquos ultimate impact on CDN policies would be anymore sigificant than the presence of such clients

Mass adoption also raises scalability concerns particularlyregarding the amount of measurement traffic being sent intothe network To keep the number of measurements smallwhile ensuring their freshness a distributed peer-to-peercomponent where clients in the same subnet share trial datacould be incorporated into Drongorsquos design We leave thismodification for future work

8 CONCLUSIONSIn this paper we proposed the first approach that enablesclients to actively measure CDNs and effectively improvetheir replica selection decisions without requiring any changesto the CDNs and while respecting CDNsrsquo policies We haveintroduced and explored latency valleys scenarios wherereplicas suggested to upstream subnets outperform thoseprovided to a clientrsquos own subnet We have asserted thatlatency valleys are common phenomena and we have foundthem across all the CDNs we have tested in 26-76 of routesWe showed that valley-prone upstream subnets are easilyfound from the client are simple to identify with few andinfrequent measurements and once found are persistentover timescales of daysWe have introduced Drongo a client-side system that

leverages the performance gains of latency valleys by iden-tifying valley-prone subnets Our measurements show thatDrongo can improve requestsrsquo latency by up to an orderof magnitude Moreover Drongo achieves this with excep-tionally low overhead a mere 5 measurements suffice fortimescales of days Using experimentally derived parame-ters Drongo affects the replica selection of requests made by6993 of clients improving their latency by 2489 in themedian case Drongorsquos significant impact on these requeststranslates into an overall improvement in client-perceivedaggregate CDN performance

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 13: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

Drongo CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea

REFERENCES[1] 2016 Akamai (2016) httpwwwakamaicom[2] 2016 Alexa Top Sites (2016) httpsawsamazoncom

alexa-top-sites[3] 2016 Alibaba Cloud CDN (2016) httpsintlaliyuncomproductcdn[4] 2016 Amazon CloudFront Ű Content Delivery Network (CDN) (2016)

httpsawsamazoncomcloudfront[5] 2016 Amazon Route 53 (2016) httpsawsamazoncomroute53[6] 2016 AWS Global Infrastructure (2016) httpsawsamazoncom

about-awsglobal-infrastructure[7] 2016 Cedexis (2016) httpwwwcedexiscom[8] 2016 Conviva (2016) httpwwwconvivacom[9] 2016 The Cost of Latency (2016) httpperspectivesmvdironacom

200910the-cost-of-latency[10] 2016 GoDaddy Premium DNS (2016) httpswwwgodaddycom

domainsdns-hostingaspx[11] 2016 Google CDN Platform (2016) httpscloudgooglecomcdn

docs[12] 2016 Google Cloud Platform Cloud DNS (2016) httpscloudgoogle

comdns[13] 2016 Google Public DNS (2016) httpsdevelopersgooglecom

speedpublic-dnsdocsusinghl=en[14] 2016 Netdirekt Content Hosting - CDN (2016) httpwwwnetdirekt

comtrcdn-largehtml[15] 2016 Neustar DNS Services (2016) httpswwwneustarbizservices

dns-services[16] 2016 OpenDNS (2016) httpswwwopendnscom[17] 2016 RIPE Atlas - RIPE Network Coordination Centre (2016) https

atlasripenet[18] 2016 Shopzilla faster page load time = 12 percent rev-

enue increase (2016) httpwwwstrangeloopnetworkscomresourcesinfographicsweb-performance-andecommerceshopzilla-faster-pages-12-revenue-increase

[19] 2016 SwedenŠs Greta wants to disrupt the multi-billion CDN market(2016) httpstechcrunchcom20160830greta

[20] 2016 Verisign Managed DNS (2016) httpwwwverisigncomen_USsecurity-servicesdns-managementindexxhtml

[21] 2016 Verizon Digital Media Services (2016) httpswwwverizondigitalmediacom

[22] 2016 Verizon ROUTE Fast Reliable Enterprise-Class Services forDomain Name System (DNS) (2016) httpswwwverizondigitalmediacomplatformroute

[23] Bernhard AgerWolfgangMuumlhlbauer Georgios Smaragdakis and SteveUhlig 2010 Comparing DNS Resolvers in the Wild In Proceedingsof the 10th ACM SIGCOMM Conference on Internet Measurement (IMCrsquo10) ACM 15ndash21 DOIhttpsdoiorg10114518791411879144

[24] Michael Butkiewicz Daimeng Wang Zhe Wu Harsha V Madhyasthaand Vyas Sekar 2015 Klotski Reprioritizing Web Content to ImproveUser Experience on Mobile Devices In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 439ndash453 httpswwwusenixorgconferencensdi15technical-sessionspresentationbutkiewicz

[25] Matt Calder Xun Fan Zi Hu Ethan Katz-Bassett John Heidemannand Ramesh Govindan 2013 Mapping the Expansion of GooglersquosServing Infrastructure In Proceedings of the 2013 Conference on InternetMeasurement Conference (IMC rsquo13) ACM 313ndash326 DOIhttpsdoiorg10114525047302504754

[26] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[27] Matt Calder Ashley Flavel Ethan Katz-Bassett Ratul Mahajan andJitendra Padhye 2015 Analyzing the Performance of an Anycast CDNIn Proceedings of the 2015 ACM Conference on Internet MeasurementConference (IMC rsquo15) ACM 531ndash537 DOIhttpsdoiorg10114528156752815717

[28] F Chen R Sitaraman and M Torres 2015 End-User Mapping NextGeneration Request Routing for Content Delivery In Proceedings ofACM SIGCOMM rsquo15 London UK

[29] Marco Chiesa Gaboe Retvari and Michael Schapira 2016 Lying YourWay to Better Traffic Engineering In Proceedings of the 2016 ACMCoNEXT (CoNEXT rsquo16) ACM

[30] ChinaNetCenter 2016 ChinaNetCenter - Network (2016) httpenchinanetcentercompagestechnologyg2-network-mapphp

[31] Brent Chun David Culler Timothy Roscoe Andy Bavier Larry Peter-son MikeWawrzoniak andMic Bowman 2003 PlanetLab An OverlayTestbed for Broad-coverage Services SIGCOMM Comput CommunRev 33 3 (July 2003) 3ndash12 DOIhttpsdoiorg101145956993956995

[32] C Contavalli W van der Gaast D Lawrence and W Kumari 2016Client Subnet in DNS Queries RFC 7871 RFC Editor

[33] Tobias Flach Ethan Katz-Bassett and Ramesh Govindan 2012 Quan-tifying Violations of Destination-based Forwarding on the Internet InProceedings of the 2012 ACM Conference on Internet Measurement Con-ference (IMC rsquo12) ACM 265ndash272 DOIhttpsdoiorg10114523987762398804

[34] Aditya Ganjam Faisal Siddiqui Jibin Zhan Xi Liu Ion Stoica JunchenJiang Vyas Sekar and Hui Zhang 2015 C3 Internet-Scale ControlPlane for Video Quality Optimization In 12th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 15) USENIX As-sociation Oakland CA 131ndash144 httpswwwusenixorgconferencensdi15technical-sessionspresentationganjam

[35] ChengHuang Ivan Batanov and Jin Li 2012 A Practical Solution to theclient-LDNS Mismatch Problem SIGCOMM Comput Commun Rev 422 (March 2012) 35ndash41 DOIhttpsdoiorg10114521853762185381

[36] Cheng Huang Angela Wang Jin Li and Keith W Ross 2008 Measur-ing and Evaluating Large-scale CDNs Paper Withdrawn at MirosoftrsquosRequest In Proceedings of the 8th ACM SIGCOMM Conference on In-ternet Measurement (IMC rsquo08) ACM 15ndash29 httpdlacmorgcitationcfmid=14525201455517

[37] Te-Yuan Huang Ramesh Johari Nick McKeown Matthew Trunnelland Mark Watson 2014 A Buffer-based Approach to Rate AdaptationEvidence from a Large Video Streaming Service In Proceedings of the2014 ACM Conference on SIGCOMM (SIGCOMM rsquo14) ACM 187ndash198DOIhttpsdoiorg10114526192392626296

[38] Rupa Krishnan Harsha V Madhyastha Sushant Jain Sridhar Srini-vasan Arvind Krishnamurthy Thomas Anderson and Jie Gao 2009Moving Beyond End-to-End Path Information to Optimize CDN Perfor-mance In Internet Measurement Conference (IMC) Chicago IL 190ndash201httpresearchgooglecomarchiveimc191imc191pdf

[39] Zhuoqing Morley Mao Charles D Cranor Fred Douglis Michael Rabi-novich Oliver Spatscheck and Jia Wang 2002 A Precise and EfficientEvaluation of the Proximity BetweenWeb Clients and Their Local DNSServers In Proceedings of the General Track of the Annual Conference onUSENIX Annual Technical Conference (ATEC rsquo02) USENIX AssociationBerkeley CA USA 229ndash242 httpdlacmorgcitationcfmid=647057759950

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References
Page 14: Drongo: Speeding Up CDNs with Subnet Assimilation from the Clientnetworks.cs.northwestern.edu/website/publications/drongo/... · 2018-12-22 · CDNs attempt to improve web and streaming

CoNEXT rsquo17 December 12ndash15 2017 Incheon Republic of Korea M Warrior et al

[40] Matthew K Mukerjee Ilker Nadi Bozkurt Bruce Maggs SrinivasanSeshan and Hui Zhang 2016 The Impact of Brokers on the Futureof Content Delivery In Proceedings of the 15th ACM Workshop on HotTopics in Networks (HotNets rsquo16) ACM 127ndash133 DOIhttpsdoiorg10114530057453005749

[41] Ravi Netravali Ameesh Goyal James Mickens and Hari Balakrish-nan 2016 Polaris Faster Page Loads Using Fine-grained Depen-dency Tracking In 13th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 16) USENIXAssociation Santa ClaraCA httpswwwusenixorgconferencensdi16technical-sessionspresentationnetravali

[42] OpenDNS 2016 A Faster Internet The Global Internet Speedup (2016)httpafasterinternetcom

[43] Jitendra Padhye Victor Firoiu Don Towsley and Jim Kurose 1998Modeling TCP Throughput A Simple Model and Its Empirical Valida-tion In Proceedings of the ACM SIGCOMM rsquo98 Conference on Applica-tions Technologies Architectures and Protocols for Computer Commu-nication (SIGCOMM rsquo98) ACM 303ndash314 DOIhttpsdoiorg101145285237285291

[44] Vern Paxson 1996 End-to-end Routing Behavior in the InternetSIGCOMM Comput Commun Rev 26 4 (Aug 1996) 25ndash38 DOIhttpsdoiorg101145248157248160

[45] Ingmar Poese Benjamin Frank Bernhard Ager Georgios Smaragdakisand Anja Feldmann 2010 Improving Content Delivery Using Provider-aided Distance Information In Proceedings of the 10th ACM SIGCOMMConference on Internet Measurement (IMC rsquo10) ACM 22ndash34 DOIhttpsdoiorg10114518791411879145

[46] Florian Streibelt Jan Boumlttger Nikolaos Chatzis Georgios Smaragdakisand Anja Feldmann 2013 Exploring EDNS-client-subnet Adopters inYour Free Time In Proceedings of IMC rsquo13 (IMC rsquo13)

[47] Ao-Jan Su David R Choffnes Aleksandar Kuzmanovic and Fabiaacuten EBustamante 2006 Drafting Behind Akamai (Travelocity-based De-touring) SIGCOMM Comput Commun Rev 36 4 (Aug 2006) 435ndash446DOIhttpsdoiorg10114511516591159962

[48] Ao-Jan Su andAleksandar Kuzmanovic 2008 ThinningAkamai In Pro-ceedings of the 8th ACM SIGCOMM Conference on Internet Measurement(IMC rsquo08) ACM 29ndash42 DOIhttpsdoiorg10114514525201452525

[49] Stefano Vissicchio Olivier Tilmans Laurent Vanbever and JenniferRexford 2015 Central Control Over Distributed Routing In Proceed-ings of the 2015 ACM Conference on Special Interest Group on DataCommunication (SIGCOMM rsquo15) ACM 43ndash56 DOIhttpsdoiorg10114527859562787497

[50] Xiao Sophia Wang Arvind Krishnamurthy and David Wetherall 2016Speeding up Web Page Loads with Shandian In 13th USENIX Sym-posium on Networked Systems Design and Implementation (NSDI 16)USENIX Association Santa Clara CA 109ndash122 httpswwwusenixorgconferencensdi16technical-sessionspresentationwang

  • Abstract
  • 1 Introduction
  • 2 Premise
    • 21 Background
    • 22 Respecting CDN Policies and Incentives for Adoption
    • 23 Subnet Assimilation
    • 24 Identifying Latency Valleys
      • 3 Exploring Valleys
        • 31 Testing for Valleys
        • 32 Valley Prevalence
          • 4 Drongo System Overview
            • 41 Window Size
            • 42 Data Collection
            • 43 Decision Making
              • 5 Drongo Evaluation
                • 51 Parameter Selection
                • 52 Per-Provider Performance
                  • 6 Related Work
                  • 7 Discussion
                  • 8 Conclusions
                  • References