9
Challenges in Net Neutrality Violation Detection: A Case Study of Wehe Tool and Improvements Vinod S. Khandkar and Manjesh K. Hanawal Industrial Engineering and Operations Research Indian Institute of Technology Bombay, Mumbai, India {vinod.khandkar, mhanawal}@iitb.ac.in Abstract—We consider the problem of detecting deliberate traffic discrimination on the Internet. Given the complex nature of the Internet, detection of deliberate discrimination is not easy to detect, and tools developed so far suffer from various limitations. We study challenges in detecting the violations (focusing on the HTTPS traffic) and discuss possible mitigation approaches. We focus on ‘Wehe,’ the most recent tool developed to detect net-neutrality violations. Wehe hosts traffic from all services of interest in a common server and replays them to mimic the behavior of the traffic from original servers. Despite Wehe’s vast utility and possible influences over policy decisions, its mechanisms are not yet validated by others. In this work, we highlight critical weaknesses in Wehe where its replay traffic is not being correctly classified as intended services by the network middleboxes. We validate this observation using a commercial traffic shaper. We propose a new method in which the SNI parameter is set appropriately in the initial TLS handshake to overcome this weakness. Using commercial traffic shapers, we validate that SNI makes the replay traffic gets correctly classified as the intended traffic by the middleboxes. Our new approach thus provides a more realistic method for detecting neutrality violations of HTTPS traffic. Index Terms—Net neutrality, traffic differentiation, detection tools I. I NTRODUCTION Net neutrality is a guiding principle promoting the “equal” treatment of all packets over the Internet. However, for eco- nomic benefits, ISPs may apply traffic differentiation to a specific service, user, content provider, or any other traffic group on the Internet without making any public declaration. It gives rise to a need to have tools that can detect such malicious activities over the Internet. Traffic differentiation (TD) detection involves the coales- cence of many elements such as end-systems (user-client and server), probing traffic generation, expected network re- sponses, and TD detection algorithms. These are interdepen- dent components or operations. Hence, in developing TD detection tools, one faces challenges such as crafting internet traffic and conditioning measured network response that suits their detection algorithm. Moreover, as HTTPS traffic becomes prevalent, it is unclear what policies the middle-boxes apply for discrimination as payload provides no signatures. Our first goal is to study various challenges in designing a reliable TD detection mechanism for HTTPS traffic. Several methods have been proposed in the literature to detect Network neutrality violations as documented in recent surveys [1], [2]. The literature is rich with various forms of discrimination and its detection, like discrimination of content providers [3], end-users [4], specific services like BitTorrent [5]. Our interest in this work is discrimination of services, specifically on streaming services (both audio and video) which are potential candidates from description due to their commercial values and high bandwidth requirements. Recently several tools are developed to detect discrimina- tion of services like, NANO [6] ChkDiff [7], Netpolice [8], Shaperprobe [9], Packsen [10], Glasnost [11], Bonafide [12], and Wehe [13]. NANO is based on passive measurements, and all others are based on either active or differential probes. Wehe is the latest tool that overcomes many of the drawbacks of the other tools. However, due to the Internet’s complexity, many issues persist that prevent its use as a reliable tool. Wehe [13] follows a client-service architecture in which content of service of interests (Youtube, Netflix, etc.) are copied (or sniffed) from their respective original servers and hosted on a common server (referred to as replay server). To check if a particular service is discriminated, the client accesses the content of that service from the replay server. The server transfers the requested data with timing relations that mimic the original service’s data transfer characteristics over the Internet. The client accesses content from the replay server with and without VPN and compares them to ascertain if there was any difference in the quality. The basic idea of Wehe is that connection over VPN is encapsulated and will not be subjected to any discrimination as the middleboxes cannot classify the traffic correctly. Whereas content accessed without VPN can be classified correctly due to the signatures induced by the timing relations and can be subjected to discrimination policies. Despite the Wehe tool’s vast utility and possible influences over policy decisions, its mechanisms are not yet fully validated by other researchers. We investigate Wehe’s differentiation detection’s perfor- mance on traffic using HTTPS protocol, which is predomi- nantly used by all services for security reasons and to avoid detection. We noticed that Wehe uses port 80 (HTTP) for all communications. Middleboxes process all the traffic on this port as non-encrypted traffic and may not classify them correctly as they will not see any signatures (recall that replayed traffic is HTTPS). On the other hand, when Wehe access traffic over port 443 (HTTP), traffic is not classified as intended traffic as we demonstrate it using a commercial traffic arXiv:2102.04196v2 [cs.CY] 24 Oct 2021

Challenges in Net Neutrality Violation Detection: A Case

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Challenges in Net Neutrality Violation Detection: A Case

Challenges in Net Neutrality Violation Detection: ACase Study of Wehe Tool and Improvements

Vinod S. Khandkar and Manjesh K. HanawalIndustrial Engineering and Operations Research

Indian Institute of Technology Bombay, Mumbai, India{vinod.khandkar, mhanawal}@iitb.ac.in

Abstract—We consider the problem of detecting deliberatetraffic discrimination on the Internet. Given the complex natureof the Internet, detection of deliberate discrimination is noteasy to detect, and tools developed so far suffer from variouslimitations. We study challenges in detecting the violations(focusing on the HTTPS traffic) and discuss possible mitigationapproaches. We focus on ‘Wehe,’ the most recent tool developedto detect net-neutrality violations. Wehe hosts traffic from allservices of interest in a common server and replays them tomimic the behavior of the traffic from original servers. DespiteWehe’s vast utility and possible influences over policy decisions,its mechanisms are not yet validated by others. In this work, wehighlight critical weaknesses in Wehe where its replay traffic isnot being correctly classified as intended services by the networkmiddleboxes. We validate this observation using a commercialtraffic shaper. We propose a new method in which the SNIparameter is set appropriately in the initial TLS handshake toovercome this weakness. Using commercial traffic shapers, wevalidate that SNI makes the replay traffic gets correctly classifiedas the intended traffic by the middleboxes. Our new approachthus provides a more realistic method for detecting neutralityviolations of HTTPS traffic.

Index Terms—Net neutrality, traffic differentiation, detectiontools

I. INTRODUCTION

Net neutrality is a guiding principle promoting the “equal”treatment of all packets over the Internet. However, for eco-nomic benefits, ISPs may apply traffic differentiation to aspecific service, user, content provider, or any other trafficgroup on the Internet without making any public declaration. Itgives rise to a need to have tools that can detect such maliciousactivities over the Internet.

Traffic differentiation (TD) detection involves the coales-cence of many elements such as end-systems (user-clientand server), probing traffic generation, expected network re-sponses, and TD detection algorithms. These are interdepen-dent components or operations. Hence, in developing TDdetection tools, one faces challenges such as crafting internettraffic and conditioning measured network response that suitstheir detection algorithm. Moreover, as HTTPS traffic becomesprevalent, it is unclear what policies the middle-boxes applyfor discrimination as payload provides no signatures. Our firstgoal is to study various challenges in designing a reliable TDdetection mechanism for HTTPS traffic.

Several methods have been proposed in the literature todetect Network neutrality violations as documented in recent

surveys [1], [2]. The literature is rich with various forms ofdiscrimination and its detection, like discrimination of contentproviders [3], end-users [4], specific services like BitTorrent[5]. Our interest in this work is discrimination of services,specifically on streaming services (both audio and video)which are potential candidates from description due to theircommercial values and high bandwidth requirements.

Recently several tools are developed to detect discrimina-tion of services like, NANO [6] ChkDiff [7], Netpolice [8],Shaperprobe [9], Packsen [10], Glasnost [11], Bonafide [12],and Wehe [13]. NANO is based on passive measurements, andall others are based on either active or differential probes.Wehe is the latest tool that overcomes many of the drawbacksof the other tools. However, due to the Internet’s complexity,many issues persist that prevent its use as a reliable tool.

Wehe [13] follows a client-service architecture in whichcontent of service of interests (Youtube, Netflix, etc.) arecopied (or sniffed) from their respective original servers andhosted on a common server (referred to as replay server).To check if a particular service is discriminated, the clientaccesses the content of that service from the replay server.The server transfers the requested data with timing relationsthat mimic the original service’s data transfer characteristicsover the Internet. The client accesses content from the replayserver with and without VPN and compares them to ascertainif there was any difference in the quality.

The basic idea of Wehe is that connection over VPN isencapsulated and will not be subjected to any discrimination asthe middleboxes cannot classify the traffic correctly. Whereascontent accessed without VPN can be classified correctly dueto the signatures induced by the timing relations and can besubjected to discrimination policies. Despite the Wehe tool’svast utility and possible influences over policy decisions, itsmechanisms are not yet fully validated by other researchers.

We investigate Wehe’s differentiation detection’s perfor-mance on traffic using HTTPS protocol, which is predomi-nantly used by all services for security reasons and to avoiddetection. We noticed that Wehe uses port 80 (HTTP) forall communications. Middleboxes process all the traffic onthis port as non-encrypted traffic and may not classify themcorrectly as they will not see any signatures (recall thatreplayed traffic is HTTPS). On the other hand, when Weheaccess traffic over port 443 (HTTP), traffic is not classified asintended traffic as we demonstrate it using a commercial traffic

arX

iv:2

102.

0419

6v2

[cs

.CY

] 2

4 O

ct 2

021

Page 2: Challenges in Net Neutrality Violation Detection: A Case

shaper. As middleboxes do not see traffic from the Wehe’sreplay server as that coming from the original server, theymay not apply intended traffic discrimination policies. ThusWehe may not detect any deliberate discrimination resultingin false negatives.

To overcome the limitations of Wehe for HTTPS traffic, wepropose a new method in which traffic is accessed on port443, but with a modification in the TLS handshake. In theTLS handshake, we explicitly send the Server Name Indication(SNI) corresponding to the actual service. We demonstrate thatthe HTTPS traffic get correctly classified by the middleboxeswith these modifications. Our method ensures the middle-boxes classify traffic from replay servers in the same way theydo from the original servers for each service and apply thesame discriminate policies (if any) on traffic from both replayand original servers. Thus, our method makes it possible todetect discrimination of HTTPS traffic in a realistic setting.Our contributions can be summarized as:

1) We study various challenges associated with detectionof discrimination of HTTPS traffic. We present thecategorization of these challenges based on their source,e.g., as protocol and operational environment.

2) We demonstrate limitations of Wehe in detection of dis-crimination of HTTPS traffic. Specifically, we demon-strate that HTTPS traffic will not be correctly classifiedin the Wehe setup using a commercial traffic shaper.

3) We propose a mechanism in which traffic of services ac-cessed from the replay server is treated as if it originatedfrom the actual servers. Thus traffic from the replayserver also gets subjected to the same discriminatedpolicies applied on the actual service. We

4) We validate that replay traffic of our mechanism istreated in the same way originating from actual serversusing a commercial traffic shaper.

The paper is organizaed as follows. Sec. II provides thenecessary background and related work. Sec. III describes allidentified challenges in measurement setup for TD detection.Sec. IV describes the Wehe tool and its mechanisms in thecontext of identified challenges, Sec. V provides the validationresults. Our proposed method is described in Sec. VI andSec. VII gives the conclusions.

II. RELATED WORK AND BACKGROUND

As net-neutrality regulations were hotly debated in manycountries, several authors have proposed methods to detect anynet neutrality violations or deliberate traffic discriminationsin the Internet. Recent surveys [1], [2] cover all the toolsand techniques used by them. The literature is also richwith various forms of discrimination and its detection, likediscrimination of content providers [3], end-users [4], specificservices like BitTorrent [5]. Our focus in this work is ondetecting discrimination of streaming services (both audio andvideo) that constitute a major portion of internet traffic andconsume significant bandwidth. We discuss some of the toolsand techniques that is most relevant to our work.

Fig. 1. Data replay techniques

A. Existing TD detection tools

There are two commonly used techniques for detectingTD in users’ Internet traffic. One type of approach passivelymonitors traffic [6]. In such cases, the end-result or TD result isnot immediately available to the user. Instead, the tool providesthe aggregated result of traffic differentiation over the givenISP. Another type of detection technique uses specially craftedprobing traffic - called Active probes. It analyses networkresponse to probing traffic to detect any anomaly. [6]–[10]describes measurement setups based on such active probing.It uses traffic parameters such as packet loss, latency, packetsequence, or pattern to identify network operation character-istics or detect anomalies. Some tool uses multiple types ofprobing traffics called active differential probes. While oneprobing traffic type undergoes standard network middle-boxprocessing, the other probing traffic type is supposed to evadeany traffic differentiation by being not amenable to middle-boxprocessing. Typically, in such differential probe methods, onetraffic type mimics characters of traffic of services accessedfrom the original server (like Youtube, Netflix). The other is areference or control traffic. In this method, network responsesto both traffic streams are compared to check if performance ofany stream suffered against the other. [11]–[13] are examplesof such probing techniques. The differential probe methoduses a common server to stores the content of services ofinterest, and this content is collected from the original servers.The common server thus replays the content and mimics thecharacteristics that original servers impart in the transfer oftheir content, like maintaining timing relationships betweendata packets. In the differential probe, traffic from the replayserver must be treated similarly to that treated from the originalservers. When it comes to HTTPS traffic, the current tools donot ensure that traffic from replay servers is treated as thatoriginated from an actual server and our work fills this gap.

B. Traffic replay mechanisms

The traffic replay mechanism mimics the client and server-side behavior for given application data exchange and theunderlying protocol. There are many traffic replay tools avail-able. Tcpreplay [22] is one such replay tool that mimics thetransport layer behavior for the given stream of transport

Page 3: Challenges in Net Neutrality Violation Detection: A Case

(a) Using internet browser (b) Using user-client

Fig. 2. Traffic classification of Gmail video play traffic with direct browser accesses and from user-client (TLS handshake not same as browser based access)

layer packets. Another example of a layer-specific replay isFlowrReplay that runs at the application layer. The layer-specific replay tools are many times protocol dependent. Thetechnique roleplayer proposed in [23] is capable of replayingapplication layer data in a protocol-independent manner. Thereplay layer selection (refer Fig. 1) for traffic replay is crucialas it affects the receiver side’s data collection as well asexpected network response. The TCP layer replay adverselyaffects the traffic analysis as it requires special permission tocollect traffic data for analysis.

III. CHALLENGES IN TD DETECTION MEASUREMENTSETUP DEVELOPMENT

A setup for detecting TD primarily consist of probing trafficgenerator, traffic data capturing system, and TD detectionengine. In this section we describe challenges in engineeringeach of the components.

A. Network responses

The network response to the probing traffic is a fundamentalinput to the TD detection mechanism. The type of networkresponse is dependent on the underlying methodology of thetool. Once fixed, the expected response from the networkchanges with the network configurations. Often, network nodesdo not respond as expected to network management messagesor do not classify the probing traffic in a specific manner.It happens either due to provisions in the associated Internetstandard to deviate from the typical response or due to networkpolicies for network resource management that are proprietaryon which Internet standards do not have any control.

Efficient network resource management is crucial for net-works. It is exercised through various network middle-boxesacross networks using network management practices (TMPs).These devices use either Deep Packet Inspection (DPI) [24]based or Deep Flow Inspection (DFI) [25] based traffic clas-sification for applying TMPs.

Fig. 3. Traffic classification validation

We validated the traffic classification of traffic flows fromthe YouTube server using commercial traffic shaper as shownin Fig. 3. Our validation setup consists of a user-client capableof downloading the content from a given service’s originalserver (e.g. Gmail server). The traffic shaper is placed in-linebetween the user-client and YouTube server for validating thetraffic classification of data traffic flow. The traffic classifi-cation of traffic flow using user-client and accessing it usinginternet browser are recorded using commercial traffic shaper’suser interface.

Fig. 2 shows the outcome of experiments as visible incommercial traffic shaper’s user interface. The experiments areperformed using an internet browser and user-client withoutperforming TLS handshake same as that of performed byInternet browser. The traffic shaper user-interface windowshows the internet traffic activity for last 10 mins in the leftwindow and service-wise classification of internet traffic inthe right window. The commercial traffic shaper successfullydetects the Gmail traffic for experiments using the internetbrowser (Fig. 2(a)). However, it could not detect any Gmailtraffic for the experiment using user-client. The traffic shaperrather wrongly detects Gmail’s traffic as YouTube traffic in thelatter experiment ( Fig. 2(b)).

It indicates that commercial traffic shaper is always able toclassify the Internet traffic correctly if accessed using Internetbrowser. However, the traffic classification is inconsistent orsometimes wrong in traffic characteristics based classification.As mentioned earlier, the traffic classification governs thenetwork response. Thus generating a network response similarto that of the original service is a challenging task. .

B. TD Detection

The TD detection algorithm is the core engine of themeasurement setup for TD detection. Most of the time, it needsa specific type of input for its proper operations that is derivedfrom the observed network response. The average throughputcurves of probing traffic or specific traffic characteristics ofsequence of network management response packets such asinter-packet times are examples of input information. TDdetection involves the comparison of such specific network re-sponse across different traffic streams. However, these networkresponse may vary across services due to different reasonswithout being subjected to deliberate traffic differentiation.

The end-to-end connection between client and server forthe Internet services is not dedicated. The best-effort nature

Page 4: Challenges in Net Neutrality Violation Detection: A Case

of the IP layer packet forwarding results in packets from thesame traffic stream to take different paths having different con-gestion environment. The performance comparison of streamsexperiencing different congestion is not reliable.

Internet services employ various mechanisms to cope withthe fluctuation in available bandwidth to provide a seamlessend-user experience. Dynamic adaptive streaming over HTTP(DASH) is one such technique that modifies traffic character-istics such as speed or content characteristics such as codingrate. Each streaming service uses tailored techniques as pertheir requirements, and they are proprietary [27]. Measurementsetups such as passive monitoring systems face this challengeof normalizing various streaming services’ performances fortheir difference in bandwidth fluctuation coping techniques.While active probing method overcome this issue using cus-tomised DASH or traffic replay, their probing traffic tendsto saturates the available bandwidth, similar to point-to-point(p2p) traffic. Such traffic streams may lose their relevance asoriginal service traffic.

The network middle-boxes applies QoS-based traffic man-agement practices (TMPs) to efficiently allocate network re-sources across different types of services while maintainingthe their bandwidth requirement. Thus different types ofservices may be allocated different bandwidths depending onthe current network load and one has to take into accountvarious confounding factors.

Fig. 4 shows the effect of various confounding factors onthe performances of Internet services (named A,B,C,D). In thefigure, total height of each bar (consisting of the colored andgrayed portion) shows throughput for each streaming serviceas governed by the transmission strategy (DASH in case ofstreaming services) of their servers. The colored portion (otherthan grey) shows the throughput experienced at the client-side.The grey portion is the throughput lost in the network – theupper part of the grayed portion shows the throughput lost dueto network congestion and the lower grayed portion showsthat lost due to network traffic management. Note that theserver of each stream could be sending the traffic at differentrates with their own proprietary DASH and that received byuser-clients depend on the different levels of degradation thestreams experience in the networks. The white portion in thethroughput performance bar of service D shows throughputlost due to deliberate degradation. Identifying this discrimina-tion of stream D just by comparing the throughout experiencedat the user-client will be misleading due to non-uniformity inthe transmission rates at the source and network effects. Wevalidated this aspect using three different streaming servicesnamely Netflix, YouTube, and PrimeVideo. We plotted theaverage running throughput for these services for their trafficas shown in Fig. 5. It can be seen that each service is fetchingthe data differently depending on their congestion environmentand server’s data transmission schemes i.e. DASH. Thusthe performances of these services are comparable withoutnormalising the effect of confounding factors.

The normalisation of effects of all confounding factors poseschallenge in case of TD detection based on comparing network

Fig. 4. Variation in performance of services while traversing the Internet

Fig. 5. Running avarage throughput for streaming services

responses across services.

C. Other challenges

Internet services use a specific port number for communica-tion. It is as per port reservations defined in Internet standards[28], e.g., port 80 for HTTP traffic and 22 for SSH (SecureShell ) traffic. Thus the port number used in the transmissionof probing data plays a vital role in traffic classification bynetwork middle-boxes [29]. Using correct data to be used onthe pre-assigned port number for a given service and making itas authentic as original service’s traffic stream is a challengingtask. It requires a thorough understanding of network trafficclassification on that port.

IV. CASE STUDY : WEHE - TD DETECTION TOOL FORMOBILE ENVIRONMENT

Wehe [13] is a tool for the detection of traffic differentiationover mobile networks (e.g., cellular and WiFi). It is availableas an App on Android and the iOS platform. The user databaseof the Wehe tool consists of 126,249 users across 2,735 ISPsin 183 countries/regions generating 1,045,413 crowd-sourcedmeasurements. European national telecom regulator, the USFTC and FCC, US Senators, and numerous US state legislatorshave used the Wehe tool’s findings.

The tool supports TD detection for many popular servicessuch as Netflix, YouTube. The tool runs TD detection tests by

Page 5: Challenges in Net Neutrality Violation Detection: A Case

coordinating with its server, called the “replay server”. Thereplay server keeps track of active users and maps replay runsto the correct user’s service.

A. Traffic generation

Wehe uses the “record-and-replay” method for generatingprobing traffic. The user-client exchanges the probing trafficwith the replay server as per the replay script that captures theapplication-level traffic behavior, including the port number,data sequence, and timing dependencies from the originalservice’s network logs. Preserving timing is a crucial featureof Wehe’s approach. It expects network devices to use thisinformation in case of non-availability of any other means toclassify applications, e.g., HTTPS encrypted data transfer.

Wehe tool uses two types of probing traffic streams. Whileone stream (original replay) is the same as the originalservice’s application-level network trace, another traffic stream(control replay) differs substantially from the first trafficstream. In one approach, Wehe uses the VPN channel to senda second probing traffic stream. This approach uses the meddleVPN [30] framework for data transfer and server-side packetcapture. Another approach uses the bit-reversed version of thefirst traffic stream sent one the same channel. Currently, Weheuses the latter approach due to its superior results.

B. Over the network response

Wehe is a differential detector tool that compares thenetwork responses for two types of traffic streams generatedby the tool. The service-specific information present in theoriginal replay is useful for network devices with Deep PacketInspection (DPI) capability to identify and classify the servicecorrectly. So, the original replay’s traffic performance over theInternet closely resembles the original application traffic on thesame network. While original replay is exposed for detectionto network devices, the traffic streams with bit reversed dataor control replay is equally “not detectable” for classification.Thus it is expected that the control replay traffic evades thecontent-based application-specific traffic differentiation. Theperformances of two such traffic streams (detectable and non-detectable) differ if network devices apply different trafficmanagement or traffic differentiation on each traffic streamas per content-based classification.

C. TD detection scenario expectations

Wehe compares the throughput performances of originalreplay and control replay to detect TD. The methodology usesthe throughput as a comparison metric due to its sensitivity tobandwidth-limiting traffic shaping. However, the tool expectsthat the TD detection algorithm does not detect TD basedon throughput for traffic streams with traffic rates below theshaping rate. The rationale is that the shaper cannot affect theperformance of such an application stream.

Many times both traffic streams get affected by other factorssuch as signal strength, congestion. It creates an irregularityin the received performance due to bandwidth volatility. It ismentioned to be leading to incorrect differentiation detection.

The tool performs multiple test replays to overcome the effectof bandwidth volatility.

D. Operational requirements

Wehe server needs side-channels for each client to associateit with precisely one app replay. This side-channel suppliesinformation about replay runs to the server. Each user directlyconnected to the Wehe replay server is uniquely identifiable onthe server-side with an associated IP address of side channelsthat maps each replay to exactly one App.

The other operational requirement is that the Wehe client-server communication uses customized socket connectionswith specific keep-alive behavior. Sometimes, the usage oftranslucent proxies by the user-client modifies this behavior.The Replay server handles this situation by handling suchunexpected connections. The protocol-specific proxies, e.g.,HTTP proxy, connect the user-client to the server through itselfusing specific port numbers, e.g., 80/443 for HTTP/HTTPS.Nevertheless, it allows the user-client to connect to the serverfor connections using other protocols directly. The side chan-nels of Wehe do not use HTTP/HTTPS connections. Sothe IP address for the same user differs for side-channeland replay runs. Wehe server detects such connections andindicates such connections to the user-client using a specialmessage. The special message triggers the exchange of furthercommunication with a customized header.

E. Challenges of validating Wehe

The TD mechanism of Wehe is straightforward to use —the requirement changes when using it for its validation. Thevalidation process may need to launch only one type of replayfor different services during one test or may need to launchall replays in parallel. These are not requirements relatedto TD detection, Wehe’s primary goal, so understandablynot supported. Hence the validation of Wehe’s working insuch scenarios needs a specific client-server setup. Here thechallenge is to separate the intended scenario-specific Wehe’smechanism so that the resulting system still mimics Wehe’sactual behavior.

Moreover, Wehe does not provide error/failure notificationsin all scenarios. Instead, it prompts the user to reopen theApp. As a result, the validation setup loses vital feedback in-formation regarding the error/failure induced by its validationscenario.

V. SHORTCOMING OF WEHE ON HTTPS TRAFFIC

Our study focuses on validating the network responses forthe replayed traffic streams, TD detection, and operational fea-sibility in various network configurations. While operationalfeasibility is validated using the publicly available “Wehe”Android app on Google Playstore, TD detection scenariosare validated using theoretical arguments. The validation ofnetwork responses requires bandwidth analysis of the receivedtraffic stream. This analysis requires the network logs forthe specific replay performed as per the validation scenario.The replay done on the device and multiple other streaming

Page 6: Challenges in Net Neutrality Violation Detection: A Case

(a) Using internet browser (b) Using user-client

Fig. 6. Traffic classification of replayed YouTube traffic

services running in parallel is one such scenario. Wehe appdoes not immediately provide such network logs for thereplays after the completion of tests. So, we implemented theuser-client and server that mimics the behavior of the Wehetool.

We use a client-server setup similar to the setup shown inFig. 3 for validating Wehe. In the current setup, we replacedthe original service’s server with replay server. The user-client and replay server are connected through a commercialtraffic shaper. Moreover, our setup has a provision to performmultiple replays in parallel. Our validation setup does not needadministrative channels and overheads, e.g., side-channels.Our server always needs to support a single-user client. Thevalidation of scenarios with multiple clients uses the WeheApp directly due to the non-requirement of associated trafficanalysis.

Reminder of this section describes results of the validation.

A. Original service’s traffic emulation

The network responses are dependent on the application ofnetwork policies based on correct probing traffic classificationby network middle-boxes as mentioned in Sec.III-A. Wevalidated the classification of Wehe’s emulated traffic usinga commercial traffic shaper. The classification of emulatedtraffic is observed using the user interface of commercial trafficshaper.

For performing experiment, the YouTube application data isreplayed from replay server to user-client through commercialtraffic shaper. During data transfer, the commercial trafficshaper’s user-interface window is observed for the presence ofYouTube traffic. We also accessed the YouTube traffic using aninternet browser and observed the traffic shaper’s classificationoutcome to baseline our traffic classification observations.

Fig. 6 shows the traffic classification outcome as shown bycommercial traffic shaper’s user-interface window for trafficaccessed directly using an internet browser from a YouTubeserver. It shows the internet activity in the left window andthe classification of corresponding traffic in the right window.

Fig. 6(a) shows that traffic accessed using internet browser iscorrectly classified as YouTube. This is inline with commercialtraffic shaper’s intended behaviour.

Fig. 6(b) shows the traffic classification outcome for traf-fic accessed using user-client. It shows the evidence of noYouTube traffic transferred over the communication link.Moreover, it classifies the same traffic as HTTPS traffic. The

outcome of this experiment shows that not all network middle-boxes can correctly classify the Wehe’s replayed traffic.

B. Effect of data rate in replay script

The probing traffic generation impacts the network responseas expected by the TD detection algorithm Wehe uses thetraffic trace from the original service for generating replayscripts that preserve the application data and its timing rela-tionship. This replay script is used over the original networkand also on networks that are differently geo-located. As trafficshaping rate varies across networks for the same service (asmentioned in [32]), the traffic rate preserved in the replayscript can be different from the traffic shaping rate of thecurrently considered network. The replay traffic rate can belower than the traffic shaping rate.

The Wehe methodology does not detect traffic differentia-tion if the replay script’s traffic rate is lower than the network’sshaping rate as it does not affect the traffic stream. Such replayscripts can never detect traffic shaping on such networks. ThusWehe tool’s TD detection capability is limited by the replayscript’s probing traffic rate.

C. Usage of port number 80

The network responses are driven by network middle-boxesperception about the probing traffic (refer Sec. III-A). Thereplay script preserves the data in the applications’ originalnetwork trace. The original application servers use port 80 forthe plain-text data and port 443 for encrypted data transfer.Wehe replay script directly uses the encrypted data from theapplication’s network trace and transmits it on port 80. Insuch cases, Wehe expects its original replay traffic streamto be classified correctly by network devices using encryptedapplication data. It is impossible for such data on port 80 asencrypted traffic data cannot expose its identification to thenetwork device. Thus Wehe tool cannot generate the requiredtraffic streams for services running on the port number 443due to default usage of port 80 for replay run.

D. Traffic load governed network behavior

Note that scarcity of resources prompt networks to applycertain network traffic management, especially in heavy net-work load scenarios, that are beneficial for all active serviceson its network, e.g., QoS-based traffic management (referSec. III-A). We validated the effect of such traffic management

Page 7: Challenges in Net Neutrality Violation Detection: A Case

(a) Only Wehe (b) Wehe plus one service (c) Wehe plus two services

Fig. 7. Effect of network load on Wehe’s traffic stream performances

on the performances of both control and original replays. Thevalidation uses the following three scenarios for the validation,

• Replaying only Wehe’s two traffic streams without anyload on the network (Fig 7(a))

• Replaying Wehe’s three traffic streams with one addi-tional streaming service running in parallel (Fig. 7(b))

• Replaying Wehe’s three traffic streams with 2 additionalstreaming services running in parallel (Fig. 7(c))

The performances in Fig. 7(a) show that performances oftraffic streams generated by the Wehe tool are the sameunder no additional network load conditions. As network loadincreases, the performance of control replay deviates fromthat of original replay and at higher level (Fig. 7(b)). Whileperformance of control replay further deviates from originalreplay on lower side, two original replays still shows similarperformances as shown in Fig. 7(c). It invalidates the Wehetool’s expectation of control replay not getting differentiated.It also invalidates the claim of the tool of detecting the TDdue to total bandwidth.

E. Wehe tests from multiple devices within the same sub-net

The side-channels are introduced in Wehe design to supportmultiple user clients simultaneously. Side channels also assistin identifying the mapping between user-client and a combi-nation of IP addresses and ports. It is useful in the case ofnetworks using NATs [33]. We validated Wehe’s support formultiple clients and NAT-enabled networks using two differenttests. First, we connected two user-clients from within thesame subnet, i.e., clients sharing the same public IP address.In one test, the Wehe tool tests the same service on bothdevices, e.g., Wehe App on both devices tests for YouTube.The result shows that the Wehe test completed finishing ononly one device while Wehe App abruptly closed on anotherdevice. We repeated the same scenario, but this time Wehe testsdifferent services, e.g., Wehe on one device testing YouTubeduring another testing Netflix. We found that the Wehe teston one device completes properly while the Wehe test onanother device throws an error on the screen, informing theuser that another client is already performing the test, as shownin Fig. 9. These tests show that Wehe does not support multipledevices if they share the same IP address. While a side-channel

is useful to identify each replay from a user-client connecteddirectly to the Wehe replay server, it is not useful in thenetwork using NAT devices. Multiple users share the sameIP address in the case of NAT. In such cases, the side channelcan not uniquely map each replay run to a client. It limits theusage of Wehe to only one active client per replay server andISP and application. This limitation is documented by Wehedevelopers as well.

F. Effect of device network load on TD detection

Wehe’s replay server uses the same timings between appli-cation data transfer as that of original application traffic. Sucha transmission strategy is expected not to exhaust availablebandwidth. Hence the effect of source rate modulation dueto overshooting of traffic rate above available bandwidth isunlikely. It makes, original and control replays show similartraffic performances unless deliberately modified by networkpolicies i.e. TD.

Nevertheless, this expectation is not always satisfied as thetraffic data rate is modulated by the network load at the userdevice while performing Wehe tests. Such perturbations creatediscrepancy as the effect of time-varying current network loadon the probing traffic is also time-varying and may not alwaysbe the same. The back-to-back replay strategy of Wehe ensuresthat both (original and control replay) probing traffic streamsgets affected differently by the current network load. Undersuch network load on the device side, the notion of servicesnot exhausting available bandwidth ceases to exist along withits benefits for TD detection. It is necessary to normalise suchconfounding factors (refer Sec. III-B) before TD detection.

VI. TD DETECTION OF HTTPS TRAFFIC

As described in Sec. V, Wehe tool suffers from criticalweakness of not achieving intended traffic classification of itsOriginal replay traffic due to encrypted nature of most of themodern streaming services as demonstrated in Sec. V-A. In thissection, we propose a method to overcome this shortcomingand make it useful to detect discrimination of HTTPS traffic.

The payload-based traffic classification techniques do notapply to encrypted traffic. Commercial traffic shapers primarilyuse the Server Name Indication or SNI-based traffic classifi-cation technique to overcome the classification of encrypted

Page 8: Challenges in Net Neutrality Violation Detection: A Case

(a) Gmail video play (b) YouTube replay traffic

Fig. 8. Traffic classification of Gmail video play and YouTube replay traffic with appropriate SNI

Fig. 9. Wehe apps running on multiple devices within the same subnet andtesting different service

traffic. It is used in non-collaborative manner [29] with otherclassification techniques due to its higher accuracy [31].

Fig. 10. TLS handshake sequence

Transport layer security (TLS) [26] is an Internet protocolthat provides channel security to transport layer protocol com-munication. The typical TLS handshake is as shown in Fig. 10.SNI is a parameter in the initial TLS handshake messagenamed “ClientHello”. It is used on the server-side to selectservice-specific SSL certificates. Thus, it uniquely identifiesthe service. “ClientHello” is the first handshake message, itscontents are unencrypted including SNI. Almost all streamingservices use TLS for secure channel establishment. Thusmaking SNI-based traffic classification applicable to almostall streaming services. So SNI is the primary parameter usedby traffic shapers to classify traffic exchanged over the publicinternet using standard HTTPS port 443 i.e., encrypted traffic.

Before demonstrating the usability of SNI in traffic classifi-cation of encrypted traffic for Wehe replay traffic, we first val-idated the importance of SNI parameters in the classificationof traffic from YouTube’s original server. We used the setup as

shown in Fig 3. We repeated the experiment in Sec. III-A withuser-client performing TLS handshake using appropriate valueof SNI parameter extracted from original service’s networklogs. The Fig. 8(a) shows the outcome of the experiment as thecorrect classification of Gmail traffic. Note that the same trafficwas wrongly classified as YouTube traffic when transferredwithout SNI (refer Fig. 2(b)).

Based on the above validation, we propose to use SNI inWehe’s replay traffic to overcome its shortcoming of correctreplay traffic classification. To validate this point, we repeatedthe experiment in Sec. V-A with appropriate SNI (SNI ex-tracted from original YouTube service’s traffic) used in theTLS handshake. We used the experiment setup as used inSec.V. The Fig. 8(b) shows the outcome of this experiment.As seen, that the traffic shaper has correctly detected YouTubereplay traffic. This observation establishes that using service-specific SNI in TLS handshake messages leads to the correctclassification of replay traffic.

The Wehe tool employing our new SNI-based mechanism toemulate the original service’s traffic can detect discriminationof HTTPS traffic and also work with a larger set of networkmiddle-boxes, including those that fail to classify replay trafficcorrectly based on traffic characteristics alone.

VII. CONCLUSION

Net neutrality violation detection is a need of an hour. Asmany of the ISPs are also content providers these days, theycompete with each other, which can lead to one deliberatelydiscriminating the services of the other to gain market share.However, users should have the freedom to choose services asper their wishes. Our work considered various challenges inthe detection of traffic discrimination in HTTPS traffic.

As a case study, we validated Wehe, one of the latesttools available to detect traffic differentiation. The describedchallenges helped us divide the entire tool into multiple inter-dependent components and validate them independently. Ourvalidation using commercial traffic shaper revealed that trafficin Wehe setup may not mimic the characteristics of HTTPStraffic accessed from the original servers. Hence, middle-boxesmay not subject them to intended discrimination. Thus, Wehemay not detect discrimination of HTTPS traffic. Our newmethod that uses the appropriate SNI parameter value in theinitial TLS handshake message overcomes this shortcoming.Hence our work provided a mechanism to detect a wide rangeof possible discriminations on the Internet.

Page 9: Challenges in Net Neutrality Violation Detection: A Case

REFERENCES

[1] T. Garrett, L. E. Setenareski, L. M. Peres, L. C. E. Bona, and E. P.Duarte, “Monitoring network neutrality: A survey on traffic differentia-tion detection,” IEEE Communications Surveys Tutorials, vol. 20, no. 3,pp. 2486–2517, 2018.

[2] V. Nguyen, D. Mohammed, M. Omar, and P. Dean, “Net neutralityaround the globe: A survey,” in 2020 3rd International Conference onInformation and Computer Technologies (ICICT), 2020, pp. 480–488.

[3] Ramneek, P. Hosein, W. Choi, and W. Seok, “Detecting networkneutrality violations through packet loss statistics,” in 2015 17th Asia-Pacific Network Operations and Management Symposium (APNOMS),2015, pp. 404–407.

[4] D. Li, F. Tian, M. Zhu, L. Wang, and L. Sun, “A novel framework foranalysis of global network neutrality based on packet loss rate,” in 2015International Conference on Cloud Computing and Big Data (CCBD),2015, pp. 297–304.

[5] M. Dischinger, A. Mislove, A. Haeberlen, and K. P. Gummadi, “De-tecting bittorrent blocking,” in Proceedings of the 8th ACM SIGCOMMConference on Internet Measurement, Oct. 2008, pp. 3–8.

[6] M. B. Tariq, M. Motiwala, N. Feamster, and M. Ammar, “Detectingnetwork neutrality violations with causal inference,” in Proceedings ofthe 5th International Conference on Emerging Networking Experimentsand Technologies, ser. CoNEXT ’09. New York, NY, USA:Association for Computing Machinery, 2009, p. 289–300. [Online].Available: https://doi.org/10.1145/1658939.1658972

[7] R. Ravaioli, G. Urvoy-Keller, and C. Barakat, “Towards a generalsolution for detecting traffic differentiation at the internet access,” in2015 27th International Teletraffic Congress, Sep. 2015, pp. 1–9.

[8] Y. Zhang, Z. M. Mao, and M. Zhang, “Detecting traffic differentiationin backbone isps with netpolice,” in Proceedings of the 9th ACMSIGCOMM Conference on Internet Measurement, ser. IMC ’09. NewYork, NY, USA: ACM, 2009, pp. 103–115. [Online]. Available:http://doi.acm.org/10.1145/1644893.1644905

[9] P. Kanuparthy and C. Dovrolis, “Shaperprobe: End-to-end detection ofisp traffic shaping using active methods,” in Proceedings of the 2011ACM SIGCOMM Conference on Internet Measurement Conference, Nov.2011, pp. 473–482.

[10] U. Weinsberg, A. Soule, and L. Massoulie, “Inferring traffic shapingand policy parameters using end host measurements,” in Proceedings ofIEEE INFOCOM, 2011.

[11] M. Dischinger, M. Marcon, S. Guha, K. Gummadi, R. Mahajan, andS. Saroiu, “Glasnost: Enabling end users to detect traffic differentiation,”in 7th USENIX Conf. Networked Systems Design and Implementation,Apr. 2010, pp. 27–27.

[12] V. Bashko, N. Melnikov, A. Sehgal, and J. Schonwalder, “Bonafide: Atraffic shaping detection tool for mobile networks,” in 2013 IFIP/IEEEInternational Symposium on Integrated Network Management (IM2013), 2013, pp. 328–335.

[13] A. Molavi Kakhki, A. Razaghpanah, A. Li, H. Koo, R. Golani,D. Choffnes, P. Gill, and A. Mislove, “Identifying traffic differentiationin mobile networks,” in Proceedings of the 2015 Internet MeasurementConference, Oct. 2015, pp. 239–251.

[14] R. Narisetty and D. Gurkan, “Identification of network measurementchallenges in openflow-based service chaining,” in 39th Annual IEEEConference on Local Computer Networks Workshops, 2014, pp. 663–670.

[15] E. Balestrieri, L. De Vito, F. Lamonaca, F. Picariello, S. Rapuano, andI. Tudosa, “Research challenges in measurement for internet of thingssystems,” ACTA IMEKO, vol. 7, p. 82, 01 2019.

[16] E. Karoly. (2000, Feb.) Qos performance validation in real lifescenarios. [Online]. Available: http://archive.opengroup.org/tech/qos/conference/q101/karoly.pdf

[17] S. Molnar, P. Megyesi, and G. Szabo, “How to validate traffic gen-erators?” in 2013 IEEE International Conference on CommunicationsWorkshops (ICC), 2013, pp. 1340–1344.

[18] G. Lu, Y. Chen, S. Birrer, F. E. Bustamante, C. Y. Cheung, and X. Li,“End-to-end inference of router packet forwarding priority,” in IEEEINFOCOM 2007 - 26th IEEE International Conference on ComputerCommunications, 2007, pp. 1784–1792.

[19] R. Mahajan, M. Zhang, L. Poole, and V. Pai, “Uncovering performancedifferences among backbone isps with netdiff,” in Proceedings of the 5thUSENIX Symposium on Networked Systems Design and Implementation,2008, pp. 205–218.

[20] (2008, Jun.) Ookla, measuring and understanding broadband : Speed,quality and application. [Online]. Available: https://www.ookla.com/docs/UnderstandingBroadbandMeasurement.pdf

[21] V. S. Khandkar and M. K. Hanawal, “Detection of traffic discriminationin the internet,” in 2020 International Conference on COMmunicationSystems NETworkS (COMSNETS), 2020, pp. 677–679.

[22] Tcpreplay: Pcap editing and replay tools for *nix. [Online]. Available:https://github.com/appneta/tcpreplay

[23] W. Cui, V. Paxson, N. Weaver, and R. Katz, “Protocol-independentadaptive replay of application dialog,” in 13th Annual Network andDistributed System Security Symposium (NDSS), Feb. 2006, pp. 27–27.

[24] R. T. El-Maghraby, N. M. Abd Elazim, and A. M. Bahaa-Eldin, “A sur-vey on deep packet inspection,” in 2017 12th International Conferenceon Computer Engineering and Systems (ICCES), 2017, pp. 188–197.

[25] M. AlSabah, K. Bauer, and I. Goldberg, “Enhancing tor’s performanceusing real-time traffic classification,” in Proceedings of the 2012 ACMConference on Computer and Communications Security, ser. CCS ’12.New York, NY, USA: Association for Computing Machinery, 2012, p.73–84. [Online]. Available: https://doi.org/10.1145/2382196.2382208

[26] Z. Hu, L. Zhu, J. Heidemann, A. Mankin, D. Wessels, and P. Hoffman,“The Transport Layer Security (TLS) Protocol Version 1.3,” InternetRequests for Comments, IETF, RFC 7858, August 2018. [Online].Available: https://tools.ietf.org/pdf/rfc8446.pdf

[27] C. Zhou, C. Lin, and Z. Guo, “mdash: A markov decision-based rateadaptation approach for dynamic http streaming,” IEEE Transactions onMultimedia, vol. 18, no. 4, pp. 738–751, Apr. 2016.

[28] T. Joe, L. Eliot, M. Allison, K. Markku, O. Kumiko,S. Martin, E. Lars, M. Alexey, E. Wes, Z. Alexander,T. Brian, I. Jana, M. Allison, T. Michael, K. Eddie, andN. Yoshifumi. (2020, Sept) Service name and transport protocol portnumber registry. [Online]. Available: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml

[29] G. Keinan. (2018, January) Optimizing enterprise networks throughsd-avc (software define application visibility and control). [Online].Available: https://www.ciscolive.com/c/dam/r/ciscolive/emea/docs/2018/pdf/BRKCRS-2502.pdf

[30] A. Rao, J. Sherry, A. Legout, A. Krishnamurthy, W. Dabbous, andD. Choffnes, “Meddle: middleboxes for increased transparency andcontrol of mobile traffic,” in CoNEXT Student Workshop, ACM, 12 2012,pp. 65–66.

[31] R. Liu and X. Yu, “A survey on encrypted traffic identification,”in Proceedings of the 2020 International Conference on CyberspaceInnovation of Advanced Technologies, ser. CIAT 2020. New York,NY, USA: Association for Computing Machinery, 2020, p. 159–163.[Online]. Available: https://doi.org/10.1145/3444370.3444564

[32] F. Li, A. A. Niaki, D. Choffnes, P. Gill, and A. Mislove, “Alarge-scale analysis of deployed traffic differentiation practices,”in Proceedings of the ACM Special Interest Group on DataCommunication, ser. SIGCOMM ’19. New York, NY, USA:Association for Computing Machinery, 2019, p. 130–144. [Online].Available: https://doi.org/10.1145/3341302.3342092

[33] P. Srisuresh and K. Egevang, “Traditional IP Network Address Translator(Traditional NAT),” Internet Requests for Comments, IETF, RFC 3022,Jan 2001. [Online]. Available: https://tools.ietf.org/pdf/rfc3022.pdf