21
Proceedings on Privacy Enhancing Technologies ; 2020 (1):235–255 Tobias Pulls* and Rasmus Dahlberg Website Fingerprinting with Website Oracles Abstract: Website Fingerprinting (WF) attacks are a subset of traffic analysis attacks where a local passive attacker attempts to infer which websites a target vic- tim is visiting over an encrypted tunnel, such as the anonymity network Tor. We introduce the security no- tion of a Website Oracle (WO) that gives a WF attacker the capability to determine whether a particular mon- itored website was among the websites visited by Tor clients at the time of a victim’s trace. Our simulations show that combining a WO with a WF attack—which we refer to as a WF+WO attack—significantly reduces false positives for about half of all website visits and for the vast majority of websites visited over Tor. The measured false positive rate is on the order one false positive per million classified website trace for websites around Alexa rank 10,000. Less popular monitored web- sites show orders of magnitude lower false positive rates. We argue that WOs are inherent to the setting of anonymity networks and should be an assumed capabil- ity of attackers when assessing WF attacks and defenses. Sources of WOs are abundant and available to a wide range of realistic attackers, e.g., due to the use of DNS, OCSP, and real-time bidding for online advertisement on the Internet, as well as the abundance of middleboxes and access logs. Access to a WO indicates that the eval- uation of WF defenses in the open world should focus on the highest possible recall an attacker can achieve. Our simulations show that augmenting the Deep Fin- gerprinting WF attack by Sirinam et al. [60] with access to a WO significantly improves the attack against five state-of-the-art WF defenses, rendering some of them largely ineffective in this new WF+WO setting. Keywords: website fingerprinting, website oracles, traffic analysis, security model, design DOI 10.2478/popets-2020-0013 Received 2019-02-28; revised 2019-09-15; accepted 2019-09-16. *Corresponding Author: Tobias Pulls: Karlstad Univer- sity, E-mail: [email protected] Rasmus Dahlberg: Karlstad University, E-mail: ras- [email protected] 1 Introduction A Website Fingerprinting (WF) attack is a type of traffic analysis attack where an attacker attempts to learn which websites are visited through encrypted net- work tunnels—such as the low-latency anonymity net- work Tor [20] or Virtual Private Networks (VPNs)—by analysing the encrypted network traffic [11, 26, 27, 35, 48, 62]. The analysis considers only the size and tim- ing of encrypted packets sent over the network to and from a target client. This makes it possible for attack- ers that only have the limited capability of observing the encrypted network traffic (sometimes referred to as a local eavesdropper ) to perform WF attacks. Sources of such capabilities include ISPs, routers, network interface cards, WiFi hotspots, and guard relays in the Tor net- work, among others. Access to encrypted network traffic is typically not well-protected over the Internet because it is already in a form that is considered safe to expose to attackers due to the use of encryption. The last decade has seen significant work on im- proved WF attacks (e.g., [8, 25, 60, 70]) and defenses (e.g, [6, 7, 31, 36]) accompanied by an ongoing debate on the real-world impact of these attacks justifying the deployment of defenses or not, in particular surround- ing Tor (e.g., [30, 49, 71]). There are significant real- world challenges for an attacker to successfully perform WF attacks, such as the sheer size of the web (about 200 million active websites) 1 , detecting the beginning of website loads in encrypted network traces, background traffic, maintaining a realistic and fresh training data set, and dealing with false positives. Compared to most VPN implementations, Tor has some basic but rather ineffective defenses in place against WF attacks, such as padding packets to a constant size and randomized HTTP request pipelin- ing [8, 20, 70]. Furthermore, Tor recently started im- plementing a framework for circuit padding machines to make it easier to implement traffic analysis de- 1 Netcraft January 2019 Web Server Survey, https: //web.archive.org/web/20190208081915/https://news.netcraft. com/archives/category/web-server-survey/.

Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Proceedings on Privacy Enhancing Technologies ; 2020 (1):235–255

Tobias Pulls* and Rasmus Dahlberg

Website Fingerprinting with Website OraclesAbstract: Website Fingerprinting (WF) attacks are asubset of traffic analysis attacks where a local passiveattacker attempts to infer which websites a target vic-tim is visiting over an encrypted tunnel, such as theanonymity network Tor. We introduce the security no-tion of a Website Oracle (WO) that gives a WF attackerthe capability to determine whether a particular mon-itored website was among the websites visited by Torclients at the time of a victim’s trace. Our simulationsshow that combining a WO with a WF attack—whichwe refer to as a WF+WO attack—significantly reducesfalse positives for about half of all website visits andfor the vast majority of websites visited over Tor. Themeasured false positive rate is on the order one falsepositive per million classified website trace for websitesaround Alexa rank 10,000. Less popular monitored web-sites show orders of magnitude lower false positive rates.

We argue that WOs are inherent to the setting ofanonymity networks and should be an assumed capabil-ity of attackers when assessing WF attacks and defenses.Sources of WOs are abundant and available to a widerange of realistic attackers, e.g., due to the use of DNS,OCSP, and real-time bidding for online advertisementon the Internet, as well as the abundance of middleboxesand access logs. Access to a WO indicates that the eval-uation of WF defenses in the open world should focuson the highest possible recall an attacker can achieve.Our simulations show that augmenting the Deep Fin-gerprinting WF attack by Sirinam et al. [60] with accessto a WO significantly improves the attack against fivestate-of-the-art WF defenses, rendering some of themlargely ineffective in this new WF+WO setting.

Keywords: website fingerprinting, website oracles, trafficanalysis, security model, design

DOI 10.2478/popets-2020-0013Received 2019-02-28; revised 2019-09-15; accepted 2019-09-16.

*Corresponding Author: Tobias Pulls: Karlstad Univer-sity, E-mail: [email protected] Dahlberg: Karlstad University, E-mail: [email protected]

1 IntroductionA Website Fingerprinting (WF) attack is a type oftraffic analysis attack where an attacker attempts tolearn which websites are visited through encrypted net-work tunnels—such as the low-latency anonymity net-work Tor [20] or Virtual Private Networks (VPNs)—byanalysing the encrypted network traffic [11, 26, 27, 35,48, 62]. The analysis considers only the size and tim-ing of encrypted packets sent over the network to andfrom a target client. This makes it possible for attack-ers that only have the limited capability of observingthe encrypted network traffic (sometimes referred to asa local eavesdropper) to perform WF attacks. Sources ofsuch capabilities include ISPs, routers, network interfacecards, WiFi hotspots, and guard relays in the Tor net-work, among others. Access to encrypted network trafficis typically not well-protected over the Internet becauseit is already in a form that is considered safe to exposeto attackers due to the use of encryption.

The last decade has seen significant work on im-proved WF attacks (e.g., [8, 25, 60, 70]) and defenses(e.g, [6, 7, 31, 36]) accompanied by an ongoing debateon the real-world impact of these attacks justifying thedeployment of defenses or not, in particular surround-ing Tor (e.g., [30, 49, 71]). There are significant real-world challenges for an attacker to successfully performWF attacks, such as the sheer size of the web (about200 million active websites)1, detecting the beginning ofwebsite loads in encrypted network traces, backgroundtraffic, maintaining a realistic and fresh training dataset, and dealing with false positives.

Compared to most VPN implementations, Tor hassome basic but rather ineffective defenses in placeagainst WF attacks, such as padding packets to aconstant size and randomized HTTP request pipelin-ing [8, 20, 70]. Furthermore, Tor recently started im-plementing a framework for circuit padding machinesto make it easier to implement traffic analysis de-

1 Netcraft January 2019 Web Server Survey, https://web.archive.org/web/20190208081915/https://news.netcraft.com/archives/category/web-server-survey/.

Page 2: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 236

fenses2 based on adaptive padding [31, 59]. However,the unclear real-world impact of WF attacks makes de-ployment of proposed effective (and often prohibitivelycostly in terms of bandwidth and/or latency overheads)WF defenses a complicated topic for both researchers toreach consensus on and the Tor Project to decide upon.

1.1 Introducing Website Oracles

In this paper, we introduce the security notion of aWeb-site Oracle (WO) that can be used by attackers to aug-ment any WF attack. A WO answers “yes” or “no” tothe question “was a particular website visited over Torat this point in time?”. We show through simulation thatsuch a capability—access to a WO—greatly reduces thefalse positive rate for an attacker attempting to finger-print the majority of websites and website visits throughthe Tor network. The reduction is to such a great ex-tent that our simulations suggest that false positivesare no longer a significant reason for why WF attackslack real-world impact. This is in particular the case foronion services where the estimated number of websitesis a fraction compared to the “regular” web [28].

Our simulations are based on the privacy-preservingnetwork measurement results of the live Tor network inearly 2018 by Mani et al. [38]. Besides simulating WOswe also identify a significant number of potential sourcesof WOs that are available to a wide range of attackers,such as nation state actors, advertisement networks (in-cluding their customers), and operators of relays in theTor network. Some particularly practical sources—dueto DNS and how onion services are accessed—can beused by anyone with modest computing resources.

We argue that sources of WOs are inherent in Tordue to its design goal of providing anonymous andnot unobservable communication: observable anonymitysets are inherent for anonymity [32, 50, 53], and a WOcan be viewed as simply being able to query for mem-bership in the destination/recipient anonymity set (thepotential websites visited by a Tor client). The solutionto the effectiveness of WF+WO attacks is therefore notto eliminate all sources—that would be impossible with-out unobservable communication [32, 50, 53]—but toassume that an attacker has WO access when evaluat-

2 Tor blog post by Nick Mathewson, https://web.archive.org/web/20190208085701/https://blog.torproject.org/new-release-tor-0401-alpha.

ing the effectiveness of WF attacks and defenses, evenfor weak attackers like local (passive) eavesdroppers.

The introduction of a WO in the setting of WFattacks is similar to how encryption schemes are con-structed to be secure in the presence of an attackerwith access to encryption and decryption oracles (cho-sen plaintext and ciphertext attacks, respectively) [23,43, 52]. This is motivated by the real-world prevalenceof such oracles, and the high impact on security whenpaired with other weaknesses of the encryption schemes:e.g., Bleichenbacher [4] padding oracle attacks remainan issue in modern cryptosystems today despite beingdiscovered about twenty years ago [40, 56].

1.2 Contributions and Structure

Further background on anonymity, Tor, and WF arepresented in Section 2. Section 3 defines a WO and de-scribes two generic constructions for combining a WOwith any WF attack. Our generic constructions are atype of Classify-Verify method by Stolerman et al. [61],first used in the context of WF attacks by Juarezet al. [30] and later by Greschbach et al. [24]. Section 4presents a number of sources of WOs that can be used bya wide range of attackers. We focus on practical sourcesbased on DNS and onion service directories in Tor, offer-ing probabilistic WOs that anyone can use with modestresources. We describe how we simulate access to a WOthroughout the rest of the paper in Section 5, based onTor network measurement data from Mani et al. [38].

Section 6 experimentally evaluates the performanceof augmenting the state-of-the-art WF attack Deep Fin-gerprinting (DF) by Sirinam et al. [60] with WO accessusing one of our generic constructions. We show signif-icantly improved classification performance against un-protected Tor as well as against traces defended withthe WF defenses WTF-PAD by Juarez et al. [31] andWalkie-Talkie by Wang and Goldberg [72], concludingthat the defenses are ineffective in this new settingwhere an attacker has access to a WO. Further, wealso evaluate DF with WO access against Wang et al.’sdataset [70] with simulated traces for the constant-rate WF defenses CS-BuFLO and Tamaraw by Caiet al. [6, 7]. Our results show that constant-rate de-fenses are overall effective defenses but not efficient dueto the significant induced overheads. We then evaluatetwo configurations of the WF defense DynaFlow by Luet al. [36], observing similar effectiveness as CS-BuFLObut at lower overheads approaching that of WTF-PADand Walkie-Talkie.

Page 3: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 237

In Section 7 we discuss our results, focusing on theimpact on false positives with WO access, how imper-fect sources for WOs impact WF+WO attacks, limita-tions of our work, and possible mitigations. Our simu-lations indicate that WF defenses should be evaluatedagainst WF attacks based on how they minimise recall.We present related work in Section 8, including howWF+WO attacks relate to traffic correlation and confir-mation attacks. Section 9 briefly concludes this paper.

2 BackgroundHere we present background on terminology, theanonymity network Tor, and WF attacks and defenses.

2.1 Anonymity and Unobservability

Anonymity is the state of a subject not being identifi-able from an attackers perspective within the anonymityset of possible subjects that performed an action such assending or receiving a message [50]. For an anonymitynetwork, an attacker may not be able to determine whosent a message into the network—providing a senderanonymity set of all possible senders—and conversely,not be able to determine the recipient of a message fromthe network out of all possible recipients in the recipi-ent anonymity set. Inherent for anonymity is that thesubjects in an anonymity set change based on what theattacker observes, e.g., when some subjects send or re-ceive messages [32, 53]. In gist, anonymity is concernedwith hiding the relationship between a sender and re-cipient, not its existence.

Unobservability is a strictly stronger notion thananonymity [32, 50, 53]. In addition to anonymity of therelationship between a sender and recipient, unobserv-ability also requires that an attacker (not acting as ei-ther the sender or recipient) cannot sufficiently distin-guish if there is a sender or recipient or not [50]. Perfectunobservability is therefore the state of an attacker be-ing unable to determine if a sender/recipient should bepart of the anonymity set or not.

2.2 Tor

Tor is a low-latency anonymity network for anonymis-ing TCP streams with about eight million daily users,primarily used for anonymous browsing, censorshipcircumvention, and providing anonymous (onion) ser-

vices [20, 38]. Because Tor is designed to be usable forlow-latency tasks such as web browsing, its threat modeland design does not consider powerful attackers, e.g.,global passive adversaries that can observe all networktraffic on the Internet [18, 20]. However, less powerfulattackers such as ISPs and ASes that observe a fractionof network traffic on the Internet are in scope.

Users typically use Tor Browser—a customised ver-sion of Mozilla Firefox (bundled with a local relay)—as a client that sends traffic through three relays whenbrowsing a website on the regular Internet: a guard,middle, and exit relay. Traffic from the client to the exitis encrypted in multiple layers as part of fixed-size cellssuch that only the guard relay knows the IP-address ofthe client and only the exit relay knows the destinationwebsite. There are about 7000 public relays at the timeof writing, all available in the consensus generated pe-riodically by the network. The consensus is public andtherefore anyone can trivially determine if traffic is com-ing from the Tor network by checking if the IP-addressis in the consensus. Note that the encrypted networktraffic in Tor is exposed to network adversaries as wellas relays as it traverses the Internet. Figure 1 depictsthe setting just described, highlighting the anonymitysets of users of Tor Browser and the possible destina-tion websites.

Fig. 1. Using Tor to browse to a website, where an attacker ob-serves the encrypted traffic into the Tor network for a target user,attempting to determine the website the user is visiting.

2.3 Website Fingerprinting

As mentioned in the introduction, attacks that analysethe encrypted network traffic (a trace) between a Torclient and a guard relay with the goal to detect thewebsite a client is visiting are referred to as website fin-gerprinting (WF) attacks. Figure 1 shows the typicallocation of the attacker, who can also be the guard it-self. WF attacks are evaluated in either the closed or theopen world. In the closed world, an attacker monitorsa number of websites and it is the goal of the attacker

Page 4: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 238

to determine which website out of all the possible mon-itored websites a target is visiting. The open world islike the closed world with one significant change: thetarget user may also visit unmonitored websites. Thismeans that in the open world the attacker may also clas-sify a trace as unmonitored in addition to monitored,posing a significantly greater challenge for the attackerin a more realistic setting than the closed world. Theratio between monitored and unmonitored traces in adataset is further a significant challenge for WF attackswhen assessing their real-world significance for Tor [30].Typically, WF attacks are evaluated on the frontpagesof websites: webpage fingerprinting is presumably muchmore challenging due to the orders of magnitude of morewebpages than websites. Unless otherwise stated, weonly consider the frontpages of websites in this paper.

2.3.1 Website Fingerprinting Attacks

Prior to WF attacks being considered for use on Tor,they were used against HTTPS [11], web proxies [27, 62],SSH tunnels [35], and VPNs [26]. For Tor, WF attacksare typically based on machine learning and can be cat-egorized based on if they use deep learning or not.

Traditional WF attacks in the literature use man-ually engineered features extracted from both the sizeand timing of packets (and/or cells) sent by Tor. Stateof the art attacks with manually engineered features areWang-kNN [70], CUMUL [46], and k-FP [25]. For ref-erence, Wang-kNN has 1225 features, CUMUL 104 fea-tures, and k-FP 125 features. In terms of accuracy, k-FPappears to have a slight edge over the other two, butall three report over 90% accuracy against significantlysized datasets. As traditional WF attacks progressed,the features more than the type of machine learningmethod have shown to be vital for the success of attacks,with an emerging consensus on what are important fea-tures (e.g., coarse features like number of incoming andoutgoing packets) [12, 25, 46].

Deep learning was first used for WF attacks byAbe and Goto in 2016 [2]. Relatively quickly, Rimmeret al. reached parity with traditional WF attacks, lend-ing credence to the emerging consensus that the researchcommunity had found the most important features forWF [55]. However, recently Sirinam et al. [60] with DeepFingerprinting (DF) significantly improved on otherWF attacks, also on the WTF-PAD and Walkie-Talkiedefenses, and is at the time of writing considered state-of-the-art. DF is based on a Convolutional Neural Net-work (CNN) with a customized architecture for WF.

Each packet trace as input to DF is simply a constantsize (5000) list of cells (or packets) and their direction(positive for outgoing, negative for incoming), ignoringsize and timings. Based on the input, the CNN learnsfeatures on its own: we do not know what they are,other than preliminary work indicating that the CNNgives more weight to input early in the trace [39].

The last layer of the CNN-based architecture of DFis a softmax function: it assigns (relative) probabilitiesto each class as the output of classification. These prob-abilities allow a threshold to be defined for the finalclassification in the open world, requiring that the prob-ability of the most likely class is above the threshold toclassify as a monitored website.

2.3.2 Website Fingerprinting Defenses

WF defenses for Tor modify the timing and number of(fixed-size) cells sent over Tor when a website is visited.The modifications are done by injecting dummy trafficand introducing artificial delays. Defenses can typicallybe classified as either based on constant-rate traffic ornot, where constant rate defenses force all traffic to fita pre-determined structure, forming collision sets forwebsites where their traffic traces appear identical toan attacker. Non-constant rate defenses simply more-or-less randomly inject dummy traffic and/or artificial de-lays with the hope of obfuscating the resulting networktraces. WF defenses are typically compared in terms oftheir induced bandwidth (BOH) and time (TOH) over-heads compared to no defense. Further, different WFdefenses make more or less realistic and/or practical as-sumptions, making comparing overheads necessary butnot nearly sufficient for reaching conclusions.

We briefly describe WF defenses that we later useto evaluate the impact of attackers performing enhancedWF attacks with access to WOs:Walkie-Talkie by Wang and Goldberg [72] puts Tor

Browser into half duplex mode and pads traffic suchthat different websites result in the same cell se-quences. This creates a collision set between a vis-ited website and a target decoy website which resultsin the same cell sequence with the defense. Theirevaluation shows 31% BOH and 34% TOH. Colli-sion sets grow beyond size two at the cost of BOH.

WTF-PAD by Juarez et al. [31] is based on the ideaof adaptive padding [59] where fake padding is in-jected only when there is no real traffic to send. Thedefense is simulated on collected packet traces andits design is the foundation of the circuit padding

Page 5: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 239

framework recently implemented in Tor. The simu-lations report 50-60% BOH and 0% TOH.

CS-BuFLO by Cai et al. [6] is a constant rate defensewhere traffic is always sent at a constant rate be-tween a sender and receiver, improving on priorwork by Dyer et al. [21]. Their evaluation shows220-270% BOH and 270-340% TOH.

Tamaraw by Cai et al. [7] is another constant ratedefense that further improves on CS-BuFLO. Inthe evaluation by Wang and Goldberg, they report103% BOH and 140% TOH for Tamaraw [72].

DynaFlow by Lu et al. [36] is a dynamic constant-rate defense that allows for the defense to adjustits parameters (notably the “inter-packet interval”)based on configuration and on the observed traffic.The evaluation shows an overall improvement overTamaraw when configured to use similar overheads.

The primary downside of defenses like Walkie-Talkiethat depend on creating collision sets for websites is thatthey require up-to-date knowledge of the target web-site(s) to create collisions with (to know how to morphthe traffic traces): this is a significant practical issuefor deployment [45, 70, 72]. Constant rate defenses likeCS-BuFLO and Tamaraw are easier to deploy but suf-fer from significant overheads [6, 7]. WTF-PAD is hardto implement both efficiently and effectively in practicedue to only being simulated on packet traces as-is andalso being vulnerable to attacks like Deep Fingerprint-ing [31, 60]. While DynaFlow shows great promise, butrequires changes at the client (Tor Browser, local relay,or both) and at exit relays to combine packets with pay-loads smaller than Tor’s cell size [36]. Without combinedpackets its advantage in terms of overhead compared toTamaraw likely shrinks.

2.4 Challenges for WF Attacks in Practice

A number of practical challenges for an attacker per-forming WF attacks have been highlighted over theyears, notably comprehensively so by Mike Perry ofthe Tor Project [49] and Juarez et al. [30]. Wang andGoldberg have showed that several of the highlightedchallenges—such as maintaining a fresh data set anddetermining when websites are visited—are practical toovercome [71]. What remains are two notably significantchallenges: distinguishing between different goals of theattacker and addressing false positives.

For attacker goals when performing WF attacks, anattacker may want to detect website visits with the goal

of censoring access to it, to identify all users that visitparticular websites, or to identify every single websitevisited by a target [49]. Clearly, these goals put differ-ent constraints on the attacker. For censorship, classifi-cation must happen before content is actually allowedto be completely transferred to the victim. For moni-toring only a select number of websites the attacker hasthe most freedom, while detecting all website visits bya victim requires the attacker to have knowledge of allpossible websites on the web.

For addressing false positives there are a numberof aspects to take into account. First, the web has mil-lions of websites that could be visited by a victim (notthe case for onion services [28]), and each website has asignificant number of webpages that are often dynam-ically generated and frequently changed [30, 49]. Sec-ondly, how often victims potentially visit websites thatare monitored by an attacker is unknown to the at-tacker, i.e., the base rate of victims are unknown. Thebase rate leads to even a small false positive rate ofa WF attack overwhelming an attacker with orders ofmagnitude more false positives than true positives, leav-ing WF attacks impractical for most attacker goals inpractice.

3 Website OraclesWe first define a WO and then present two generic con-structions for use with WF attacks based on the kind ofoutput the WF attack supports.

3.1 Defining Website Oracles

Definition 1. A website oracle answers true or falseto the question “was a particular monitored website w

visited over the Tor network at time t?”.

A WO considers only websites and not webpages for w,but note that even for webpage fingerprinting being ableto narrow down the possible websites that webpages be-long to through WO access is a significant advantage toan attacker. The time t refers to a period of time ortimeframe during which a visit should have taken place.Notably, different sources of WOs may provide differentresolutions for time, forcing an attacker to consider atimeframe in which a visit could have taken place. Forexample, timestamps in Apache or nginx access logs useregular Unix timestamps as default (i.e., seconds), while

Page 6: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 240

CDNs like Cloudflare maintain logs with Unix nanosec-ond precision. Further, there are inherent limitations inapproximating t for the query when the attacker in addi-tion to WO access can only directly observe traffic fromthe victim into Tor. We explore this later in Section 5.3.

One important limitation we place on the use of aWO with WF is that the attacker can only query theWO for monitored websites. The open world setting isintended to capture a more realistic setting for evalu-ating attacks, and inherent in this is that the attackercannot train (or even enumerate) all possible websiteson the web. Given the ability to enumerate and query allpossible websites gives the adversary a capability in linewith a global passive adversary performing correlationattacks, which is clearly outside of the threat model ofTor [20]. We further relate correlation and confirmationattacks to WF+WO attacks in Section 8.

Definition 1 defines the ideal WO: it never fails toobserve a monitored website visit, it has no false posi-tives, and it can answer for an arbitrary t. This is sim-ilar to how encryption and decryption oracles alwaysencrypt and decrypt when modelling security for en-cryption schemes [23, 43, 52]. In practice, sources of allof these oracles may be more or less ideal and chal-lenging for an attacker to use. Nevertheless, the preva-lence of sources of these imperfect oracles motivate theassumption of an attacker with access to an ideal or-acle. Similarly, for WOs, we motivate this assumptionin Sections 4 and 5, in particular wrt. a timeframe onthe order of (milli)seconds. Section 7 further considersnon-ideal sources of WOs and the effect on WF+WOattacks, both when the WO can produce false positivesand when the source only observes a fraction of visitsto monitored websites.

3.2 Generic Website FingerprintingAttacks with Website Oracles

As mentioned in Section 2.3, a WF attack is a classi-fier that is given as input a packet trace and providesas output a classification. The classification is either amonitored site or a class representing unmonitored (inthe open world). Figure 2 shows the setting where an at-tacker capable of performing WF attacks also has accessto a WO. We define a generic construction for WF+WOattacks that works with any WF attack in the openworld in Definition 2:

Definition 2 (Binary verifier). Given a website oracleo and WF classification c of a trace collected at time

Fig. 2. WF+WO attacks, where the WO infers membership of aparticular website w in the website anonymity set of all possiblewebsites visited over Tor during a particular timeframe t.

t, if c is a monitored class, query the oracle o(c, t). Re-turn c if the oracle returns true, otherwise return theunmonitored class.

Note that the WO is only queried when the WF classi-fication is for a monitored website and that Definition 2is a generalisation of the “high precision” DefecTor at-tack by Greschbach et al. [24]. In terms of precision andfalse positives, the above generic WF+WO constructionis strictly superior to a WF attack without a WO. As-sume that the WF classification incorrectly classifiedan unmonitored trace as monitored, then there is onlya probability that a WO also returns true, depending onthe probability that someone else visited the website inthe same timeframe over Tor. If it does not, then a falsepositive is prevented. That is, a WF attack without WOaccess is identical to a WF attack with access to a use-less WO that always returns true; any improvementsbeyond that will only help the attacker in ruling outfalse positives. We consider the impact on recall later.

We can further refine the use of WOs for the sub-set of WF attacks that support providing as output anordered list of predictions in decreasing likelihood, op-tionally with probabilities, as shown in Definition 3:

Definition 3 (List verifier). Given an ordered list ofpredictions in the open world and a website oracle:

for top prediction p in list do

if p is unmonitored or oracle says p visited then

return list

move p to last in list and optionally update probabilities

First, we observe that if the WF attack thinks that itis most likely an unmonitored website, then we acceptthat because a WO can only teach us something new

Page 7: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 241

about monitored websites. Secondly, if the most likelyprediction has been visited according to the WO thenwe also accept that classification result. Finally, all thatis left to do is to consider this while repeatedly iterat-ing over the top predictions: if the top classification is amonitored website that has not been visited accordingto the WO, then move it from the top of the list andoptionally update probabilities (if applicable, then alsoset p = 0.0 before updating) and try again. Per defini-tion, we will either hit the case of a monitored websitethat has been visited according to the WO or an un-monitored prediction. As mentioned in Section 2.3, WFoutput that has some sort of probability or threshold as-sociated with classifications are useful for attackers withdifferent requirements wrt. false positives and negatives.

One could consider a third approach based on re-peatedly querying a WO to first determine if any moni-tored websites have been visited and then train an opti-mised classifier (discarding monitored websites that weknow have not been visited). While this may give a mi-nor improvement, our results later in this paper as wellas earlier work show that confusing monitored websitesis a minor issue compared to confusing an unmonitoredwebsite as monitored [24, 30, 70].

4 Sources of Website OraclesThere are a wide range of potential sources of WOs.Table 1 summarizes a selection of sources that are morethoroughly detailed in Appendix C. The table showsthe availability of the source, i.e., if the attacker needsto query the source in near real-time as a website visitoccurs or if it can be accessed retroactively, e.g., througha legal request. We also estimate qualitatively the falsepositive rate of the source, its coverage of websites it canmonitor (or fraction of Tor network traffic, dependingon source), as well as the estimated effort to access thesource. Finally, the table gives an example of an actorwith access to the source.

Next we focus on a number of sources of WOs thatwe find particularly relevant: several due to DNS in Sec-tion 4.1, the DHT of Tor onion directory services in Sec-tion 4.2, and real-time bidding platforms in Section 4.3.

4.1 DNS

Before a website visit the corresponding domain namemust be resolved to an IP address. For a user that uses

Fig. 3. The two cases when deciding on a classifier’s threshold.

Tor browser, the exit relay of the current circuit resolvesthe domain name. If the DNS record of the domain nameis already cached in the DNS cache of the exit relay,then the exit relay uses that record. Otherwise the do-main name is resolved and subsequently cached usingwhichever DNS resolution mechanism that the exit re-lay has configured. Based on this process we presentthree sources of WOs that work for unpopular websites.

4.1.1 Shared Pending DNS Resolutions

If an exit relay is asked to resolve a domain name thatis uncached it will create a list of pending connectionswaiting for the domain resolution to finish. If anotherconnection asks that the same domain name be resolved,it is added to the list of pending connections. When aresult is available all pending connections are informed.This is the basis of a WO: if a request to resolve a do-main name returns a record more quickly than previ-ously measured by the attacker for uncached entries, theentry was either pending resolution at the time of therequest or already cached. Notably this works regard-less of if exit relays have DNS caches or not. However,the timing constraints of shared pending connectionsare significant and thus a practical hurdle to overcome.

4.1.2 Tor’s DNS Cache at Exit Relays

If an unpopular website is visited by a user, the resolveddomain name will likely be cached by a single exit relay.We performed 411 exitmap [73] measurements betweenApril 1–10 (2019), collecting on average 3544 (un)cacheddata points for each exit relay using a domain nameunder our control that is not in use by anyone else.

Given a labelled data set of (un)cached times foreach exit relay, we can construct distinct per-relay clas-sifiers that predict whether a measured time corre-sponds to an (un)cached domain name. While there are

Page 8: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 242

Table 1. Comparison of a number of WO sources based on their estimated time of availability (when attacker likely has to collectdata, i.e., retroactively or real-time), False Positive Rate (FPR), coverage of website/network visits, and primary entities with access.

Source Availability FPR Coverage Effort Access

Dragnet surveillance programmes retroactive negl. high high intelligence agenciesContent Delivery Networks retroactive negl. high high operatorsReal-time bidding real-time (retroactive) negl. high modest customers (operator)Webserver access logs retroactive negl. high medium operatorsMiddleboxes retroactive [1] negl. medium medium operatorsOCSP retroactive low high medium few CAs, plaintext8.8.8.8 operator retroactive low [24] 16.8% of visits high Google, plaintext1.1.1.1 operator retroactive low [24] 7.4% of visits high Cloudflare, plaintextExit relays real-time negl. low low operatorsExit relays DNS cache real-time medium high medium anyoneQuery DNS resolvers real-time high low low anyoneOnion v2 (v3) real-time negl. high (low) low (high) anyone

Fig. 4. The difference between (un)cached standard deviation andmean times without any absolute values, i.e., a negative valueimplies that the uncached time is smaller than the cached time.

many different approaches that could be used to buildsuch a classifier, we decided to use a simple heuristicthat should result in little or no false positives: out-put ‘cached’ iff no uncached query has ever been thisfast before. Figure 3 shows the idea of this classifierin greater detail, namely create a threshold that is theminimum of the largest cached time and the smallestuncached time and then say cached iff the measuredtime is smaller than the threshold. Regardless of howwell this heuristic performs (see below), it should bepossible to construct other classifiers that exploit thetrend of smaller resolve times and less standard devia-tion for cached queries (Figure 4). For example, 69.1%of all exit relays take at least 50 ms more time to resolvean uncached domain on average.

To estimate an upper bound on how effective thecomposite classifier of all per-relay classifiers could bewithout any false positives using our heuristic, we ap-plied ten-fold cross-validation to simply exclude everyexit relay that had false positives during any fold andthen weighted the observed bandwidth for the remain-ing classifiers by the individual true positive rates. Thisgives us an estimate of how much bandwidth we could

predict true positives for without having any false pos-itives. By comparing it to the total exit bandwidth ofthe Tor network, we obtain an estimated upper boundtrue positive rate for the composite classifier of 17.3%.

When an attacker measures if a domain is cached ornot the domain will, after the measurement, be cachedfor up to an hour (current highest caching duration inTor, independent of TTL) at every exit. However, if aan attacker can cause an exit to run low on memory,the entire DNS cache will be removed (instead of onlyparts of it) due to a bug in the out-of-memory managerof Tor. We have reported this to the Tor Project3. Wefurther discuss in Section 7 how frequently an attackeron average can be expected to query a WO.

4.1.3 Caching at Recursive DNS Resolvers

For a website that is unpopular enough, there is a highchance that nobody on the web visited the websitewithin a given timeframe. This is the basis of our nextidea for a WO which is not mutually exclusive to theTor network: wait a couple of seconds after observinga connection, then probe all recursive DNS resolvers ofTor exits that can be accessed to determine whetherany monitored website was cached approximately at thetime of observing the connection by inspecting TTLs.

In 2016 Greschbach et al. [24] showed that remoteDNS resolvers like Google’s 8.8.8.8 receive a large por-tion of all DNS traffic that exit relays generate. To bet-ter understand how the DNS resolver landscape lookstoday, we repeated their experiment setup for 35 hours

3 https://trac.torproject.org/projects/tor/ticket/29617

Page 9: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 243

in February 17–18 (2019), measuring every 30 minutes.Our results show that Google (16.8%) and Cloudflare(7.4%) are both popular. Many exits use a same-AS re-solver which is presumably the ISP (42.3%), while otherexits resolve themselves (15.2%) or use a remote DNSresolver that we did not identify (18.2%). Further, wenote that there are at least one RIPE Atlas4 networkmeasurement probe in the same AS as 53.3% of all ex-its, providing access to many of the same DNS resolversas used by exits from a similar network vantage point.

Instead of using RIPE Atlas nodes we opted fora different approach which is strictly worse: queryGoogle’s and Cloudflare’s DNS resolvers from VMs in 16Amazon EC2 regions. With a simple experiment of firstvisiting a unique domain (once again under our controland only used by us) using torify curl and then query-ing the DNS resolvers from each Amazon VM to observeTTLs, we got true positive rates of 2.9% and 0.9% forGoogle and Cloudflare with 1000 repetitions. While thismay seem low, the cost for an attacker is at the time ofwriting about 2 USD per day using on-demand pric-ing. Using an identical setup we were also able to find asubset of monitored websites that yield alarmingly hightrue positive rates: 61.4% (Google) and 8.0% (Cloud-flare). Presumably this was due to the cached entriesbeing shared over a wider geographical area for somereason (however, not globally). Regardless, coupled withthe fact that anyone can globally purge the DNS cachesof Google5 and Cloudflare6 for arbitrary domain names,this is a noteworthy WO source.

4.2 Onion Service Directories in Tor

To access an onion service a user first obtains the ser-vice’s descriptor from a Distributed Hash Table (DHT)maintained by onion service directories. From the de-scriptor the user learns of introduction points selectedby the host of the onion service in the Tor network thatare used to establish a connection to the onion service ina couple of more steps [64, 65] that are irrelevant here.Observing a request for the descriptor of a monitoredonion service is a source for a WO. To observe visits fora target (known) onion service in the DHT, a relay first

4 https://web.archive.org/web/20190228145306/https://atlas.ripe.net/5 https://web.archive.org/web/20190228150306/https://developers.google.com/speed/public-dns/cache6 https://web.archive.org/web/20190228150344/https://1.1.1.1/purge-cache/

has to be selected as one out of six or eight (dependingon version) relays to host the descriptor in the DHT,and then the victim has to select that relay to retrievethe descriptor. For v2 of onion services, the location inthe DHT is deterministic [64] and an attacker can po-sition its relays in such a way to always be selected forhosting target descriptors. Version 3 of onion servicesaddresses this issue by randomising the process every24 hours [65], forcing an attacker to host a significantnumber of relays to get a WO for onion services withhigh coverage. At the time of writing, there are about3,500 relays operating as onion service directories.

4.3 Real-Time Bidding

Real-Time Bidding (RTB) is an approach towards on-line advertisement that allows a publisher to auctionad space to advertisers on a per-visit basis in realtime [68]. Google’s Display Network includes more thantwo million websites that reach 90% of all Internetusers7, and an advertiser that uses RTB must respondto submitted bid requests8 containing information suchas the three first network bytes of an IPv4 address,the second-level domain name of the visited website,and the user agent string within ≈100 ms. While theexact information available to the bidder depends onthe ad platform and the publisher’s advertisement set-tings, anonymous modes provide less revenue9. Com-bined with many flavours of pre-targeting such as IPand location filtering [67], it is likely that the bidderknows whether a user used Tor while accessing a mon-itored website. Vines et al. [67] further note that “35%of the DSPs also allow arbitrary IP white-and blacklist-ing (Admedo, AdWords, Bing, BluAgile, Criteo, Centro,Choozle, Go2Mobi, Simpli.fi)”. Finally, observe that anattacker need not win a bid to use RTB as a WO.

7 https://web.archive.org/web/20190228122431/https://support.google.com/google-ads/answer/2404191?hl=en&ref_topic=3121944%5C8 https://web.archive.org/web/20190228122615/https://developers.google.com/authorized-buyers/rtb/downloads/realtime-bidding-proto9 https://web.archive.org/web/20190228123602/https://support.google.com/admanager/answer/2913411?hl=en&ref_topic=2912022

Page 10: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 244

5 Simulating Website OraclesTo be able to simulate access to a WO for arbitrarymonitored websites we need to simulate the entire web-site anonymity set of Tor, because the anonymity set iswhat a WO queries for membership. We opt for simu-lation for ethical reasons. The simulation has three keyparts: how those visits are distributed, the number ofvisits to websites over Tor, and the timeframe (resolu-tion) of the oracle source. Note that the first two partsare easy for an attacker to estimate by simply observ-ing traffic from live Tor exit relays, something we can-not trivially do as researchers adhering to Tor’s researchsafety guidelines10. Another option available to an at-tacker is to repeatedly query a WO to learn about thepopularity of its monitored websites and based on thosefigures infer the utility of the WO. We opted to notperform such measurements ourselves, despite access toseveral WOs, due to fears of inadvertently harming Torusers. Instead we base our simulations on results fromthe privacy-preserving measurements of the Tor networkin early 2018 by Mani et al. [38].

5.1 How Website Visits are Distributed

Table 2 shows the average inferred website popular-ity from Mani et al. [38]. The average percentage doesnot add up to 100%, presumably due to the privacy-preserving measurement technique or rounding errors.Their results show that torproject.org is very popular(perhaps due to a bug in software using Tor), and be-yond that focus on Alexa’s11 top one million most pop-ular websites as bins. The “other” category is for web-sites identified not part of Alexa’s top one million web-sites ranking. For the rest of the analysis (not simula-tion) in this paper we exclude torproject.org: for one,that Tor users visit that website is unlikely to be aninteresting fact for an attacker to monitor, and its over-representation (perhaps due to a bug) will skew ouranalysis. Excluding torproject.org, about one third ofall website visits go to Alexa (0,1k], one third to Alexa(1k,1m], and one third to other websites. The third col-umn of Table 2 contains adjusted average percentages.

In our simulations for website visits we treat theentries in column two of Table 2 as bins of a histogram

10 https://web.archive.org/web/20190213130918/https://research.torproject.org/safetyboard.html11 https://www.alexa.com/topsites

Table 2. Inferred average website popularity for the entire Tornetwork early 2018, from Mani et al. [38, Figure 2].

Website Average Withoutprimary domain (%) torproject.org

torproject.org 40.1Alexa (0,10] 8.4 13.9Alexa (10,100] 5.1 8.4Alexa (100,1k] 6.2 10.3Alexa (1k,10k] 4.3 7.1Alexa (10k,100k] 7.7 12.7Alexa (100k,1m] 7.0 11.6other 21.7 35.9

with the relative size indicated by the average websitepopularity. After randomly selecting a bin (weighted bypopularity), in the case of an Alexa range we uniformlyselect a website within the range, and for the othercategory we uniformly select from one million otherwebsites. This is a conservative choice given that thereare hundreds of millions of active websites on the In-ternet. Uniformly selecting within a bin will make themore popular websites in the bin likely underrepresentedwhile less popular websites in the bin get overrepre-sented. However, we typically simulate an attacker thatmonitors ≈100 websites and use the website popularityas the starting rank of the first monitored website. Forthe most popular websites, monitoring 100 websites cov-ers the entire or significant portions of the bins (Alexa≤1k), and for less popular websites (Alexa >1k), as ourresults later show, this does not matter.

5.2 The Number of Website Visits

Mani et al. also inferred with a 95% confidence intervalthat (104±36)∗106 initial streams are created during a24 hour period in the entire Tor network [38]. Based onthis, in our simulation we assume 140 million websitevisits per day that are distributed as described aboveand occur uniformly throughout the day. While assum-ing uniformity is naive, we selected the upper limit ofthe confidence interval to somewhat negate any unrea-sonable advantage to the attacker.

5.3 A Reasonable Timeframe

Wang and Goldberg show that it is realistic to assumethat an attacker can determine the start of a webpageload even in the presence of background noise and mul-tiple concurrent website visits [71]. An attacker can fur-

Page 11: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 245

Fig. 5. Time differences between start, log, and stop events whenvisiting a website over HTTP(S) using Tor.

ther determine if a circuit is used for onion services ornot [34, 47]. Now, consider an attacker that observestraffic between a Tor client and its guard. The initialstream contains the first HTTP GET request for a typ-ical website visit. The request will be the first outgoingpacket as part of a website visit once a connection hasbeen established. When the request arrives at the desti-nation is the point in time when an oracle, e.g., instan-tiated by access logs would record this time as the timeof visit. Clearly, the exact time is between the requestand the response packets and the attacker observes thetiming of those packets. So what is a realistic timeframefor the attacker to use when it queries a WO?

Between January 22–30 (2019) we performedRound-Trip Time (RTT) measurements using fourAmazon EC2 instances that ran their own nginxHTTP(S) servers to visit themselves over Tor (withtorify curl) using a fresh circuit for each visit. Thisallowed us easy access to start and stop times for theRTT measurement, as well as the time a request ap-peared in the nginx access log (without any clock-drift).In total we collected 21,497 HTTP traces and 21,492HTTPS traces, where each trace contains start, log, andstop timestamps. Our results are shown in Figure 5.It is clear that that log-to-stop times are independentof HTTP(S). More than half of all log-to-stop times(54.5%) are within a 100 ms window (see 40–140 ms),and nearly all log-to-stop times are less than 1000 ms.

Based on our experiment results we consider threetimeframes relevant: 10 ms, 100 ms, and 1000 ms. First,10 ms is relevant as close to optimal for any attacker. Onaverage, there are only 17 website visits during a 10 mswindow in the entire Tor network. 100 ms is our defaultfor the WF experiments we perform: we consider it real-istic for many sources of WOs (e.g., Cloudflare logs andreal-time bidding). We also consider a 1000 ms time-frame relevant due to the prevalence of sources of WOswith a resolution in seconds (e.g., due to Unix times-

tamps or TTLs for DNS). Based on our simulations andthe different timeframes, Appendix A contains an anal-ysis of the utility of WOs using Bayes’ law. Appendix Bpresents some key lessons from the simulation, in partic-ular that while the resolution and resulting timeframeis an important metric in our simulation, it is minor incomparison to the overall website popularity in Tor ofthe monitored websites.

6 Deep Fingerprinting withWebsite Oracles

We first describe how we augment the Deep Fingerprint-ing (DF) attack by Sirinam et al. [60] with WO access.Next we evaluate the augmented classifier on three dif-ferent datasets with five different WF defenses. Sourcecode and datasets for simulating WF+WO attacks aswell as steps to reproduce all of the following resultsusing DF are available at github.com/pylls/wfwo.

6.1 The Augmented Classifier

As covered in the background (Section 2.3), DF is aCNN where the last layer is a softmax. The output is anarray of probabilities for each possible class. Comparedto the implementation of DF used by Sirinam et al.,we changed DF to not first use binary classification inthe open world to determine if it is an unmonitoredtrace or not, but rather such that there is one classfor each monitored website and one for unmonitored.Conceptually, this slightly lowers the performance of DFin our analysis, but our metrics show that mistakingone monitored website for another is insignificant for thedatasets used in the analysis of this paper. The principalsource of false positives is mistaking an unmonitoredwebsite for a monitored.

Given the probability of each possible class as out-put of DF, we used the second generic construction (Def-inition 3) from Section 3.2 to combine DF with a WO.To update the remaining probabilities after removinga (monitored) prediction with the help of the WO, weuse a softmax again. However, due to how the softmaxfunction is defined, it emphasizes differences in valuesabove one and actually de-emphasizes values between

Page 12: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 246

(a) No defense. (b) Walkie-Talkie [72]. (c) WTF-PAD [31].

Fig. 6. Attack simulation for Deep Fingerprinting (DF) with website oracles (100 ms timeframe) on Sirinam et al.’s dataset [60]. Thelines in each sub-figure show DF with and without website oracle access for different starting Alexa ranks for monitored websites.

zero and one12. This is problematic for us because allvalues we send through the softmax are probabilitiesthat per definition are between zero and one. To ac-count for this, we first divide each probability with themaximum probability and multiply with a constant be-fore performing the softmax. Through trial-and-error, aconstant of five gave us a reasonable threshold in prob-abilities. Note that this does not in any way affect theorder of likely classes from DF, it simply puts the prob-abilities in a span that makes it easier for us to retaina threshold value between zero and one after multiplecalls to the softmax function.

6.2 WTF-PAD and Walkie-Talkie

We use the original dataset of Sirinam et al. [60]that consists of 95 monitored websites with 1,000 in-stances each as well as 20,000 unmonitored websites(95x1k+20k). The dataset is split 8:1:1 for training,validation, and testing, respectively. Given the datasetand our changes to DF to not do binary classificationmeans that our testing dataset is unbalanced in terms ofinstances per class. Therefore we show precision-recallcurves generated by alternating the threshold for DFwith and without WO access.

Figure 6 shows the results of DF and DF+WO witha simulated WO on Sirinam et al.’s dataset with no de-fense (Figure 6a), Walkie-Talkie (Figure 6b), and WTF-PAD (Figure 6c). For the WO we use a 100 ms time-frame and plot the results for different starting Alexaranks of the 95 monitored websites. Regardless of de-fense or not, we observe that for Alexa ranks 1k andless popular websites the precision is perfect (1.0) re-gardless of threshold. This indicates that—for an at-

12 https://en.wikipedia.org/w/index.php?title=Softmax_function&oldid=883834589

tacker monitoring frontpages of websites—a 100 ms WOsignificantly reduces false positives for two-thirds of allwebsite visits made over Tor, for the vast majority ofpotentially monitored frontpages of websites. Recall isalso slightly improved.

For Walkie-Talkie we observe a significant improve-ment in precision due to WO access. Wang and Gold-berg note that the use of popular websites as decoy(non-sensitive) websites protects less-popular sensitivewebsites due to the base rate: an attacker claiming thatthe user visited the less-popular website is (per defi-nition) likely wrong, given that the attacker is able todetect both potential website visits [72]. Access to a WOflips this observation on its head: if a WO detects thesensitive less-popular website, the base rate works inreverse. The probability of an unpopular website beingboth miss-classified and visited in the timeframe is smallfor all but the most popular websites. The key questionbecomes one of belief in the base rate of the networkand that of the target user, as analysed in Appendix A.

Further, WO access improves both recall and pre-cision for all monitored websites against WTF-PAD.WTF-PAD only provides a three percentage points de-crease in recall compared to no defense for monitoredwebsites with Alexa ranks 1k and above.

6.3 CS-BuFLO and Tamaraw

To evaluate the constant-rate defenses CS-BuFLO andTamaraw by Cai et al. [6, 7] we use Wang et al.’s datasetin the open world [70]. The dataset consists of 100 mon-itored websites with 90 instances each and 9000 un-monitored sites (100x90+9k), that we randomly split(stratified) into 8:1:1 for training, validation, and test-ing. We had to increase the length of the input to DFfor this dataset, from 5000 to 25000, to ensure that wecapture most of the dataset. To get defended traces for

Page 13: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 247

(a) No defense. (b) CS-BuFLO [6], with reported 67.2% BOWand 575.6% TOH [12].

(c) Tamaraw [7], with reported 256.7% BOW and341.4% TOH [12].

Fig. 7. Attack simulation for Deep Fingerprinting (DF) [60] with website oracles (100 ms timeframe) on Wang et al.’s dataset [70].The lines in each sub-figure show DF with and without website oracle access for different starting Alexa ranks for monitored websites.

(a) No defense. (b) DynaFlow [36] config 1, with measured 59%BOH and 24% TOH.

(c) DynaFlow [36] config 2, with measured 109%BOH and 30% TOH.

Fig. 8. Attack simulation for Deep Fingerprinting (DF) [60] with website oracles (100 ms timeframe) on Lu et al.’s dataset [36]. Thelines in each sub-figure show DF with and without website oracle access for different starting Alexa ranks for monitored websites.

CS-BuFLO and Tamaraw we use the slightly modifiedimplementations as part of Cherubin’s framework [12].

Figure 7 shows the results of our simulations. DFalone is also highly effective against the original Wangdataset—as expected—and our attack simulation showsthat we can further improve it with access to website or-acles. Most importantly, both CS-BuFLO and Tamarawoffer protection against DF with and without oracle ac-cess by significantly lowering recall. Tamaraw offers anorder of magnitude better defense in terms of recall. Asimplemented in the framework by Cherubin, CS-BuFLOand Tamaraw reportedly has BOH 67.2% and 256.7%,and TOH 575.6% and 341.4%, respectively. This kindof overhead is likely prohibitively large for real-worlddeployment in Tor [6, 7, 31, 60, 72].

6.4 DynaFlow

DynaFlow is a dynamic constant-rate defense by Luet al. [36] with two configurations that result in dif-ferent overheads and levels of protection. Lu et al.gathered their own dataset of 100 monitored websiteswith 90 instances each and 9000 unmonitored websites(100x90+9k, same as Wang et al.’s [70]) to be able tocombine smaller packets, as discussed briefly in Sec-

tion 2.3. As for CS-BuFLO and Tamaraw, we had toincrease the length of the input to DF for this datasetto 25000 to ensure that we capture most of the dataset.

Figure 8 shows the results of our simulations for nodefense as well as the two configurations of DynaFlow.As for Wang et al.’s dataset [70], we see as expected thatDF is highly effective and WO access further improvesthe attack. Further, both configurations of DynaFloware effective defenses, comparable to CS-BuFLO withsignificantly lower overheads at first glance. However,note that the comparison is problematic due to Dy-naFlow combining smaller packets. The extra overheadfor config 2 over 1 is not wasted: recall is significantlyreduced, more than halved for regular DF and slightlyless than half with a WO.

7 DiscussionFor defenses that are based on the idea of creating col-lision sets between packet traces generated by websites,oracle access is equivalent to being able to perform setintersection between the set of websites in a collisionset and monitored websites visited at the time of fin-gerprinting. As the results show from Section 6, some

Page 14: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 248

defenses can significantly reduce the recall of WF at-tacks with WOs, but not the precision for the majorityof websites and website visits in Tor.

Next, in Section 7.1 we further cement that our sim-ulations show that WOs significantly reduces false posi-tives, highlighting that a WF+WO attacker surprisinglyinfrequently have to query a WO when classifying un-monitored traces. Section 7.2 discusses the impact of im-perfect WO sources with limited observability and falsepositives on the joint WF+WO attack. Finally, Sec-tion 7.3 covers limitations of our work, and Section 7.4discusses possible mitigations.

7.1 A Large Unmonitored Dataset

We look at the number of false positives for a large testdataset consisting of only unmonitored websites (rep-resenting a target user(s) with base rate 0, i.e., thatnever visits any monitored website). Using the datasetof Greschbach et al. [24], we trained DF on 100x100monitored and 10k unmonitored websites (8:2 stratifiedsplit for validation), resulting in about 80% validationaccuracy after 30 epochs (so DF is clearly successful alsoon this dataset). We then tested the trained classifieron only 100k unmonitored traces, with and without or-acle access (100ms resolution) for different assumptionsof the popularity of the monitored websites. With tenrepetitions of the above experiment, we observed a falsepositive rate in the order of 10−6 for monitored websiteswith Alexa popularity 10k. Excluding torproject.org,this indicates that an attacker would have close to nofalse positives for about half of all website visits in Tor,according to the distribution of Mani et al. [38] (seeSection 5.1). Without access to a WO, DF had a falsepositive rate in the order of 10−3 to 10−4, depending onthe threshold used by the attacker.

Recall how WOs are used as part of WF+WO at-tacks in Section 3.2: the WO is only used if the WF at-tack classifies a trace as monitored13. This means that,in the example above, the WO is used on average every103 to 104 trace, to (potentially) rule out a false posi-tive. Clearly, this means that WO sources that can onlybe used infrequently, e.g., due to caching as in DNS, arestill valuable for an attacker.

13 For WF attacks like DF that produces a list of probabilities(Definition 3), just assume that the attacker picks the thresholdand checks if the probability is above as part of the if-statementbefore using the WO.

Fig. 9. How limited WO observability effects the final recall of aWF+WO attack for five different WO false positive rates.

7.2 Imperfect Website Oracle Sources

Our analysis considered an ideal source of a WO thatobserves all visits to targeted monitored websites of theattacker and that produces no false positives. Next, us-ing the same dataset and setup as for Figure 6a with anAlexa starting rank of 104, we simulate the impact onrecall and the False-Negative-to-Positive-rate14 (FNP)of the joint WF+WO attack for five false positive ratesof the WO and a fraction of observed website visits.

Figure 9 shows the impact on the joint recall in theabove setting. We see that recall is directly proportionalto the fraction of observed visits, as per the results ofGreschbach et al. [24]. Further, false positives for theWO have a positive impact on the fraction of recall,counteracting visits missed due to limited observability.For the same reason, a larger timeframe or monitoringmore popular websites would also improve recall.

Figure 10 shows the impact on the joint FNP. Notethat lower FNP is better for an attacker. We see thatlimited observability has no impact on FNP. This makessense, because a WO cannot confirm anything it doesnot observe. The FNP fraction for the joint FNP isroughly proportional to the FP of the WO. We also seethat the FNP fraction consistently is above the FP ofthe WO: this is because—beyond the simulated FP—there is a slight probability that someone else (in oursimulation of the Tor network) visited the website foreach classified trace. A larger timeframe or monitoringmore popular websites would also increase FNP.

From above results, our simulations indicate thateven with a deeply imperfect WO source an attackercan get significant advantage in terms of reduced false

14 For a classifier with multiple monitored classes and an un-monitored class (as for our modified DF, see Section 6.1), FNPcaptures the case when the classifier classifies an unmonitoredtesting trace as any monitored class. In addition to FNP, sucha classifier can also confuse one monitored website for another.Both these cases are false positives [69].

Page 15: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 249

Fig. 10. How limited WO observability effects the final False-Negative-to-Positive-rate (FNP) of a WF+WO attack for fivedifferent WO false positive rates. Lower is better.

positives at a comparatively small cost of recall. Forexample, given a WO with 50% observability and falsepositives, the resulting WF+WO attack has about 75%of the recall of the WF attack and slightly more thanhalf the false positives.

7.3 Limitations

As discussed in Section 2.3, there are a number of prac-tical limitations in general for WF attacks. Regardingattacker goals, WOs are likely less useful for the purposeof censorship than for other goals. Many sources of WOscannot be accessed in real-time, giving little utility foran attacker that needs to make a near real-time cen-sorship decision. An attacker that only wants to detectvisits to a few selected monitored websites gains signif-icant utility from WOs, as long as the detection doesnot have to be in real-time. It is also noteworthy thatan attacker that wants to detect all possible website vis-its by a victim can use the WO to in essence “close theworld” from all possible websites to only those visitedover Tor while the victim is actively browsing. Granted,this requires a source for the WO that is slightly differ-ent from our definition, but some do offer this: e.g., anattacker that gains comprehensive control over the DNSresolvers used by Tor exits [24].

When it comes to false positives a significant limita-tion of our simulations is that we consider fingerprintingthe frontpages of websites and not specific webpages.Several sources or WOs are not able to detect webpagevisits. This is also true for subsequent webpage visits onthe same website after first visiting the frontpage of awebsite (e.g., DNS and OCSP will be cached). An at-tacker with the goal of detecting each such page visitwill thus suffer more false positives or fail at detectingthem for some sources of WOs.

7.4 Mitigations

The best defense against WOs is WF defenses that sig-nificantly reduce the recall of WF attacks. In particu-lar, if an attacker can significantly reduce the websiteanonymity set after accounting for information from theWO, then attacks are likelier to succeed. This impliesthat most websites need to (at least have the potentialto) result in the same network traces, as we see withDynaFlow, Tamaraw, and CS-BuFLO.

For onion websites we note that the DHT source ofa WO from Section 4 is inherent to the design of onionservices in Tor. Defenses that try to make it harder todistinguish between regular website visits and visits toonion websites should also consider this WO source aspart of their analysis, in particular for v2 onion services.

Finally, some sources of WOs could be minimized. Ifyou run a potentially sensitive website: do not use RTBads, staple OCSP, have as few DNS entries as possible15

with a high TTL, do not use CDNs, do not retain anyaccess logs, and consider if your website, web server, oroperating system have any information leaks that canbe used as an oracle. If you run a Tor exit, consider notusing Google or Cloudflare for your DNS but insteaduse your ISP’s resolver if possible [24].

8 Related WorkThe combination of a WF attack with a WO is a typeof Classify-Verify method as proposed by Stolerman etal. [61], which in turn is a type of rejection function asdescribed by Chow [13]. Such a method was first usedin the context of WF by Juarez et al. [30] and later byGreschbach et al. [24] to augment WF attacks with in-ferences from observed DNS traffic. Note that the attackby Greschbach et al. can be seen as a probabilistic WOdue to the attacker under their threat model only ob-serving a fraction of DNS traffic from the Tor network.Our work builds upon and generalises their work whereDNS traffic is just one of many possible sources to inferwebsite visits from. Further, our DNS-based sources areusable by anyone instead of relatively strong networkattackers (or Google or Cloudflare).

All anonymity networks produce anonymity sets(per definition) that change with observations by an at-

15 As noted by Greschbach et al. [24], websites may have sev-eral unique domain names. Each of those could be used inde-pendently to query several sources (e.g., DNS) of WOs.

Page 16: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 250

tacker over time [53]. Modelling the behaviour of ananonymity system (as a mix), what the attacker ob-serves, and how the anonymity sets change over timeallows us to reason about how the attacker can performtraffic analysis and break the anonymity provided by thesystem [19, 32, 58]. Attacks along these lines are manywith more-or-less consistent terminology, including in-tersection attacks, (statistical) disclosure attacks, andtraffic confirmation attacks [3, 15–17, 33, 53, 54, 66].

WOs are nothing more than applying the notionof anonymity sets to the potential destination websitesvisited over an anonymity network like Tor and givingan attacker the ability to query this anonymity set formembership for a limited number of monitored web-sites. The way we use WOs in our generic attacks is notto learn long-term statistically unlikely relationships be-tween senders and recipients in a network. Rather, theWO is only used to learn part of the anonymity set atthe time of the attack. That an attacker can observeanonymity sets is not novel, what is novel in our workis how we apply it to the WF domain and argue for itsinclusion as a core attacker capability when modellingWF attacks and defenses.

Murdoch and Danezis showed how to use observedlatency in Tor as an oracle to perform traffic analy-sis attacks [42]. Chakravarty et al. detailed similar at-tacks but based on bandwidth estimation [10] and Mit-tal et al. using throughput estimation [41]. Attackersin these cases do not need to be directly in control ofsignificant fractions Tor, but rather use network mea-surements to infer the state of the network and createan oracle that an attacker can utilize, similar to WOs.

Correlation of input and output flows is at the coreof many attacks on anonymity networks like Tor [5, 29,63]. Flow correlation attacks correlate traffic on the net-work layer, considering packet sizes and timing of senttraffic. The RAPTOR attack by Sun et al. [63] needsabout 100MB of data sent over five minutes to correlateflows with high accuracy. The recent state-of-the-art at-tack DeepCorr by Nasr et al. [44]—based on deep learn-ing like Deep Fingerprinting by Sirinam et al. [60]—needs only about 900KB of data (900 packets) for com-parable accuracy to RAPTOR. While flow correlationattacks like RAPTOR and DeepCorr operate on the net-work layer, WF+WO attacks can be viewed as applica-tion layer correlation attacks. WF attacks extract theapplication-layer data (the website) while WOs recon-struct parts of the anonymity set of possible monitoredwebsites visited. WF attacks need to observe most ofthe traffic generated when visiting a website that goesinto the anonymity network. While a WO does not have

to directly view any of the output flows of the network,it needs to be able to infer if a particular website wasvisited during a period of time, as shown in Section 4.

9 ConclusionsWF+WO attacks use the base rate of all users of thenetwork against victims, significantly reducing false pos-itives in the case of all but the most popular websitesvisited over Tor. This is troubling in many ways, in partbecause presumably many sensitive website visits are tounpopular websites used only by local communities inregions of the world where the potential consequencesof identification are the worst.

The threat model of Tor explicitly states that Tordoes not consider attackers that can observe both in-coming and outgoing traffic [20]. Clearly, a WO givesthe capability to infer what the outgoing traffic of thenetwork encodes on the application layer (the websitevisits). This is in a sense a violation of Tor’s threatmodel when combined with a WF attacker that alsoobserves incoming traffic. However, we argue that be-cause of the plethora of possible ways for an attackerto infer membership in the anonymity sets of Tor, WOsshould be considered within scope simply because Torasserts that it is an anonymity network.

While the real-world impact of WF attacks on Torusers remains an open question, our simulations showthat false positives can be signficantly reduced by manyattackers with little extra effort for some WO sources.Depending on WO source, this comes at a trade-off ofless recall. For many attackers and attacker goals, how-ever, this is a worthwhile trade. To us, the threat of WFattacks appears more real than ever, especially whenalso considering recent advances by deep learning basedattacks like DF [60] and DeepCorr [44].

AcknowledgementsWe would like to thank Jari Appelgren, Roger Din-gledine, Nicholas Hopper, Marc Juarez, George Kadi-anakis, Linus Nordberg, Mike Perry, Erik Wästlund,and the PETS reviewers for their valuable feedback.Simulations were performed using the Swedish NationalInfrastructure for Computing (SNIC) at High Perfor-mance Computing Center North (HPC2N). This re-search was funded by the Swedish Internet Foundationand the Knowledge Foundation of Sweden.

Page 17: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 251

References[1] C. Abdelberi, T. Chen, M. Cunche, E. D. Cristofaro,

A. Friedman, and M. A. Kâafar. Censorship in the wild:Analyzing internet filtering in syria. In IMC, 2014.

[2] K. Abe and S. Goto. Fingerprinting attack on Tor anonymityusing deep learning. Proceedings of the Asia-Pacific Ad-vanced Network, 42:15–20, 2016.

[3] O. Berthold, A. Pfitzmann, and R. Standtke. The disad-vantages of free MIX routes and how to overcome them. InInternational Workshop on Design Issues in Anonymity andUnobservability, 2000.

[4] D. Bleichenbacher. Chosen ciphertext attacks against pro-tocols based on the RSA encryption standard PKCS #1. InCRYPTO, 1998.

[5] N. Borisov, G. Danezis, P. Mittal, and P. Tabriz. Denial ofservice or denial of security? In CCS, 2007.

[6] X. Cai, R. Nithyanand, and R. Johnson. CS-BuFLO: A con-gestion sensitive website fingerprinting defense. In WPES,2014.

[7] X. Cai, R. Nithyanand, T. Wang, R. Johnson, and I. Gold-berg. A systematic approach to developing and evaluatingwebsite fingerprinting defenses. In CCS, 2014.

[8] X. Cai, X. C. Zhang, B. Joshi, and R. Johnson. Touchingfrom a distance: website fingerprinting attacks and defenses.In CCS, 2012.

[9] Y. Cao, Z. Qian, Z. Wang, T. Dao, S. V. Krishnamurthy,and L. M. Marvel. Off-path TCP exploits of the challengeACK global rate limit. IEEE/ACM Trans. Netw., 26(2):765–778, 2018.

[10] S. Chakravarty, A. Stavrou, and A. D. Keromytis. Trafficanalysis against low-latency anonymity networks using avail-able bandwidth estimation. In ESORICS, 2010.

[11] H. Cheng and R. Avnur. Traffic analysis of SSL encryptedweb browsing. Project paper, University of Berkeley, 1998.

[12] G. Cherubin. Bayes, not naïve: Security bounds on websitefingerprinting defenses. PoPETs, 2017.

[13] C. Chow. On optimum recognition error and reject tradeoff.IEEE Transactions on information theory, 16(1), 1970.

[14] T. Chung, J. Lok, B. Chandrasekaran, D. R. Choffnes,D. Levin, B. M. Maggs, A. Mislove, J. P. Rula, N. Sulli-van, and C. Wilson. Is the web ready for OCSP must-staple?In IMC, 2018.

[15] G. Danezis. Statistical disclosure attacks. In Security andPrivacy in the Age of Uncertainty, IFIP SEC, 2003.

[16] G. Danezis. The traffic analysis of continuous-time mixes. InPET, 2004.

[17] G. Danezis and A. Serjantov. Statistical disclosure or in-tersection attacks on anonymity systems. In InformationHiding, 6th International Workshop, IH, 2004.

[18] D. Das, S. Meiser, E. Mohammadi, and A. Kate. Anonymitytrilemma: Strong anonymity, low bandwidth overhead, lowlatency - choose two. In IEEE S&P, 2018.

[19] C. Díaz, S. Seys, J. Claessens, and B. Preneel. Towardsmeasuring anonymity. In PET, 2002.

[20] R. Dingledine, N. Mathewson, and P. F. Syverson. Tor: Thesecond-generation onion router. In USENIX Security, 2004.

[21] K. P. Dyer, S. E. Coull, T. Ristenpart, and T. Shrimpton.Peek-a-boo, I still see you: Why efficient traffic analysis

countermeasures fail. In IEEE S&P, 2012.[22] R. Ensafi, J. C. Park, D. Kapur, and J. R. Crandall. Idle

port scanning and non-interference analysis of network pro-tocol stacks using model checking. In USENIX Security,2010.

[23] S. Goldwasser and S. Micali. Probabilistic encryption. J.Comput. Syst. Sci., 28(2):270–299, 1984.

[24] B. Greschbach, T. Pulls, L. M. Roberts, P. Winter, andN. Feamster. The effect of DNS on Tor’s anonymity. InNDSS, 2017.

[25] J. Hayes and G. Danezis. k-fingerprinting: A robust scalablewebsite fingerprinting technique. In USENIX Security, 2016.

[26] D. Herrmann, R. Wendolsky, and H. Federrath. Websitefingerprinting: attacking popular privacy enhancing technolo-gies with the multinomial naïve-bayes classifier. In CCSW,2009.

[27] A. Hintz. Fingerprinting websites using traffic analysis. InPET, 2002.

[28] R. Jansen, M. Juárez, R. Galvez, T. Elahi, and C. Díaz.Inside job: Applying traffic analysis to measure Tor fromwithin. In NDSS, 2018.

[29] A. Johnson, C. Wacek, R. Jansen, M. Sherr, and P. F.Syverson. Users get routed: traffic correlation on Tor byrealistic adversaries. In CCS, 2013.

[30] M. Juárez, S. Afroz, G. Acar, C. Díaz, and R. Greenstadt. Acritical evaluation of website fingerprinting attacks. In CCS,2014.

[31] M. Juárez, M. Imani, M. Perry, C. Díaz, and M. Wright.Toward an efficient website fingerprinting defense. In ES-ORICS, 2016.

[32] D. Kesdogan, D. Agrawal, and S. Penz. Limits of anonymityin open environments. In Information Hiding, 5th Interna-tional Workshop, IH, 2002.

[33] D. Kesdogan and L. Pimenidis. The hitting set attack onanonymity protocols. In Information Hiding, 6th Interna-tional Workshop, IH, 2004.

[34] A. Kwon, M. AlSabah, D. Lazar, M. Dacier, and S. De-vadas. Circuit fingerprinting attacks: Passive deanonymiza-tion of Tor hidden services. In USENIX Security, 2015.

[35] M. Liberatore and B. N. Levine. Inferring the source ofencrypted HTTP connections. In CCS, 2006.

[36] D. Lu, S. Bhat, A. Kwon, and S. Devadas. Dynaflow: Anefficient website fingerprinting defense based on dynamically-adjusting flows. In WPES, 2018.

[37] D. Lyon. Surveillance, snowden, and big data: Capacities,consequences, critique. Big Data & Society, 1(2), 2014.

[38] A. Mani, T. Wilson-Brown, R. Jansen, A. Johnson, andM. Sherr. Understanding tor usage with privacy-preservingmeasurement. In IMC, 2018.

[39] N. Mathews, P. Sirinam, and M. Wright. Understandingfeature discovery in website fingerprinting attacks. In 2018IEEE Western New York Image and Signal Processing Work-shop (WNYISPW), pages 1–5. IEEE, 2018.

[40] R. Merget, J. Somorovsky, N. Aviram, C. Young, J. Fliegen-schmidt, J. Schwenk, and Y. Shavitt. Scalable scanning andautomatic classification of tls padding oracle vulnerabilities.In USENIX Security, 2019. to appear.

[41] P. Mittal, A. Khurshid, J. Juen, M. Caesar, and N. Borisov.Stealthy traffic analysis of low-latency anonymous communi-cation using throughput fingerprinting. In CCS, 2011.

Page 18: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 252

[42] S. J. Murdoch and G. Danezis. Low-cost traffic analysis ofTor. In IEEE S&P, 2005.

[43] M. Naor and M. Yung. Public-key cryptosystems provablysecure against chosen ciphertext attacks. In Proceedings ofthe 22nd Annual ACM Symposium on Theory of Computing,1990.

[44] M. Nasr, A. Bahramali, and A. Houmansadr. Deepcorr:Strong flow correlation attacks on Tor using deep learning.In CCS, 2018.

[45] R. Nithyanand, X. Cai, and R. Johnson. Glove: A bespokewebsite fingerprinting defense. In WPES, 2014.

[46] A. Panchenko, F. Lanze, J. Pennekamp, T. Engel, A. Zin-nen, M. Henze, and K. Wehrle. Website fingerprinting atinternet scale. In NDSS, 2016.

[47] A. Panchenko, A. Mitseva, M. Henze, F. Lanze, K. Wehrle,and T. Engel. Analysis of fingerprinting techniques for Torhidden services. In WPES, 2017.

[48] A. Panchenko, L. Niessen, A. Zinnen, and T. Engel. Web-site fingerprinting in onion routing based anonymizationnetworks. In WPES, 2011.

[49] M. Perry. A critique of website traffic fingerprinting attacks,https://web.archive.org/web/20190208082403/https://blog.torproject.org/critique-website-traffic-fingerprinting-attacks.

[50] A. Pfitzmann and M. Hansen. A terminology for talkingabout privacy by data minimization: Anonymity, unlinka-bility, undetectability, unobservability, pseudonymity, andidentity management. 34, 01 2010.

[51] Z. Qian, Z. M. Mao, and Y. Xie. Collaborative TCP se-quence number inference attack: how to crack sequencenumber under a second. In CCS, 2012.

[52] C. Rackoff and D. R. Simon. Non-interactive zero-knowledgeproof of knowledge and chosen ciphertext attack. InCRYPTO, 1991.

[53] J. Raymond. Traffic analysis: Protocols, attacks, designissues, and open problems. In International Workshop onDesign Issues in Anonymity and Unobservability, 2000.

[54] M. G. Reed, P. F. Syverson, and D. M. Goldschlag. Anony-mous connections and onion routing. IEEE Journal on Se-lected Areas in Communications, 16(4):482–494, 1998.

[55] V. Rimmer, D. Preuveneers, M. Juárez, T. van Goethem,and W. Joosen. Automated website fingerprinting throughdeep learning. In NDSS, 2018.

[56] E. Ronen, R. Gillham, D. Genkin, A. Shamir, D. Wong, andY. Yarom. The 9 lives of bleichenbacher’s CAT: new cacheattacks on TLS implementations. IACR Cryptology ePrintArchive, 2018:1173, 2018.

[57] Q. Scheitle, O. Hohlfeld, J. Gamba, J. Jelten, T. Zimmer-mann, S. D. Strowes, and N. Vallina-Rodriguez. A long wayto the top: Significance, structure, and stability of internettop lists. In IMC, 2018.

[58] A. Serjantov and G. Danezis. Towards an information theo-retic metric for anonymity. In PET, 2002.

[59] V. Shmatikov and M. Wang. Timing analysis in low-latencymix networks: Attacks and defenses. In ESORICS, 2006.

[60] P. Sirinam, M. Imani, M. Juárez, and M. Wright. Deepfingerprinting: Undermining website fingerprinting defenseswith deep learning. In CCS, 2018.

[61] A. Stolerman, R. Overdorf, S. Afroz, and R. Greenstadt.Classify, but verify: Breaking the closed-world assumption instylometric authorship attribution. In IFIP Working Group,

volume 11, page 64, 2013.[62] Q. Sun, D. R. Simon, Y. Wang, W. Russell, V. N. Padman-

abhan, and L. Qiu. Statistical identification of encryptedweb browsing traffic. In IEEE S&P, 2002.

[63] Y. Sun, A. Edmundson, L. Vanbever, O. Li, J. Rexford,M. Chiang, and P. Mittal. RAPTOR: routing attacks onprivacy in Tor. In USENIX Security, 2015.

[64] Tor Project. Tor rendezvous specification - version 2, https://gitweb.torproject.org/torspec.git/tree/rend-spec-v2.txt.

[65] Tor Project. Tor rendezvous specification - version 3, https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt.

[66] C. Troncoso, B. Gierlichs, B. Preneel, and I. Verbauwhede.Perfect matching disclosure attacks. In PETS, 2008.

[67] P. Vines, F. Roesner, and T. Kohno. Exploring ADINT:using ad targeting for surveillance on a budget - or - howalice can buy ads to track bob. In WPES, 2017.

[68] J. Wang, W. Zhang, and S. Yuan. Display advertising withreal-time bidding (RTB) and behavioural targeting. Founda-tions and Trends in Information Retrieval, 2017.

[69] T. Wang. Website Fingerprinting: Attacks and Defenses.PhD thesis, University of Waterloo, 2015.

[70] T. Wang, X. Cai, R. Nithyanand, R. Johnson, and I. Gold-berg. Effective attacks and provable defenses for websitefingerprinting. In USENIX Security, 2014.

[71] T. Wang and I. Goldberg. On realistically attacking Tor withwebsite fingerprinting. PoPETs, 2016(4):21–36, 2016.

[72] T. Wang and I. Goldberg. Walkie-talkie: An efficient defenseagainst passive website fingerprinting attacks. In USENIXSecurity, 2017.

[73] P. Winter, R. Köwer, M. Mulazzani, M. Huber, S. Schrit-twieser, S. Lindskog, and E. R. Weippl. Spoiled onions:Exposing malicious tor exit relays. In PETS, 2014.

A Bayes’ Law for EstimatingUtility of Website Oracles

To reason about the advantage to an attacker of havingaccess to a WO, we estimate the conditional probabil-ity of a target user visiting a monitored website. Forconditional probability we know that:

P (C0 ∩ C1) = P (C0|C1)P (C1) (1)

For a hypothesis H given conditional evidence E,Bayes’ theorem states that:

P (H|E) = P (E|H)P (H)P (E) (2)

Assume that E = E0 ∩ E1, then:

P (H|E0 ∩ E1) = P (E0 ∩ E1|H)P (H)P (E0 ∩ E1) (3)

Substituting P (E0 ∩ E1) with (1) we get:

P (H|E0 ∩ E1) = P (E0 ∩ E1|H)P (H)P (E0|E1)P (E1) (4)

Page 19: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 253

(a) 10 ms (b) 100 ms (c) 1000 ms

Fig. 11. The conditional probability as a function of user base rate and website popularity (Alexa) for three different timeframes.

For a timeframe t, we defineH the probability that target user(s) visited website w

over Tor in t

E0 the probability that target user(s) visited a websiteover Tor in t

E1 the probability that someone visited website w overTor in t

We see that P (E0 ∩ E1|H) = 1 by definition and get:

P (H|E0 ∩ E1) = P (H)P (E0|E1)P (E1) (5)

Consider P (E0|E1): while the conditional E1 mayhave some minor affect on user behaviour (in particularfor overall popular websites), we assume that the popu-larity of using Tor to visit a particular website (by anyof the users of Tor) has negligible impact on E0 andtreat E0 and E1 as independent:

P (H|E0 ∩ E1) = P (H)P (E0)P (E1) (6)

We can further refine P (H) as being composed ofat least:

P (H) = P (E0) ∩ P (Bw) = P (E0|Bw)P (Bw) (7)

Where P (Bw) is the base rate (prior) of the user(s)visiting website w out of all possible websites they visit(P (E0)). We again assume (perhaps naively) that E0 isalso independent of Bw, which gives us:

P (H|E0 ∩ E1) = P (E0)P (Bw)P (E0)P (E1) = P (Bw)

P (E1) (8)

In other words, if an attacker learns that targetuser(s) visited a website (E0) over Tor and that websitew was also visited over Tor by some user (E1), then wecan estimate the probability that it was target user(s)that visited website w (H) as the ratio between the baserate (prior) for visiting w of target user(s) (Bw) and the

probability that someone visited the website over Tor(E1), all within a timeframe t.

Figure 11 shows the results for simulating the prob-ability P (H|E0 ∩ E1) for different website popularitiesof w, base rates, and timeframes. We see that with a re-alistic timeframe of 100 ms, for all base-rates but 10−6

there is non-zero conditional probability (and thereforeutility of WO access) for Alexa top 100k or less popularwebsites, which covers about half of all website visitsover Tor (excluding torproject.org).

B Lessons from SimulationWith the ability to simulate access to WOs we can nowsimulate the entire website anonymity set for Tor. Toget a better understanding of why WOs are so usefulfor an attacker performing WF attacks, we look at tworesults from the simulation below.

B.1 Time Until Website Visited over Tor

Figure 12 shows the time until there is a 50% probabilitythat a website has been visited over Tor depending onwebsite popularity (Alexa, as discussed in Section 5.1).Within ten seconds, we expect that most of Alexa top 1khas been visited. Recall that this represents about onethird of all website visits over Tor. The less popular web-sites on Alexa top one-million represent another thirdof all visits, quickly approaching hundreds of secondsbetween visits. For the remaining third of all websitevisits we expect them to be even less frequent.

B.2 Visits Until First False Positive

Assume that target user(s) have a base rate of 0, i.e.,they never visit the attacker’s monitored websites. With

Page 20: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 254

Fig. 12. The simulated time until there is a 50% probability thata website for different Alexa ranks has been visited over Tor.

Fig. 13. The number of website visits until there is a 50% proba-bility that a website oracle would contribute to a false positive.

WO access, we can determine how many (naively as-sumed independent) website visits it at least takes untilthere is a 50% chance that the attacker’s classifier getsa false positive. This is because if the attacker’s websiteclassifier without oracle access always returns a falsepositive, then the false positive rate by the WF+WOattack will be determined by when the WO says thatthe—incorrectly classified as monitored—website hasbeen visited. Figure 13 shows the expected number ofvisits by the victim(s) for different timeframes based onthe popularity of the monitored websites. Note that theattacker per definition chooses which websites are mon-itored and can therefore take the probability of falsepositives into account.

C Sources of Website OraclesThere are a wide number of possible sources to instan-tiate WOs. Here we present some details on a selectionof sources, far from exhaustive.

C.1 Surveillance Programmes

Intelligence agencies operate surveillance programmesthat perform bulk collection and retention of communi-cations metadata, including web-browsing [37]. For ex-ample, the Snowden revelations included Marina:

Of the more distinguishing features, Marina has the abilityto look back on the last 365 days’ worth of DNI (DigitalNetwork Intelligence) metadata seen by the Sigint collec-tion system, regardless whether or not it was tasked forcollection.16

Another example is the prevalence of nation states tomonitor Internet traffic that crosses geographic bor-ders. For example, China operates the Great Firewallof China that is also used for censorship purposes. Dueto the nature of Tor and how exits are selected, visitsto websites that are not operated by world-wide reach-ing hosting providers are highly likely to cross multiplenation borders as traffic goes from an exit to the web-site. It is also worth to highlight that any sensitive web-site hosted from within a country where a state actoris interested in identifying visitors are likely to capturetraffic to that website due to the Tor traffic crossing itsborders more often than not.

C.2 Content Delivery Networks

Content Delivery Networks (CDNs), such as Akamai,Google, and Amazon host different types of contentfor a significant fraction of all websites on the Inter-net [57]. Inherently, all requests for these resources areeasily identified as coming from Tor exits, and depend-ing on content, things like unique identifiers and HTTPreferrer headers enable the CDN provider to infer thewebsite the content is hosted on.

C.3 Internet Giants

Internet giants like Google, Apple, Facebook, Amazon,Microsoft, and Cloudflare make up a large fraction theweb as we know it. For example the use of GoogleAnalytics is wide-spread, so is hosting in clouds pro-vided by several of these giants, and Cloudflare withits “cloud network platform” hosts over 13 million do-mains17. While some of them may do what is in theirpower to protect the valuable data they process andretain, they are still subject to many legal frameworksacross the world that might not offer the best of pro-tections for, say, access logs pertaining to “anonymous”

16 https://web.archive.org/web/20190227151029/https://www.theguardian.com/world/2013/sep/30/nsa-americans-metadata-year-documents17 https://web.archive.org/web/20190227165133/https://www.cloudflare.com/

Page 21: Tobias Pulls* and Rasmus Dahlberg ...WebsiteFingerprintingwithWebsiteOracles 236 fenses2 based on adaptive padding [31,59]. However, theunclearreal-worldimpactofWFattacksmakesde

Website Fingerprinting with Website Oracles 255

users of Tor when requested by authorities of nationstates. As another example, Cloudflare offers a nice APIfor their customers to get their access logs with Unixnanosecond precision. The logs are retained for up toseven days18, giving ample time for legal requests.

C.4 Access Logs of Web Servers

The vast majority of web servers retain access logsby default. Typically, they provide unix timestampswith seconds as the resolution (the case for Apacheand nginx). Further, the access logs may be shippedto centralised security information and event manage-ment (SIEM) systems for analysis, with varying reten-tion times and rigour in storage. For example, it is com-mon to “anonymize” logs by removing parts of the IP-addresses and then retaining them indefinitely, as is thecase for Google who removes part of IP addresses in logsafter nine months19.

C.5 Middleboxes

Network middleboxes that observe, analyse, and poten-tially retain network traffic abound. Especially in moreoppressive countries, middleboxes are often used for cen-sorship or dragnet surveillance, e.g., as seen with BlueCoat in Syria [1].

C.6 OCSP Responders

Chung et al. [14] found in a recent study that 95.4%of all certificates support the Online Certificate StatusProtocol (OCSP), which allows a client to query theresponsible CA in real-time for a certificate’s revoca-tion status via HTTP. As such, the browsed websitewill be exposed to the CA in question. From a privacy-standpoint this could be solved if the server stapled arecently fetched OCSP response with the served certifi-cate. Unfortunately, only 35% of Alexa’s top-one-millionuses OCSP stapling [14].

Unless an OCSP response is stapled while visitinga website in a default configuration of the Tor browser,the status of a certificate is checked in real-time using

18 https://web.archive.org/web/20190227165850/https://developers.cloudflare.com/logs/faq/19 https://web.archive.org/web/20190227170903/https://policies.google.com/technologies/retention

OCSP. As such, any CA that issued a certificate fora website without OCSP stapling could instantiate aWO with an RTT-based resolution. Similarly, any ac-tor that observes most OCSP traffic (which is in plain-text due to HTTP) gets the same capability. To bet-ter understand who could instantiate a WO based onOCSP we performed preliminary traceroute measure-ments20 on the RIPE Atlas network towards four OCSPresponders that are hosted by particularly large CAs:Let’s Encrypt, Sectigo, DigiCert, and GoDaddy. Let’sEncrypt and Sectigo are fronted by a variety of actors(mainly due to CDN caching), while DigiCert is frontedby a single CDN. Requests towards GoDaddy’s OCSPresponder always end-up in an AS hosted by GoDaddy.

C.7 Tor Exit Relays

Anyone can run a Tor exit relay and have it be usedby all Tor users. Obviously, the operator of the exitrelay can observe when its relay is used and the des-tination websites. At the time of writing, the consumedexit bandwidth of the entire Tor network is around50 Gbit/s. This makes the necessary investment for anattacker that wishes to get a decent chunk of exit band-width more a question of stealthily deploying new exitrelays than prohibitively large monetary costs.

C.8 Information Leaks

More sophisticated attackers can look for informationleaks at the application, network, and operating systemlevels that allow them to infer that websites have beenvisited. Application level information leaks are particu-larly of concern for onion services: any observable statethat can be tied to a new visitor is a WO for an onionvisit (this is not the case for “regular” websites). Suchstate can include online status or the number of onlineusers of a service, any observable activity with times-tamps, a predictable caching structure, and so on. Sim-ilar information leaks can also occur on the network andoperating system level [9, 22, 51].

20 Every RIPE Atlas probe used its configured DNS resolver(s).In total we requested 2048 WW-probes for one-off measure-ments.