Mohsen Imani*, Mehrdad Amirabadi, and Matthew Wright ... · 2018. 9. 3. · proposed mechanisms in terms of both relay-level and network-leveladversaries. Relay-Level Adversary Model

2018; 2018 (x):1–20

Mohsen Imani*, Mehrdad Amirabadi, and Matthew Wright

Modified Relay Selection and Circuit Selectionfor Faster TorAbstract: Users of the Tor anonymity system suf-fer from less-than-ideal performance, in part becausecircuit building and selection processes are not tunedfor speed. In this paper, we examine both the pro-cess of selecting among pre-built circuits and the pro-cess of selecting the path of relays for use in buildingnew circuits to improve performance while maintain-ing anonymity. First, we show that having three pre-built circuits available allows the Tor client to iden-tify fast circuits and improves median time to first byte(TTFB) by 15% over congestion-aware routing, the cur-rent state-of-the-art method. Second, we propose a newpath selection algorithm that includes broad geographiclocation information together with bandwidth to reducedelays. In Shadow simulations, we find 20% faster me-dian TTFB and 11% faster median total download timesover congestion-aware routing for accessing webpage-sized objects. Our security evaluations show that thisapproach leads to better or equal security against ageneric relay-level adversary compared to Tor, but in-creased vulnerability to targeted attacks. We explorethis trade-off and find settings of our system that offergood performance, modestly better security against ageneric adversary, and only slightly more vulnerabilityto a targeted adversary.

Keywords: Tor network, performance, relay selection,circuit selection

DOI foobar

1 IntroductionTor provides anonymity for millions of users around theworld by routing their traffic over paths selected fromapproximately 7,000 volunteer-run relays.1 Tor effec-tively hides the user among all the users, so having moreusers and more traffic enhances anonymity for all [7, 11].Unfortunately, Tor users often face large delays and longdownload times, which can discourage users and therebyreduce anonymity. In this paper, we examine two ap-

*Corresponding Author: Mohsen Imani: The Universityof Texas at Arlington, E-mail: [email protected] Amirabadi: The University of Texas at Arlington,E-mail: [email protected] Wright: Rochester Institute of Technology, E-mail:[email protected]

1 https://metrics.torproject.org/, accessed October. 2017

proaches to improve Tor performance and evaluate themin term of both performance and security.

Circuit Selection. The client’s traffic in Tor goesthrough a three-hop encrypted channel, called a circuit.When the user makes a request, such as for a webpage,Tor attaches the new stream (by opening a SOCKS con-nection) to a circuit. The Tor client builds circuits pre-emptively based on the client’s use or immediately ifthere is no current circuit to handle the stream. Torcurrently does not use any performance criteria in se-lecting a circuit. In this paper, we evaluate using thelength of the circuits, their congestion, the Round TripTime (RTT), or a combination of them in choosing afast circuit. We also find that the number of availablecircuits in Tor is often small, between one and three cir-cuits, such that picking the best circuit for performancedoes not have much effect in practice. As the numberof available circuits increases, the chance of finding afast and high performance circuit should increase. Tothis end, for each circuit selection criteria we study, weevaluate the impact of more available circuits in termsof both performance and security.

Relay Selection. For circuit selection to be effective,some of the available circuits must be reasonably highperforming. To improve the chances of this, we modifythe relay selection mechanism to build short and high-bandwidth circuits. Tor clients select paths in a waythat balances traffic load among the relays accordingto their advertised bandwidths, but they do not makeany consideration for the locations of relays relative tothe clients, their destinations, or the other relays in thepath. Paths can jump around the globe, which is intu-itively good for anonymity but measurably bad for per-formance.

Prior work has examined improving path selectionin Tor for better performance, considering factors suchas bandwidth [29], congestion [34], latency [28], and lo-cation [8].

Wacek et al. performed a comprehensive study ofpath selection [33], and they found that congestion-aware routing [34] offers the best combination of per-formance and anonymity among the tested approaches.They also found that approaches that emphasized la-tency but failed to consider bandwidth had poor per-

arX

iv:1

608.

0734

3v3

[cs

.CR

] 3

0 A

ug 2

018

https://metrics.torproject.org/

Modified Relay Selection and Circuit Selection for Faster Tor 2

formance, and they suggested that an approach thatoptimized both latency and bandwidth could do betterthan any of their tested approaches. In this paper, wetake on this suggestion and explore designs that addressboth criteria.

We make the following contributions:– We define nine circuit selection approaches using

the geographical length, circuit delay, congestion,or a combination of these. We evaluate each of theapproaches and compare them experimentally.

– In our relay selection approach, combined weight-ing, we explore the design of a single weightingfunction that balances bandwidth and geographicalinter-node distance. We examine the design issuesin our approach and compare it with the state ofthe art.

– To prevent delays, it is important to build circuits inadvance of their use [33]. Since we want to use des-tination location information to better inform ourpath-selection strategy, we build circuits in advance,using popular destinations as the end points. Wethen select from among these circuits base on ourfindings in circuit selection approaches.

– We show the results of experiments on our ap-proaches in Shadow, following the methodology ofJansen et al. [19], and we examine a range of param-eters. We find a number of settings in our proposedmethods that offer reasonable anonymity and sig-nificant performance improvements over congestion-aware routing, the current state-of-the-art in Torpath selection. In particular, our recommended ap-proach provides a 20% reduction in median time tofirst byte and a 11% reduction in median time tolast byte compared to congestion-aware routing.

– We also measure the security of our approaches us-ing Gini coefficient and entropy on first-and-lastcombinations, with the rate of path compromise inthe presence of relay-level and AS-level adversaries,and in the presence of four targeted relay-level at-tacks. We find that both of our approaches provideanonymity in line with Tor at settings that alsoprovide significant performance improvements. Ourrecommended approach has a slightly better Ginicoefficient and entropy than Tor, with slightly fewercompromised paths against our attackers.

2 Related workResearchers have addressed Tor performance issues ina variety of ways, such as modifying circuit schedul-ing [31], congestion control [34], traffic splitting [9],and incentives to encourage users to offer their band-width [13, 22]. In this section, we first briefly overviewTor’s current path selection mechanism and then dis-cuss the prior works on enhancing performance in Torfrom circuit selection and relay selection point of view.

Tor. Tor is a volunteer-based overlay network pro-viding anonymity online. Details are available at http://www.torproject.org/ and in the original design pa-per [12]. In Tor, the choice of relays is governed a com-plex weighting function2 that includes various consider-ations and the bandwidths of the relays. Weighting bybandwidths serves to balance load in the system, as re-lays have huge variance in advertised bandwidth, withthe bottom quintile under 2 Mbps, a median of about10 Mbps, and a maximum of 1 Gbps as of May 2016.3

Circuit Selection. Can et al. [31] propose a circuitscheduling mechanism that gives high priority to inter-active traffic over bulk traffic on the same connection.This circuit selection mechanism has been deployed inTor relays, but it has no impact on the client. Our cir-cuit selection approaches are designed to improve theperformance from the client side and are thus orthogo-nal to scheduling in the relays.

Wang et al. introduce node latency as a parameterto measure a relay’s congestion [34]. In their approach,congestion-aware routing (CAR), the client calculatescongestion delay using both active and opportunisticmethods. It then uses the measured latency to avoidcongested nodes during path selection and to avoid se-lecting congested paths. They use both short-term andlong-term congestion in their work, where short-termcongestion is caused by current traffic levels and long-term congestion is caused by the relay’s bandwidth.Their results show improvement in quality of serviceand load balancing. In our evaluation of circuit selectionmethods, we also examine the use of congestion timesand compare them with RTTs and circuit lengths.

The current Tor client measures the Circuit BuildTime (CBT), i.e. the time to construct the circuit, anduses this to discard slow circuits whose CBT is above a

2 Full details at https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt.3 http://metrics.torproject.org/

http://www.torproject.org/

http://www.torproject.org/

https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt

https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt

http://metrics.torproject.org/


client-specific threshold. Annessi and Schmiedecker [10]propose that Tor should use the circuit round trip times(RTTs) in eliminating slow circuits instead of CBTs. Inthis method, the circuit RTT is actively measured afterthe circuit is built, and if it is longer than a timeout thecircuit is discarded from future uses. In their study, thisprovided only 3% improvement in the time to down-load the first byte with mixed anonymity results. Ourstrategy in this paper is different from both approaches.Rather than examining circuits after their creation todiscard them or keep them, we instead try to pick ahigh performing circuits in the first place.

Relay Selection. A number of improvements to Torpath selection have been investigated [8, 28, 29]. Waceket al. examine Tor path selection in a comprehensivestudy with experiments running many simultaneousclients [33]. They create a model of the Tor networkto evaluate the recent published papers modifying pathselection and show results for throughput, time to lastbyte (TTLB), and round-trip time (RTT). They testedTor, Snader/Borisov [29], Unweighted Tor, in which Torrelays get selected uniformly at random, Coordinate [28]in which path selection is based on estimated pair-wiselatencies, LASTor [8], and Congestion-Aware [34]. Theirinvestigation shows that path selection algorithms thatdo not consider bandwidth as a factor in relay selectionhave poor performance. Congestion-aware had nearlythe best performance in throughput and time-to-first-byte, plus it had anonymity approximately in line withTor and significantly better than other high-performingalgorithms. We thus select it for comparison in our work.

Improving performance can also affect attackers, po-tentially providing them better attacking opportunitiesand more accurate measurements. There are several at-tacks that use latency and throughput information tode-anonymize Tor users [16, 18, 26]. Geddes et al. [17]introduced a new class of attacks, called induced throt-tling, that exploit performance-enhancing mechanismsto throttle and unthrottle a circuit and identify theuser. They evaluated the vulnerability of performanceimprovements, such as congestion control and traffic ad-mission control to these attacks, and they found thatthere are highly effective attacks that can uniquely iden-tify users. While this does not directly affect our ap-proaches, we recognize that there is generally a trade-offbetween anonymity and performance.

3 Model and Goals

3.1 Network Model

Testing new path and circuit selection strategies on thelive Tor network is challenging and could compromiseusers’ anonymity or their harm their performance. Weperform our simulations in Shadow [4, 21], a discrete-event simulator that runs the Tor code in a complete,but scaled-down, network. Shadow simulates the under-lying network and it considers network attributes suchas packet loss, bandwidth upstream and downstream,jitter, latency, and network edges. In our performanceevaluations, we used a scaled-down model of Tor, whichconsists of 1100 clients and 220 relays; this scaled-downmodel was built based on the procedures of Jansen etal. [19] and measurements from the live Tor network(from July 2015). In our security evaluations, we use alarger scaled-down model of 2127 relays with one clientat a time.

3.2 Attacker Model

As with prior work in Tor performance [8, 28, 29, 33],our attention is more on performance characteristicsthan on attacks. We only seek to validate that our ap-proach does not significantly weaken the anonymity pro-vided by Tor currently. We evaluate the security of ourproposed mechanisms in terms of both relay-level andnetwork-level adversaries.

Relay-Level Adversary Model. In the relay-leveladversary model, we assume that the adversary is run-ning some Tor relays in the network with the goal ofgetting into the guard and exit positions of some cir-cuits. An adversary in such a position can observe theentry and exit traffic and correlate them to link theclients to their destinations. To evaluate the security ofour proposed circuit selection mechanism, we simulateour proposed method, CAR, and vanilla Tor in Shadowand randomly mark one of our guards and one of our ex-its as malicious relays. We then extract the streams andidentify which ones were compromised. We repeat thisprocess 10 times, and measure the compromise rates allover 10 repetitions.

To evaluate the security of our proposed relay se-lection mechanism at the relay level, we first follow theapproach of Wacek et al., who use the Gini coefficientand entropy as measures of the diversity of paths takenby each of their studied approaches [33]. We consider


a high-bandwidth attacker who adds a modest numberof high-bandwidth ORs into the Tor network. Since ourpath selection algorithm uses distance as well as band-width, leading to our path selection algorithms pickhigh-bandwidth ORs with short distance more often,this attacker is aimed at capturing a large number ofcircuits. We also consider four targeted attack strategiesin which the attacker targets a specific client, a specificdestination, a specific client and destination, or with nospecific target. In all these strategies, the attacker placeshis relays in the target’s exact location to have minimumdistance and a high chance to be selected. Our targetedattack scenarios are thus worst cases.

Network-Level Adversary Model. The adversarycan control some network components like ASes orIPXs. If the entry traffic and exit traffic of an anony-mous connection traverse through the adversary’s net-work components, the adversary observes both sides ofthe traffic and deanonymizes the clients. We evaluatethe security of our circuits selection mechanisms andrelays selection mechanisms in the network level. In cir-cuits selection and relay selection mechanisms, we sim-ulate each of the proposed mechanisms in Shadow andextract the streams, including their paths. To determinethe compromised streams, we use the algorithm pro-posed by Qiu and Gao [27] to infer the AS paths on boththe entry side of the circuit (between clients and guards)and and exit side (between exits and servers). Qiu andGao’s algorithm exploits known paths from BGP tablesto improve the inferred paths. In measuring the compro-mise rates, we consider the possibility of an asymmetrictraffic correlation attack that can happen between datapath and ack path, which is one of the RAPTOR attacksproposed by Sun et al. [30].

3.3 Design Goals

We seek an algorithm that meets the following goals:1.Interactive use like web browsing should be signif-icantly faster than Tor and prior work.2.Performance for bulk downloads should not be sig-nificantly slowed compared to Tor.3.Anonymity should be similar to what Tor cur-rently provides against our selected attack models.4.Usage should be fairly distributed among relaysaccording to their available capacities.5.Clients should be able to select paths with littlecomputational or other overhead.

6.Circuits should be available to the client for at-taching streams to when needed.7.We should avoid downloading large amounts ofadditional information from the directory servers.

We emphasize web traffic since delays in interactive useare more harmful to the user experience than delays inbulk downloads. We consider both response time, mea-sured as time to first byte (TTFB), and total downloadtime, measured as time to last byte (TTLB).

Note that we do not seek the optimal latency for cir-cuits. Although having accurate latency information in-stead of geographic distance could further improve per-formance, the gains might be marginal given require-ments for bandwidth and path diversity. Further, ob-taining and distributing accurate pairwise latency in-formation may be expensive due to the necessary mea-surements and directory server overhead.

4 Circuit SelectionWhen a Tor client issues a request, the new stream ishandled by one of the available circuits. In this sectionwe explain how Tor tries to provide some available cir-cuits for new streams and how it attaches the streamsto the circuits. Then we explain how the stream at-tachment can be improved by increasing the number ofcircuits and considering performance criteria in circuitselection.

Pre-Built Circuits. As the user browses the Webwith Tor, the Tor client opens new circuits so that laterstreams can be attached to those circuits without delay.Since different exits support different sets of ports, theTor client aims to keep open two circuits to cover anyport that the user has used recently. In practice, one ortwo circuits are typically available at any given time.

On-Demand Circuits. Sometimes the user’s re-quested streams are not supported by current availablecircuits, or all available circuits are older than 10 min-utes and considered dirty. In this case, Tor builds a cir-cuit for the unhandled stream and attaches the streamto this circuit. It is obvious that these streams expe-rience more delay than streams using pre-built circuitsdue to the circuit built time.

Tor Stream Attachment. When a new stream is cre-ated, the Tor client selects the most recently createdcircuit or creates a new circuit if needed and attachesthe new stream to it. Then all communication on that


stream, including DNS resolution, goes through the cir-cuit.

4.1 Performance in Circuit Selection

The Tor client does not use performance as a crite-ria when selecting from available circuits for attach-ing a stream. Wang et al. [34] propose to use the leastcongested circuit, but there are several possible perfor-mance characteristics to use instead. Also, we know ofno study testing the effect of changing number of cir-cuits on performance-based selection.

To investigate the effect of circuit selection on Torperformance, we evaluate both the number of availablecircuits for the streams and the way we choose the bestcircuit among the available circuits. To set the numberof circuits, we check once per second that there are atleast N circuits to support all recently used ports. Ifthere are fewer than N circuits, then we start buildingcircuits to reach the threshold. We compare vanilla Tor,which typically offers one or two circuits, with makingbetween three and five circuits available. Given somenumber of circuits, we can then use various methods toselect the best one. We compare various combinationsof geographic circuit length, congestion, and round triptime (RTT).

Metrics. To find the total geographic circuit length(or simply length) L between the client and destination,we compute:

L = DCG +DGM +DME +DExit (1)

DCG, DGM , DME and DExit are shown in Figure 1a forwhen the destination IP address is known and Figure 1bfor when the IP address is not yet known.

We use the opportunistic circuit measurements andlatency model proposed by Wang et al. [34] to measurethe circuit round-trip times and congestion times. Con-gestion time Tc is measured as:

Tc = RTT −RTTmin (2)

where RTT is the round-trip time and RTTmin is theminimum RTT observed over that circuit. Wang etal. [34] showed that five measurements can effectivelyidentify congested circuits. We thus measure and storethe mean of the last five Tc measurements as the con-gestion time of the circuit.

4.2 Attaching Streams to Circuits

We consider nine different methods in handling streamsusing circuit length, congestion, and RTT.

1.Congestion Only: Pick by lowest congestion time.2.Length Only: Pick the shortest circuit.3.RTT Only: Pick the circuit with the lowest RTT.4.Congestion then length: Select the two lowest con-gestion times and pick the shorter one.5.RTT then length: Select the two lowest RTTs andpick the shorter one.6.Length then congestion: Select the two shortestcircuits and pick the lower congestion time.7.Length then RTT : Select the two shortest circuitsand pick the lower RTT.8.RTT then Congestion: Select the two circuits withlowest RTT and pick the lower congestion time.9.Congestion then RTT : Select the two circuits withlowest congestion time and pick the lower RTT.

Since these circuit selection mechanisms are determin-istic, given a set of candidate circuits, only one circuitfrom a set will be be used. These strategies will exploitthe best circuit for the full 10-minute window that thecircuit can be used. Since this means that the other cir-cuits will go unused, we have the OP close any circuitsthat go unused for five minutes after their creation, lead-ing to new circuits being opened. By itself, this mightimprove performance, as inferior circuits are closed infavor of untested circuits that may be better (or worse).

5 Circuit Selection PerformanceWe now evaluate the nine methods of selecting circuitsfor stream attachment and compare them with Tor andCAR.

5.1 Network Configuration

We largely follow the experimental procedures sug-gested by Jansen et al. [19] and describe them here inbrief. Shadow runs actual Tor code for accurate model-ing; we used Tor version 0.2.5.12, modifying it as neces-sary to implement our methods and CAR. To generatea realistic Tor network topology, Shadow comes withtopology generation tools that model a private Tor net-work based on a validated research study [19]. We used


Entry

Middle

ExitDCG

DGM DME DExit

(a) When the dest. IP is known

Entry Middle

Exit

DCG DGM DME

DExit

(b) When the dest. IP is unknown

Fig. 1. Distances used to compute circuit length.

Congestion Only

Congestion Then Length

Length Then Congestion

Only Length

RTT Then Length

RTT Only

Length Then RTT

RTT Then Congestion

Congestion Then RTT

0.20

0.22

0.24

0.26

0.28

Seco

nds

Median of TTFB for web clients

Congestion Only

Congestion Then Length

Length Then Congestion

Only Length

RTT Then Length

RTT Only

Length Then RTT

RTT Then Congestion

Congestion Then RTT

0.90

0.95

1.00

1.05

1.10

1.15

Seco

nds

Median of TTLB for web clients3 circuits4 circuits5 circuits

No-changeCAR

vanilla

Fig. 2. Circuit Selection: TTLB and TTFB for web clients.

these tools and data from the Tor metrics portal to gen-erate our private Tor network. Our Tor network includes1100 clients, 220 Tor relays (52 exit relays, includingexit-guard relays, and 49 guard relays), three directoryauthorities, and 220 HTTP destination servers.

Shadow uses an underlying topology that modelsthe Internet. The default topology shipped with Shadowis very small, consisting of only 183 vertices and 17,000edges, and is not a good representation of the Internet.For all the simulations in this paper we used the Inter-net topology used by Jansen at al. [20]. This topology isprovided by techniques from recent research in modelingTor typologies [19, 23], traceroute data from CAIDA [2],data from the Tor Metrics Portal [32] and Alexa [1],and it includes 699,029 vertices and 1,338,590 edges.In our simulation, we tried different ratios of clients torelays, i.e. different congestion levels, and different aver-age packet loss rates in the Internet topology and com-pared our results with Torperf data [32]. We found thata clients-to-relays ratio of 5:1 with 0.0025% packet lossprovides us comparable results on TTFB and TTLB-with Torperf data.

Our clients run Tor code in client-only mode andare distributed around the world in line with Tor usagestatistics. We have two types of clients in our experi-ments: web clients and bulk clients. The 900 web clientsdownload 320 KiB of data and simulate web surfing be-havior by waiting between 1 to 20 seconds uniformlyat random before starting the next download. The 100

bulk clients download 5 MiB of data without pausingbetween the end of a download and starting the nextone. This methodology is proposed in [19].

5.2 CAR: Congestion-Aware Routing

To compare our methods, we also simulated CAR, thecircuit selection technique of Wang et al. [34]. They pro-posed opportunistic and active probing techniques tomeasure congestion times, which are obtained by RTTsin Equation 2, and use these measurements to mitigatecongestion using both an instant response for temporarycongestion and a long-term response for low-bandwidthconditions. In our simulation, we follow the method ofWacek et al. [33], who also simulated CAR and ignoredthe long-term response due to its small impact on per-formance.

When attaching streams to circuits in CAR, we ran-domly select three circuits from the circuit list and pickthe one that has the smallest mean congestion time fromthe five most recent measurements. If the mean of lastfive congestion times is more than 0.5 seconds for a cir-cuit, we stop using the circuit for new streams.


5.3 Performance Results

Figure 2 shows the median time-to-first-byte (TTFB)and time-to-last byte (TTLB) for web clients. vanillarepresents unmodified Tor circuit selection, and Nochange represents the case we do not modify the num-ber of circuits from Tor, which typically has one or twocircuits available at a time.

As shown in Figure 2, RTT is best criterion tochoose the circuit, with RTT Only as the best methodoverall. RTT Only has 15% lower TTFB than CAR(22% lower than vanilla) for three circuits and 22%lower TTFB than CAR (27% lower than vanilla) forfive circuits. RTT Only also has 9% lower TTLB thanCAR (13.8% lower than vanilla) for three circuits and12% lower TTLB than CAR (16% lower than vanilla forfive circuits. We speculate that RTT is the best criteriabecause it effectively captures both propagation delaysand congestion time (queuing delays and transmissiondelays).

As expected, CAR is better than vanilla, and Con-gestion Only with no change in the number of circuitsperforms the same as CAR, as both use the same cri-teria. Length turns out to be less effective compared toRTT or congestion times, particularly when the numberof circuits is small. We note that Length Then RTT per-forms fairly well for three or more circuits. Length maybe suitable for gauging broad performance information,such as comparing a circuit with multiple intercontinen-tal hops to one with no such hops, but poor at predictingthe best circuit otherwise.

The TTFB and TTLB for RTT Then Congestionare slightly better than Congestion Then RTT, whichindicates that RTT can narrow down candidate circuitsbetter than congestion times. Results of both RTT ThenCongestion and Congestion Then RTT are worse thanRTT Only and show that mixing congestion time withRTTs will not provide better performance than usingRTT by itself.

5.4 Circuit Creation Analysis

In our circuit selection strategies we build more circuitsthan Tor normal behavior, so it is important to under-stand the load this imposes on the network. Unfortu-nately, Shadow does not provide results regarding theload on nodes. To estimate changes in load, we com-pare the strategies based on the number of created cir-cuits and used circuits. To see how many circuits ourclients build, we simulate the circuit selection strategies

in Shadow for one hour of simulated time, which leadsto about 40 minutes of activity after 20 minutes of ini-tialization. We extract the number of general-purposecircuits built by our web clients. Figure 3 shows theCDF of created general purpose circuits and the CDFof used circuits, the circuits actually being used. We seethat web clients in vanilla, CAR, RTT Only with thesame number of circuits as Tor, and RTT Only with 3-5circuits. The median of created circuits in RTT Onlyis 17, 23, and 29 circuits as we increase the numberof circuits from 3 to 5. RTT Only leads to buildingso many circuits due to proactively checking every sec-ond that there are N circuits available for each recentlyused port, plus killing unused circuits after five minutes.Fig. 3.b shows how many of these created circuits havebeen used in transferring data. The median for usedcircuits in vanilla, CAR, and RTT Only with no changein the number of circuits is around four circuits, whichmeans that they use all the circuits created and attachsome stream to them. The median in RTT Only is 8,10, and 13 circuits, respectively.

6 Security AnalysisIn this section we examine the security of these cir-cuit selection strategies, considering both relay-leveland network-level adversaries. Our performance resultsshowed that RTT Only outperforms all the other circuitselection strategies and CAR. Therefore, in this section,we focus on the security analysis of RTT Only.

6.1 Relay-Level Adversary

In relay-level adversary model, we assume that the ad-versary runs both guard and exit relays in the hope thathis relays simultaneously occupy the guard and exit po-sitions in some circuits. If the adversary can sit on theexit and guard position on a circuit, he can apply a traf-fic correlation attack and link the client to her destina-tions. These circuits and streams attached to them arecalled compromised circuits and compromised streams,respectively. To analyze the security of the RTT Onlystrategy, we need to have access to RTTs (which includepropagation delays, queuing delays, and transmissiondelays), which means we need to simulate a whole net-work. To do this, we use again Shadow to simulate theTor network, and we use the same Tor network config-uration as our performance evaluations in Section 5.1,


0 5 10 15 20 25 30 350

0.2

0.4

0.6

0.8

1

(a) Number of circuits per client

Cum

ulat

ive

Pro

babi

lity

2 4 6 8 10 12 14 16 180

0.2

0.4

0.6

0.8

1

(b) Number of used circuits per client

Cum

ulat

ive

Pro

babi

lity

3 circuits4 circuits5 circuits

CARNo-change

vanilla

Fig. 3. CDF of the number of circuits created (a) and used (b) for web clients.

3circs 4circs 5circs CAR No-ch vanilla0

0.05

0.1

0.15

Circuit Strategies

Fractio

n

Fig. 4. Relay adversary: the distribution of compromise rates

which consists of 52 exit relays, including exit-guard re-lays, and 49 guard relays.

For the relay-level adversary, we randomly mark10% of our guard bandwidth and 10% of our exit band-width as malicious guards and malicious exit relays inthe network, then we simulate CAR, vanilla, and usingthree to five circuits with RTT Only. We run 10 simu-lations, where the malicious guards and exits change ineach run. 10 simulations for each case is reasonable con-sidering that we have 52 exit relays, 49 guards, and sim-ulations taking 11 hours. Figure 4 shows the distributionof stream compromised rates for clients. The median ofcompromised streams is almost the same for vanilla,CAR, and RTT Only with no change in the number ofcircuits because the clients build almost the same num-ber of circuits. As the number of circuits increases themedian of compromised streams rate increases due tothe increase in circuits created by the clients. When thenumber of circuits increases, the chance of creating a cir-cuit which has a malicious relays on its guard and exitposition increases. As we see, RTT Only with 5 circuitshas the highest compromised streams.

0 0.2 0.4 0.6 0.80

0.2

0.4

0.6

0.8

1

Fraction of streams with compromised guard and exitCDF

vanilla4 circuits5 circuits3 circuits

CARNo-change

Fig. 5. Network adversary: CDF of compromise rates

6.2 Network Level Adversary

In the network-level adversary model, we assume thatthe adversary controls an Autonomous System (AS). Ifthe entry traffic (traffic between the client and guard)and exit traffic (traffic between the exit and server) tra-verse over a common AS, that AS can apply traffic cor-relation attack and link the client to her destinations.

To analyze the security of circuit selection strate-gies, we used our simulation results from the relay-leveladversary for CAR, vanilla, and each circuit numbersin RTT Only in Shadow. Because we did not add anyrelays to the network in evaluating the relay-level ad-versary model and we only marked existing relays asmalicious we can re-use their results in this selection.For each circuit selection approach, we extract all thegenerated streams by clients. The simulations gener-ated approximately 730,000 streams for each circuit se-lection approach, or about 700 streams per client. Foreach stream, we used the algorithm proposed by Qiuand Gao [27] to infer the AS paths between clients andguards and between exits and servers. This algorithmexploits known paths from BGP tables to improve theinferred paths. In measuring the compromise rates, we


consider the possibility of asymmetric traffic correlationattack that can happen between the data path and ackpath, which is one of the RAPTOR attacks proposed bySun et al. [30].

Figure 5 shows the cumulative distribution of com-promise rates for each strategy. As we see, when weincrease the number of circuits from 3 to 5, the mediancompromise rate increases from 27.2% to 28.1% whichare 0.7% to 4% worse than CAR. CAR performs 1%better than vanilla. These results show that using RTTand increasing the number of circuits even up to fivecircuits have a very limited effect on the security in thenetwork level adversary. Figure 5 shows the cumulativedistribution of compromise rates for each strategy.

7 Relay SelectionIn this section, we describe a method for path selectionin which we assign weights to relays based on a com-bination of bandwidth and geographical distances. Thisapproach extends the idea of Tor path selection, whichuses weights based on various factors in probabilisticrelay selection. The goal of the combined weighting ap-proach is to build circuits that still have high bandwidthrelays, ensuring load balancing and good throughput,but also relatively shorter paths between the client andher destinations.

7.1 Weight Function

In a large and growing network like Tor, which consistsof around 7000 nodes as of August 2016, examining allpossible paths to find ones with these characteristicswould be expensive. Clustering relays into geographicareas, as in LASTor [8], can lead to uneven distributionof bandwidth between clusters. Instead, we approachthe problem much like Tor’s current algorithm by se-lecting one relay at a time, starting with the exit node,then the entry node, and finally the middle node. Thisgreedy approach may miss the optimal path, for somedefinition of optimal, but our design aims to select froma wide range of paths with good performance and toavoid poorly performing paths. A broader selection ofpaths should help us maintain anonymity.

We calculate the weight of each relay using followingfunction:

w = α× wB + (1− α)× wD

where wB is a measure of the relay’s bandwidth and wDis a measure of distance. In this function, α is parameterthat we can use to tune the share of bandwidth and dis-tance in the weights. As α increases, the importance ofbandwidth to the weight increases, and as α decreases,the importance of distance to the weight increases. wBand wD for relay i are computed as follows:

wBi = BiBmax

, wDi = 1− DiDmax

Here, Bi is the relay’s weighted bandwidth, and Bmaxis the maximum weighted bandwidth among all relays.Tor assigns weights for each position in the circuit, andthese weights bias the relay selection for circuits to dis-tribute more load to higher-bandwidth relays. Di is adistance that is computed differently depending on theselected relay’s role in the circuit as exit, middle, or en-try. The maximum value of Di over all relays is Dmax.We subtract the ratio from 1 so as to weight short dis-tances more than long ones. Note that both wB and wDwill be between 0 and 1.

Figure 6 shows how we compute the distance forrelays for each position in the circuit. We seek to mini-mize total distance from the client to the destination byminimizing the intermediate pairs of distances that areadded by each relay in the sequence used by Tor: exit,entry, middle. Intuitively, selecting one of these relayswith a large distance to its neighbors extends the pathaway from a straight line between client and destination,which would theoretically be the ideal path.

Since we use geographic locations, we compute Dusing the great-circle distance between two points on asphere from their longitudes and latitudes. In choosingthe circuit’s exit node, we compute D for all relays asfollows:

Dexit = (1− λ)×Dclient−exit + λ×Dexit−dest

For the circuit’s entry node:

Dentry = λ×Dclient−entry + (1− λ)×Dentry−exit

And for the circuit’s middle node

Dmiddle = Dentry−middle +Dmiddle−exit

Dmax in wD is the maximum computed D among theset of relays for each position (exit, entry, or middle).The use of Dmax and Bmax ensure that wD and wBboth range between 0 and 1 for more straightforwardcalculations.

λ is a tuning parameter that enables us to changethe share of the distance between different nodes on the


Exit

Dclient-exitDexit-dest.Client

(a) ExitEntry

Exit

Dclient-entryDentry-exitClient

(b) Entry

Middle

ExitDentry-middle

Dmiddle-exitEntry

(c) Middle

Fig. 6. Distance for relays for each position.

Entry Middle Exit

(a) Large value of λ

Entry Middle Exit

(b) Small value of λ

Fig. 7. The effect of λ.

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9λ

0.00

0.02

0.04

0.06

0.08

0.10

dev λ

Fig. 8. Average added distance over the shortest path for differ-ent values of λ.

path. As shown in Figure 7, as λ goes up, the entrynodes and exit nodes move toward the source and des-tination, respectively, and a large portion of the pathbetween the source and destination is covered by inter-relay connections. In this case, the selected guards areclose to the clients which can decrease the threat ofnetwork level adversaries. In particular, it causes theAS paths between clients and guards to be shorter andinvolve fewer ASes, thereby decreasing the chance thata common AS appears on both sides of the traffic, i.e.between the client and guard and between the exit andserver. We will evaluate the effect of nearby guards inSections 9.2 and Appendix A. As λ decreases, the path’sinter-relay portion shrinks.

To evaluate how close the short paths selected byour algorithm are to optimal, we used a scaled-downTor network, with 147 exit nodes, 700 middle nodes, and170 entry nodes, a user located in the central US, and100 destinations from the Alexa Top 100 websites [1].We found short paths between the user and all destina-

tions with our method for different values of λ. To mea-sure the average distance between our method from theactual shortest path to each of these 100 destinations(d), we used the mean absolute percentage deviation(MAPD):

devλ = 1100

d=100∑d=1

|Lλd − Ld|Ld

where Lλd is the length of the shortest path found byour method for a given value of λ, and Ld is the lengthof the shortest path to destination d. Figure 8 shows theaverage deviations devλ for different values of λ, and wesee that λ = 0.5 has the lowest deviations at just 0.19%longer on average. This indicates that our greedy algo-rithm produces short paths close to the optimal ones.

In Appendix A, We use our relay selection ap-proach to implement the nearby guard proposal, a de-fense mechanism proposed by Sun et al. [30], on TorPS,the Tor path simulator [23].

7.2 Preemptively built circuits

When computing distances, we need to know the loca-tion of the final destination. The destination addressescan be either an IP address or a DNS hostname. If theaddress is IP, we can find the destination’s location byusing IP geolocation databases (we use Maxmind [3] inthis paper) and find the shortest path to these destina-tions. But most of time, the addresses are DNS host-names. To find the location, we first need to performDNS resolution to get the IP address. In the DNS res-olution process, the party performing DNS lookup re-turns the closest content provider or replica sever toitself, which in Tor’s case means the closest ones to theexit nodes.

In Tor, however, the client saves time by having anumber of circuits already built and available for newconnections. Building a new circuit takes time for nu-merous protocol messages and public-key cryptographicoperations. Wacek et al. showed that LASTor, whichbuilds new circuits once the destination location is dis-


Fig. 9. The location of the existing servers in Alexa Top 1000websites. Red stars show the cluster centroids.

covered, suffers from significant added delay due to thesedelays [33].

To solve this issue, we build circuits in advance thatshorten the path between the source and the most popu-lar destinations online. To identify popular destinations,we use the Alexa Top 1000 websites [1]. Because some ofthese sites use CDNs or replica servers, we visited themfrom different places in thew world. In particular, weselected 13 planet-lab nodes based on Tor users’ statis-tics [6]. From each node, we visited all of the 1000 sites,got the address of all fetchable elements in their frontpages, and resolved their addresses if needed. For exam-ple, from our planet-lab node on the US west coast, wegot 46,306 hostname addresses, leading to 2894 uniqueIP addresses from 346 unique locations. We clusteredthe obtained locations into four clusters using the k-means algorithm (we got better performance for fourrather than with three or five clusters), and we use thecentroids of these clusters as a target destination. Fig-ure 9 shows all obtained locations and our cluster cen-troids. From these four target destinations, we mark theone closest to the client as the default destination, i.e.when we have no other information, we assume that theusers is more likely to visit sites located closer to her.We have the client build circuits in advance of their usewith short paths to these four target destinations. Wecheck our circuit list once per second to ensure that wehave at least one circuit to each of these destinationsand ensure that a new connection can be handled asquickly as possible.

7.3 Guard Selection

In Tor, each client selects and uses one guard consis-tently for a period of nine to ten months. This meansthat for selecting the path, the guard relay is alreadyselected, before the exit. We thus cannot use the exit’slocation to help pick the entry and must modify ouralgorithm. In selecting the guard, instead of using the

exit node’s location, we use the closest of the four tar-get destinations to the client to compute Dentry. Dentrywill be computed as:

Dentry = λ×Dclient−entry + (1− λ)×Dentry−target

This can reveal some information about the client’s lo-cation, e.g. through fingerprinting attacks that identifythe guard from the exit [18]. Since we only have fourpopular destinations, however, the anonymity sets forclients’ location will be quite large. We evaluate the se-curity of our design in Section 9. Further performanceimprovement can be achieved by using Dguard−exit in-stead of Dclientexit in computing Dexit because we havealready selected the guard relay and know its location.This helps us to not have long circuits in case the guardis relatively far from the client.

During this research, we found a bug in the Torsource code that was causing Tor clients to chooseguards from all the available relays, not the ones withthe guard flag. This harmed both anonymity and per-formance. The bug was reported to the Tor project4 andwas fixed in Tor version 2.7.6. We use the corrected codein all of our simulations.

7.4 Attaching streams to the circuits

In Section 4.2, we introduced and evaluated differentstrategies of attaching streams to the circuits, and wefound that using RTT Only offers the best performance.Our relay selection mechanism tries to find the shortpaths between the client and the destination. There-fore, we use RTT then length to pick the fast and shortcircuits in the rest of the work.

An alert reader may wonder why we do not use RTTto construct circuits given its strong performance in se-lecting completed circuits. We note that RTT is notavailable before constructing the circuit, and so it can-not be used when picking relays. Sherr et al. used esti-mated latencies for their relay selection algorithm [28],but Wacek et al. found that CAR was more effectivethan this technique [19]. In our approach, we seek tobuild more circuits, most of which should have reason-able performance, and then use RTT measurements toselect the current best circuit from this set. This pro-vides the benefits of using RTT with less risk of nothaving any high-performing circuits.

4 https://trac.torproject.org/projects/tor/ticket/17772

https://trac.torproject.org/projects/tor/ticket/17772


0.0 0.2 0.4 0.6 0.8 1.00.20

0.22

0.24

0.26

0.28

α

Second

sλ = 0.97CARvanilla

(a) TTFB

0.0 0.2 0.4 0.6 0.8 1.00.900.951.001.051.101.15

α

Second

s

λ = 0.97CARvanilla

(b) TTLB

Fig. 10. Median TTFB and TTLB for web clients

8 Performance EvaluationIn this section, we evaluate the performance of our pro-posed relay selection method compared with Tor andcongestion-aware routing (CAR), the current state ofthe art [33]. In all the experiments in this section, we useShadow with the same network configuration as used inSection 5. We evaluate the performance of our methodas α and λ vary.

The effect of α. For evaluating different values ofα, we set λ = 0.97 and vary α from 0.0 to 1.0, we havechosen λ high to stretch circuits between the clients anddestinations. Figure 10 shows the median of web clients’TTLB and TTFB with respect to α, where we plot onlythe median of the results for readability. Changing α

from 0.0 to 1.0 yields 7% to 24% improvement in webclients’ TTFB compared to CAR, and 2% to 12% im-provement in web clients’ TTLB compared to CAR. Re-lay bandwidth is more important in improving perfor-mance, which matches findings from Wacek et al. [33].We will explore the trade-offs in security in Section 9.

The effect of λ. Parameter λ controls the elasticityof the path. As λ increases, the circuit stretches out,with guards moving toward the clients and exits movingtoward the destinations. To evaluate the effect of λ, weset α = 0.0. We find that performance changes less than2% as λ varies, and it is best for λ of 0.4-0.5.

9 Security analysisWe now examine the impact of our path selection strate-gies on anonymity using three models. First, we studybroad system-wide measures of anonymity. We then ex-amine path compromise rates in the presence of AS ad-

Strategy Gini Coef. Entropyvanilla 0.725 9.702

α = 1.0 0.724 9.713α = 0.9 0.590 10.326α = 0.8 0.490 10.609α = 0.7 0.444 10.735α = 0.6 0.401 10.802α = 0.5 0.370 10.872α = 0.15 0.335 10.933α = 0.1 0.338 10.944α = 0.0 0.341 10.945

Table 1. Relay adversary. System-wide security results.

versary. Finally, we study a set of targeted attacks inthe relay level adversary.

9.1 System-wide Security Metrics

We simulated the proposed path-selection strategies ina 2127-relay model of the Tor network, built by sam-pling approximately one-third of the nodes of each type(exit, entry, middle) from a descriptor file from Decem-ber 2015. In our simulations, 200 clients are placed ac-cording to statistics about users from the Tor metricsportal, and each client constructs 27,000 paths. In ourlocation-based approaches, the clients select paths usingthe four target destinations as described in Section 7.2.We created in total 5.4 million paths for different valuesof α, with λ = 0.97.

To measure anonymity, we focus on end-to-end traf-fic confirmation attacks in which the adversary controlsboth the exit and entry relays in a circuit. We mea-sured the Gini coefficient and Shannon entropy of theexit-entry combinations occurring on selected circuits.The Gini coefficient is a measure of the equality of relayselection, where 0 represents pure equality (i.e. each re-lay is selected uniformly at random) and 1 represents astate of complete inequality (i.e. a given relay is always


0 0.2 0.4 0.6 0.8 1.00.24

0.25

0.26

0.27

0.28

relay selection

Med

ian

λ = 0.97CARvanilla

(a) As α varies

0 0.2 0.4 0.6 0.8 1.00.24

0.25

0.26

0.27

0.28

relay selection

Med

ian

α = 0CARvanilla

(b) As λ varies

Fig. 11. AS Adversary. The median stream compromised rate as α and λ vary

selected) [29, 33]. Table 9.1 shows the results for Ginicoefficient and entropy.

In our static path selection method, we cannot pro-duce equivalent results for CAR, since circuits dynami-cally change in their method. Wacek et al. report thatCAR has a lower (i.e., better) Gini coefficient than Torbut slightly lower (i.e., worse) entropy [33]. As shown inTable 9.1, as α decreases, the Gini coefficient decreasesand the entropy increases. As we expect, for α = 1.0,which selects the relays only based on bandwidth, theresults are nearly the same as vanilla. The most dra-matic difference in anonymity occurs for higher valuesof α, e.g. from α = 1.0 and α = 0.8, where the Ginicoefficient drops from 0.724 to 0.490 and entropy rises0.9 bits.

Since αmeans increasing the share of distance in theselection weights, we see that emphasizing distance inthe weights improves the system-wide security metrics.On the other hand, according to Figure 10, small valuesof α offer lower performance. We thus face a trade-off be-tween security and performance, where decreasing α im-proves security but offers less performance benefits. Val-ues of α between 0.8 and 0.5 offer both good security andperformance, with a Gini coefficient of between 0.370-0.490 and almost 20% improvement in TTFB comparedto CAR.

9.2 AS Adversary

In this section, we evaluate the security of our approachin the presence of AS-level adversaries. The same as oursecurity analysis in Section 6.2, we use Shadow withthe same configuration as previous sections and carryout simulations for different α and λ values.

For evaluating the effect of α, we fix λ = 0.97 andvary α from 0.0 to 1.0. For each value of α, we extractall the generated streams along with their attached cir-

cuits in the simulation. For all the streams, we findthe AS paths between the clients and guards (guardsand clients) and between the exits and the destinations(destinations and exits) using the algorithm proposedby Qiu and Gao [27]. We consider the possibility of anasymmetric traffic correlation attack that can happenbetween the data path and ack path. Figure 11a showsthe median stream compromise rates. As we see, by in-creasing α from 0, the compromise rate starts decreasinguntil α reaches around 0.5, where we have the mini-mum compromise rate. After α = 0.5, the compromiserate again increases until we have the worst case at α =1.0, which is close to vanilla’s and CAR’s compromiserates. The compromise rate in CAR is slightly betterthan vanilla.

To evaluate the effect of λ on the security, we setα = 0.0 and vary λ from 0 to 1.0. Figure 11b shows themedian compromise rate as λ varies. As we expect, whenλ increases, the compromise rate generally decreases.This results from stretching the circuits closer to thecommunication end points, which in turn reduces thechance of common ASes appearing on both the entryand exit sides traffic.

9.3 Targeted Attacks in Relay-LevelAdversary

To further explore how path selection strategies performagainst attacks in the relay level model, we now examinefour types of attacks in the network and measured howoften adversaries can compromise a circuit. We assumethat a circuit is compromised if both the exit and entrynodes are controlled by the adversary.

In the targeted attacks, we consider a high-bandwidth adversary that owns a few high-bandwidthrelays such that its bandwidth is a considerable frac-tion of the network’s total bandwidth. In particular, we


start with our 2127-relay model of the Tor network asdescribed in Section 9.1. We select a random bandwidthin the range [20 MiB/s, 220 MiB/s], where 220 MiB/s isthe maximum bandwidth in our model network. A mali-cious relay with this bandwidth is added to the Tor net-work, where the location of the malicious relay is basedon our attack strategies. This process is repeated untilthe target attacker bandwidth for this run of the experi-ment is reached. For each of 200 runs at each bandwidthsetting, we place one client into the network using a lo-cation based on Tor metrics data and have it pick 9,000paths for each of our tested strategies. We use the fourpopular destinations as described in Section 7.2 for ourstrategies. In our evaluations, we consider four differentattack strategies as follows:

1.Targeted Clients: In this attack strategy, in eachrun that we add the client and malicious relays, allthe malicious guards are located in the exact loca-tion of the client, just as if the adversary could runall of his guards in the client’s room. The maliciousexit relays are randomly placed in locations basedon the geographical distribution of Tor relays.2.Targeted Destination: In this attack strategy, ineach run that we add the malicious relays, all themalicious exits are located in the exact location ofthe one of the randomly selected popular destina-tions, just as if the adversary could run all of itsexit relays in the same server room. The maliciousguard relays are randomly placed in locations basedon Tor relays geographical distribution.3.Targeted Destination and Client: In this attackstrategy, in each run that we add the client and themalicious relays, all the malicious exits are locatedin the exact location of the one of the randomlyselected popular destinations, and all the maliciousguards are located in the exact location of the client.4.Non-targeted: In this attack strategy, in each runthat we add the malicious relays, all the maliciousrelays are randomly placed in locations based on thegeographical distribution of Tor relays.

Figure 12 shows the fraction of compromised paths withrespect to the percentage of total bandwidth controlledby the adversary’s relays for vanilla,α = 0.5,α = 0.8, andα = 0.9.As shown in Figure 12, the compromise rate forTargeted Destination is almost the same as compromiserates in vanilla. For Targeted Client and for TargetedDestination and Client, the compromise rates for α =0.5,0.8, and 0.9 are worse than vanilla, and as the ad-versary’s bandwidth fraction increases, the gap between

them and vanilla increases. For both Targeted Clientand for Targeted Destination and Client, α = 0.8 hasalmost the same compromise rate as α = 0.9 but bet-ter compromise rate than α = 0.5. For Non-targeted weobserved the same compromise rate as vanilla for α =0.5,0.8, and 0.9 because in this attack malicious relaysare randomly located in the network and all relay selec-tion methods pick the malicious relays with the sameprobability.

We also note that this is a trade-off with the mod-est, but wide-spread, security benefits of using α = 0.8on Gini coefficient, entropy, and compromise rates com-pared with α = 1.0, for example. Greater emphasis onbandwidth leads to better performance and more re-silience to targeted attacks, while greater emphasis ondistance leads to more diffuse spreading of load on thenetwork.

10 DiscussionIn this section, we discuss the implications of our find-ings and the scope for future research.

Circuit Selection. We evaluated the impact of dif-ferent number of pre-built circuits on Tor performance,and found that having at least three pre-built circuitsready results in a significant improvement compared tovanilla Tor. Preparing more than three circuits, how-ever does not provide much additional benefit and mayalso add more load on the network. Our circuit selectionmechanisms also kill unused circuits after five minutes,which raises the rate of exploring for better circuits.

Relay Selection. In relay selection, combined weight-ing seems to provide a trade-off of performance andanonymity. As the weights emphasize on the bandwidth,α is high, combined weighting provides higher perfor-mance. On the other hand, higher values of α could notprovide diverse paths. As α goes down and the weightsare inclined toward the distances, the performance im-provement decreases, but the created circuits are morediverse. Low values α suffer from a greater chance oftargeted relay-level attacks. Overall, we think that com-bined weighting with α = 0.8 seems to provide the besttrade-off of performance and anonymity. The best valueof α may vary with network configuration, bandwidthdistribution, geographical dispersion of relays, and theclient’s location. We will examine setting α more care-fully in future work.


Fig. 12. Targeted Attacks: Fraction of circuits compromised. X-axis shows the percentage of exit and guard bandwidth controlledby the attacker.

Nearby guards. Our evaluations showed that the pro-posed defense, picking guards close to the clients, doesnot effect an AS-level adversary. The AS path betweenthe clients and guards is not highly correlated withthe geographical distance between them. The AS pathlength between the guards and clients depend on theclients’ networks, guards’ networks, and their ASes re-lationships with other ASes. Moreover, the clients andguards are not uniformly distributed on the globe andon the network. For example, a single AS, AS16276, iscontributing more than 170 guards to the Tor network,which is 16% of all the guards in January 2015. Theother issue is guard rotation, as currently Tor clientschange their guard after 9 to 10 months. In our 10-month TorPS simulations, the median number of guardchanges for clients was five times, with a minimum oftwo times and a maximum of 29 times. Thus, even if theclient is secure due to the short AS path, after guard ro-tation, she may pick a guard that has a long AS pathlength and get compromised. We evaluate the nearbyguards in Appendix A.

References[1] Alexa. http://www.alexa.com.[2] CAIDA data. http://www.caida.org/data.[3] Maxmind IP geolocation database. http://dev.maxmind.

com/geoip/legacy/geolite/.[4] Shadow. http://shadow.github.io/.[5] Tor metric portal. https://metrics.torproject.org/userstats-

relay-table.html.

[6] Tor users statistics. https://metrics.torproject.org/userstats-relay-table.html.

[7] A. Acquisti, R. Dingledine, and P. Syverson. On the eco-nomics of anonymity. In FC, Jan. 2003.

[8] M. Akhoondi, C. Yu, and H. V. Madhyastha. LASTor: Alow-latency AS-aware Tor client. In IEEE S&P, 2012.

[9] M. Alsabah, K. Bauer, T. Elahi, and I. Goldberg. The pathless travelled: Overcoming tor’s bottlenecks with traffic split-ting. In PETS, July 2013.

[10] R. Annessi and M. Schmiedecker. NavigaTor: Finding fasterpaths to anonymity. In Euro S&P, 2016.

[11] R. Dingledine and N. Mathewson. Anonymity loves com-pany: Usability and the network effect. In WEIS, June 2006.

[12] R. Dingledine, N. Mathewson, and P. Syverson. Tor: Thenext-generation onion router. In USENIX Security, Aug.2004.

[13] R. Dingledine, D. S. Wallach, et al. Building incentives intoTor. In FC, pages 238–256. Springer, 2010.

[14] M. Edman and P. Syverson. As-awareness in tor path selec-tion. In CCS, 2009.

[15] M. Edman and P. F. Syverson. AS-awareness in Tor pathselection. In CCS, November 2009.

[16] N. Evans, R. Dingledine, and C. Grothoff. A practical con-gestion attack on Tor using long paths. In USENIX Security,August 2009.

[17] J. Geddes, R. Jansen, and N. Hopper. How low can you go:Balancing performance with anonymity in tor. In PETS, July2013.

[18] N. Hopper, E. Y. Vasserman, and E. Chan-Tin. How muchanonymity does network latency leak? ACM TISSEC, 13(2),February 2010.

[19] R. Jansen, K. Bauer, N. Hopper, and R. Dingledine. Me-thodically modeling the tor network. In CSET 2012, August2012.

[20] R. Jansen, J. Geddes, C. Wacek, M. Sherr, and P. Syverson.Never been KIST: Tor’s congestion management blossomswith kernel-informed socket transport. In USENIX Security,Aug. 2014.

http://www.alexa.com.

http://www.caida.org/data.

http://dev.maxmind.com/geoip/legacy/geolite/

http://dev.maxmind.com/geoip/legacy/geolite/

http://shadow.github.io/

https://metrics.torproject.org/userstats-relay-table.html





[21] R. Jansen and N. Hopper. Shadow: Running Tor in a box foraccurate and efficient experimentation. In NDSS, February2012.

[22] R. Jansen, N. Hopper, and Y. Kim. Recruiting new Torrelays with BRAIDS. In CCS, 2010.

[23] A. Johnson, C. Wacek, R. Jansen, M. Sherr, and P. Syver-son. Users get routed: Traffic correlation on tor by realisticadversaries. In CCS, 2013.

[24] A. Johnson, C. Wacek, R. Jansen, M. Sherr, and P. Syver-son. Users get routed: Traffic correlation on Tor by realisticadversaries. In CCS 2013, November 2013.

[25] B. N. Levine, M. Reiter, C. Wang, and M. Wright. Timinganalysis in low-latency mix systems. In FC, Feb. 2004.

[26] P. Mittal, A. Khurshid, J. Juen, M. Caesar, and N. Borisov.Stealthy traffic analysis of low-latency anonymous commu-nication using throughput fingerprinting. In CCS, October2011.

[27] J. Qiu and L. Gao. AS path inference by exploiting knownAS paths. In GLOBECOM, 2005.

[28] M. Sherr, M. Blaze, and B. T. Loo. Scalable link-based relayselection for anonymous routing. In PETS, August 2009.

[29] R. Snader and N. Borisov. A tune-up for Tor: Improvingsecurity and performance in the Tor network. In NDSS,February 2008.

[30] Y. Sun, A. Edmundson, L. Vanbever, O. Li, J. Rexford,M. Chiang, and P. Mittal. RAPTOR: Routing attacks onprivacy in Tor. In USENIX Security, Aug. 2015.

[31] C. Tang and I. Goldberg. An improved algorithm for Torcircuit scheduling. In CCS, October 2010.

[32] The Tor Project. Tor metrics portal.http://metrics.torproject.org/.

[33] C. Wacek, H. Tan, K. Bauer, and M. Sherr. An empiricalevaluation of relay selection in Tor. In NDSS, February2013.

[34] T. Wang, K. Bauer, C. Forero, and I. Goldberg. Congestion-aware path selection for Tor. In FC, February 2012.

A Nearby GuardsTor is known to be vulnerable to adversaries observingthe entry and exit traffic, since a simple traffic corre-lation attack [24, 25, 30] can link the user to the des-tination (an end-to-end attack). If an adversary sits onthe first and the last hop of the path, the adversary cansee both sides of the traffic and carry out the trafficcorrelation attack. The adversary may perform this at-tack by controlling components of the network, like Au-tonomous Systems (ASes) or Internet Exchange Points(IXPs). As a defense against such AS-level adversaries,Sun et al. suggest picking guards close to the clients [30].This results in having fewer ASes in the path on the en-try side of the traffic and can decrease the chance thatan adversary can observe both ends of the traffic.

A.1 Attacker Model.

We evaluate the security of nearby guards in terms ofboth relay-level and network-level adversaries.

Relay-Level Adversary Model. We examine the se-curity of nearby guard in the presence of relay-level ad-versaries. We use TorPS in this evaluation and considerthe case that the attacker targets a specific user. Weanalyze two relay-level adversaries, high bandwidth andlow bandwidth adversaries. In low bandwidth adversarywe consider that the attacker adds one low bandwidthguard to the network and places it in the target’s lo-cation, and in high bandwidth adversary, the attackeradds one high bandwidth guard in the target’s location.These two evaluations can show how much bandwidthand distance matter in compromising streams.

Network-Level Adversary Model. To analyze thesecurity of nearby guards in the network level, we simu-late the proposed algorithm in TorPS to get the paths.We follow the same methodology as our network-leveladversary in circuit selection and relay selection in ob-taining the AS paths and measuring compromise rates.

A.2 Network Model.

To evaluate the consequences on security of using guardnodes close to the client, we use TorPS, a tor path sim-ulator that uses realistic models to mimic users’ webbrowsing behavior, and historical data to model the net-work. TorPS builds the circuits using Tor’s path selec-tion code and real consensuses and server descriptors.Inour TorPS simulation, we consider 30 client ASes andpicked the client ASes from top client ASes found byEdman et al. [15] and Tor Metrics [6]. We ran the sim-ulations in two time periods, for one month of Tor de-scriptor data each, from February 2015 to March 2015,with 500 clients in each of 30 client ASes (15,000 clientstotal), and for 10 months of Tor descriptor data, fromFebruary 2015 to November 2015, with 100 clients ineach of 30 client ASes (3,000 clients total).

A.3 Implementation of nearby guards

Malicious ASes can perform traffic correlation attackspassively over the traffic passing through them, or theycould use one of RAPTOR attacks [30] to actively hi-jack the traffic and put themselves in the paths betweenthe guard and the client. Sun et al. proposed as a de-fense against these attacks is to pick guards close to the


Mar May July Sept Dec0

0.20.40.60.8

11.21.41.6·10−2

Days from first stream

Cum

ulativeprob

ability

α=0.0, lowα=0.0, highα=0.5, lowα=0.5, highα=1.0, high

Fig. 13. Time to first compromise in 10-month simulations

clients [30], since short paths mean fewer ASes and thusless chance of malicious ASes being on the path betweenthe client and the guard. This approach has never beenevaluated. On the other hand, if clients select nearbyguards, it means that a relay-level adversary can runsome guards close to a targeted client and increase hischance to be selected. This suggests a trade-off betweenthe two attacks. In this section, we examine this trade-off in detail.

We partially evaluated the effect of picking closeguards in Section 9.2, and we showed that a large valueof λ reduces the compromise rate in the presence ofan AS-level adversary. Since changing λ also affects thepath to the destination, we now isolate the study topicking guards close to the client.

To examine the impact of close guards, we use theTorPS simulator [23] to generate streams and build cir-cuits. TorPS is a path simulator that uses real Tor dataand realistic models to mimic Tor path selection behav-ior. We used the typical user model in the TorPS config-uration, which consists of Gmail, Google Chat, GoogleCalendar, Docs, Facebook, and web search activity. Weconsider 30 client ASes in our simulation and pick theseASes in a way that covers both top client ASes as iden-tified by Edamn et al. [14] and the geographical dis-tribution of Tor users [5]. In our model, we place 100users on each AS (3,000 users total). We modified theTorPS path selection module to implement the selectionof nearby guards. We change the weight of candidate re-lays for the guard position in the relay selection moduleaccording to our weighting function from Section 7.1(w = α × wB + (1 − α) × wD). The only difference isin Dentry, we remove parameter λ and redefine the dis-tance as Dentry = Dclient−entry, the distance betweenclient and guard. In other words, we ignore the distanceto the destination because we want to only evaluate the

0 2000 4000 6000 8000 10000 12000 14000 16000Client-guard distances (Km)

0

1

2

3

4

5

6

7

8

9

ASpa

thlen

gth

(ASe

s)

Fig. 14. AS path length

proximity of guards to the clients not destinations. Notethat we only change the weights for guard selection; theselection process for other positions, exit and middle,remains the same as vanilla.

A.4 Relay-level adversary

In evaluating the threat of relay-level adversaries, we usethe methodology of Johnson et al. [23] and consideredan adversary running a guard relay.5 When α is low,the guard’s bandwidth does not have significant role inthe selection weight. On the other hand, for high val-ues of α, the guard’s bandwidth plays a notable role inthe weights and guard selection. Given these facts, weconsidered a high bandwidth and a low bandwidth adver-sary in our evaluations. The high bandwidth adversaryinjects a guard relay with 55 MBps bandwidth, whichis in the top 5% of guard bandwidths as of February2015. The low bandwidth adversary adds a guard relaywith 2MBps bandwidth, which is in the bottom 10%of guard bandwidth and the minimum bandwidth forgetting the guard flag. In both cases, for each client weplaced the malicious guard in the client’s precise loca-tion (i.e. Dentry = Dclient−entry = 0).

Figure 13 shows the cumulative probability of thetime to the first stream get compromised for differentcases. As expected, both low bandwidth and high band-width adversaries have the same time to first compro-mise for α = 0.0 because the share of bandwidth is zeroin weights and the distance only matters. In high band-width adversary, when α increases, time to first compro-mise decreases, as shown α = 1.0 has the worst time

5 We only analyze the guard compromise, the case the clientsselect the malicious guards, not full compromise that the clientspick both malicious guard and malicious exit. We assume that ifthe client picks a malicious guard, some of her streams passingthrough that guard will be compromised.


0 0.2 0.4 0.6 0.8 10

2

4

6

·10−3

α

Frac.o

fcom

prom

ised

stream

slow BW adversaryhigh BW adversary

Fig. 15. Frac. of compromise streams as α varies in one-monthsimulations

to first compromised guard. In the low bandwidth ad-versary,when α is low the malicious guard’s weight ishigher and has more chance to be picked. As shown inthe figure, the time to first compromised guard is almostthe same for α equal to 0.5 and 0.0 because WD is dom-inated the the weights. For α = 1.0 in low bandwidthadversary, none of clients picked the malicious guardduring the 10-month simulation.

Figure 15 shows the fraction of compromisedstreams as α changes for one month simulation of bothadversary models. As we see, the compromise rate ishigh for high bandwidth adversary as α changes, becausethe malicious relay has high selection weight for bothhigh and low values of α. The compromise rate decreasesfor low bandwidth adversary as α increases because themalicious relay’s selection weight is low in high valuesof α.

A.5 AS level adversary

In this section we assume that the adversary controlsa single AS and applies the traffic correlation attackin order to de-anonymize users. Picking guards close tothe clients is assumed to decrease the AS path betweenthe client and the guard. As a result of this, the chanceof appearing a common AS on the both entry and exitsides of the traffic seems to decrease. To evaluate thistheory, we run TorPS modified with our guard selectionscenario for different values of α for one months andand 10 months the same as the previous selection con-figurations. We find that the fraction of compromisedstreams is very high in all cases and almost completelyindependent of alpha. Clients will have a compromisedstream within one hour of using Tor, and picking nearbyguards does not help.

An important reason for this is that the number ofASes is not significantly reduced by having shorter dis-tances. To see the relationship between the geographi-cal distance (between guards and clients) and AS pathlengths (the number of AS between guards and clients),we extracted all guards picked by our clients as α =0.0 and computed their distance between guard-clientpairs and their AS path lengths. Fig 14 shows a scatterplot of guard-client distances and their AS path lengths.There is not a clear relationship or correlation betweenthe distance and AS path lengths; for all short and longdistances, the majority of paths lengths have three orfour ASes.

B Modified Tuning FunctionIn this appendix, we describe our modification to theSnader-Borisov tuning function. Figure 16a shows theoriginal Snader-Borisov family of functions for differentvalues of s.

In this function, for a performance-concerned userthere is still a chance to choose low bandwidth relays.When the number of performance-concerned users in-crease, high bandwidth relays would be over-utilizedand become congested, which would result in these usersactually experiencing worse performance. We modifiedthis function to address these shortcomings as shown:

fs : [0 1]→ [0 1]

fs(x) = 1− psx

1− ps ×(

1− 1−GminSmax

× s)

(3)

In this function, Gmin (Gmin ∈ [0 1]) determinesthe smallest pool of high weight relays that users are al-lowed to use in s = Smax (Smax is maximum acceptableselection parameter), and p lets us control the curva-ture of the graph. Figure 16 shows the modified tuningfunction (the function presented by Snader et al. [29] isa special case of our modified tuning function for p = 2and (Gmin = 1).

Controlling the curvature is useful when we need tobalance the load on high weight relays if they are gettingcongested. Decreasing the curvature leads to decreasingthe probability of choosing high weight relays, whichit means we can control their load. This function alsohelps users to withdraw some portion of low weight re-lays depend on their selection parameter, but they can-not reach less than (Gmin× 100) percent of high weightrelays, which prevents them from choosing a small set of


0.0 0.2 0.4 0.6 0.8 1.0X

0.0

0.2

0.4

0.6

0.8

1.0f(X

)sb = 3sb = 6sb = 9sb = 12sb = 16sb = 20

(a)

0.0 0.2 0.4 0.6 0.8 1.0X

0.00.10.20.30.40.50.60.70.80.9

f(X)

G = 0.25

s = 3s = 6s = 9s = 12s = 16s = 20

(b)

0.0 0.2 0.4 0.6 0.8 1.0X

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

f(x)

p = 1.1p = 1.3p = 1.5p = 1.7p = 1.9

(c)

Fig. 16. (a) Snader-Borisov family of functions; (b) Our tuning function as s varies; (c) Our tuning function as p varies. Gmin = 0.25and Smax = 20.

high weight relays. As shown in Figure 16b, if we haven relays, as s increases from zero to Smax, the pool thatusers are allowed to select relays includes from all n re-lays to Gmin × n relays.

C Breaking TiesIn Section 7, we introduced a weighting function thatmixes geographical distance and bandwidth to selectrelays for the circuit. Using a combined function toweight relays requires combining two rather differentquantities—one being a measure of edges and anotherbeing a measure of the node—in a non-standard way.In this section, we propose a simple alternative: we firstuse one metric, bandwidth or distance, to narrow downthe list of relays and then use the other metric to “breakthe tie” and pick among the smaller list.

C.1 Distance-first

In this approach, we first select a set of relays with lowdistance (where distance is defined depending on its po-sition in the circuit) and then select a high-bandwidthrelay from this set. As in combined weighting, we as-sign weights wB and wD to each relay as defined inEquation (3). To choose a relay for each position in thecircuit, we first fill a bucket with k relays, selected basedon their distance weights wD. In particular, we use ourrelay selection function (see Section 7) with weights wDto pick k relays from among all relays flagged for thegiven position in the circuit, i.e. exit, entry, or middle.Note that this is a probabilistic selection, not a deter-ministic one, i.e. we do not take the k closest relays.Then among the k relays in the bucket, we again use

our relay selection function with weights for bandwidthwB to pick the relay.

The key parameter in this method is the bucket sizek. A large bucket will contain many relays, such thatmore of them are likely to have high bandwidth, but itwill also make it more likely to pick a relay farther fromthe best path. A small bucket is less likely to have poorchoices in terms of distance, but it is also less likelyto include high-bandwidth relays. We choose to havek = b

√nc, where n is the number relays available for

the specific position in the circuit. This setting puts agreater focus on distance than bandwidth.

C.2 Bandwidth-first

The other way to select relays in this approach is to usebandwidth to narrow down the list of relays and thenselect from among these high-bandwidth relays to getlow distance. In particular, we use our relay selectionfunction with bandwidth weights wB to get a bucketof k relays. We then apply our relay selection functionwith distance weights wD to select the relay. Here, largebucket size means potentially shorter paths but lowerbandwidths, while small bucket size should yield highbandwidth choices with little optimization for distance.We again choose k = b

√nc, which should ensure high

bandwidth on average.

D Targeted Attacks in RelaySelection Adversaries

The compromised circuit rates in different relay-leveladversaries.


0 5 10 15 200.000.020.040.060.080.100.12

Fractio

n

α = 0.9α = 0.8α = 1.0α = 0.1α = 0.0α = 0.5α = 0.7α = 0.6α = 0.15Vanilla

(a) Targeted Destination and Client

0 5 10 15 200.000.020.040.060.080.10

Fractio

n


(b) Targeted Clients

0 5 10 15 200.000.010.020.030.040.05 ·10−2

Fractio

n


(c) Targeted Destination

0 5 10 15 200.000.010.020.030.040.05 ·10−2

Fractio

n


(d) Non-targeted

Fig. 17. Fraction of compromised paths

Documents

Mohsen Imani*, Mehrdad Amirabadi, and Matthew Wright ... · 2018. 9. 3. · proposed mechanisms in terms of both relay-level and network-leveladversaries. Relay-Level Adversary Model