[PDF] Holding all the ASes: Identifying and Circumventing the Pitfalls of AS-aware Tor Client Design

Abstract

Traffic correlation attacks to de-anonymize Tor users are possible when an adversary is in a position to observe traffic entering and exiting the Tor network. Recent work has brought attention to the threat of these attacks by network-level adversaries (e.g., Autonomous Systems). We perform a historical analysis to understand how the threat from AS-level traffic correlation attacks has evolved over the past five years. We find that despite a large number of new relays added to the Tor network, the threat has grown. This points to the importance of increasing AS-level diversity in addition to capacity of the Tor network. We identify and elaborate on common pitfalls of AS-aware Tor client design and construction. We find that succumbing to these pitfalls can negatively impact three major aspects of an AS-aware Tor client -- (1) security against AS-level adversaries, (2) security against relay-level adversaries, and (3) performance. Finally, we propose and evaluate a Tor client -- Cipollino -- which avoids these pitfalls using state-of-the-art in network-measurement. Our evaluation shows that Cipollino is able to achieve better security against network-level adversaries while maintaining security against relay-level adversaries and

Full PDF

HHolding all the ASes: Identifying and Circumventing thePitfalls of AS-aware Tor Client Design

Rishab Nithyanand Rachee Singh Shinyoung Cho Phillipa Gill

Stony Brook University { rnithyanand, racsingh, shicho, phillipa } @cs.stonybrook.edu ABSTRACT

Trafﬁc correlation attacks to de-anonymize Tor users are pos-sible when an adversary is in a position to observe trafﬁc en-tering and exiting the Tor network. Recent work has broughtattention to the threat of these attacks by network-level ad-versaries ( e.g.,

Autonomous Systems). We perform a histor-ical analysis to understand how the threat from AS-level traf-ﬁc correlation attacks has evolved over the past ﬁve years.We ﬁnd that despite a large number of new relays added tothe Tor network, the threat has grown. This points to theimportance of increasing AS-level diversity in addition tocapacity of the Tor network.We identify and elaborate on common pitfalls of AS-awareTor client design and construction. We ﬁnd that succumbingto these pitfalls can negatively impact three major aspectsof an AS-aware Tor client – (1) security against AS-leveladversaries, (2) security against relay-level adversaries, and(3) performance. Finally, we propose and evaluate a Torclient – Cipollino– which avoids these pitfalls using state-of-the-art in network-measurement. Our evaluation shows thatCipollino is able to achieve better security against network-level adversaries while maintaining security against relay-level adversaries and performance characteristics compara-ble to the current Tor client.

1. INTRODUCTION

As governments and organizations increase their com-mitment to mass surveillance and online tracking, theTor anonymity network has become the de facto tech-nology for preserving anonymity and privacy on the In-ternet with nearly two million daily users [1].Tor’s popularity has made it a prime target for at-tacks and also increases the importance of improvingits defenses. In this paper, we focus on a long-standingclass of attacks known as traﬃc-correlation attacks. Ina traﬃc-correlation attack, an adversary correlates thecharacteristics of traﬃc ( e.g., packet sizes, inter-packettimings, etc. ) entering and exiting the Tor network.Successfully correlating these ﬂows results in the de-anonymization of Tor users – i.e., it becomes possibleto identify the destination server being contacted by aTor user. Traﬃc correlation attacks have been known about forover a decade [2] but as our study shows, Tor is stillincredibly vulnerable. Worse yet, the recent Snowdenleaks have conﬁrmed that the NSA and GCHQ, in col-lusion with several Internet Service Providers (ISPs),have actively been working to implement network-levelattacks in the wild [3, 4, 5]. In order to launch a traf-ﬁc correlation attack, an adversary needs to be able toobserve network traﬃc on (1) the path between the Toruser and the entry (relay) to the Tor network and (2)the path between the exit (relay) from the Tor networkto the destination server. Such attacks have been shownto be feasible for both, relay-level adversaries [6, 7, 8]and network-level adversaries such as Autonomous Sys-tems (ASes) [2, 9, 10, 11, 12, 13].While the problem of relay-level traﬃc-correlation at-tacks have been mitigated and solutions have been inte-grated into the current Tor client [14, 15], the problemof defending against network-level traﬃc-correlation at-tacks remains unsolved. While numerous defenses havebeen proposed [2, 16, 17, 18], none have been success-fully adopted in practice. We identify ﬁve pitfalls thatrender existing AS-aware Tor clients insecure, imprac-tical, or both. We characterize the impact of these pit-falls and propose a modiﬁed Tor client that is able tomitigate them.In this paper we make three major contributions.

Measuring the threat (Section 3).

We perform acurrent and a historical analysis to understand how thethreat from AS-level adversaries has evolved over thepast ﬁve years. From these measurements, we make thefollowing observations: • When considering Tor clients used speciﬁcally forthe purpose of loading webpages, 31% of the cir-cuits constructed by the Tor client in our exper-iments were found to be vulnerable to AS-levelcorrelation attacks. However, due to aggressivecircuit re-use by the Tor client, 58% of the web-sites loaded in our experiments were vulnerable tode-anonymization. When considering Tor clientsused for a mix of applications (Web, BitTorrent,IRC, email, etc. ), 30% of the circuits were found1 a r X i v : . [ c s . CR ] M a y o be vulnerable. • From our historical analysis, we ﬁnd that the threatfaced by Tor clients has grown. In the context ofclients used for loading webpages, we found thenumber of vulnerable circuits used by the clientincreased from 38% (2010) to 41% (2015). In thecontext of clients used for a mix of applications, wefound the number of vulnerable circuits increasedmore drastically – from 21% (2010) to 35% (2015).These results show that the threat has been in-creasing in spite of a massive growth in the size ofthe Tor network.

Evaluating existing defenses (Section 4).

Weidentify ﬁve pitfalls in the design of AS-aware Tor clients:(1) a lack of accurate Internet path data, and not con-sidering (2) the impact of asymmetric routing on theInternet, (3) the impact of BGP hijack and intercep-tion vulnerabilities, (4) relay-level adversaries, or (5)capacity of Tor relays. We characterize how these pit-falls impact the security and performance of existingAS-aware solutions.

Improving security and performance of AS-awareTor (Section 5).

Based on our evaluation, we designand construct Cipollino, an AS-aware Tor client whichcarefully avoids previous pitfalls while improving secu-rity and performance, compared to the current state-of-the-art. In particular, we show that only 1.4% of all thewebpages loaded by the Cipollino client were vulnerableto AS-level attacks, compared to 58% with the vanillaTor client. Further, Cipollino reduces the attack sur-face for relay-based adversaries by 80% relative to thestate-of-the-art AS-aware Tor client (Astoria). Finally,in terms of performance, the Cipollino client achievesmedian page-load times that are seven seconds fasterthan the Astoria Tor client and only 1.6 seconds slowerthan the vanilla Tor client.

2. BACKGROUND

In this section, we overview the current state of Torrelay selection and circuit construction algorithms. Thenwe present our adversary model which considers activeand passive network-level traﬃc correlation attacks.

Tor is a low-latency onion routing network that cur-rently consists of 7.1K relays and has nearly two milliondaily users [1]. When a user connects to a destinationserver via the Tor client, the client typically establishesthe connection using a nested and encrypted three relay circuit . The ﬁrst relay, called the entry-relay , commu-nicates directly with the Tor user. The last-relay, calledthe exit-relay , communicates directly with the destina-tion server. The key idea is that no single relay is si- multaneously aware of the identities of both, the source(Tor user) and destination of the circuit.

Tor relay selection.

The three relays in a Tor circuitare selected according to the following constraints [19]:(1) no relay may be selected twice in the same circuit,(2) no two relays belonging to the same family (adver-tised by the relays) may be selected as part of the samecircuit, and (3) no two relays belonging to the same /16subnet may be chosen as part of the same circuit.In addition to the above constraints, Tor is (by de-fault) conﬁgured to select an entry-relay from a re-stricted set of guard relays that are stable and havegood performance metrics. When the Tor client is con-ﬁgured to use guards as entry-relays, it maintains an or-dered list of guards and selects the ﬁrst usable (online)relay in this list to serve as its entry-relay [15]. Guardshelp mitigate the threat of relay-level attacks such asthe predecessor attack [20], selective denial-of-serviceattacks [21], and relay-level correlation attacks [8]. Forthe middle- and exit-relay positions in the circuit, relaysare selected based on their available bandwidths. Whilethe middle-relay is selected from the set of all availablerelays, exit-relays are chosen from a smaller subset ofrelays which have an exit ﬂag. Since relays are chosenwith probability proportional to their available band-widths, the problem of overloading small sets of relaysis avoided.

Circuit construction and usage.

Since relays will-ing to serve as Tor exits have the ability to specify whichports and IP addresses they are not willing to establishconnections to, not all circuits constructed by a Torclient are usable for incoming connection requests. Todeal with this, the Tor client pre-emptively constructscircuits so that at least two are available for every des-tination port seen in the past hour. This allows connec-tion requests to be served by existing circuits as soon asthey are received. In the event that the client receives arequest that cannot be satisﬁed by any available circuit,it constructs a new circuit using an exit-relay that canserve the IP and port speciﬁc to that request.

As a pre-condition to launch network-level correla-tion attacks, an attacker ( e.g., an Autonomous System(AS)) needs to be present on one of the paths enteringthe Tor network and one of the paths exiting it. Fig-ure 1 illustrates this condition. Here, to de-anonymize aTor client, an AS needs to be present on one of the solidpath segments and on one of the dashed path segments.More formally, if P SRC ↔ EN is the set of ASes onthe forward and reverse paths between the Tor client(source) and the selected Tor entry-relay and similarly, P EX ↔ DST is the set of ASes on the paths between theselected Tor exit-relay and the destination, then we saythat a Tor circuit is vulnerable to de-anonymization via orNetworkSRC DSTAB CD Figure 1: Condition required for launching traﬃc-correlation attacks: An AS needs to be present on oneof the two solid path segments – i.e., path segment Aor B – and on one of the two dashed path segments – i.e., path segment C or D.traﬃc-correlation if there is some AS A such that: A ∈ { P SRC ↔ EN ∩ P EX ↔ DST } (1)An adversarial AS may satisfy Equation ?? throughpassive or active means. Passive adversaries.

An AS may ﬁnd itself in a posi-tion to launch a traﬃc-correlation attack simply as a re-sult of the AS-level topology and the relationships ( i.e., customer-provider or peer-peer) it shares with otherASes. In order to defend against attacks from passiveadversaries, it is suﬃcient to have an accurate snap-shot of the ASes that occur in the sets P SRC ↔ EN and P EX ↔ DST for each choice of EN and EX . Given thisinformation, a correlation attack can be avoided by sim-ply selecting an entry- and exit-relay for which there isno AS A which satisﬁes Equation ?? (if such an entry-and exit-relay combination exists). Active adversaries.

Due to the dynamics and in-securities of the BGP protocol, ASes may also activelyseek to place themselves in a position to launch traﬃc-correlation attacks. For example, an AS may hijackor intercept traﬃc sent to the preﬁx associated withthe client, entry-relay, exit-relay, or destination server.Such targeted hijacks and interceptions potentially al-low adversaries to place themselves on any of the fourpaths illustrated in Figure 1. Defending against suchadversaries is more challenging due to need for accessto real-time control-plane data to identify AS that arelikely to be hijacking or intercepting traﬃc. This is inaddition to the snapshots of AS-level paths required fordefending against passive adversaries.

3. MEASURING THE THREAT

In this section we describe our methodology for mea-suring the potential threat from AS-level adversaries.We use a combination of live experiments on the cur-rent Tor network and simulations that capture a varietyof user workloads on snapshots of the Tor network from2010 to 2016. Table 1 summarizes our experimentalsetup.

We use a combination of live experiments using VPN

Period Setting Workload StreamsTested

Table 1: Summary of our experiments to study thethreat posed by AS-level adversaries.vantage points and simulations to understand the threatto Tor in practice, at scale, and over time.In our experiments, we consider the fact that due toregional diﬀerences in AS-level topologies, Tor clientsin diﬀerent regions face varying levels of vulnerability.Therefore, we consider Tor client located in ten coun-tries: Brazil (BR), China (CN), Germany (DE), Spain(ES), France (FR), England (GB), Italy (IT), Russia(RU), Ukraine (UA), and the United States (US). Thislist of locations was obtained by performing an inter-section of the countries with the largest number of Torusers [1] and the countries ranking the lowest on theFreedom House Internet freedom rankings [22].

Live experiments with VPNs.

In each live exper-iment, Crawler Incantatus [23] (a Selenium based web-crawler) and the Tor client were used to load webpagesfrom within each country, using a commercial VPN.

Simulating Tor behavior with TorPS.

Since theVPN vantage points only provide us a limited viewof each country, in our simulations we considered Torclients located in 100 of the most popular (in terms ofend-users [24]) ASes in each country.We use the Tor Path Simulator (TorPS) to analyzethe vulnerability of the Tor network, while consideringthe massive growth in the Tor network between 2010and 2015. TorPS is a realistic Python-based Tor sim-ulator which uses archives of previously published Torserver-descriptors and consensuses from the CollecTorproject [25] to model historical states of the Tor net-work. Given (1) the set of server descriptors corre-sponding to the period of the experiment and (2) the setof streams generated by the user (each stream consistsof a set of IP addresses, ports, and connection requesttimes), the TorPS simulator constructs circuits for eachconnection request within the stream, according to achosen client model (in our case the vanilla Tor client).This allows us to predict the relays that would havebeen selected by the Tor client, given a speciﬁc networkstate from the past.Each experiment was executed in the live or simu-lated setting in each of the ten countries. Additionally,the simulations were used to obtain a picture of thevulnerability of the Tor client based on network statesobtained for the Tor network between 2010 and 2015.Logs were maintained to track the circuits established3y the Live or Simulated Tor client.

When the Tor client selects relays for a circuit, it mayonly select exit relays that have agreed to transport thetype of traﬃc to be sent over the circuit ( e.g., someexits restrict commonly abused ports such as port 25 –SMTP). As a result, the vulnerability of the Tor clientcan depend on the applications used by the Tor user.We consider two diﬀerent client workloads described be-low.

Web model.

For each experiment using the webuser model, 200 websites were loaded by the Tor client.The list of 200 websites were dependent on the clientlocation – i.e., comprised of the local Alexa Top 100sites [26] and 100 country-speciﬁc sensitive (likely tobe blocked or monitored) webpages obtained from theCitizen Lab testing list repository. In the case of sim-ulated Tor clients, streams that were used as input tothe TorPS simulator were constructed using the IPs andports observed in the live experiments.

Mixed (application) model.

For each experimentconsidering a mixed user model, we considered clientsthat used Tor for a mix of Web, P2P (BitTorrent), e-mail and IRC chat for an hour long period. The purposeof these experiments was to understand if the securityof the Tor client was aﬀected when users required con-nections to non-HTTP(S) ports.

To measure the threat posed by network-level attack-ers, we need to be able to identify the diﬀerent networks( i.e.,

ASes) traversed by packets sent between the Torentry- / exit-relay and client / destination server. How-ever, ISPs generally treat their routing information andrelationships as trade secrets, making predictions basedon simulations inaccurate. To mitigate this problem,we use a novel path prediction toolkit – PathCache [27].The main idea behind PathCache is to perform AS-levelpath prediction by utilizing existing publicly availablemeasurements obtained from data-plane measurementplatforms such as RIPE Atlas [28], iPlane [29], CAIDAArk [30], and control-plane measurement platforms suchas RouteViews [31], RIPE RIS [32], and many others.In the remainder of our experiments, we consider acircuit constructed by a Tor client to be vulnerable ifthe set of ASes A in Equation ?? is non-empty. Here,we use the PathCache framework to identify the ASeson P SRC ↔ EN and P EX ↔ DST . M1: Measuring vanilla Tor’s vulnerability to AS-level adversaries (web model).

In this experi-ment we measured the fraction of vulnerable circuits V u l n e r a b l e F r a c t i o n CountryWebsites Circuits

Figure 2: Per-country breakdown of fraction of vulner-able websites and circuits [ M1 ]. V u l n e r a b l e F r a c t i o n o f c i r c u i t s CountryMixed model Web model

Figure 3: Per-country break down of fraction of vulner-able circuits found to be vulnerable with the Web andmixed user models. [ M2 ].constructed by the vanilla Tor client and the fraction ofwebsites that use one of these vulnerable circuits. Theresults, for each of the ten countries, are illustrated inFigure 2. The experiments were conducted using a VPNvantage point in each of the ten countries, while loading200 webpages (from each). Observation:

While only 31% of the circuits con-structed by the Tor client are vulnerable to AS-leveladversaries, we ﬁnd that due to aggressive circuit re-use and concentration of websites in a few ASes, thata larger fraction (58%) of all websites loaded by theclients end up using a vulnerable circuit.

M2: Measuring current vulnerability to AS-leveladversaries (mixed model).

In this experiment wemeasured the fraction of vulnerable circuits constructedby the vanilla Tor client when it was used for a mixof loading webpages, sending email, communicating via

IRC chat, and downloading ﬁles using BitTorrent. Theresults for each of the ten countries are illustrated inFigure 3. The experiments were simulated using theTorPS simulator and a user model based on streamsgenerated by the above applications. 100 of the mostpopulous (in terms of end-users) ASes [24] in each ofthe ten countries were selected as Tor client locations.

Observation:

We ﬁnd that although the average vul-nerability of mixed application clients (30%) in the coun-tries is similar to web-only clients (31%), the averagevulnerability of clients in DE, FR, and UA are mostaﬀected by considering mixed application traﬃc. This4 F r a c t i o n o f v u l n e r a b l e c i r c u i t s YearAvg VulnerabilityMin Vulnerability (FR) Max Vulnerability (CN)

Figure 4: The average, current minimum, and currentmaximum fraction of vulnerable circuits constructed byvanilla Tor when considering web model clients locatedin each of ten countries and the Tor network between2010 and 2015 [ M3 ].implies that the few exit-relays that allow communica-tion over non-HTTP(S) ports enable at-least one AS toperform a traﬃc correlation attack, given clients locatedin these countries. M3: Measuring historical vulnerability to AS-level adversaries (web model).

In this experimentwe measured the fraction of vulnerable circuits con-structed by the vanilla Tor client when loading 200 web-pages from each of our ten countries, while consideringthe changing landscape of the Tor ecosystem between2010 and 2015. In each country we consider clients lo-cated in the 100 most populous ASes [24]. Figure 4illustrates our results. Here, we show the average frac-tion of vulnerable circuits for clients in all 1000 ASes,the country whose 100 ASes had the least average vul-nerability (FR), and the country whose 100 ASes hadthe highest average vulnerability (CN).

Observation:

Most countries have an average of 25-45% of their circuits remaining vulnerable to AS-levelattackers. China is an exception with an average of 50-60% of their circuits remaining vulnerable. Further, inspite of the addition of nearly 6K new relays in the Tornetwork (since 2010), the average threat from AS-leveladversaries has grown – from 38% of all circuits beingvulnerable in 2010 to 41% in 2015.

M4: Measuring historical vulnerability to AS-level adversaries (mixed model).

Here, we usethe same settings as experiment M3 , only changing theuser model – i.e., while M3 calculated the fraction ofvulnerable circuits for users loading 200 webpages ineach country, here we consider users who perform a va-riety of non-http(s) related communication via Tor – e.g., IRC, email, BitTorrent, etc. . The results are illus-trated in Figure 5.

Observation:

We ﬁnd that the threat faced by clientsthat use Tor for a mix of non-Web applications is cur- F r a c t i o n o f v u l n e r a b l e c i r c u i t s YearAvg VulnerabilityMin Vulnerability (IT) Max Vulnerability (CN)

Figure 5: The average, current minimum, and currentmaximum fraction of vulnerable circuits constructed byvanilla Tor when considering mixed application modelclients located in each of ten countries and the Tor net-work between 2010 and 2015 [ M4 ]. Figure 6: The growth of the Tor network in terms ofcapacity, number of relays, and number of ASes.rently slightly lower than web-only Tor clients, in gen-eral. However, the threat has been growing at a signif-icantly faster rate. We see in the last ﬁve years thatthe average threat (in terms of vulnerable circuits con-structed in the course of our experiments) has increasedfrom 21% to 35%.

Discussion.

Our results indicate that the threat fromde-anonymization by AS-level adversaries is signiﬁcant,regardless of client location and what the Tor client isused for (web or mixed models). Although the threatfaced by clients used for non-Web purposes is slightlylower, we ﬁnd that it is growing at a faster rate thanWeb-only clients. This is due to the small number ofnew non-Web supporting exit-relays being added to theTor network.Investigating further into the reason for the growthof the threat from AS-level adversaries in spite of themassive growth of the Tor network, we ﬁnd that whilethe network has grown, the diversity of the ASes in thenetwork has not increased. This is illustrated in Figure6. Here, we see that while the number of relays in thenetwork has grown to nearly 250% and the capacityof the network has grown to over 3000% of their 2010values, the number of ASes in the network has laggedbehind (growing to only 160% of its 2010 value).

Take-away:

The Tor network faces a fundamentalproblem when dealing with AS-level attackers: the lack5f AS-level diversity in the network. In the absenceof a speciﬁc client-based solution for constructing AS-aware circuits, the threat from AS-level attackers is onlyexpected to increase.

4. PITFALLS OF AS-AWARE TOR CLIENTS

In this section we survey previous work to identify ﬁvecommon pitfalls ( P1 - P5 ) in the design and construc-tion of AS-aware Tor clients. We empirically demon-strate the negative consequences of each. The core component of any AS-aware Tor client isits path-prediction toolkit. The Tor client must ac-curately identify ASes on the paths from and to theselected entry- and exit-relays to build circuits thatavoid network-level correlation attacks. Designers ofAS-aware clients have three main options for predictingpaths between pairs of ASes:

Data-plane measurements:

Data-plane measure-ment tools such as traceroute allow measurement ofexact paths between a source and destination host. How-ever, this requires control of the source host, which maynot always be possible ( e.g., it is not possible to tracer-oute between the exit-relay and destination server) andhas a high latency cost, making it infeasible for clientsto perform on-demand.

Control-plane measurements:

Paths may also beobtained via control-plane measurement infrastructuresuch as BGP monitors ( e.g.,

RIPE [32], Routeviews [31]).However, they (like data-plane infrastructure) are lim-ited by the location and peers of the BGP monitors.

Algorithmic simulations:

This approach relies onseveral simpliﬁed assumptions about Internet routing.Typically, algorithmic simulators use empirically de-rived AS-level topologies, inferred inter-AS relationships( e.g., customer-provider or peer-peer), and a simpli-ﬁed model of Internet routing policies ( e.g., [33, 34]).While algorithmic simulators are able to predict AS-level paths between any pair of ASes, their accuracycompares unfavorably with paths obtained from data-and control-plane measurements. This is due to the in-completeness of AS-level topologies and the absence ofground-truth while inferring AS relationships.Table 2 illustrates the design choices of previous ef-forts to measure and defend against threats from AS-level attackers. Here, we see that all previous work,with the exception of LASTor [17] and Juen et al. [12]relied solely on algorithmic path simulators to identifythreatening ASes.To understand the impact of inaccurate path predic-tions, we test the accuracy of the state-of-the-art simu- The client was not made available by the authors aftermultiple requests. We try to objectively evaluate the paperbased on descriptions in the text. C D F Number of ASesOverestimated Underestimated

Figure 7: Number of ASes over or under-estimated bythe state-of-the-art algorithmic simulator [ P1 ].lator [34] which relies on the Gao-Rexford routing model[33]. For our experiment, 225 pairs of exit-relay and des-tination ASes were chosen from the circuits constructedby the vanilla Tor client in our VPN experiment ( M1 ).For each pair, a traceroute was executed from the AScontaining the exit-relay (vantage points were obtainedusing RIPE Atlas probes). IPs from each traceroutehop were resolved to their ASes using up-to-date BGPannouncement data to produce AS-level paths. TheseAS-level paths were compared with the AS-level pathsobtained by the algorithmic simulator. Figure 7 showsthe result of comparing measured with simulated paths.We ﬁnd that straightforward application of simulationcan lead to over estimating the number of ASes present80% of the time. Worse yet, 40% of the time simulatedpaths actually miss ASes contained in the paths, poten-tially leaving the client vulnerable to traﬃc-correlationattacks. Recent work by Sun et al. [13] demonstrated, via highaccuracy AS-level correlation attacks on the Tor net-work, that the threat from AS-level attackers was higherthan previously anticipated. This is primarily becauseof two factors: Adversaries can (1) exploit the asymme-try of routing on the Internet – i.e., exploit the fact thattheir presence on the forward- or reverse-paths at eitherend of the network is suﬃcient to launch an attack and(2) perform manipulation of routes via

BGP hijacks andinterceptions to place themselves on targeted paths. Inthis section, we consider the impact of adversaries onasymmetric routes. In the following section, we discussthe impact of BGP hijacks on Tor.From Table 2 we ﬁnd that the possibility of asym-metric routes was considered in several previous works.However, defending against these attackers is challeng-ing since it requires knowledge of reverse network paths,many of which cannot be measured directly. This com-pounds P1 for AS-aware Tor clients as these paths needto be predicted as well.We measure the consequences of not considering anadversary that exploits asymmetric paths. To do so, we6 ath Prediction Approach (P1) AsymmetricRoutes BGP Insecu-rities Relay-levelattacks LoadbalancingData-plane Control-plane Simulations (P2) (P3) (P4) (P5) Feamster &Dingledine [2] X X √ [35, 36] √ X X XEdman &Syverson [16] X X √ [37, 36] √ X X XLASTor [17] X √ X X X X XJohnson etal. [11] X X √ [37, 36] X X √ XJuen et al. [12] √ X √ [37, 36] X X X XAstoria [18] X X √ [33, 34] √ X X √ Cipollino [Thispaper] √ √ √ [33, 34] √ √ √ √

Table 2: Comparison of the measurement methodologies and defense contributions of the state-of-the-art. X indicatesthe corresponding criteria was not considered and √ indicates that it was. (P1-P5) V u l n e r a b l e F r a c t i o n o f W e b s i t e s CountryAsymmetric model Symmetric model

Figure 8: Fraction of websites using vulnerable circuitsagainst a symmetric and asymmetric adversary [ P2 ].repeat experiment M1 , but this time we only consideran attacker that can exploit only forward paths – i.e., we say that a circuit is vulnerable to de-anonymizationif there is some AS A such that: A ∈ { P SRC → EN ∩ P EX → DST } . Figure 8 compares the fraction of web-sites marked as vulnerable against a forward-path ex-ploiting (symmetric) adversary model with our (asym-metric) adversary model. We ﬁnd that operating underthe assumption of symmetric routing ( i.e., consideringonly forward-path exploiting adversaries) results in sig-niﬁcant threat under-estimation, with circuits to 17%of all websites identiﬁed as safe when they were in factvulnerable. The potential for BGP hijacks and interceptions tocompromise Tor traﬃc was highlighted by Sun et al. [13].In this section, we measure how vulnerable Tor relaysare to BGP hijacks and interceptions by sets of mali-cious ASes. For this experiment, we considered 10Kpairs of (source, entry) ASes and 10K pairs of (exit,destination) ASes. The source ASes were randomly se-lected from the 1000 popular ASes (100 in each of tencountries) used in experiments

M2-M4 while the entryand exit ASes were selected from the set of all Tor en-try and exit relays, respectively. Destination ASes were randomly chosen from the set of all destination ASesseen in experiment M1 (when loading 200 webpages ineach of ten countries). For our adversary ( i.e., ASesattempting to launch hijack and interception attacks),we selected the 16 malicious ASes identiﬁed in previouswork [38] as popular ASes for hosting illegal content,botnet C&C servers, and other malicious resources.For each pair of ASes we use heuristics from Goldberg et al. [39] to check which of the 16 malicious ASes iscapable of hijacking or intercepting traﬃc between thepair of ASes.We ﬁrst characterize the ability of the malicious ASesto hijack traﬃc for a chosen path. Figure 9b demon-strates the hijack and interception success rates of eachof the 16 ASes considered in this experiment. Herewe see that two ASes – ASN 9002: RETN (UA), ASN29131: RapidSwitch (GB) – achieve high hijack and in-terception success rate of nearly 50%. The case of ASN9002 can be explained by its high customer cone size(3271 customer ASes). On the other hand ASN 29131is a smaller AS with only one customer AS, however, itpeers with seven other large ASes having an AS rankunder 1K (based on customer cone sizes).Next we wanted to understand how vulnerable givenTor relays are to attack. Speciﬁcally, Figure 9a showsthe fraction of hijack/interception attempts were suc-cessful for the relays in ascending order. Each one ofthe relays we consider is susceptible to at least 20% ofhijacks and 12% of interception attempts.

We argue that a client which utilizes a smaller num-ber of relays to serve connection requests, over a periodof time, is less likely to encounter a malicious relay inthe Tor network. Thus, a Tor client that uses a smallernumber of relays is more secure against adversarial re-lays.We observe that many proposed defenses [2, 16, 17,18] do not consider the impact of AS-aware relay selec-7 F r a c t i o n o f s u cc e ss f u l a tt a c k s AS NumberHijacks Interceptions (a) Fraction of successful hijack and interception attacks by eachchosen malicious AS against selected Tor relays. F r a c t i o n o f s u cc e ss f u l a tt a c k s Relay PercentileHijacks Interceptions (b) Fraction of successful hijack and interception attacks by 16 ma-licious ASes against each chosen Tor relay.

Figure 9: Threats from omitting BGP insecurities inthe adversary model [ P3 ].tion on the security of the client against relay-level ad-versaries. This is problematic because many AS-awareclients build circuits on a per-destination basis, as op-posed to reusing a smaller set of existing circuits. Thisresults in them leveraging a large set of relays over time.To illustrate the impact of destination-based circuits,we conduct an experiment using the Astoria Tor client[18]. The Astoria Tor client performs on-demand cir-cuit construction for each new destination AS that itencounters, while re-using valid (live) circuits for previ-ously seen ASes. In this experiment, we use our VPNend points and crawler to load the 200 pages of theWeb user model ( i.e., the same settings as M1 ) usingAstoria and the vanilla Tor clients. We log the numberof unique relays utilized by the Astoria and Tor clientsto serve the page loads. We ﬁnd that circuits generatedby Astoria utilize nearly ﬁve times more relays than thevanilla Tor client – i.e., Astoria utilized 3,104 unique re-lays compared to the 623 relays used by the Tor client.This is a drastic increase in the potential for encounter-ing malicious relays over the vanilla Tor client.

We ﬁnd that much of previous work [2, 16, 17] doesnot consider the capacities of relays chosen as part ofAS-aware circuits. We argue that relay capacity is im-portant to consider to prevent custom relay selectionschemes from overloading low-capacity relays and re-ducing performance across the population of Tor users.

Pitfall Solution

P1. Simulated network paths PathCache empirical dataP2. Ignoring route asymmetry Including reverse paths indecision makingP3. Ignoring BGP hijacks Realtime BGP dataP4. Increasing risk of relay adver-saries Reuse safe circuits betweendestinationsP5. Overloading Tor relays Load balance across safecircuits

Table 3: Overview of how Cipollino mitigates the pit-falls of prior AS-aware Tor clients.As an example of the impact of ignoring relay ca-pacities, Wacek et al. [40] performed a study to analyzethe throughput of various Tor relay selection strategiesand found that: (1) strategies that ignored relay capac-ities had signiﬁcant drops in both, client and networkthroughput and (2) while LASTor had better perfor-mance than vanilla Tor when considering round-triptimes on established circuits, the throughput of theclient when used for page loads was 70% less than theTor client (compared to 25% more than Tor as demon-strated in original work by Akhoondi et al. ).The reason for this large disparity in performance re-ported in the two evaluations is due to Akhoondi etal. [17] only sending HTTP HEAD requests in their ex-periments (as opposed to downloading complete web-pages or documents). In addition to being unrepresen-tative of typical web traﬃc, such evaluations do notsuﬃciently stress all the relays chosen as part of a cir-cuit and as a result do not reveal the issues associatedwith capacity-agnostic relay selection.

5. THE CIPOLLINO TOR CLIENT

Based on the pitfalls we identiﬁed in the prior sec-tion, we design Cipollino, an AS-aware Tor client thatuses state-of-the-art network measurements and opti-mizations to mitigate the pitfalls. Table 3 summarizeshow Cipollino addresses each of the pitfalls describedin Section 4. We elaborate on each in the followingsections.

We reduce our dependence on algorithmic simulatorsby using PathCache [27] – a system that aggregates ex-isting data and control plane measurements to predictpaths. We fall back to simulations only when a pathquery cannot be answered using measurement data. Re-peated querying of the PathCache server every time acircuit needs to be built is (1) time consuming and (2)reveals destinations of interest to a third party ( e.g.,

PathCache server). To avoid this, the Cipollino clientsubscribes to daily updates of the routing graphs main-tained by the PathCache server and locally computespaths between ASes. This is beneﬁcial for two other8 C D F Number of ASesOverestimated Underestimated

Figure 10: Number of ASes over or under-estimatedby PathCache, when compared to exact AS-level pathsobtained by traceroutes [ P1 ].reasons:1. Oﬄine veriﬁcation of paths: Since the meta-datafor each edge in the routing graphs maintainedby PathCache includes information regarding thesource of the edge ( i.e., to indicate the edge wasobserved in a traceroute from the RIPE Atlas net-work, control-plane data from RouteViews, etc. )and the measurement ID corresponding to the source.This is useful for the client to verify the authen-ticity of of a random subset of the daily updatedpaths supplied by the Cipollino aggregator.2. Low communication overhead: The routing graphupdates are between 5-15 MB/day. This is fea-sible for clients in most settings. Additionally, itallows clients to identify safe circuits even if thePathCache server is not immediately reachable.To understand the beneﬁts of PathCache, we evaluatePathCache on two criteria: (1) accuracy of predictedpaths and (2) the fraction of paths where PathCache isable to answer using empirical data (vs. simulations).We measure the number of ASes over- or under-estimatedwhen compared with 225 traceroutes that were not al-ready aggregated by PathCache. Figure 10 illustratesthe results of this experiment. We ﬁnd that PathCacheis signiﬁcantly more accurate than the state-of-the-artalgorithmic simulator (cf. Figure 7). Most importantly,with 84% of all paths having no missing ASes (no un-derestimations), PathCache is much less likely to createvulnerable circuits due to incorrect path predictions.To understand how often PathCache is able to pre-dict paths using empirical data, we queried PathCachefor paths between (1) 1,000 source ASes (100 of themost populous ASes in each of the ten countries) andthe ASes of all entry-relays in the Tor network (265Kpath queries) and (2) between the ASes of all exit-relaysin the Tor network and all the destination ASes seen inour 2,000 web-page loads (312K path queries). Table 4shows the percentage of paths that were predicted byPathCache using empirical data. Here we see that Path-Cache is able to achieve reasonable coverage when con- N Coverage (Percentage)

SRC ↔ EN EX ↔ DST

10 36.6 34.025 32.1 32.450 27.6 31.3100 23.2 29.4

Table 4: Percentage of paths predicted by PathCachewhen considering only the top N percentile of relays (bybandwidth) [ P1 ]. F r a c t i o n o f q u e r i e s a n s w e r e d b y P a t h C a c h e CountrySRC to EN EN to SRC

Figure 11: Per-country breakdown of

SRC ← EN and EN ← SRC paths predicted by PathCache [ P1 ].sidering high capacity entry- and exit-relays (34-36%).This implies a higher accuracy of paths predicted fororganically generated Tor circuits, as the Tor client willtend to use these higher capacity relays.In Figure 11 we see the per-country breakdown of thefraction of path requests satisﬁed by PathCache. Inter-estingly, we see BR and CN in particular having a verysmall fraction of paths between their 100 AS sourcesand the Tor entry-relays. We speculate that this is dueto blocking of communication with Tor entry-relays inthese countries. This prevents traceroutes (that Path-Cache uses as a basis for path prediction) from success-fully traversing paths from client ASes in these countriesto Tor entry relay ASes.Depending on client location, PathCache is able toanswer between 15-50% of all queries issued to it by theTor client. Importantly, the paths returned for thesequeries are unlikely to under-estimate the presence of anAS. Additionally, coverage increases signiﬁcantly whenconsidering higher capacity Tor relays. These factorsmake it a good alternative to relying on simulations forpath prediction. Cipollino considers an adversary model that includesthe possibility of ASes exploiting (1) asymmetric routesand (2) BGP insecurities. To explain how we deal withsuch adversaries, we describe how Cipollino veriﬁes thesafety of a given circuit below. • Mapping destination IP addresses to ASNs and pre-ﬁxes:

Given a circuit and a destination IP address,9he Cipollino client ﬁrst uses an up-to-date oﬄineIP to ASN database (based oﬀ of BGP announce-ments) to obtain the AS numbers associated withthe network of the client, entry-relay, exit-relay, andrequested destination IP. This database (sourcedand updated by CAIDA) is supplied and updatedby the PathCache daily updates.Following this, Cipollino generates two pairs ofASes and two pairs of preﬁxes – ( AS EN , AS SRC ),( AS EX , AS DST ), (

P re EN , P re

SRC ), and (

P re EX , P re

DST ). • BGP anomaly detection:

In order to detect hijacksand interceptions in near-real-time, Cipollino re-ceives hourly (customizable in the client conﬁgu-ration) feeds from BGPStream [41] of current BGProuting anomalies. In particular, BGPStream pro-duces a live stream of ongoing Multiple Origin AS(MOAS) anomalies. MOAS anomalies, which oc-cur when a preﬁx is being announced by multipleorigin ASes. We use MOAS as an indicator of po-tentially anomalous routing behavior as a proof ofconcept. Beyond the scope of this paper we areworking to develop more accurate detection meth-ods for hijacks and interceptions which could beincorporated into Cipollino [42]. This feed of ASesis used to identify ASes that are likely to be hijack-ing or intercepting traﬃc to any of the preﬁxes inthe previously generated pairs – (

P re EN , P re

SRC )and (

P re EX , P re

DST ).Any AS X that is suspected to be hijacking orintercepting traﬃc to the preﬁx associated with theentry-relay is added to the set H EN . Similarly, thesets H EX , H SRC , and H DST are populated. • Path prediction:

The Cipollino client uses the lo-cally stored PathCache destination based graphs toobtain the set of ASes on the

SRC ↔ EN and EX ↔ DST paths. Additionally, the ASes oc-curring on the paths between H EN ← SRC and EN ← H EN are added to SRC ↔ EN . This ac-counts for all ASes that are able to view traﬃc char-acteristics in the event of a successful interception(and hijack) of traﬃc to EN . The same process isrepeated for H EX , H SRC , and H DST . • Circuit safety marking:

After the paths are com-puted, a circuit is marked as safe iﬀ the sets

SRC ↔ EN and EX ↔ DST have no intersection.The circuit safety veriﬁcation procedure shows thatCipollino does not mark a circuit as safe to serve a givendestination unless there are no ASes that are in a po-sition to view traﬃc characteristics at either end of acircuit, after accounting for route asymmetry and po-tential hijacks.

To reduce the number of relays it uses, Cipollino N u m b e r o f r e l a y s u s e d i n c i r c u i t s Client

Figure 12: Number of unique relays used in circuitsconstructed by each client while loading 200 webpagesin each of ten countries [ P4 ].employs a circuit pre-building strategy similar to thevanilla Tor client. Cipollino pre-emptively constructsa ﬁxed (and conﬁgurable) number of circuits. In ad-dition to the beneﬁt of reduced utilization of relays,two other arguments for pre-emptive circuit construc-tion come from the following observations drawn fromprevious work by Nithyanand et al. [18]: • For over 50% of all client locations and destinationASes considered, at-least 50% of all possible entry-and exit-relay combinations were safe from corre-lation attacks by AS-level adversaries. Therefore,by pre-building a number of circuits, we are verylikely to ﬁnd at least one safe circuit for a givendestination AS. • Constructing a new circuit is signiﬁcantly more ex-pensive than verifying the safety of an existing cir-cuit – i.e., due to the need for estimating the pathsbetween all possible (source, entry-relay) and (exit-relay, destination) pairs. Therefore, by pre-emptivelyconstructing circuits, Cipollino reduces the need toconstruct on-demand destination-aware circuits.To understand how circuit pre-building aﬀects thenumber of relays used by Cipollino, we consider the 200Web pages loaded in the Web user model with Cipollinoconﬁgured to pre-build and always maintain 4, 16, and64 live and usable circuits. Figure 12 compares thenumber of relays used in each setting with the vanillaTor client and Astoria. When Cipollino is conﬁgured toonly pre-build and maintain 4 active circuits, it utilizes786 relays (compared to the 623 relays used by Tor).This is signiﬁcantly lower than Astoria (3104 relays).Figure 13 also illustrates that pre-building circuits re-sults in the need for constructing fewer on-demand anddestination-aware circuits. In this experiment, 1,000Cipollino clients were simulated (with locations in the100 most populous ASes in each of ten countries) andissued connection requests for destinations associatedwith 200 country-speciﬁc webpages. Here we see that50% of the clients were able avoid on-demand circuitconstruction for at-least 86% of the connection requests,when just four circuits were prebuilt.10 F r a c t i o n o f c li e n t s Fraction of connections having a safe and usable pre-built circuitCipollino-4 Cipollino-16Cipollino-64

Figure 13: Distribution of the fraction of connectionrequests that were able to ﬁnd a safe and usable circuitfrom 4, 16, and 64 circuits pre-built be the Cipollinoclient [ P4 ]. C u m u l a t i v e P r o b a b ili t y Circuit Allocation Time (sec)AstoriaTorCipollino

Figure 14: Distribution of circuit allocation times [ P4 ].Reusing circuits, when possible, also improves theperformance of Cipollino as compared with other AS-aware Tor clients. Figure 14 shows the elapsed timebetween the arrival of a connection request and the al-location of a circuit to satisfy the request. As expected,since the vanilla Tor client always uses an existing cir-cuit, it is signiﬁcantly faster than Astoria and Cipollino,requiring under .1 seconds to allocate a circuit to over99% of incoming connection requests. Within the sametime constraints we see that the Cipollino Tor client isable to satisfy 60% of its requests, while the Astoriaclient can only satisfy 21%.Pre-emptive circuit construction yields two primarybeneﬁts. First, it is responsible for a nearly 80% re-duction in number of relays utilized by the AS-awareclient (compared to AS-aware clients that do not dopre-emptive construction), resulting in improved secu-rity against relay-level adversaries. Second, it results inreduced circuit allocation times when an existing circuitis reused. Load balancing is explicitly performed in two cases:(1) when constructing and replenishing Cipollino’s re-serve of pre-built circuits and (2) when there are mul-tiple safe circuits available for a connection request.In the ﬁrst case, Cipollino exactly mimics the load-balancing approach utilized by the vanilla Tor client– i.e., relays are selected in a circuit with probability C u m u l a t i v e P r o b a b ili t y Relay Bandwidth (MBps) AstoriaTorCipollino

Figure 15: Distributions of the bandwidths of the relaysselected by each Tor client [ P5 ].proportional to their bandwidth capacity. The secondcase, however, is more nuanced. When there are multi-ple safe entry- and exit-relay options – ( en , ex ), . . . ,( en n , ex n ) – Cipollino selects the i th entry and exit-relay combination with probability P r i , where: P r i = BW en i × BW ex i (cid:80) nj =1 BW en j × BW ex j (2)Here BW en j and BW ex j are the advertised band-widths of the entry- and exit-relay associated with the j th safe relay combination. This weighting of combi-nations works to ensure that each entry- and exit-relayis selected with the probability proportional to its ad-vertised bandwidth (when only considering safe relayoptions).Figure 15 compares the eﬀect of the load-balancingapproaches used by the vanilla Tor client, Astoria, andCipollino. We ﬁnd that they are all able to eﬀectivelyensure that relays do not get overloaded. Further, Cipollinodoes not perform any worse than Astoria, despite itsreuse of existing safe circuits. In this section we describe the complete architectureof Cipollino. Finally, we complete our evaluation ofthe security and performance of the complete Cipollinoclient.

Cipollino architecture.

Cipollino consists of threemain components: (1) an AS-level path aggregationtoolkit (PathCache), (2) a circuit allocator, and (3) acircuit builder. The interaction between each of thesecomponents is illustrated in Figure 16.The Cipollino client maintains a compact local repos-itory of destination-based routing graphs. These areupdated by the PathCache servers on a daily (or, con-ﬁgurable) basis. The PathCache path-stitching algo-rithms are used on these graphs to identify ASes thatare in a position to observe traﬃc ﬂowing between agiven source and destination AS.When the Cipollino client receives a request for a con-nection to a destination IP and port, the circuit alloca-tor uses the PathCache stitching algorithms and graphs11

Cipollino Client Measurement Aggregator

Daily updates (~5-15 MB)Control-plane measurementaggregator Data-plane measurementaggregatorRouteViews RIPE RIS RIPE Atlas iPlane Anomaly detectorBGPStream

REST API

Compact Routing GraphsAlgorithmic SimulatorCompact Routing Graphs Request for Connection to DST Circuit AllocatorCircuit Builder Connection to DST via circuit CCAIDA Ark

Figure 16: Architecture of the Cipollino Tor client.to identify if there are any pre-built circuits that arenot vulnerable to traﬃc correlation attacks by ASes. Ifexactly one of the safe circuits is able to serve the re-quested IP and port of the destination, then the circuitis used to satisfy the connection request. If there aremultiple such circuits, then one of them is chosen in ac-cordance with our load-balancing scheme described inthe previous section.In the event that none of the pre-built circuits is ableto satisfy the connection, the circuit builder constructsa circuit speciﬁcally for the requested connection. Theconstructed circuit performs also relay selection in away that achieves load-balancing across all relays in theTor network. Additionally, the circuit builder also han-dles the worst-case scenario – when there are no safe cir-cuits that may be built. In this case, the circuit builderborrows the linear program proposed by the AstoriaTor client to ensure that no single adversary is able tode-anonymize a large number of circuits.

Cipollino security against AS-level adversaries.

We compare the security of the circuits constructed byCipollino, Astoria, and the vanilla Tor client while per-forming 200 page-loads performed in each of ten dif-ferent client locations (same settings as M1 ). The re-sults are shown in Figure 17. From these results we seethat the Cipollino client circuits provide more securityagainst AS-level traﬃc-correlation adversaries. Only1.4% of all webpages loaded by the Cipollino client uti-lized a vulnerable circuit, when compared to 11% and57% for the Astoria and vanilla Tor clients, respectively. Cipollino page-load times.

To give a completepicture of the performance of the Cipollino client weconsider the time required to load a complete web-page(including third-party content). Figure 18 shows thecumulative distribution of page-load times of 2000 web-pages in ten client locations for the Cipollino, Astoria, F r a c t i o n o f v u l n e r a b l e w e b s i t e s CountryAstoria Tor Cipollino

Figure 17: Security of Cipollino and other clientsagainst AS-level adversaries. C u m u l a t i v e P r o b a b ili t y Page Load Time (sec)Astoria Tor Cipollino

Figure 18: Distribution of page-load timesand Tor clients. We ﬁnd that the time required for load-ing pages using the Cipollino and Tor client are quiteclosely matched with the median page-load time diﬀer-ing by only 1.6 seconds, while the Astoria Tor client isnearly 7 seconds slower.

6. CONCLUSIONS

In this paper we analyzed the threat faced by Torclients from AS-level adversaries from a current and his-torical perspective. We found that the current threat ishigh, with around 30% of all Tor circuits created in ourexperiments remaining vulnerable to de-anonymizationby AS-level correlation attacks, regardless of whetherthe Tor client is used for web browsing or other ap-plications. Further, our historical analysis points to afundamental problem with the Tor network – the lackof growth of AS-level diversity. Without speciﬁc eﬀortsfrom the Tor project to increase diversity of relays orincorporate AS-awareness in the Tor client, our studyshows that the threat is bound to increase.Our survey of previous work identiﬁed ﬁve commonpitfalls associated with the design and construction ofAS-aware Tor clients. We show how each of these pit-falls results in high under-estimation of the threat fromAS-level adversaries, or increased vulnerability to active(AS-level) and passive (relay-level) adversaries, or poorperformance characteristics.We ﬁnd that our AS-aware Tor client – Cipollino,designed speciﬁcally to address these pitfalls improvesthe current state-of-the-art by achieving better security12gainst network-level adversaries. Speciﬁcally, by us-ing a data- and control-plane measurement infrastruc-ture whenever possible, Cipollino reduces the fractionof vulnerable webpage loads from 57% (vanilla Tor)and 11% (Astoria) to 1.4%. Additionally, by incor-porating the concept of circuit pre-building and cir-cuit re-use, the Cipollino client signiﬁcantly reduces thethreat faced from malicious relays. As a consequence ofcircuit pre-building and re-use, the Cipollino client isalso able achieve performance characteristics compara-ble with the vanilla Tor client.

Data and source-code release:

In an eﬀort to en-able reproducibility and ease future comparative evalu-ation eﬀorts, the following resources will be made avail-able on acceptance of this work: the Cipollino Tor client,the destination-based graphs provided by PathCacheduring the time of this study, and the Web and mixed-application user-models used in our simulations.

7. REFERENCES [1] Tor Metrics. Tor Project: Anonimity Online.Available at https://metrics.torproject.org.[2] Nick Feamster and Roger Dingledine. LocationDiversity in Anonymity Networks. In

Proceedingsof the 2004 ACM Workshop on Privacy in theElectronic Society , WPES ’04, pages 66–76, NewYork, NY, USA, 2004. ACM.[3] How the NSA Attacks Tor/Firefox Users WithQUANTUM and FOXACID. .[4] The ’Tor Stinks’ Presentation. .[5] NSA Stores Metadata of Millions of Web Usersfor a Year, Secret Files Show. .[6] Xiang Cai, Xin Cheng Zhang, Brijesh Joshi, andRob Johnson. Touching from a distance: Websiteﬁngerprinting attacks and defenses. In

Proceedingsof the 2012 ACM Conference on Computer andCommunications Security , CCS ’12, pages605–616, New York, NY, USA, 2012. ACM.[7] Tao Wang and Ian Goldberg. Improved websiteﬁngerprinting on tor. In

Proceedings of the 12thACM Workshop on Workshop on Privacy in theElectronic Society , WPES ’13, pages 201–212,New York, NY, USA, 2013. ACM.[8] Tao Wang, Xiang Cai, Rishab Nithyanand, RobJohnson, and Ian Goldberg. Eﬀective attacks andprovable defenses for website ﬁngerprinting. In , pages 143–157, San Diego, CA,August 2014. USENIX Association. [9] Steven J. Murdoch and George Danezis. Low-CostTraﬃc Analysis of Tor. In

Proceedings of the 2005IEEE Symposium on Security and Privacy , SP’05, pages 183–195, Washington, DC, USA, 2005.IEEE Computer Society.[10] Steven J. Murdoch and Piotr Zieli´nski. SampledTraﬃc Analysis by Internet-exchange-levelAdversaries. In

Proceedings of the 7thInternational Conference on Privacy EnhancingTechnologies , PET’07, pages 167–183, Berlin,Heidelberg, 2007. Springer-Verlag.[11] Aaron Johnson, Chris Wacek, Rob Jansen, MicahSherr, and Paul Syverson. Users Get Routed:Traﬃc Correlation on Tor by RealisticAdversaries. In

Proceedings of the 2013 ACMSIGSAC Conference on Computer andCommunications Security , CCS ’13, pages337–348, New York, NY, USA, 2013. ACM.[12] Joshua Juen, Aaron Johnson, Anupam Das,Nikita Borisov, and Matthew Caesar. DefendingTor from Network Adversaries: A Case Study ofNetwork Path Prediction.

Proceedings on PrivacyEnhancing Technologies , 2015(2):1–17, 2015.[13] Yixin Sun, Anne Edmundson, Laurent Vanbever,Oscar Li, Jennifer Rexford, Mung Chiang, andPrateek Mittal. RAPTOR: Routing Attacks onPrivacy in Tor. pages 271–286, August 2015.[14] Xiang Cai, Rishab Nithyanand, Tao Wang, RobJohnson, and Ian Goldberg. A systematicapproach to developing and evaluating websiteﬁngerprinting defenses. In

Proceedings of the 2014ACM SIGSAC Conference on Computer andCommunications Security , CCS ’14, pages227–238, New York, NY, USA, 2014. ACM.[15] Roger Dingledine, Nicholas Hopper, GeorgeKadianakis, and Nick Mathewson. One FastGuard for Life (or 9 months). , 2014.[16] Matthew Edman and Paul Syverson.AS-awareness in Tor Path Selection. In

Proceedings of the 16th ACM Conference onComputer and Communications Security , CCS’09, pages 380–389, New York, NY, USA, 2009.ACM.[17] Masoud Akhoondi, Curtis Yu, and Harsha V.Madhyastha. LASTor: A Low-Latency AS-AwareTor Client. In

Proceedings of the 2012 IEEESymposium on Security and Privacy , SP ’12,pages 476–490, Washington, DC, USA, 2012.IEEE Computer Society.[18] Rishab Nithyanand, Oleksii Starov, Adva Zair,Phillipa Gill, and Michael Schapira. Measuringand Mitigating AS-level Adversaries against Tor.

Network and Distributed System Security (NDSS) ,13age To appear, 2016.[19] Torspec – Tor’s Protocol Speciﬁcations. https://gitweb.torproject.org/torspec.git/tree/path-spec.txt .[20] Matthew K. Wright, Micah Adler, Brian NeilLevine, and Clay Shields. The PredecessorAttack: An Analysis of a Threat to AnonymousCommunications Systems.

ACM Trans. Inf. Syst.Secur. , 7(4):489–522, November 2004.[21] Nikita Borisov, George Danezis, Prateek Mittal,and Parisa Tabriz. Denial of Service or Denial ofSecurity? In

Proceedings of the 14th ACMConference on Computer and CommunicationsSecurity , CCS ’07, pages 92–102, New York, NY,USA, 2007. ACM.[22] Freedom House. Freedom on the Net 2014.[23] Crawler Incantatus. https://bitbucket.org/rishabn/crawler-incantatus/overview .[24] APNIC. Visible ASNs: Customer Populations(Est.). http://stats.labs.apnic.net/aspop .[25] CollecTor. https://collector.torproject.org .[26] Alexa top sites. .[27] PathCache: Avoiding Redundant Traceroutes. .[28] RIPE NCC. RIPE Atlas. http://atlas.ripe.net .[29] Harsha V. Madhyastha, Tomas Isdal, MichaelPiatek, Colin Dixon, Thomas Anderson, ArvindKrishnamurthy, and Arun Venkataramani. iPlane:an information plane for distributed services. In

OSDI , 2006.[30] CAIDA. Archipelago Measurement Infrastructure. .[31] Advanced Network Technology Center. Universityof Oregon Route Views Project. .[32] RIPE NCC. Routing Information Service (RIS). .[33] Lixin Gao and Jennifer Rexford. Stable Internet Routing Without Global Coordination.

IEEE/ACM Transactions on Networking (TON) ,9(6):681–692, 2001.[34] Phillipa Gill, Michael Schapira, and SharonGoldberg. Modeling on Quicksand: Dealing withthe Scarcity of Ground Truth in InterdomainRouting Data.

SIGCOMM Comput. Commun.Rev. , 42(1):40–46, January 2012.[35] Z Morley Mao, Lili Qiu, Jia Wang, and YinZhang. On AS-level path inference. In

ACMSIGMETRICS Performance Evaluation Review ,volume 33, pages 339–349. ACM, 2005.[36] Lixin Gao. On Inferring Autonomous SystemRelationships in the Internet.

IEEE/ACM Trans.Netw. , 9(6):733–745, December 2001.[37] J. Qiu and Lixin Gao. CAM04-4: AS PathInference by Exploiting Known AS Paths. In

Global Telecommunications Conference, 2006.GLOBECOM ’06. IEEE , pages 1–5, Nov 2006.[38] Maria Konte, Roberto Perdisci, and NickFeamster. Aswatch: An as reputation system toexpose bulletproof hosting ases. In

Proceedings ofthe 2015 ACM Conference on Special InterestGroup on Data Communication , SIGCOMM ’15,pages 625–638, New York, NY, USA, 2015. ACM.[39] Sharon Goldberg, Michael Schapira, PeterHummon, and Jennifer Rexford. How secure aresecure interdomain routing protocols. In

Proceedings of the ACM SIGCOMM 2010Conference , SIGCOMM ’10, pages 87–98, NewYork, NY, USA, 2010. ACM.[40] Chris Wacek, Henry Tan, Kevin S. Bauer, andMicah Sherr. An Empirical Evaluation of RelaySelection in Tor. In , 2013.[41] BGPStream. https://bgpstream.caida.org .[42] CAIDA.