Adversaries monitoring Tor traffic crossing their jurisdictional border and reconstructing Tor circuits
aa r X i v : . [ c s . CR ] J a n Adversaries monitoring Tor traffic crossing their jurisdictionalborder and reconstructing Tor circuits
Herman Galteland ∗ and Kristian GjøsteenDepartment of Mathematical Sciences,Norwegian University of Science and Technology, NTNU { herman.galteland, kristian.gjosteen } @ntnu.no January 22, 2020
Abstract
We model and analyze passive adversaries that monitor Tor traffic crossing the border ofa jurisdiction the adversary is controlling. We show by simulations that a single jurisdictionis able to connect incoming and outgoing traffic of their border, tracking the traffic, anda coalition of jurisdictions is able to reconstruct parts of the Tor network, revealing user-website connections. We use two algorithms to estimate the capabilities of the adversaries,the first simulates a Tor network and the second analyzes data from the simulation andreconstructs the network.
Keywords.
Onion routing, anonymity, simulations.
The Onion Router (Tor) protocol [9] is a well-established onion routing system that tries toprovide a low-latency communications channel while also defending against network-level ad-versaries trying to reveal who is talking to whom. It is well understood how the Tor networkbehaves when an adversary compromises a fraction of the onion routers, and in particular thatif the entire network is monitored little or no security is left.In this paper we analyze the power of (coalitions of) less powerful adversaries who do notmonitor onion router traffic directly, but instead partition the network into jurisdictions andmonitor traffic crossing from one partition into another.These kinds of adversaries are interesting because they are real, in particular of the form ofprograms to monitor traffic crossing borders . In 2008 the Swedish parliament passed a bill al-lowing the Swedish National Defence Radio Establishment (
F¨orsvarets radioanstalt ) to monitorboth wireless and cable signals passing the Swedish border [17]. Denmark [18], France [6], andthe United Kingdom [29] all have similar laws on how to gather and store digital information.In 2016, the Norwegian government appointed a group of experts to investigate whetheror not the Norwegian Intelligence Service should be allowed access to communication crossingthe Norwegian border, similar to the Swedish National Defence Radio Establishment. Theinvestigating report concluded that the Norwegian Intelligence Service should be allowed tomonitor the Norwegian border [11], however, this has not yet been put into effect. It seemslikely that other countries either have or plan to have similar programs. ∗ This work is funded by Nasjonal sikkerhetsmyndighet (NSM), . .1 Related work Formal analysis of the Tor protocol comes in two variants. The first use an abstract model ofthe protocol and gives security bounds based on a worst case adversary [2, 13, 15, 16, 19, 20].The second includes a detailed description of the protocols in their analysis when proving thesecurity bound [3, 22, 37].Adversaries that observe both ends of a Tor circuit can connect the user with the websiteit is communicating with [25, 28]. The literature has considered adversaries controlling: an autonomous system (AS) [14, 34]; an
Internet exchange point (IXP) [26]; and several ASes andIXPs [22, 27]. It has been shown that ASes can observe both ends of Tor circuits [37]. Tor pathselection algorithms has been proposed to avoid detection from ASes [1, 12].An adversary with access to timing, packet size, and directionality of packets sent overan encrypted HTTP tunnel can reveal the identity of the server and user by traffic analysis attacks [4, 7, 21, 24, 33, 39]. Countermeasures to traffic analysis attacks includes paddingmessages [8] and morphing Tor traffic to mimic traffic associated to other servers [39]. Notethat hiding the packet length is insufficient [10].
Tor network simulators [31, 35] makes it possible to analyze the effectiveness of adversariesversus the Tor protocol.A stepping stone is an intermediate node used by an attacker to conceal his identity. Al-gorithms used to detect stepping stones analyses streams of traffic to confirm or reject theexistence of intermediate nodes between the analyzed traffic streams [5, 38].
In this paper we model and discuss a specific adversary versus the Tor protocol. The jurisdic-tional adversary is similar to an adversary controlling AS(es) or IXP(s), however, the ASes andIXPs are typically located inside a jurisdiction whereas we consider a passive adversary thatonly monitors traffic crossing the border of a jurisdiction. Further, an adversary controllingan AS or an IXP would see all traffic inside their network whereas an adversary monitoringjurisdictional borders would not.We simulate a Tor network, which includes the adversaries monitoring and storing trafficcrossing their border. A chosen coalition of jurisdictions is trying to reconstruct the simulatedTor network by analyzing the stored data using traffic analysis. We do not morph the Tor trafficsince the adversaries are only interested in the existence of traffic and not what it looks like.Algorithms used to detect stepping stones analyzes streams of traffic to find intermediatenodes between the streams. Similarly, our reconstruction algorithm attempts to connect streamof traffic between known onion routers to recreate circuits. The techniques used to detectstepping stones could be used to detect onion routers. The difference between the steppingstone literature and our work is the adversary we are modeling and analyzing, where we assumethat the location of all onion routers is already known and we want to connect Tor traffic toreconstruct Tor circuits.
The model for our overlay network of Tor is in Section 3, this model is used to classify the typesof traffic, and connections, the jurisdictional adversaries can observe, and create. The simulationalgorithm is described in Section 4 and the reconstruction algorithm in Section 5. In Section 6we present the reconstruction test results. We conclude with a possible countermeasure againstthe adversaries and summarize the adversaries in Section 7. We include the parameters usedfor the reconstruction results in Appendix A. 2
Background
The Onion Router protocol is an anonymous communication protocol [9]. The Tor protocoluses intermediate nodes called onion routers to achieve anonymity. A user establishes a circuit of onion routers in the Tor network to communicate with a server, where each onion routeronly knows the identity of its neighboring nodes. The first onion router of a circuit is a guardnode G and when a user creates a new circuit he picks the guard node from a small set of onionrouters, the default Tor configuration is three guard nodes. The last onion router communicateswith the server on behalf of the user and is called an exit node E . The Tor protocol does notensure encryption between the exit and the server and the a malicious actor could abuse theinformation. Only a few onion routers get marked as an exit and it is believed that the majorityof the exit nodes are honest. The intermediate node, between the guard and exit, is a relaynode R . We let circuit node refer to any of the nodes in a circuit. For simplicity we assumethat all circuits consist of one user, three onion routers, and one server, which is the defaultTor configuration. Restricting circuits to contain only three onion routers is not essential forour reconstruction algorithm.The user establishes a secret key with each onion router in the circuit and encrypts data inlayers when sending it to the server, where each onion router removes one layer of encryptionbefore relaying the data to the next node. When the server sends data back to the user eachonion router encrypts the data and adds a layer. Note that we are only interested in the flowof information and will not include any encryption in our simulation. We describe the overlay network of the Tor network and use this model to determine what typesof traffic a jurisdiction could observe.
The overlay network of the Tor network describes how information is sent between circuit nodes.A node in the overlay network represents a jurisdiction and an edge represents a communicationconnection between two jurisdictions. A jurisdiction contains a number of circuit nodes, wherewe assume the jurisdictions know which node is located inside its border. Information is sentin the overlay network when circuit nodes communicate. If two communicating circuit nodesare located in the same jurisdiction then no information is sent, and if the two circuit nodesare located in two different jurisdictions a network path in the overlay network is chosen. Thispath consists shows how the traffic is sent between jurisdictions, from the first jurisdictioncontaining the sender circuit node to the last jurisdiction containing the receiver circuit node,and determines which jurisdiction is able to observe the traffic data sent between the two circuitnodes.Note that we make a simplification. Traffic between two circuit nodes inside a jurisdictioncould very well cross the jurisdiction’s borders in the physical network. In fact, since routing isdynamic, it could cross borders one day and not cross borders the next. Hence, the adversariesget less information in our model than in the real world.
As a Tor user communicates with a website they both generate traffic data. The user sendsdata packets to the first node of the circuit, which forwards it to the next node in the circuitand so forth until the website receives the user’s packets, similar for the website. Packet data3
G R E WJ Type 1 J Type 0 J Type 2 J Type 3 J Type 1 t t t t Figure 1: An example of the overlay network with a Tor circuit. U denotes a user, G aguard node, R a relay node, E an exit node, W a website, J , . . . , J denotes five distinctjurisdictions, and t , t , t denotes timestamps for packets traveling between two circuit nodes. J and J observe Type 1 traffic, an endpoint. J observe Type 0 , traffic passing through. J observe Type 2 traffic, where an incoming and an outgoing packet share a common node G andthe timestamp difference | t − t | is close to an expected value. J observe Type 3 traffic, wherethe observed incoming and outgoing packets do not share a node but the timestamp difference | t − t | is close to an expected value. The solid line shows the network paths and the dashedline shows the Tor circuitsent between two circuit nodes are transferred over a network path and all jurisdictions in thepath observe the packets’ metadata information. We assume a jurisdiction learns the sender,receiver, and direction of the packet and has the timestamp for when it observed the packet.A packet is observes when it crosses the border of a jurisdiction. A packet can be incoming,entering the jurisdiction, outgoing, leaving the jurisdiction, or passing through a jurisdiction.The jurisdictional adversaries want to reconstruct the Tor circuits to reveal the sender anduser, breaking the relationship anonymity [30] of the Tor protocol. Using the observed packetdata a jurisdiction can combine incoming and outgoing packets using traffic analysis. When anonion router receives a packet it will either encrypt or decrypt it, to add or remove an onionlayer. This cryptographic computation takes time and there will be a timestamp differencebetween the observed incoming and outgoing packet. If the timestamp difference is close toan expected value then the two observed incoming and outgoing packets are most likely partof the same circuit, which means they can be connected. The time it takes to send a packetover a network cable is negligible compared to the time it takes for a circuit node to do itscryptographic computations and we assume that sending packets over a cable takes no time atall. We classify the types of connections a jurisdiction can create from its observed packets intofour categories, see Figure 1 for a visual description; Type 0
A single packet passing through the jurisdiction, where the sender and receiver nodeof the packet is not located inside the jurisdiction’s borders.
Type 1
A single packet ending in the jurisdiction, where either the sender or the receiver nodeof the packet is located inside the jurisdiction and is an endpoint of the circuit (a user ora website).
Type 2
One incoming and one outgoing packet share a common node inside the jurisdictionand the packets’ timestamp difference is close to an expected value.
Type 3
One incoming and one outgoing packet that do not share a common node, but theirtimestamp difference is close to an expected value.Packets that can be connected are combined and stored as partial circuits , each partial circuitcontains a path of circuit nodes representing a partial Tor circuit and timestamps. We say apartial circuit has length n if its path consists of n circuit nodes. The timestamps are collectedfrom the packet(s) it is created from, and all timestamp of any packet which would makethe same partial circuit. (Many similar packets each with one timestamp makes one partialcircuit with many timestamps.) The timestamps may be ordered into different sets to show thedirection and flow of traffic over the partial circuit.4 ype 4 U G R ER E Wt t (a) Two partial circuits thatshare two nodes, and thereare enough timestamps that areequal t = t . Type 5
U G RR E Wt t (b) Two partial circuits thatshare one node, and there areenough timestamps differences | t − t | that are close to an ex-pected value. Type 6
U G R E Wt t (c) Two partial circuits thatdo not share any nodes, how-ever, there are enough times-tamps differences | t − t | thatare close to an expected value. Figure 2: Examples of partial circuits and how they can be connected. U denotes a user, G aguard node, R a relay node, E an exit node, W a website, and t , t denotes timestamps forpackets traveling between two circuit nodes. It is very unlikely that a single jurisdiction is able to reveal the sender and receiver of any circuit.A coalition, however, can combine their analyzed partial circuits and potentially create full Torcircuits, breaking the relationship anonymity. The colluding jurisdictions share their partialcircuits with each other and try to combine them. We want to track the packets traveling fromone jurisdiction to the next and look for partial circuits with paths that overlap, such that wecan make a longer path by combining them. We classify the types of connections a coalitioncan create from its partial circuit, see Figure 2 for a visual description;
Type 4
Two partial circuits with paths that share two common nodes, where the two nodesare located at the end of the first and at the beginning of the second, and there are enoughtimestamp pairs, one from each partial circuit, that are identical.
Type 5
Two partial circuits with paths that share a common node, where that node is locatedat the end of the first and at the beginning of the second, and there are enough timestamppairs, one from each partial circuit, that have a timestamp difference that is close to anexpected value.
Type 6
Two partial circuits with paths that do not share any common nodes, but there areenough timestamp pairs, one from each partial circuit, that have a timestamp differencethat is close to an expected value.Two partial circuits which combines into a
Type 4 connection should have identical timestamps.They share two nodes and the timestamps sent between these two nodes are should be presentin both partial circuits.Note that
Type 5 and connections are similar to Type 2 and connections, the onlydifference is that partial circuits are being connected instead of single packets and in the recon-struction algorithm we reuse many of the methods for connecting these types of traffic. We describe our algorithm simulating the Tor network. The flow of information is sent in ouroverlay network, detailed in Section 3. In our simulator we assume that the adversaries are ableto recognize Tor traffic and only generate Tor traffic, since onion routers usually only send Tortraffic to each other and all onion routers are known. Further, we assume that the adversarieshave analyzed the distribution of timing patterns of Tor traffic. They will use this knowledgeto statistically connect traffic entering and exiting their jurisdiction.5e initiate a network of jurisdictions, place onion routers, and define the user and websitedistributions. When a user communicates with a website, creates a new circuit, or destroyscircuits we generate data. All traffic generated is sent between circuit nodes in the form ofpackets. Any data crossing the border of a jurisdiction is observed and stored, either as incomingor outgoing traffic. This data will be used in the reconstruction algorithm, detailed in Section 5.We try to simulate the real world as best as we can, where information about the Tornetwork is gathered from the Tor data analysis website “Tor metrics” [36].
Each initialized jurisdiction represents a real world country. Two jurisdictions are connected byan edge if they share a border or if they connected to the same underwater internet cable [32].Guard, Relay, and Exit nodes are placed in jurisdictions, where the location of each node isgathered from Tor metrics [36].Users and websites are not placed in a jurisdiction at initialization, they are chosen andcreated during the simulation. The distribution of Tor users is gathered from Tor metrics [36]and a user can communicate with a website from any jurisdiction, hence, the websites areuniformly distributed.
The simulation runs for n iterations and each iteration generates data for a user that commu-nicates with a website.We pick either an existing ready user or create a new. If there is a ready user, we create anew circuit if its existing has been up for more that 10 minutes. If there is no ready user wemake a new user with a fresh circuit. Creating circuits generate traffic data.When a user and website communicate they sends data over the circuit, where each node inthe circuits sends packets to each other. A packet includes a timestamp, the sender and receivernode, the circuit ID of the Tor circuit, and packet length. It has the form Timestamp Sender > Receiver (Circuit ID) Length . Note that the circuit ID is only used to verify the output of the reconstruction algorithm.
A global
TIME parameter is used to maintain the order of the user’s activity, it is increased bya positive value between each iteration. The global parameter stays fixed during an iterationwhile the active user’s time continues. As a packet is forwarded in the circuit the nodes doescryptographic operations on it, although we do not do the actual computations we add an onionrouter delay to the users time. Similarly, we add a sender delay between packets sent from theuser and website.A lognormal distribution is used to sample these delays. Each onion router use its owndistribution to compute its delays, and it is sampled from a family of lognormal distributions.We do not know which distribution each onion router is using. We only know the family ofdistribution, which is chosen such that the circuit round-trip latency of the simulated networkis close to the real Tor network’s latency [36].When the user has finished its activity in the current iteration it is getting ready for its nextby waiting, as if reading the website it is communicating with. An activity delay is added tothe user’s time, which is uniform. Whenever the
TIME parameter is larger than the user’s timeactivity delay it can be chosen as a ready user.6 .4 Write observed traffic data
Jurisdictions in the network path between two circuit nodes observes and stores the traffic.Each jurisdiction writes data to file as either incoming or outgoing traffic, this data will be usedin the reconstruction algorithm.
The runtime of the simulation algorithm is mainly dominated by writing and storing all datagenerated, the number of iterations n determines how much data is generated. For each it-eration we generate traffic data for one Tor user, as it communicates with a website. Usingthe simulation parameters in Table 1, one iteration will on average generate 14,000 packets.However, more that one jurisdiction observes each packet and on average 60,000 packets arestored each iteration.In wall-clock runtime for the simulation algorithm for: n = 1000 is roughly 5 minutes, n = 10000 is 4 hours, and n = 100000 is 11 hours. We describe our reconstruction algorithm, where a coalition of jurisdictions wants to reconstructTor circuits and reveal the sender and website.We partially reconstruct a simulated Tor network, created using the algorithm described inSection 4, using the packets generated in the simulation. Each jurisdiction (from a chosen setof collaborators) process their observed data to make partial circuits, which will be connectedfurther to create full circuits. The jurisdictions output is verified by comparing it with the realcircuits created in the simulation. We assume all Tor circuits has length five, this is not essentialfor our algorithm.
Each jurisdiction process its observed packets to make partial circuits, using the classificationdiscussed in Section 3. All packets are labeled either as incoming, entering the jurisdiction,or outgoing, leaving the jurisdiction. We iterate over all incoming packets and try to combineeach one with an outgoing packet. We only look at the outgoing packets which are close tothe incoming packet, with respect to time. We first look for trivial connections: incoming oroutgoing packets which can be classified as a
Type 0 or (packets passing though or packetsending inside the jurisdiction). If the incoming packet it not a trivial connection we lookfor an outgoing packet that share a common node with the incoming packet, that is, we tryto make a Type 2 connection. If there are no outgoing packets with a common node we look
Type 3 connections, where we want to find the outgoing packet which fits best with the incomingpacket based on their timestamp difference. For
Type 2 and we combine packets if theirtimestamp difference is close to some expected value, this value is the expected onion routerdelay and is derived from the family of lognormal distributions used in the simulation algorithm.All connections made are stored as partial circuits, which contains the following information:a path of circuit nodes, four sets of timestamps, a probability score, and a list of circuit IDs.The four sets of timestamps shows the flow of data traveling over the path. Two of the setsrepresents packets traveling in one direction, say from left to right, where one contains incomingtimestamps on the left side and one contains outgoing timestamps on the right side. Similar forthe remaining two sets where the direction is opposite of the first two (from right to left).The probability score is used to evaluate how likely the circuit is part of a Tor circuit. Thescore is based on the time difference of the packets the partial circuit is made from. Partialcircuits made from Type 0 and connections have a score of zero. The score is the output of7he probability density function of a lognormal distribution with the time difference as input.The distribution is derived from the family of lognormal distributions. The closer the differenceis to the expected onion router delay the higher the score is. For Type 2 and
Type 3 connectionswe only connect an incoming packet with the outgoing packet which results in the highest score.A partial circuits probability score is equal to the sum of each of its packet pair’s score.To verify our data we use the Tor circuit IDs, where each circuit ID is collected from thepackets used to make the partial circuit. Each Tor circuit has one unique ID, but a partialcircuit can have more than one ID stored. Note that we do not use the circuit IDs during thereconstruction process, they are only used to verify the output.When a jurisdiction has analyzed all of its packets we discard all partial circuits that aremost like not part of a true Tor circuit, that is, if it has a low probability score. We set thediscard limit based on trial and error, with a small discard limit we get a high false positiverate and with a large discard limit we get a low reconstruction rate.
Colluding jurisdictions share their partial circuits with each other to create full Tor circuits.We start by finding partial circuits that share two nodes, to find
Type 4 connections. Thenwe look for partial circuits that share one node, to make
Type 5 connections. Following bylooking for partial circuits that have enough timestamps pairs, one from each partial circuit,that have a timestamp difference close to the expected value, that is,
Type 6 connections. Ifany new partial circuit was created while looking for these three types of connections we restartthe partial circuit combination process, which is continued until there are no more circuits tocombine.We create a new partial circuit when we combine two, where we keep some of the dataand discard some. We combine the two paths and any overlapping nodes are merged together.The new partial circuit only need four sets of timestamps, where we take two from the firstpartial circuit, say the two leftmost, and two from the second, say the two rightmost. Wecalculate a new probability score for the circuit, where we use the timestamps that is goingto be discarded to calculate the score, using the same method we use to scoring packets. Allcircuit IDs contained in the two partial circuits are included in the new.The reconstruction algorithm terminates when there are no more partial circuits to combine.The output is all partial circuits of length tree or longer, with a large enough probability score.Length five partial circuits are the potential full Tor circuits.
From the simulation we store all Tor circuits created, including their circuit IDs, and we usethis to check the output of the reconstruction algorithm.We compare the partial circuit’s path with all simulated Tor circuits with a circuit ID that isequal to one of the partial circuit’s stored IDs. The partial circuit is correct if the reconstructedpath is equal to, or part of, one of the simulated path. That is, a partial circuit is consideredcorrect if is its path is part of a simulated Tor circuit’s path and it was created using packetdata that was indeed sent over that Tor circuit.We split the output into two groups: incorrect partial circuits, showing the false positiverate, and correct partial circuits, from which we get the reconstruction rate and the relation-ship revealing rate. The reconstruction rate shows how much of the simulated Tor circuitsthe algorithm reconstructed, and the relationship revealing rate shows how many user-websiteconnections we found. 8 .4 Runtime
The main contributing factor to the runtime of the reconstruction algorithm is the number ofpackets, the second is the number of timestamps. The algorithm can be split into two parts: thefirst analyze packets to make partial circuits, the second analyze and combine partial circuits.Let J denote the set of colluding jurisdictions, cooperating in analyzing packets and partialcircuits. Each jurisdiction J ∈ J analyzes its incoming and outgoing packets to combine themand stores the timestamps. Let p denote the number of incoming packets J has stored, weassume J also have p outgoing packets. We look through all p incoming packets and for eachof them we compare it with some of the outgoing packets, we are only connect packets thathave timestamps close to each other and keep a short list of outgoing packets with timestampsclose to the current incoming packet’s. Iterating over the packets is at most O ( p log p ). Foreach packet we connect we store its timestamp in a sorted list and inserting a timestamp is O ( p/k ), where k is the number of partial circuit made by jurisdiction J . The runtime of thepacket analysis algorithm, for each jurisdiction J , is O (( p log p ) /k ), where p is the number ofpackets (incoming or outgoing) J has stored and k is the number of partial circuits J has made.The number of colluding jurisdictions |J | is a small constant. As an optimization, we store allpartial circuits a jurisdiction makes and we only need to run the packet analysis algorithm fora jurisdiction once. (If we want to change the set of collaborating jurisdictions we do not haveto redo the packet analysis every time.)All partial circuits made by the colluding jurisdictions are analyzed in the second part ofthe reconstruction algorithm. Let K denote the total number of partial circuits. When wecombine partial circuits that share at least one common node we have a O ( K log K ) methodfor iterating through them, similar to the packet analysis method. However, when we combinepartial circuits that do not share any common node we need a O ( K ) method for iteratingthough them. We want to see if each partial circuit fits with all other, based on their combinedpath and timestamps. If two partial circuits’ combined path is logical, looks like a Tor circuit,then we evaluate their timestamps and give a probability score for how well the two partialcircuits fit together. Let t denote the number of timestamps a partial circuit has. Evaluatingthe timestamps is O ( t log t ), using the method for packet analysis, and if the new probabilityscore is high enough we create a new partial circuit and insert timestamps into sorted lists,which is O ( t ). The runtime of the packet analysis algorithm is O ( K t log t ).The wall-clock runtime of the reconstruction algorithm for: a simulated network with 1000iterations and five jurisdictions is three hours, a simulated network with 10000 iterations andfive jurisdictions is two and a half days. Our implementation is not perfect, and here we mention possible improvements to the recon-struction algorithm. The implementation is written in python, where an implementation writtenin a different language could be more efficient. Furthermore, each jurisdiction analyze their owndata and can be run in parallel.
In the result we look at: the false positive rate, the reconstruction rate, and the relationshiprevealing rate. The false positive rate shows how many of the partial circuits are incorrect,the reconstruction and the relationship revealing rate only look at the partial circuit that arecorrect. The reconstruction rate shows how much of the simulated Tor circuits is reconstructed,partial circuits of length three or longer is used to find the reconstruction rate. The relationshiprevealing rate shows how many partial circuits that reveal the user-website connection.9 P e r ce n t ag e Reconstruction %Relationship revealing %False positive % 6 8 10 12 14020406080 Number of jurisdictions P e r ce n t ag e Reconstruction %Relationship revealing %False positive % (a) Reconstructing a simulated network with alarge number of iterations. (b) Reconstructing a simulated network with asmall number of iterations.
Figure 3: Comparing reconstruction results. The more iterations used in the simulation themore data is generated, more data means more connections can be made – both correct andfalse ones.
We simulate a Tor network, with a specified number of iterations, and use the output of thesimulation algorithm as input to the reconstruction algorithm. We run the simulation algorithmonce and the reconstruction algorithm several times. Every time we reconstruct we change thenumber of colluding jurisdictions. We include two reconstruction tests, in the first we look athow many simulation iterations affects the reconstruction results and in the second we look athow the coalition size affects the reconstruction results. The parameters used for the simulationsare in Table 1, and the parameters used for the reconstructions are in Table 2.In the first reconstruction test we compare reconstruction results from a simulation with alarge number of iterations with reconstruction results from a simulation with a small number ofiterations, see Figure 3. The average number of active Tor users in the simulation algorithm isclose to the reported number of active users on Tor Metrics. The number of iterations specifiedfor a simulation changes how many users have been active, but the average number of activeshould still be the same for all simulations. We see that all three rates are higher in thereconstruction results of the larger simulation. The larger simulation generates more data andthe reconstruction algorithm has more to process, this means the reconstruction algorithm canmake more connections and make more errors. By setting the cutoff bound for the probabilityscore higher in the larger reconstruction we would get a lower false positive, reconstruction andrelationship revealing rate, and get a result closer to the smaller reconstruction. Hence, havinga larger number of iterations in the simulations only means longer computation time, and wewill therefore only run the reconstruction algorithm on the smaller simulation for the secondreconstruction test.In the second reconstruction test we look at how the coalition size affects the reconstructionresults, see Figure 4. The simulated network is always the same, we only changes the numberof jurisdictions cooperating for each run of the reconstruction algorithm. After the simulationalgorithm is completed the jurisdictions are sorted based on the amount of data they havestored, from big to small. The jurisdiction coalition chosen for the n ’th run of the reconstructionalgorithm is the first 5 n jurisdictions in the sorted list. This means that for each run of thereconstruction algorithm the coalition consists of the jurisdictions used in the previous run plus100 20 30 400204060 Number of jurisdictions P e r ce n t ag e ReconstructionRelationship revealingFalse positiveFigure 4: Reconstruction results for an increasing number of jurisdictions.five new ones. For each new run of the reconstruction algorithm the five new jurisdictions hasless and less data to contribute to the coalition, hence, all three rates flattens out and stabilizes.We can only speculate as to why the reconstruction rate peak at 65 percent and the rela-tionship rate peaks at 10 percent for the second reconstruction test. This is partly of how theTor network builds circuits and partly because of our implementation. If a Tor circuit’s trafficdoesn’t cross the border of a jurisdiction, then no data is recorded and can never be recon-structed. The jurisdictions used for the reconstruction simply does not observe enough data.Furthermore, we believe that each jurisdiction discards too much information when analyzingtheir own data. When we combine incoming and outgoing traffic we get a pile of leftovers, trafficthat has not been used to reconstruct. A better algorithm could possibly reduce the amount ofleftover traffic and find ways to use the leftovers.
To be able to reconstruct a Tor circuit the jurisdictional adversaries needs to observe trafficsent to and from the user and the website. If the traffic sent from the user to the guard node isdoes not cross a jurisdictional border then no traffic can be observed and the circuit can neverbe fully reconstructed, similar for when the exit node and the website communicates. If theuser specify its Tor circuit such that the traffic that crosses the jurisdictional borders is sentbetween onion routers then the adversaries cannot see the user or the website and are neverable to connect them.The following Tor circuit selection prevents the jurisdictional adversaries breaking the rela-tionship anonymity. A user U wants to connect to a website W . The user chooses the onionrouters as follows: the guard node G should be situated in the same jurisdiction as the user U ,the relay node R can be in any jurisdiction, and the exit node E should be located inside thesame jurisdiction as the website W . See Figure 5.Note that this path selections only avoids the jurisdictional adversaries, it is possible thatother types adversaries could break the relationship anonymity if the users use this path selec-tions. For example, an adversary corrupting single onion routers can see traffic sent inside ajurisdiction if a circuit visits a corrupted node, and can possibly see the user or website of thecircuit. 11 G R E W
Figure 5: Path selection where the jurisdictional adversaries are unable to connect the user U with the website W , since they can only observe traffic sent between the guard node G , therelay node R , and the exit node E . We claim that the best attack the jurisdictional adversary can do is to passively observe Tortraffic, and a coalition of jurisdictions are therefore a passive global adversary versus the Tornetwork.Tor uses a TLS connection between circuit nodes (except between the exit node and thewebsite), which provides confidentiality and message integrity [8] and implies that Tor is IND–CCA [23]. Any active attack against messages sent between circuit nodes will be detectedand prevented. The best attack the adversaries can do is to passively observe Tor traffic, andpossibly stop traffic.The jurisdictional adversaries indirectly monitor all onion routers inside its jurisdiction.If the jurisdictional adversaries cooperates in reconstructing Tor circuits they quickly becomeglobal, since they monitor a large portion of the onion router. In addition, a large set ofjurisdictions has the power to reveal the relationship of a circuit if they choose to do so.
References [1] Masoud Akhoondi, Curtis Yu, and Harsha V. Madhyastha. LASTor: A Low-latency AS-aware Tor Client.
IEEE/ACM Trans. Netw. , 22(6):1742–1755, December 2014.[2] Michael Backes, Aniket Kate, Praveen Manoharan, Sebastian Meiser, and Esfandiar Mo-hammadi. AnoA: A framework for analyzing anonymous communication protocols. In
Proceedings of the 2013 IEEE 26th Computer Security Foundations Symposium , CSF ’13,pages 163–178, Washington, DC, USA, 2013. IEEE Computer Society.[3] Michael Backes, Aniket Kate, Sebastian Meiser, and Esfandiar Mohammadi. (Nothing else)MATor(s): Monitoring the Anonymity of Tor’s Path Selection. In
Proceedings of the 2014ACM SIGSAC Conference on Computer and Communications Security , CCS ’14, pages513–524, New York, NY, USA, 2014. ACM.[4] George Dean Bissias, Marc Liberatore, David Jensen, and Brian Neil Levine. Privacyvulnerabilities in encrypted http streams. In
Proceedings of the 5th International Confer-ence on Privacy Enhancing Technologies , PET’05, pages 1–11, Berlin, Heidelberg, 2006.Springer-Verlag.[5] Avrim Blum, Dawn Song, and Shobha Venkataraman. Detection of interactive steppingstones: Algorithms and confidence bounds. In Erland Jonsson, Alfonso Valdes, and MagnusAlmgren, editors,
Recent Advances in Intrusion Detection , pages 258–277, Berlin, Heidel-berg, 2004. Springer Berlin Heidelberg.[6] Conseil d’Etat. Loi no. 2015-912 du 24 juillet 2015. Le livre VIII “Du renseignement”,2015. 127] Thomas Demuth. A passive attack on the privacy of web users using standard log informa-tion. In Roger Dingledine and Paul Syverson, editors,
Proceedings of Privacy EnhancingTechnologies workshop (PET 2002) . Springer-Verlag, LNCS 2482, April 2002.[8] T. Dierks and E. Rescorla. The transport layer security (TLS) protocol version 1.2. RFC5246, RFC Editor, August 2008. .[9] Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor: The second-generation onionrouter. In
Proceedings of the 13th Conference on USENIX Security Symposium - Volume13 , SSYM’04, pages 21–21, Berkeley, CA, USA, 2004. USENIX Association.[10] Kevin P. Dyer, Scott E. Coull, Thomas Ristenpart, and Thomas Shrimpton. Peek-a-boo, istill see you: Why efficient traffic analysis countermeasures fail. In
Proceedings of the 2012IEEE Symposium on Security and Privacy , SP ’12, pages 332–346, Washington, DC, USA,2012. IEEE Computer Society.[11] Digitalt grenseforsvar. https://forsvaret.no/etjenesten/dgf . Accessed: 2017-10-04.[12] Matthew Edman and Paul Syverson. As-awareness in Tor path selection. In
Proceedingsof the 16th ACM Conference on Computer and Communications Security , CCS ’09, pages380–389, New York, NY, USA, 2009. ACM.[13] Nathan S. Evans, Roger Dingledine, and Christian Grothoff. A Practical Congestion Attackon Tor Using Long Paths. In
Proceedings of the 18th Conference on USENIX SecuritySymposium , SSYM’09, pages 33–50, Berkeley, CA, USA, 2009. USENIX Association.[14] Nick Feamster and Roger Dingledine. Location diversity in anonymity networks. In
Pro-ceedings of the 2004 ACM Workshop on Privacy in the Electronic Society , WPES ’04, pages66–76, New York, NY, USA, 2004. ACM.[15] Joan Feigenbaum, Aaron Johnson, and Paul Syverson. A model of onion routing withprovable anonymity. In
Proceedings of the 11th International Conference on FinancialCryptography and 1st International Conference on Usable Security , FC’07/USEC’07, pages57–71, Berlin, Heidelberg, 2007. Springer-Verlag.[16] Joan Feigenbaum, Aaron Johnson, and Paul Syverson. Probabilistic analysis of onionrouting in a black-box model.
ACM Trans. Inf. Syst. Secur. , 15(3):14:1–14:28, November2012.[17] F¨orsvarsdepartementet. lag (2008:717), 2008.[18] Forsvarsminister. Lov nr. 602 af 12-06-2013, 2013.[19] Nethanel Gelernter and Amir Herzberg. On the limits of provable anonymity. In
Proceedingsof the 12th ACM Workshop on Workshop on Privacy in the Electronic Society , WPES ’13,pages 225–236, New York, NY, USA, 2013. ACM.[20] Alejandro Hevia and Daniele Micciancio. An indistinguishability-based characterizationof anonymous channels. In Nikita Borisov and Ian Goldberg, editors,
Privacy EnhancingTechnologies: 8th International Symposium, PETS 2008 Leuven, Belgium, July 23-25,2008 Proceedings , pages 24–43, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.[21] Andrew Hintz. Fingerprinting websites using traffic analysis. In
Proceedings of the 2nd In-ternational Conference on Privacy Enhancing Technologies , PET’02, pages 171–178, Berlin,Heidelberg, 2003. Springer-Verlag. 1322] Aaron Johnson, Chris Wacek, Rob Jansen, Micah Sherr, and Paul Syverson. Users getrouted: Traffic correlation on Tor by realistic adversaries. In
Proceedings of the 2013ACM SIGSAC Conference on Computer & , CCS ’13, pages337–348, New York, NY, USA, 2013. ACM.[23] Jonathan Katz and Moti Yung. Unforgeable encryption and chosen ciphertext secure modesof operation. In Gerhard Goos, Juris Hartmanis, Jan van Leeuwen, and Bruce Schneier,editors,
Fast Software Encryption: 7th International Workshop, FSE 2000 New York, NY,USA, April 10–12, 2000 Proceedings , pages 284–299, Berlin, Heidelberg, 2001. SpringerBerlin Heidelberg.[24] Marc Liberatore and Brian Neil Levine. Inferring the source of encrypted http connections.In
Proceedings of the 13th ACM Conference on Computer and Communications Security ,CCS ’06, pages 255–263, New York, NY, USA, 2006. ACM.[25] Steven J. Murdoch and George Danezis. Low-cost traffic analysis of Tor. In
Proceedings ofthe 2005 IEEE Symposium on Security and Privacy , SP ’05, pages 183–195, Washington,DC, USA, 2005. IEEE Computer Society.[26] Steven J. Murdoch and Piotr Zieli´nski. Sampled traffic analysis by internet-exchange-leveladversaries. In
Proceedings of the 7th International Conference on Privacy EnhancingTechnologies , PET’07, pages 167–183, Berlin, Heidelberg, 2007. Springer-Verlag.[27] Rishab Nithyanand, Oleksii Starov, Phillipa Gill, Adva Zair, and Michael Schapira. Mea-suring and Mitigating AS-level Adversaries Against Tor. In
Proceedings of the Network andDistributed Security Symposium - NDSS ’16 . Internet Society, February 2016.[28] Lasse Øverlier and Paul Syverson. Locating hidden servers. In
Proceedings of the 2006IEEE Symposium on Security and Privacy , SP ’06, pages 100–114, Washington, DC, USA,2006. IEEE Computer Society.[29] Parliament of the United Kingdom. Investigatory powers act 2016, 2016.[30] Andreas Pfitzmann and Marit Hansen. A terminology for talking about privacy by dataminimization: Anonymity, unlinkability, undetectability, unobservability, pseudonymity,and identity management, 2010.[31] shadow-plugin-tor wiki. https://github.com/shadow/shadow-plugin-tor/wiki . Ac-cessed: 2018-02-19.[32] Submarine Cable Map. .Accessed: 2018-10-25.[33] Qixiang Sun, Daniel R. Simon, Yi-Min Wang, Wilf Russell, Venkata N. Padmanabhan,and Lili Qiu. Statistical identification of encrypted web browsing traffic. In
Proceedings ofthe 2002 IEEE Symposium on Security and Privacy , SP ’02, Washington, DC, USA, 2002.IEEE Computer Society.[34] Yixin Sun, Anne Edmundson, Laurent Vanbever, Oscar Li, Jennifer Rexford, Mung Chiang,and Prateek Mittal. RAPTOR: Routing attacks on privacy in Tor. In
Proceedings of the24th USENIX Security Symposium , August 2015.[35] The Tor path simulator, torps. https://github.com/torps/torps . Accessed: 2018-06-07.[36] Tor Metrics. https://metrics.torproject.org/ . Accessed: 2018-10-23.1437] Chris Wacek, Henry Tan, Kevin S. Bauer, and Micah Sherr. An empirical evaluation ofrelay selection in Tor. In
NDSS , 2013.[38] Xinyuan Wang, Douglas S. Reeves, and S. Felix Wu. Inter-packet delay based correlationfor tracing encrypted connections through stepping stones. In
Proceedings of ESORICS2002 , pages 244–263, October 2002.[39] C. V. Wright, L. Ballard, S. E. Coull, F. Monrose, and G. M. Masson. Spot me if you can:Uncovering spoken phrases in encrypted voip conversations. In , pages 35–49, May 2008.15
Simulation and reconstruction parameters
Table 1: Parameters and data of the simulations used in the results.Iterations Packets sent Data storedSmall simulation 1000 14 , ,
560 0 .
69 GBLarge simulation 10000 141 , ,
190 5 .
16 GBTable 2: Parameters and data of the reconstruction algorithm.Probability score lower boundJurisdictions