[PDF] A Real-time Defense against Website Fingerprinting Attacks

Abstract

Anonymity systems like Tor are vulnerable to Website Fingerprinting (WF) attacks, where a local passive eavesdropper infers the victim's activity. Current WF attacks based on deep learning classifiers have successfully overcome numerous proposed defenses. While recent defenses leveraging adversarial examples offer promise, these adversarial examples can only be computed after the network session has concluded, thus offer users little protection in practical settings. We propose Dolos, a system that modifies user network traffic in real time to successfully evade WF attacks. Dolos injects dummy packets into traffic traces by computing input-agnostic adversarial patches that disrupt deep learning classifiers used in WF attacks. Patches are then applied to alter and protect user traffic in real time. Importantly, these patches are parameterized by a user-side secret, ensuring that attackers cannot use adversarial training to defeat Dolos. We experimentally demonstrate that Dolos provides 94+% protection against state-of-the-art WF attacks under a variety of settings. Against prior defenses, Dolos outperforms in terms of higher protection performance and lower information leakage and bandwidth overhead. Finally, we show that Dolos is robust against a variety of adaptive countermeasures to detect or disrupt the defense.

Full PDF

aa r X i v : . [ c s . CR ] F e b A Real-time Defense against Website Fingerprinting Attacks

Shawn Shan

University of Chicago

Arjun Nitin Bhagoji

University of Chicago

Haitao Zheng

University of Chicago

Ben Y. Zhao

University of Chicago

Abstract

Anonymity systems like Tor are vulnerable to Website Fin-gerprinting (WF) attacks, where a local passive eavesdrop-per infers the victim’s activity. Current WF attacks basedon deep learning classiﬁers have successfully overcome nu-merous proposed defenses. While recent defenses leveragingadversarial examples offer promise, these adversarial exam-ples can only be computed after the network session has con-cluded, thus offer users little protection in practical settings.We propose

Dolos , a system that modiﬁes user networktrafﬁc in real time to successfully evade WF attacks.

Do-los injects dummy packets into trafﬁc traces by computing input-agnostic adversarial patches that disrupt deep learningclassiﬁers used in WF attacks. Patches are then applied toalter and protect user trafﬁc in real time. Importantly, thesepatches are parameterized by a user-side secret, ensuring thatattackers cannot use adversarial training to defeat

Dolos . Weexperimentally demonstrate that

Dolos provides 94+% pro-tection against state-of-the-art WF attacks under a variety ofsettings. Against prior defenses,

Dolos outperforms in termsof higher protection performance and lower information leak-age and bandwidth overhead. Finally, we show that

Dolos isrobust against a variety of adaptive countermeasures to detector disrupt the defense.

Website-ﬁngerprinting (WF) attacks are trafﬁc analysis at-tacks that allow eavesdroppers to identify websites visitedby a user, despite their use of privacy tools such as VPNsor the Tor anonymity system [30, 74]. The attacker identiﬁeswebpages in an encrypted connection by analyzing and rec-ognizing network trafﬁc patterns. These attacks have grownmore powerful over time, improving in accuracy and scale.The most recent variants can overwhelm existing defenses,by training deep neural network (DNN) classiﬁers to iden-tify the destination website given a network trace. In realworld settings, WF attacks have proven effective in identi-fying traces in the wild from a large number of candidatewebsites using limited data [7, 66]. There is a long list of defenses that have been proposedand then later defeated by DNN based WF attacks. First, aclass of defenses obfuscate traces by introducing random-ness [23, 26, 36, 75]. These obfuscation-based defenses havebeen proven ineffective (< 60% protection) against DNNbased attacks [7, 65]. Other defenses have proposed random-izing HTTP requests [17] or scattering trafﬁc redirectionacross different Tor nodes [21,31]. Again, these defenses pro-vide poor protection against DNN-based attacks (<50% pro-tection). Unsurprisingly, the success of DNN attacks have de-railed efforts to deploy WF defenses on Tor (e.g., Tor stoppedthe implementation of WTF-PAD after it was broken by theDF attack [55, 65])Against these strong DNN attacks, the only defenses toshow promise are recent proposals that apply adversarial ex-amples to mislead WF classiﬁcation models [34, 58]. Adver-sarial examples are known weaknesses of DNNs, where asmall, carefully tuned change to an input can dramaticallyalter the DNN’s output. These vulnerabilities have been stud-ied intensely in the adversarial ML community, and now aregenerally regarded a fundamental property of DNNs [35,63].Unfortunately, defenses based on adversarial exampleshave one glaring limitation. To be effective, adversarial ex-amples must be crafted individually for each input [48]. Inthe WF context, the “input” is the entire network trafﬁc trace.Thus a defense built using adversarial examples requires theentire trafﬁc trace to be completed, before it can compute theprecise perturbation necessary to mislead the attacker’s WFclassiﬁer. This is problematic, since real world attackers willobserve user trafﬁc in real time, and be unaffected by a de-fense that can only take action after the fact.In this paper, we propose

Dolos , a practical and effectivedefense against WF attacks that can be applied to networktrafﬁc in real-time. The key insight in

Dolos is the applica-tion of the concept of trace-agnostic patches to WF defenses,derived from the concept of adversarial patches. While ad-versarial examples are customized for each input, adversar-ial patches can cause misclassiﬁcations when applied to awide range of input values. In the context of a WF defense,1 -1 1 -1 -1User NetworkTrace

Web Server

Attacker eavesdropsuser’s traffic

PredictedLabel

Attacker’s ClassifierTor UserVisit Website W 1 -1 1 -1 -1

Figure 1: A WF attacker, positioned between user and the Tornetwork, eavesdrops on user network trafﬁc. After the con-nection terminates, the attacker classiﬁes the entire networktrace using a pretrained WF attack classiﬁer.“patches” are pre-computed sequences of dummy packetsthat protect all network traces of visits to a speciﬁc website.A patch is applied to an active, ongoing network connection,i.e. in “real time.” As we will show,

Dolos generates patchesparameterized by a user-side secret such as private keys. Un-like traditional patches or universal perturbations, patches pa-rameterized by a user’s secret cannot be overcome by attack-ers, unless they compromise the user’s secret.Our work describes experiences in designing and evaluat-ing

Dolos , and makes four key contributions:• We propose

Dolos , a new WF defense that is highly ef-fective against the strongest attacks using deep learningclassiﬁers. More importantly,

Dolos precomputes patchesbefore the network connection, and applies the patch toprotect the user in real time.• We introduce a secret-based parameterization mechanismfor adversarial patches, which allows a user to generaterandomized patches that cannot be correctly guessed with-out knowledge of the user-side secret. We show that thisgives

Dolos strong resistance to WF attacks that apply ad-versarial training with known adversarial patches and per-turbations, which can otherwise defeat patches and univer-sal perturbations [50].• We evaluate

Dolos under a variety of settings. Regardlessof how the attacker trains its classiﬁer (handcrafted fea-tures, deep learning, or adversarial training),

Dolos pro-vides 94+% protection against the attack. Compared tothree state-of-the-art defenses [26, 36, 58],

Dolos signiﬁ-cantly outperforms all three in key metrics: overhead, pro-tection performance, and information leakage.• Finally, we consider attackers with full knowledge of ourdefense (i.e., full access to source code but not the user se-cret), and demonstrate that

Dolos is robust against multipleadaptive attacks and countermeasures.

In WF attacks, an attacker tries to identify the destination ofuser trafﬁc routed through an encrypted connection to Toror a VPN. The attacker passively eavesdrops on the user connection, and once the session is complete, feeds the net-work trace as input to a machine learning classiﬁer to iden-tify the website visited. Even though packets across the con-nection are encrypted and padded to the same size, attackerscan distinguish traces as sequences of packets and their direc-tion. The attacker’s machine learning classiﬁer is trained onpacket traces generated by visiting a large, pre-determinedset of websites beforehand.

Attacks via Hand-crafted Features.

Panchenko et al. [53] proposed the ﬁrst effective WF attack against Tor trafﬁcusing a support vector machine with hand-crafted features.Follow-up work proposed stronger attacks [11, 30, 32, 43, 44,52, 70, 74] by improving the feature set and using differentclassiﬁer architectures. The most effective attacks based onhand crafted features such as k-NN [74], CUMUL [52] , andk-FP [30] achieve over 90% accuracy in identifying websitesbased on network traces.

Attacks via DNN Classiﬁers.

Recent work [1, 7, 62,65] leverages deep neural networks (DNNs) to performmore powerful WF attacks. DNNs automatically extractfeatures from raw network traces, and outperform previ-ous WF attacks based on hand-crafted features. Two of themost successful DNN-based attacks are

Deep Fingerprint-ing (DF) [65] and

VarCNN [7]. DF leverages a deep con-volutional neural network for classiﬁcation, and reaches over98% accuracy on undefended traces, or traces defended usingexisting defenses (§2.2). Var-CNN [7] further improves theattack performance by using a large residual network archi-tecture, and can achieve high performance even with limitedtraining data. Later in §6, our experiments show that

Dolos is effective at defending against both of these state-of-the-artattacks, as well as against those using hand-crafted features.

WF defenses modify (add, delay, or reroute) packets to pre-vent the identiﬁcation of the destination website using traceanalysis. Broadly speaking, defenses either obfuscate tracesbased on expert-designed noise heuristics or leverage adver-sarial perturbations to evade machine learning classiﬁers.

Defenses via Trace Obfuscation.

A number of defensesaim to obfuscate traces to increase the difﬁculty of classiﬁca-tion. This obfuscation is performed either at the applicationlayer or the network layer. Since these defenses do not specif-ically target the attacker’s classiﬁer, they generally afford lowprotection (<60%) against state-of-the-art WF attacks.In application layer obfuscation , the defender introducesrandomness into HTTP requests or the Tor routing algorithm(e.g., [18,21,31]). Application layer defenses generally makestrong assumptions that are often unrealistic in practice, suchas target websites implementing customized HTTP proto-cols [18] or only allowing attackers to observe trafﬁc at asingle Tor entry node [21, 31]. These defenses provide less2han 60% protection against DNN-based attacks such as DF and Var-CNN [7, 65].In network layer obfuscation , the defender inserts dummypackets into network traces to make website identiﬁcationmore challenging. First, early defenses [9, 10, 23] usedconstant-rate padding to reduce the information leakagecaused by time gaps and trafﬁc volume. However, thesemethods led to large bandwidth overhead ( > supersequence , which isa longer packet trace that contains subsequences of differ-ent websites’ traces. The strongest supersequence defense,Walkie-Talkie, achieves 50% protection against DNN attacks.Overall, against strong WF attacks, network layer obfusca-tion defenses either induce extremely large overheads (>100%) or offer low protection (< 60%). Defenses via Adversarial Perturbation.

Goodfellow etal. ﬁrst proposed evasion attacks against DNNs, where an at-tacker causes a DNN to misclassify inputs by adding small, adversarial perturbations [27] to them. Such attacks havebeen widely studied in many domains, e.g., computer vi-sion [13, 15, 16, 39, 72], natural language processing [24, 83],and malware detection [28,68]. Recent WF defenses [34,58]use adversarial perturbations to defeat DNN-based attacks.The challenge facing adversarial perturbation-based WFdefense is that adversarial perturbations are computed withrespect to a given input, which in the WF context is the fullnetwork trace. Thus, computing the adversarial perturbationnecessary to protect a network connection requires the de-fender to know the entire trace before the connection is evenmade. This limitation renders the defense impractical to pro-tect user traces in real-time.Mockingbird [58] suggests that we can use a database ofrecent traces to compute perturbations, and use them on newtraces. Yet it is widely accepted in ML literature that adver-sarial perturbations are input speciﬁc and rarely transfer [48].We conﬁrm this experimentally using Mockingbird’s pub-licly released code: for the same website, perturbations cal-culated on one trace offer an average of 18% protection to adifferent trace from the same website. We tested 10 pairs oftraces per website, over 10 websites randomly sampled fromthe same dataset used by Mockingbird [58].In a concurrent manuscript, Nasr et al. [50] propose solv-ing this limitation by precomputing universal adversarialperturbations for unseen traces, enabling the protection oflive traces. However, an attacker aware of this defense canalso compute these universal perturbations, and adversariallytrain their models against them to improve robustness. This

Trace 1Trace 2Trace 3Trace 4

Figure 2: Example traces observed by a WF attacker when aTor user u visits the same website. Each sample trace ( x u )records the packet direction (black bar: incoming packet,white bar: outgoing packet) of the ﬁrst 500 packets in thissession. We see that while the Tor user visits the same web-site in these four sessions, the traces observed by the attackervary across sessions.countermeasure causes a signiﬁcant drop in its protection to76%. We show that our defense outperforms [50](§6.4). We consider the problem of defending against website ﬁn-gerprinting attacks. A user u wishes to use the Tor networkto visit websites privately. An attacker, after eavesdroppingon u ’s trafﬁc and collecting a trace of u ’s website visit ( x u ),attempts to use x u to determine (or classify) the destinationwebsite that u has just visited. To defend against such attacks,the defender seeks to inject “obfuscation” trafﬁc into the Tornetwork, such that the trafﬁc trace observed by the attacker( ˆ x u ) will lead to a wrong classiﬁcation result. Threat Model.

We use the same threat model adopted byexisting WF attacks and defenses.• The attacker, positioned between the user and Tor guardnodes, can only observe the user’s network trafﬁc but notmodify it. Furthermore, the attacker can tap into user con-nections over a period of time and may see trafﬁc on mul-tiple website requests.• We consider WF attacks that operate on packet direc-tions . This assumption is consistent with many previousdefenses [21, 31, 34, 58, 75]. For each u ’s website session,the attacker collects its trace x u as a sequence of packetdirections (i.e., marking each outgoing packet as +1 andeach incoming packet as -1). Figure 2 shows four exam-ples when a Tor user visits the same website at differ-ent times. While the Tor user visits the same website inthese four sessions, the traces observed by the attackervary across sessions.• We consider closed-world WF attacks where the user onlyvisits a set of websites known to the attacker. These at-tacks are strictly stronger than open-world attacks where Since Tor’s trafﬁc is encrypted and padded to the same size [32], onlypacket directions and time gaps will leak information about the destinationwebsite.

Defender Capabilities.

We make two assumptions aboutthe defender.• The defender can actively inject dummy packets into usertrafﬁc. With cooperation from both the user and a node inthe Tor circuit (i.e., a Tor bridge ), the defender can injectpackets in both directions (outgoing and incoming).• The defender has no knowledge of the WF classiﬁer usedby the attacker. Success Metrics.

To successfully protect users against WFattacks, a WF defense needs to meet multiple criteria. First, itshould successfully defend the most powerful known attacks(i.e., those based on DNNs), and reduce their attack successto near zero. Second, it should do so without adding unrea-sonable overhead to users’ network trafﬁc. Third, it needs tobe effective in realistic deployments where users cannot pre-dict the order of packets in a real-time network ﬂow. Finally,a defense should be robust to adaptive attacks designed withfull knowledge of the defense (i.e., with source code access).Earlier in §2.2 we provide a detailed summary of existingWF defenses and their limitations with respect to the abovesuccess metrics.

In this paper, we present

Dolos , a new WF defense that ex-ploits the inherent weaknesses of deep learning classiﬁersto provide a highly effective defense that protects networktraces in real time, resists a variety of countermeasures, andincurs low overhead compared to existing defenses. In thefollowing, we describe the key concepts behind

Dolos andits design considerations. We present the detailed design of

Dolos later in §5.

Our new WF defense is inspired by the novel concept of ad-versarial patches [8] in computer vision.Adversarial patches are a special form of artifacts, whichwhen added to the input of a deep learning classiﬁer, cause aninput to be misclassiﬁed. Adversarial patches differ from ad-versarial perturbations in that patches are both input-agnostic and location-agnostic , i.e., a patch causes misclassiﬁcation Tor bridges are often used to generate incoming packets for WF de-fenses [26, 36, 50, 58].

Original Image Adversarial Perturbation Adversarial Patch

Figure 3: Sample images showing the difference between ad-versarial perturbations and adversarial patches.when applied to an input, regardless of the value of the inputor the location where the patch is applied . Thus adversar-ial patches are “universal,” and can be pre-computed withoutfull knowledge of the input. Existing works have already de-veloped adversarial patches to bypass facial recognition clas-siﬁers [67, 73, 80].In computer vision, an adversarial patch is formed as aﬁxed size pattern on the image [8]. Figure 3 shows an ex-ample of an adversarial patch next to an example of adver-sarial perturbation. Given knowledge of the target classiﬁer,one can search for adversarial patches under speciﬁc con-straints such as patch color or intensity. For instance, for aDNN model ( F ), an targeted adversarial patch p adv is com-puted via the following optimization: p adv = argmin p E x ∈ X , l ∈ L loss ( F ( Π ( p , x , l )) , y t ) (1)where X is the set of training images, L is a distribution oflocations in the image, y t is the target label, and Π is the func-tion that applies the patch to a random location of an image.This optimization is performed over all training images inorder to make the patch effective across images. We propose to defend against WF attacks by adding adver-sarial trafﬁc patches to user trafﬁc traces, causing attackersto misclassify destination websites. To the best of our knowl-edge, our work is the ﬁrst to use adversarial patches to defendagainst WF attacks.

Trace-agnostic.

This is the key property of adversarialpatches and why they are a natural defense against WF at-tacks. Given a user u and a website W , one can design a patchthat works on any network trace produced when u visits W .We note that like adversarial perturbations, patches can becomputed as the solution to a constrained optimization prob-lem. Empirical studies have not found any limitations in thenumber of unique patches that can be computed for any tar-geted misclassiﬁcation task.Leveraging this property, the defender can pre-compute,for u , a set of W -speciﬁc adversarial patches. Once u starts to We note that while input-dependent patches have been considered [37],they are largely utilized and analyzed in an input agnostic context. W , the defender fetches a pre-computed patchand injects, in real-time, the corresponding dummy packetsinto u ’s live trafﬁc. Furthermore, since patches are built us-ing diverse training data, they are inherently robust againstmoderate levels of website updates and/or network dynamics.The defender periodically re-computes new patches or whenthey detect signiﬁcant changes in website content and/or usernetworking environments. Here, we describe new design considerations that arise fromapplying adversarial patches to network trafﬁc traces.For a speciﬁc user/website pair (user u , website W ), the de-fender ﬁrst uses some sample traces of u visiting W to com-pute a patch p for the pair. At run-time, when u initiates aconnection to W , the defender fetches p and follows the cor-responding insertion schedule to add dummy packets into u ’sTor trafﬁc. No original packets are dropped or modiﬁed.Therefore, to generate adversarial patches for trafﬁc traces,we follow the optimization process deﬁned by Eq. (1), butchange the Π function for patch injection. Note that thepatches are generated for each speciﬁc user and website pair( u , W ). Another key change is the “perturbation budget” (i.e.,the maximum changes to the input), which is now deﬁned asthe bandwidth overhead introduced by those dummy pack-ets. Patches for images are often limited by a small pertur-bation budget (e.g., < + % bandwidth over-head [55, 65]. Strong Model Transferability of Patches.

Here modeltransferability refers to the well-known phenomenon thatML classiﬁers trained for similar tasks share similar behav-iors and vulnerabilities, even if they are trained on differentarchitectures or training data [82]. Existing work has shownthat when applied to an input, an adversarial perturbation orpatch computed for a given DNN model will transfer acrossmodels [22, 56, 69]. In addition, a perturbed or patched in-put will also transfer to non-DNN models such as SVM andrandom forests [14,22]. The level of transferability is particu-larly strong for large perturbation sizes [22,56], as is the casefor adversarial patches for network trafﬁc.Leveraging this strong model transferability, we can builda practical WF patch without knowing the actual classiﬁerused by WF attackers. The defender can compute adversar-ial patches using local WF models, and they should succeedagainst attacks using other WF classiﬁers.

A standard response by WF attackers to our patch-based de-fenses is to take patches generated by the defense, build and label patched user traces, and use these traces to (re)trainan attack classiﬁer to bypass the defense. In the adversar-ial ML literature, this is termed “adversarial training,” andis regarded as the most reliable defense against adversar-ial attacks, including adversarial perturbations, adversarialpatches, and universal perturbations.In our experiments, we ﬁnd that existing WF defenses failunder such attacks. For [21], protection drops to 55%. Note-worthy is the fact that [50], which uses universal perturba-tions, is particularly vulnerable to this attack. An attackeraware of the defense can compute the universal perturbationthemselves, and use it for adversarial training. Our resultsshow that the protection rate of [50] under adversarial train-ing drops dramatically from 92% to 16%.To resist adaptive attackers using adversarial training, wepropose a novel secret-based patch generation mechanismthat makes it nearly impossible for the attacker to reproducethe same patch as the defender. Speciﬁcally, the defender ﬁrstcomputes a user-side secret S based on private keys, nonces,and website-speciﬁc identiﬁers, and then uses it to parameter-ize the optimization process of patch generation. The resultis that different secrets will generate signiﬁcantly differentpatches. When applied to the same network trace, the result-ing patched traces will also display signiﬁcantly disjoint rep-resentations in both input and feature spaces. Without accessto the user-side secret, adversarial training using patches gen-erated by the WF attacker will have little effect, and Dolos will continue to protect users from the WF attack. Dolos

In this section we present the design of

Dolos , starting froman overview, followed by detailed descriptions of its two keycomponents: patch generation and patch injection.

Consider the task to protect a user u ’s visits to website W . Dolos implements this protection by injecting an adversarialpatch p W , T to u ’s live trafﬁc when visiting W , such that whenthe (defended) network trace is analyzed by a WF attacker,its classiﬁer will conclude that u is visiting T (a website dif-ferent from W ). Here T is a conﬁgurable defense parameter. Dolos includes two key steps: patch generation to com-pute an adversarial trafﬁc patch ( p W , T ) and patch injection toinject a pre-computed patch into u ’s live trafﬁc as u is visiting W . This is also shown in Figure 4.To generate a patch, Dolos inspects the WF feature space,and searches for potential adversarial patches that can effec-tively “move” the feature representation of a patched traceof u visiting W close to the feature representation of the (un-patched) traces of u visiting T . When these two feature rep-resentations are sufﬁciently similar, WF attacker classiﬁers5 olos Feature Extractor:

Patch I n s e r ti on S c h e du l e

1) Patch generation

Tor user

Dolos

Time

Insert dummy packets

2) Patch injection in real-time

Internet

Current Time

User-side Secret

Live packets

Figure 4: Our proposed

Dolos system that protects a user u from WF attacks. First , Dolos precomputes a patch and its packetinsertion schedule to protect u ’s visits to website W , using u ’s secret and a feature extractor. Second , when u is visiting website W , Dolos defends user’s traces in real-time by inserting dummy packets according to the precomputed patch and schedule.will identify the patched traces of u visiting W as a trace vis-iting T .The above optimization can be formulated as follows: p W , T = argmin p E x ∈ X W , x ′ ∈ X T , s ∈ S D ( Φ ( Π ( p , x , s )) , Φ ( x ′ )) subject to | p | ≤ p budget (2)where p budget deﬁnes the maximum patch overhead, X W ( X T )deﬁnes a collection of unpatched instances of u visiting W ( T ), S deﬁnes the set of feasible schedules to inject a patchinto live trafﬁc, and Π ( p , x , s ) deﬁnes the patch injection function that injects a patch p onto the live trafﬁc x under aschedule s . Finally, Φ ( · ) refers to the local WF feature extrac-tor used by Dolos while D is a feature distance measure ( ℓ norm in our implementation). Figure 5 provides an abstractillustration of the patch and the injection results. Randomized Design.

To prevent attackers from extractingor reverse-engineering our patches, we design

Dolos to incor-porate randomization into both patch generation and injec-tion. In §5.2 and §5.3, we describe how

Dolos uses user-sidesecret to conﬁgure T , S , Π ( p , x , s ) and the optimization pro-cess of Eq. (2) to implement patches that are robust againstadaptive attacks. Choosing Φ ( · ) . As discussed earlier,

Dolos does not as-sume knowledge of the WF classiﬁers used by the attack-ers. Instead,

Dolos operates directly on the feature space anduses a feature extractor Φ , basically a partial neural networktrained on the same or similar task. Dolos can train Φ locallyor use a pre-trained WF classiﬁer from a trusted party (e.g.,Tor). Given an input x , Dolos uses the outputs of an interme-diate layer of Φ as x ’s feature vector that quantiﬁes distancein the feature space. A well-trained Φ can help Dolos toler-ate web content dynamics and can function well with a widerrange of websites both known and unknown [62].

The patch generation follows the optimization process de-ﬁned by eq. (2). Here a novel contribution of

Dolos is touse a secret to conﬁgure the patch generation process, so thatthe resulting patches are strictly conditioned on this secret.The means that when we apply patches generated with differ-ent secrets to the same original trace, the resulting patched

Original TracePatched Trace (schedule 1)PatchPatched Trace (schedule 2)

Figure 5: An abstract illustration of adversarial trafﬁc patchand how it is injected into user trafﬁc traces. Black/whitebars mark the packet directions (out/in) of original packets,red/blue mark the (out/in) directions of dummy packets (i.e.,the patch).

Dolos injects the patch into a live user trace fol-lowing a speciﬁc schedule. Here we show the results whenthe same patch is injected using two different schedules (1 &2) and random packet ﬂipping (see §5.3).traces will be signiﬁcantly different in both input and WFfeature spaces. The secret can be a one way hash of a user’sprivate key, a time-based nonce, and an optional website spe-ciﬁc modiﬁer. This secret allows

Dolos to compute multiple,distinct patches based on speciﬁc users, destination websites,and time. A defender can periodically recompute patcheswith updated nonces to prevent any longitudinal attacks thattry to identify a common patch across multiple connectionsto the same website. This controlled randomization preventsattackers from observing the true distribution of the patchedtraces over time or multiple traces.

Parameterized Patch Generation.

When generating apatch,

Dolos uses a secret S to determine T (the target web-site) and the exact number of dummy packets to be injected,i.e., the length of the patch | p W , T | . Choosing T : Dolos collects a large pool of candidate tar-get websites (400 ,

000 in our implementation) from the Inter-net. To protect user u ’s visit to W , Dolos uses u ’s secret S to“randomly” select a website from the candidate pool, whosefeature representation is far from that of the original trace,i.e., E x ∈ X W Φ ( x ) is largely dissimilar from E x ∈ X T Φ ( x ) . Forour implementation, we ﬁrst calculate the feature distance ( ℓ norm) between W and each candidate in our large pool, iden-tify the top 75 th percentile as a reduced candidate pool for W ,and use the secret S as a random seed to select one from the6educed pool. Choosing | p W , T | : To further obfuscate the appearance ofour patches, we use the same secret S to select the patchlength (i.e., the number of dummy packets to be injected intothe user trafﬁc). Speciﬁcally, given p budget (the maximumoverhead ratio), we “randomly” pick a patch length between ( p budget − ε , p budget ) where ε is a defense parameter (0 . Patch Optimization.

When solving the patch optimiza-tion problem deﬁned by eq. (2), we use Stochastic GradientDescent (SGD) [64] by batching samples from X W . We notethat in order to apply SGD, we need to relax the constraintson p W , T and allow it to lie in [ − , ] n , i.e., we allow for con-tinuous values between − + − +

1. This method is widelyused to solve discrete optimization problems in domains likenatural language processing [54, 61, 76] and malware clas-siﬁcation [38], which avoids the intractability of combina-torial optimization [40]. Finally, we note that the optimiza-tion also takes into account the patch injection process (e.g., Π ( p , x , s ) ), which we describe next in §5.3. Managing Secrets.

Note that

Dolos manages secrets andtheir original components (private keys, nonces, website spe-ciﬁc identiﬁers) following the standard private key manage-ment recommendations [5]. Secrets are only stored on user’sdevice and updated periodically.

The patch injection component has two goals: 1) making thepatch input-agnostic and location-agnostic, in order to de-ploy our WF defense on live trafﬁc, and 2) obfuscating thepatched traces at run-time to prevent attackers from detect-ing/removing patches from the observed traces (to recoverthe original trace), or dismantling patches via trace segmen-tation to reduce its coverage/effectiveness.Existing solutions (used by adversarial patches for images)do not meet these goals. They simply inject a given patch (asa ﬁxed block) at any location of the trace. Under our problemcontext, an attacker can easily recognize the ﬁxed pattern byobserving the user traces over a period of time. Instead, wepropose to combine a segment-based patch injection methodwith run-time packet-level obfuscation.

Segment-based Patch Injection.

A pre-computed patch p is a sequence of dummy packets (+1s and -1s) designed to beinjected into the user’s live trafﬁc x . We ﬁrst break p into mul-tiple segments of equal size M p (e.g., 30 packets), referred toas “mini-patches.” Each mini-patch is assigned to protect asegment of x of size M x (e.g., 100 packets). We conﬁgure thepatch generation process to ensure that, within each segmentof x , the corresponding mini-patch stays as a ﬁxed block.Patches are location-agnostic , so they will produce the same effect regardless of location within the segment. Therefore,given M x and M p , the injection function Π ( p , x , s ) in Eq. (2)depends on s , an injection schedule that deﬁnes the randomlychosen location of each mini-patch within x ’s segments (seeFigure 5). S deﬁnes all possible sets of mini-patch locations.An advantage of splitting up the patch into mini-patches isthat it protects against attackers trying to infer the website bysearching for subsequences of packets. Our results conﬁrmthis hypothesis later in §7.1. Run-time Patch Obfuscation using Packet Flips.

Whena single patch p is used to protect u ’s visit to W over somewindow of time, the same p could appear in multiple patchedtraces. While our segment-based injection hides p within thepatched traces, it does not change p . Thus a resourceful at-tacker could potentially recover p using advanced trace anal-ysis techniques. To further elevate the obfuscation, we applyrandom “packet ﬂipping” to make p different in each visit to W . Speciﬁcally, in each visit session, we randomly choose asmall set of dummy packets (in p ) and ﬂip their directions(out to in, in to out). We ensure that this random ﬂippingoperation is accounted by the patch generation process (eq.(2)), so that it does not affect the patch effectiveness. Laterin §7, we show that this random ﬂipping does deter counter-measures that leverage frequency analysis to estimate p . Deployment Considerations.

At run-time, the user andTor bridge follow a simple protocol to protect traces. Assoon as the user u requests a website W , Dolos sends the pre-computed patch p W , T (after random ﬂipping) and the currentinsertion schedule s to the Tor bridge through an encryptedand ﬁxed length (padded) network tunnel. Next, the user andthe Tor bridge coordinate to send dummy packets to eachother to achieve the protection. In this section, we perform a systematic evaluation of

Dolos under a variety of WF attack scenarios. Speciﬁcally, we eval-uate

Dolos against i) the state-of-the-art DNN WF attackswhose classiﬁers use either the same or different feature ex-tractors of

Dolos (§6.2), ii) non-DNN WF attacks that usehandcrafted features (§6.3). Furthermore, we compare

Dolos against existing WF defenses under these attacks (§6.4) andin terms of information leakage , which estimates the numberof potential vulnerabilities facing any WF defense (includingthose not yet exploited by existing WF attacks) (§6.5).Overall, our results show that

Dolos is highly effectiveagainst state-of-the-art WF attacks ( ≥ .

4% attack protec-tion at a 30% bandwidth overhead). It largely outperformsexisting defenses in all three key metrics: protection successrate, bandwidth overhead, and information leakage.Finally, under a simpliﬁed 2-class setting, we show that

Dolos is provably robust with a sufﬁciently high bandwidthoverhead. Our theory result and proof, which rely on the the-7ry of optimal transport, are listed in the Appendix. WF Datasets.

Our experiments use two well-known WFdatasets:

Sirinam and

Rimmer (see Table 1), which are com-monly used by prior works for evaluating WF attacks and de-fenses [7,33,50,62,65]. Both datasets contain Tor users’ web-site traces (as data) and their corresponding websites (as la-bels).

Sirinam was collected by Sirinam et al. around Febru-ary 2016 and includes 86 ,

000 traces belonging to 95 web-sites [65] in the Alexa top website list [3].

Rimmer was col-lected by Rimmer et al. [62] in January 2017, covering 2million traces for visiting Alexa’s top 900 websites. The twodatasets have partial overlap in their labels. Following priorworks on WF attacks and defenses, we pad each trace in thedatasets into a ﬁxed length of 5000.

Dataset Name

Sirinam

95 76K 10K

Rimmer

900 2M 257K

Table 1: Two WF datasets used by our experiments.

Dolos

Conﬁguration.

Next we describe how we conﬁg-ure

Dolos ’ feature extractor Φ , the patch injection parame-ters, and the user-side secret. We build four feature extrac-tors using two well-known DNN architectures for web traceanalysis, Deep CNN (from the DF attack) and ResNet-18(from the Var-CNN attack), and train them on the above twoWF datasets. We also apply standard adversarial patch train-ing [4, 50, 60] to ﬁne-tune these feature extractors for 20epochs, which helps increase the transferability of our adver-sarial patches . Table 2 lists the resulting Φ s and we namethem based on the model architecture and training dataset. Model Architecture F Training Data X Feature Extractor Φ Deep CNN (DF)

Sirinam DF

Sirinam

ResNet-18 (Var-CNN)

Sirinam VarCNN

Sirinam

Deep CNN (DF)

Rimmer DF

Rimmer

ResNet-18 (Var-CNN)

Rimmer VarCNN

Rimmer

Table 2: The four feature extractors ( Φ ) used in our experi-ments, their model architecture and training dataset.When injecting patches, we set the mini-patch length M p to 10 and the trace segment length M x to M p / R , where R rep-resents the bandwidth overhead of the defense (0 < R < M p and ﬁnd that itsimpact on the defense performance is insigniﬁcant. Thuswe empirically choose a value of 10. By default, we setthe packet ﬂipping rate β = .

2, i.e., at run-time 20% of thedummy packets in each mini-patch will be ﬂipped to a differ-ent direction. The impact of β on possible countermeasures We show the protection results of

Dolos when using standard featureextractors without adversarial patch training in table 9 in the Appendix. is discussed later in §7.1. We use a separate dataset that con-tains traces from 400,000 different websites as our pool oftarget websites [62].Finally, since our patch generation depends on the user-side secret, we repeat each experiment 10 times, each usinga randomly formulated user-side secret (per website). We re-port the average and standard deviation values. Overall, thestandard deviations are consistently low across our experi-ments, i.e., <

1% for protection success rate.

Attack Conﬁguration.

We consider two types of WF at-tacks: 1) non-DNN based and using handcrafted features, i.e., k-NN [74], k-FP [30], and

CUMUL [52], and 2) DNN-basedattacks, i.e., DF [65] and Var-CNN [7]. We follow the origi-nal implementations to implement these attacks.Consistent with prior WF defenses [21, 26, 33, 58, 75], weassume that the attacker trains their classiﬁers using defendedtraces , i.e., those patched using our defense. To generate andlabel such traces, the attacker downloads

Dolos and runs it ontheir own traces when visiting a variety of websites. Here theattacker must input some user-side secret to run

Dolos , whichwe refer to as S attack . Since the attacker has no knowledge ofthe user-side secret S de f ense being used by Dolos to protectthe current user u , we have S de f ense = S attack . A relevant coun-termeasure by attackers is to enumerate many secrets whentraining the attack classiﬁer, which we discuss later in §7.3. Intersection Attacks.

We also consider intersection at-tacks in our evaluation of

Dolos . Here the attacker assumesthe victim visits the same websites regularly and monitorsthe victim’s trafﬁc for a longer time period (e.g., days). Withthese information, the attacker could make better inferenceson user’s traces. We test

Dolos against the intersection attackused by a previous WF defense [58] and ﬁnd that the attackis ineffective on

Dolos . More details about our experimentsand results can be found in the Appendix.

Evaluation Metrics.

We test

Dolos against various WFattacks using the testing traces in the two WF datasets (seeTable 1). We evaluate

Dolos using three metrics: 1) pro-tection success rate deﬁned as the WF attack’s misclassiﬁ-cation rate on the defended traces, 2) bandwidth overheadR = patch lengthoriginal trace length , and 3) information leakage , which mea-sures the amount of potential vulnerability of any WF de-fense [17, 42].We also examine the computation cost of Dolos . The av-erage time required to compute a patch is 19 s on an NvidiaTitan X GPU and 43 s on an eight core i9 CPU machine. Dolos vs. DNN-based WF Attacks

Our experiments consider three attack scenarios.•

Matching attack/defense : the attacker and

Dolos operateon the same original trace data X (e.g., Sirinam ) and usethe same model architecture F (e.g., DF). Dolos trains itsfeature extractor Φ using model F and data X , while the8 P r o t e c t i on S u cc e ss R a t e Bandwidth Overhead R (%)VarCNN-RimmerVarCNN-SirinamDF-RimmerDF-Sirinam

Figure 6: When against DNN-based at-tacks,

Dolos ’s protection success rate in-creases rapidly with its bandwidth over-head R . Dolos achieves >

97% protec-tion rate when R reaches 30%. Assumingmatching attack/defense. P r o t e c t i on S u cc e ss R a t e Nth Closest Target LabelVarCNN-RimmerVarCNN-SirinamDF-RimmerDF-Sirinam

Figure 7: Worst-case analysis on the im-pact of key collision on

Dolos , wherethe target feature representations T attack and T de f ense are N th nearest neighbors inthe feature space. Assuming matching at-tack/defense. P r o t e c t i on S u cc e ss R a t e Figure 8:

Dolos ’s effectiveness over timeagainst fresh attacks after deploying apatch to protect u visiting W at day 0.The same patch is able to resist attacksthat train their classiﬁers on newly gener-ated defended traces in subsequent days. Dataset

Dolos ’sFeature Extractor

Dolos ’s Protection Success Rate (%) against WF Attacksk-NN k-FP CUMUL DF Var-CNN

Sirinam DF

Sirinam . ± . . ± . . ± . . ± . . ± . DF Rimmer . ± . . ± . . ± . . ± . . ± . VarCNN

Sirinam . ± . . ± . . ± . . ± . . ± . VarCNN

Rimmer . ± . . ± . . ± . . ± . . ± . Rimmer DF

Sirinam . ± . . ± . . ± . . ± . . ± . DF Rimmer . ± . . ± . . ± . . ± . . ± . VarCNN

Sirinam . ± . . ± . . ± . . ± . . ± . VarCNN

Rimmer . ± . . ± . . ± . . ± . . ± . Table 3:

Dolos ’s protection success rate against different WF attacks. The bold entries are results under matching attack/defense.attacker trains its attack classiﬁer using model F and the defended version of X .• Mismatching attack/defense : the attacker and

Dolos usedifferent X and/or F when building their attack classiﬁerand feature extractor, respectively.• Defense effectiveness over time : Dolos starts to apply apatch p to protect u ’s visits to W since day 0; the attackerruns fresh WF attacks in the subsequent days by trainingattack classiﬁers on defended traces freshly generated ineach day.

Scenario 1: Matching Attack/Defense.

Figure 6 plots

Do-los ’s protection success rate against its bandwidth overhead,for each of the four ( F , X ) combinations listed in Table 2. Thestandard deviation values are small ( < Dolos consistently achieves 97% or higher pro-tection rate when the defense overhead R ≥ Dolos is even more effectiveagainst WF attacks whose classiﬁers are trained on original(undefended) traces, i.e., 98% protection success rate with a15% overhead. For the rest of the paper, we use R =

30% asthe default conﬁguration.

Likelihood of Secret Collisions:

In the above experi- ments, we randomly select the secret pair ( S attack , S de f ense )and show that in general, as long as S attack = S de f ense , Do-los is highly effective against WF attacks. Next, we also runa worst-case analysis that looks at cases where the combina-tion of S attack and S de f ense leads to heavy collision in the WFfeature space. That is, the patches generated by S attack and S de f ense will move the feature representation of the originaltrace to the target feature representations of website T attack and T de f ense , respectively, but the two targets are close in thefeature space (with respect to the ℓ distance). Here we askthe question: how “close” do T attack and T de f ense need to bein order to break our defense ?We answer this question in Figure 7 by plotting Dolos ’sprotection success rate when T attack is the N th nearest label to T de f ense in the feature space ( N ∈ [ , ] ). Here we show theresult for each of the four ( F , X ) combinations. We see thatas long as T attack is beyond the top 20 th nearest neighborsof T de f ense , Dolos can maintain >

96% protection successrate. Since our pool of target websites is very large (400,000websites), the probability of an attacker ﬁnding a K attack thatweakens our defense is very low ( p bad ≈ , = × − ).Later in §7, we show that even when the attacker trains theirclassiﬁers on defended traces produced by a large number ofsecrets, the impact on Dolos is still minimum.

Scenario 2: Mismatching Attack/Defense.

We now con-9 ataset Defense Name BandwidthOverhead Protection Success Rate Against WF Attacksk-NN k-FP CUMUL DF Var-CNN Worst Case

Sirinam

WTF-PAD 54% 87% 56% 69% 10% 11% 10%FRONT 80% 96% 68% 72% 31% 34% 31%Mockingbird 52% 94% 89% 91% 69% 73% 69%

Dolos

30% 98% 99% 97% 96% 95% 95%

Rimmer

WTF-PAD 61% 84% 58% 72% 14% 11% 11%FRONT 72% 97% 62% 68% 37% 39% 37%Mockingbird 57% 96% 87% 90% 71% 79% 71%Blind Adversary 11% - - - - 76% ∗ Dolos

10% 95% 92% 93% 87% 92% 87%

Dolos

30% 99% 98% 98% 97% 98% 97%

Table 4: Comparing bandwidth overhead and protection success rate of WTF-PAD, FRONT, Mockingbird, Blind Adversary,and

Dolos . ∗ we take the number from their original paper [50] as the authors have not released their source code.sider the more general scenario where the attacker and Dolos use different X and/or F to train their classiﬁers and featureextractors, respectively. Here we consider two existing DNN-based WF attacks, DF and Var-CNN, trained on Sirinam or Rimmer , and conﬁgure

Dolos to use one of the four featureextractors listed in Table 2. Using the test data of

Sirinam and

Rimmer , we evaluate

Dolos against these WF attacks (DFand Var-CNN), and list

Dolos ’s protection success rate in Ta-ble 3. As reference, we also include the results of matchingattack/defense (scenario 1), marked in bold.Overall,

Dolos remains highly effective ( >

94% protec-tion rate) against attacks using different training data and/ormodel architecture. This shows that our proposed adversarialpatches generated from a local feature extractor can success-fully transfer to a variety of attack classiﬁers.

Scenario 3: Defense Effectiveness Over Time.

Next, weevaluate

Dolos against freshly generated attacks over time.Here

Dolos computes and deploys a patch p to protect u ’svisits to W at day 0; the attacker continues to run WF attacksin the subsequent days, and each day, they train the attackclassiﬁer using defended traces generated on the current day.We use this experiment to examine the robustness of Dolos ’spatches under web content dynamics and network dynamics,also referred to as concept drift.Our experiment uses the concept drift dataset provided byRimmer et al. [62], which was collected along with

Rimmer .This new dataset consists of 200 websites (a subset of 900websites in

Rimmer ), repeatedly collected over a six week pe-riod (0 day, 3 days, 10 days, 14 days, 28 days, 42 days). Werun

Dolos using each of the four feature extractors to producea patch at day 0. The attacker uses the Var-CNN classiﬁer andtrains the classiﬁer using fresh defended traces generated onday 3, 10, 14, 28 and 42. We see that the protection successrate remains consistent ( > Dolos vs. non-DNN Attacks

While

Dolos targets the inherent vulnerability of DNN-basedWF attacks, we show that the adversarial patches produced by

Dolos are also highly effective against non-DNN basedWF attacks. Here we consider three most-effective non-DNNattacks: k-NN [74], k-FP [30], and CUMUL [52]. Table 2lists the protection success rate of

Dolos under these attacks,using each of the four different feature extractors. Our resultsalign with existing results: adversarial patches designed forDNN models also transfer to non-DNN models [14, 22].

Table 4 lists the performance of

Dolos and four state-of-the-art defenses (WTF-PAD [36], FRONT [26], Mocking-bird [58], Blind Adversary [50] as described in §2). We eval-uate them against ﬁve attacks on the two WF datasets. For

Dolos , we use

VarCNN

Rimmer as the local feature extractor.

WTF-PAD.

WTF-PAD is reasonably effective against tra-ditional ML attacks, but performs poorly against any DNNbased attack, i.e., protection success rate drops to 10%. Theseﬁndings align with existing observations [36, 65].

FRONT.

FRONT is effective against non-DNN attacks butfails against DNN attacks. In our experiments, FRONT in-duces larger bandwidth overhead than reported in the originalpaper [26]. The discrepancy is because FRONT’s overhead isdataset dependent and a separate paper [33] reports the sameoverhead as ours when applying FRONT to

Sirinam . Mockingbird.

Mockingbird is effective against non-DNNattacks and reasonably effective against DNN attacks (71%protection success). As stated earlier, the biggest drawbackof Mockingbird is that the defense needs the access to thefull trace beforehand, making it unrealistic to implement inthe real-world.

Blind Adversary.

We are not able to obtain the sourcecode of Blind adversary at the time of writing. In the origi-nal paper, Blind Adversary is evaluated on the same dataset(

Rimmer ) using same robust training technique. However, theauthors report the results for 11% overhead under the white-box setting. Using the same setting and similar overhead,

Do-los achieves 92% protection success rate whereas Blind Ad-10ersary achieves 76% protection.

Recent works [17,42] argue that WF defenses need to be eval-uated beyond protection success rate because existing WF at-tacks may not show the hidden vulnerability of the proposeddefense. Li et al. [42] proposes an information leakage esti-mation framework (WeFDE) that measures a given defense’sinformation leakage on a set of features. We measure

Dolos information leakage following the WeFDE framework.WeFDE computes the mutual information of trace featuresbetween different classes. A feature has a high informationleakage if it distinguishes traces between different websites.However, we cannot directly apply such analyzer to measurethe leakage of

Dolos because

Dolos does not seek to maketraces from different websites indistinguishable from eachother. Attacker can separate the traces defended by

Dolos buthas no way of knowing which websites the traces belong to.Thus to measure the information leakage of

Dolos , weneed to obtain the overall distribution of defended traces ag-nostic to secrets, i.e., defended traces using all possible se-crets. We approximate this distribution using traces gener-ated with a large number of secrets and feed the aggregatedtraces to WeFDE for information leakage analysis.We use all websites from

Rimmer . For each website, weuse DF Sirinam and 80 different secrets to generate defendedtraces. We ﬁnd that enumerating more than 80 secrets haslimited impact on the information leakage results. We ag-gregate the traces together before feeding into WeFDE. Wecompare

Dolos with three previous state-of-the-art defenses,WTF-PAD, FRONT, and MockingBird. We measure the leak-age (bits) on two sets of features: 1) hand crafted featuresfrom the WeDFE paper [42], and 2) DNN features from amodel trained on the defended traces of a given defense.

Leakage on Hand Crafted Features.

We ﬁrst measurethe information leakage on the default set of 3043 handcrafted features from WeFDE. This feature set covers variouscategories, e.g., packet ngrams, counts, and bursts. Figure 9shows the empirical cumulative distribution function (ECDF)of information leakage across the features. The curve of

Do-los increases much faster than the curves of other defenses.For

Dolos , no feature leaks more than 1 . Leakage on Features Trained on Defended Traces.

Foreach defense, we measure the information leakage on the fea-ture space of Var-CNN models trained on the defended traces.For

Dolos , the defended traces used by the attacker are gener-ated by a different secret. Figure 10 shows the ECDF of theleakage across all the features. Overall, this feature set leaksmore information than the hand crafted features in Figure 9.Again the curve of

Dolos increases faster than the curves ofother defenses. The gap between

Dolos and other defenses is larger in this set of features showing

Dolos is more resilientagainst models trained on defended traces due to randomnessintroduced by the secret.

In this section, we explore additional countermeasures thatcould be launched by attackers with complete knowledgeof

Dolos . We consider three classes of countermeasures: de-tecting

Dolos patches, preprocessing inputs to disrupt

Dolos patches, and boosting WF classiﬁer robustness. Unless oth-erwise speciﬁed, experiments in this section run VarCNN-based attacks on the

Rimmer dataset, and the defender usesthe DF Sirinam feature extractor to generate patches (see §6).

Dolos

Patches

An attacker can apply data analysis techniques to detect thepresence of patches in network traces. Detection can lead topossible identiﬁcation of patches and their removal.

Frequency Analysis.

An attacker who observes multiplevisits to the same website by the same user might identifydefender’s patch sequences using frequency analysis, if thesame patch is applied to multiple traces over a period of time.In practice, this frequency analysis might be challenging be-cause: the location of the patch is randomized,

Dolos ran-domly ﬂips a subset of the dummy packets each time thepatch is applied, and packet sequences in patches might blendin naturally with unaltered network traces.We test the feasibility of this countermeasure. We assumethe attacker has gathered network traces from 100 separatevisits by the same user to a single website. For each trace,the attacker enumerates all packet sequences with the lengthof the mini-patch (known to attacker). To address the ran-dom ﬂipping, the attacker merges packet sequences that havea Hamming distance smaller than the ﬂipping ratio (knownto the attacker). This produces a set of packet sequences foreach trace. The sequence of each mini-patch should appearin every set.As the ﬂip ratio increases, however, packet sequences fromthe patch start to blend in with common packet sequencesfound frequently in benign (unpatched) network traces. Patchsequences can thus blend in with normal sequences, makingtheir identiﬁcation and removal difﬁcult.For example, for each website in

Rimmer , we take 100 orig-inal network traces and perform frequency analysis. With aﬂip ratio of 0 .

2, an website has on average 45% of its packetsequences showing up in every set as false positives that looklike potential patches. An aggressive attacker can remove all such high frequency packet sequences before classiﬁer in-ference. Removing these high frequency packet sequencesmeans the attacker’s classiﬁer accuracy is reduced down to7%. We perform this test using different values of ﬂip ratio β and show the results in Table 8. success rate against a normal11 E CD F Information Leakage (bit)

UndefendedWTF-PADMockingbirdFRONTDolos

Figure 9: The ECDF of informationleakage on hand crafted features fromWeFDE. E CD F Information Leakage (bit)

UndefendedWTF-PADMockingbirdFRONTDolos

Figure 10: The ECDF of informationleakage on features from models trainedon defended traces. P r o t e c t i on S u cc e ss R a t e Figure 11: Protection performance dropsslightly as attacker trains on traces de-fended by increasing numbers of secrets.

Ratio ofPackets Protection Success Rate WhenDrop Packets Flip Packets

Table 5: Protection success rate remainshigh as attacker drops or ﬂips a portionof packets before classiﬁcation.

Ratio of PacketsTrimmed Protection SuccessRate

Table 6: Protection success rate remainshigh as attacker trim a portion of packetsat the end of each trace before classiﬁca-tion.

20 96%40 95%100 94%200 95%

Table 7: Protection success rate remainshigh when transferring to increasingmore robust models.

Flip Ratio Protection Success Success ofVar-CNN Attack Frequency Analysis

Table 8: Impact of ﬂip ratio on frequency analysis attack.Var-CNN attack and frequency analysis attack. When the ﬂipratio is ≥ .

2, frequency analysis countermeasure offer nobeneﬁt against our defense.

Anomaly Detection.

We also consider attackers that us-ing traditional anomaly detection techniques to distinguishpatches from normal packet sequences.We compute 80 distinct patches using DF Sirinam for each ofthe 900 websites in

Rimmer , for a total of 72,000 patches. Wecompare these patches to 72,000 natural packet sequences ofthe same length randomly chosen from original traces (withrandom offsets).

First , we run a 2-means clustering on thefeatures space of sequences (features extracted by DF Sirinam ).The resulting clusters contain 47% and 53% patches respec-tively and fail to distinguish patches from normal sequences.

Second , we also try to separate patches using supervisedtraining. We train a DF classiﬁer on packet sequences ofpatches and natural packets. The classiﬁer achieves 58% ac-curacy, only slightly outperforming random guessing.

A simple but effective approach to defeat adversarial patchis to transform the input before using them for both trainingand inferencing [12, 25]. In the case of WF attacks, we con-sider 3 possible transformations: i) add “noise” by randomlyﬂipping packets, ii) add “noise” by randomly dropping pack-ets, iii) truncating the network traces after the ﬁrst N pack-ets. Note that the attacker is processing the traces locally anddoes not modify any packets in network.Our tests show that none of these transformations im-pact our defense in any meaningful way. Flipping randompackets in the trace degrades the classiﬁcation accuracy ofthe attacker classiﬁer by 22%, but Dolos remains successful >

96% (Table 5). Next, Table 5 shows randomly droppingpackets from the trace decreases protection success rate byat most 2%, but more signiﬁcantly, degrades attacker’s clas-siﬁcation accuracy (by 32%). Finally, truncating the trace de-grades the attacker’s classiﬁcation accuracy but

Dolos pro-tection success remains >

94% (Table 6).

Next, we evaluate the feasibility of techniques to improve therobustness of attacker classiﬁers against adversarial patches.

Training on Multiple Patches.

In §6, the attacker trainedtheir classiﬁer on traces protected by patch generated using asingle secret, and failed to achieve high attack performance.Here, we consider a more general adversarial training ap-proach, that trains the attacker’s model against patched traces12enerated from multiple distinct secrets. Training againstmultiple targets gets the model closer to a more completecoverage of the space of potential adversarial patches.The attacker uses

Dolos source code (with DF Sirinam fea-ture extractor) to generate defended traces using N randomlyselected keys for each website in Rimmer and trains a Var-CNN classiﬁer. On the defender side, the traces is protectedusing DF Sirinam feature extractor but a different key . Fig-ure 11 shows that as the attacker trains against more patchedtraces generated from different secrets, there is a small gainin robustness by the attacker’s model. At its lowest point, theefﬁcacy of Dolos patches drops to 87%. Across all of ourcountermeasures tested, this is the most effective.

Robust Attack Classiﬁer.

In adversarial patch training, amodel is iteratively retrained on adversarial patches gener-ated by the model. This technique is similar to but differentfrom training on defended traces, which are generated by thedefender’s model. In adversarial patch training, the classiﬁeris iteratively trained on adversarial patches generated on theattacker model such that the classiﬁer is robust to any type ofadversarial patches.We evaluate

Dolos against increasingly more robust clas-siﬁers on the attacker side. Model robustness directly cor-relates with the number of epochs trained using adversarialtraining [41, 45, 71, 77]. In our experiment, the defender usesthe DF Sirinam feature extractor (adversarially trained for 20epochs) to generate patches. We test the defense against at-tack classiﬁer (Var-CNN) with varying robustness (adversar-ially trained from 20 to 200 epochs). Figure 7 shows the pro-tection success rate remains ≤

94% against all the robust clas-siﬁers and does not trend downwards as the model becomesmore robust. This shows that generic adversarial training isless effective than training on defended traces, likely becausethe latter is more targeted towards speciﬁc types of perturba-tions generated by the defender.

Training Orthogonal Classiﬁers.

Another countermea-sure by the attacker can be to explicitly avoid the featuresused by the defender to generate patches, and to ﬁnd otherfeatures for their trace classiﬁcation. If successful, it wouldproduce a classiﬁer that is largely resistant to the patch. Oneapproach is to build an attack classiﬁer that has orthogonalfeature space as the defender’s feature extractor. The attackeradds an additional loss term to model training to minimizethe neuron cosine similarity at an intermediate layer of themodel. We train such a classiﬁer using

Rimmer and Var-CNNmodel architecture. In our tests, the classiﬁer only achieves8% normal classiﬁcation accuracy after 20 epochs of train-ing. This likely shows that there are not enough alternative,orthogonal features that can accurately identify destinationwebsites from network traces.

Other Countermeasures Against Adversarial Patches. We do not consider the case where the user and attacker’s secrets match,since its probability is extremely small (i.e., ≤ / K in cases we tested). There are other defenses against adversarial patches exploredin the computer vision domain. However, most of them arelimited to small input perturbations (less than 5% of the in-put) [2, 19, 81]. Others only work on contiguous patches [29,47,49]. To the best of our knowledge, there exist no effectivedefense against larger patches (30% of input) or nonconsecu-tive adversarial patches induced by

Dolos . While it is alwayspossible the community will develop more effective defensesagainst larger adversarial patches, there exists a proven lowerbound on adversarial robustness that increases as the size ofperturbation increases [6]. Thus, it is difﬁcult to be robustagainst large input perturbations without sacriﬁcing classiﬁ-cation accuracy.

The primary contribution of

Dolos is an effective defenseagainst website ﬁngerprinting attacks (both traditional MLbased and DNN-based) that can run in real time to protectusers. Our work is the ﬁrst to apply the concept of adversar-ial patches to WF defenses.However, there are questions we have yet to study in de-tail. First, while most recent defenses and attacks focus on adirection-only threat model and ignore information leakagethrough time gaps [21, 31, 34, 58, 65, 75], some recent WF at-tacks [7,59] also utilize time gaps between packets to classifywebsites. We believe

Dolos can be extended to also defendagainst attacks utilizing time-gaps, and plan on addressingthis task in ongoing work. Second, we have not yet studied

Dolos deployed in the wild. Real measurements and tests inthe wild may reveal additional considerations, leading to ad-ditional ﬁne tuning of our system design.

References [1] A BE , K., AND G OTO , S. Fingerprinting attack on toranonymity using deep learning.

APAN 42 (2016), 15–20.[2] A

KHTAR , N., L IU , J., AND M IAN , A. Defense against uni-versal adversarial perturbations. In

Proc. of CVPR (2018),pp. 3389–3398.[3] , 2017.[4] B

AGDASARYAN , E.,

AND S HMATIKOV , V. Blind backdoorsin deep learning models. arXiv preprint arXiv:2005.03823 (2020).[5] B

ARKER , E., B

URR , W., P

OLK , W., S

MID , M.,

ET AL . Recommendation for key management: Part 1: Gen-eral . NIST, 2006.[6] B

HAGOJI , A. N., C

ULLINA , D.,

AND M ITTAL , P. Lowerbounds on adversarial robustness from optimal transport. In

Proc. of NeurIPS (2019), pp. 7498–7510.[7] B

HAT , S., L U , D., K WON , A.,

AND D EVADAS , S. Var-cnn:A data-efﬁcient website ﬁngerprinting attack based on deeplearning.

PoPETS 2019 , 4 (2019), 292–310.

8] B

ROWN , T. B., M

ANÉ , D., R OY , A., A BADI , M.,

AND G ILMER , J. Adversarial patch. arXiv preprintarXiv:1712.09665 (2017).[9] C AI , X., N ITHYANAND , R.,

AND J OHNSON , R. Cs-buﬂo: Acongestion sensitive website ﬁngerprinting defense. In

Proc.of WPES (2014), pp. 121–130.[10] C AI , X., N ITHYANAND , R., W

ANG , T., J

OHNSON , R.,

AND G OLDBERG , I. A systematic approach to developing andevaluating website ﬁngerprinting defenses. In

Proc. of CCS (2014), pp. 227–238.[11] C AI , X., Z HANG , X. C., J

OSHI , B.,

AND J OHNSON , R.Touching from a distance: Website ﬁngerprinting attacks anddefenses. In

Proc. of CCS (2012), pp. 605–616.[12] C

ARLINI , N.,

AND W AGNER , D. Adversarial examples arenot easily detected: Bypassing ten detection methods. In

Proc.of AISec (2017).[13] C

ARLINI , N.,

AND W AGNER , D. Towards evaluating the ro-bustness of neural networks. In

Proc. of IEEE S&P (2017).[14] C

HARLES , Z., R

OSENBERG , H.,

AND P APAILIOPOULOS , D.A geometric perspective on the transferability of adversarialdirections. In

Proc. of AISTAT (2019), PMLR, pp. 1960–1968.[15] C

HEN , P.-Y., S

HARMA , Y., Z

HANG , H., Y I , J., AND H SIEH ,C.-J. Ead: elastic-net attacks to deep neural networks via ad-versarial examples. In

Proc. of AAAI (2018).[16] C

HEN , S.-T., C

ORNELIUS , C., M

ARTIN , J.,

AND C HAU , D.H. P. Shapeshifter: Robust physical adversarial attack onfaster r-cnn object detector. In

Proc. of ECML PKDD (2018),Springer, pp. 52–68.[17] C

HERUBIN , G. Bayes, not naïve: Security bounds on websiteﬁngerprinting defenses.

PoPETS 2017 , 4 (2017), 215–231.[18] C

HERUBIN , G., H

AYES , J.,

AND J UAREZ , M. Website ﬁn-gerprinting defenses at the application layer.

PoPETS 2017 , 2(2017), 186–203.[19] C

HIANG , P.- Y ., N I , R., A BDELKADER , A., Z HU , C., S TU - DOR , C.,

AND G OLDSTEIN , T. Certiﬁed defenses for adver-sarial patches. arXiv preprint arXiv:2003.06693 (2020).[20] D

ANEZIS , G. Statistical disclosure attacks. In

Proc. of IFIPSEC (2003), Springer, pp. 421–426.[21] D

E LA C ADENA , W., M

ITSEVA , A., H

ILLER , J., P EN - NEKAMP , J., R

EUTER , S., F

ILTER , J., E

NGEL , T., W

EHRLE ,K.,

AND P ANCHENKO , A. Trafﬁcsliver: Fighting websiteﬁngerprinting attacks with trafﬁc splitting. In

Proc. of CCS (2020), pp. 1971–1985.[22] D

EMONTIS , A., M

ELIS , M., P

INTOR , M., J

AGIELSKI , M.,B

IGGIO , B., O

PREA , A., N

ITA -R OTARU , C.,

AND R OLI , F.Why do adversarial attacks transfer? explaining transferabilityof evasion and poisoning attacks. In

Proc. of USENIX Security (2019), pp. 321–338.[23] D

YER , K. P., C

OULL , S. E., R

ISTENPART , T.,

AND S HRIMP - TON , T. Peek-a-boo, i still see you: Why efﬁcient trafﬁc analy-sis countermeasures fail. In

Proc. of IEEE S&P (2012), IEEE,pp. 332–346.[24] E

BRAHIMI , J., R AO , A., L OWD , D.,

AND D OU , D. Hotﬂip:White-box adversarial examples for text classiﬁcation. arXivpreprint arXiv:1712.06751 (2017). [25] F EINMAN , R., C

URTIN , R. R., S

HINTRE , S.,

AND G ARD - NER , A. B. Detecting adversarial samples from artifacts. arXiv:1703.00410 (2017).[26] G

ONG , J.,

AND W ANG , T. Zero-delay lightweight defensesagainst website ﬁngerprinting. In

Proc. of USENIX Security (2020), pp. 717–734.[27] G

OODFELLOW , I. J., S

HLENS , J.,

AND S ZEGEDY , C. Ex-plaining and harnessing adversarial examples. arXiv preprintarXiv:1412.6572 (2014).[28] G

ROSSE , K., P

APERNOT , N., M

ANOHARAN , P., B

ACKES ,M.,

AND M C D ANIEL , P. Adversarial examples for malwaredetection. In

Proc. of ESORICS (2017), Springer, pp. 62–79.[29] H

AYES , J. On visible adversarial perturbations & digital wa-termarking. In

Proc. of CVPR (2018), pp. 1597–1604.[30] H

AYES , J.,

AND D ANEZIS , G. k-ﬁngerprinting: A robust scal-able website ﬁngerprinting technique. In

Proc. of USENIXSecurity (2016), pp. 1187–1203.[31] H

ENRI , S., G

ARCIA -A VILES , G., S

ERRANO , P., B

ANCHS ,A.,

AND T HIRAN , P. Protecting against website ﬁngerprint-ing with multihoming.

PoPETS 2020 , 2 (2020), 89–110.[32] H

ERRMANN , D., W

ENDOLSKY , R.,

AND F EDERRATH , H.Website ﬁngerprinting: attacking popular privacy enhancingtechnologies with the multinomial naïve-bayes classiﬁer. In

Proc. of CCSW (2009), pp. 31–42.[33] H

OLLAND , J. K.,

AND H OPPER , N. Regulator: Apowerful website ﬁngerprinting defense. arXiv preprintarXiv:2012.06609 (2020).[34] H OU , C., G OU , G., S HI , J., F U , P., AND X IONG , G. Wf-gan: Fighting back against website ﬁngerprinting attack usingadversarial learning. In

Proc. of ISCC (2020), IEEE, pp. 1–7.[35] I

LYAS , A., S

ANTURKAR , S., T

SIPRAS , D., E

NGSTROM , L.,T

RAN , B.,

AND M ADRY , A. Adversarial examples are notbugs, they are features. In

Proc. of NeurIPS (2019).[36] J

UAREZ , M., I

MANI , M., P

ERRY , M., D

IAZ , C.,

AND W RIGHT , M. Toward an efﬁcient website ﬁngerprinting de-fense. In

Proc. of ESORICS (2016), Springer, pp. 27–46.[37] K

ARMON , D., Z

ORAN , D.,

AND G OLDBERG , Y. Lavan: Lo-calized and visible adversarial noise. In

Proc. of ICML (2018),PMLR, pp. 2507–2515.[38] K

OLOSNJAJI , B., D

EMONTIS , A., B

IGGIO , B., M

AIORCA ,D., G

IACINTO , G., E

CKERT , C.,

AND R OLI , F. Adversar-ial malware binaries: Evading deep learning for malware de-tection in executables. In

Proc. of EUSIPCO (2018), IEEE,pp. 533–537.[39] K

URAKIN , A., G

OODFELLOW , I.,

AND B ENGIO , S. Ad-versarial examples in the physical world. arXiv preprintarXiv:1607.02533 (2016).[40] L EE , J. A ﬁrst course in combinatorial optimization , vol. 36.Cambridge University Press, 2004.[41] L I , B., W ANG , S., J

ANA , S.,

AND C ARIN , L. Towardsunderstanding fast adversarial training. arXiv preprintarXiv:2006.03089 (2020).

42] L I , S., G UO , H., AND H OPPER , N. Measuring informationleakage in website ﬁngerprinting attacks and defenses. In

Proc. of CCS (2018), pp. 1977–1992.[43] L

IBERATORE , M.,

AND L EVINE , B. N. Inferring the sourceof encrypted http connections. In

Proc. of CCS (2006),pp. 255–263.[44] L U , L., C HANG , E.-C.,

AND C HAN , M. C. Website ﬁnger-printing and identiﬁcation using ordered feature sequences. In

Proc. of ESORICS (2010), Springer, pp. 199–214.[45] M

ADRY , A., M

AKELOV , A., S

CHMIDT , L., T

SIPRAS , D.,

AND V LADU , A. Towards deep learning models resistant toadversarial attacks. In

Proc. of ICLR (2018).[46] M

ALLESH , N.,

AND W RIGHT , M. An analysis of the statisti-cal disclosure attack and receiver-bound cover.

Computers &Security 30 , 8 (2011), 597–612.[47] M C C OYD , M., P

ARK , W., C

HEN , S., S

HAH , N.,R

OGGENKEMPER , R., H

WANG , M., L IU , J. X., AND W AGNER , D. Minority reports defense: Defending againstadversarial patches. arXiv preprint arXiv:2004.13799 (2020).[48] M

OOSAVI -D EZFOOLI , S.-M., F

AWZI , A., F

AWZI , O.,

AND F ROSSARD , P. Universal adversarial perturbations. In

Proc.of CVPR (2017), pp. 1765–1773.[49] N

ASEER , M., K

HAN , S.,

AND P ORIKLI , F. Local gradientssmoothing: Defense against localized adversarial attacks. In

Proc. of WACV (2019), pp. 1300–1307.[50] N

ASR , M., B

AHRAMALI , A.,

AND H OUMANSADR , A.Blind adversarial network perturbations. arXiv preprintarXiv:2002.06495 (2020).[51] N

ITHYANAND , R., C AI , X., AND J OHNSON , R. Glove: Abespoke website ﬁngerprinting defense. In

Proc. of WPES (2014), pp. 131–134.[52] P

ANCHENKO , A., L

ANZE , F., P

ENNEKAMP , J., E

NGEL , T.,Z

INNEN , A., H

ENZE , M.,

AND W EHRLE , K. Website ﬁnger-printing at internet scale. In

Proc. of NDSS (2016).[53] P

ANCHENKO , A., N

IESSEN , L., Z

INNEN , A.,

AND E NGEL ,T. Website ﬁngerprinting in onion routing based anonymiza-tion networks. In

Proc. of WPES (2011), pp. 103–114.[54] P

APERNOT , N., M C D ANIEL , P., S

WAMI , A.,

AND H ARANG ,R. Crafting adversarial input sequences for recurrent neuralnetworks. In

Proc. of MILCOM (2016), IEEE, pp. 49–54.[55] P

ERRY , M. Tor protocol speciﬁcation proposal, 2015. https://gitweb.torproject.org/torspec.git/tree/proposals/254-padding-negotiation.txt .[56] P

ETROV , D.,

AND H OSPEDALES , T. M. Measuring thetransferability of adversarial examples. arXiv preprintarXiv:1907.06291 (2019).[57] P

YDI , M. S.,

AND J OG , V. Adversarial risk via optimaltransport and optimal couplings. In Proc. of ICML (2020),pp. 7814–7823.[58] R

AHMAN , M. S., I

MANI , M., M

ATHEWS , N.,

AND W RIGHT ,M. Mockingbird: Defending against deep-learning-basedwebsite ﬁngerprinting attacks with adversarial traces.

TIFS (2020), 1594–1609. [59] R

AHMAN , M. S., S

IRINAM , P., M

ATHEWS , N., G

ANGAD - HARA , K. G.,

AND W RIGHT , M. Tik-tok: The utility ofpacket timing in website ﬁngerprinting attacks.

PoPETS 2020 ,3 (2020), 5–24.[60] R AO , S., S TUTZ , D.,

AND S CHIELE , B. Adversarial trainingagainst location-optimized adversarial patches. arXiv preprintarXiv:2005.02313 (2020).[61] R EN , S., D ENG , Y., H E , K., AND C HE , W. Generatingnatural language adversarial examples through probabilityweighted word saliency. In Proc. of ACL (2019), pp. 1085–1097.[62] R

IMMER , V., P

REUVENEERS , D., J

UAREZ , M.,V AN G OETHEM , T.,

AND J OOSEN , W. Automated websiteﬁngerprinting through deep learning. In

Proc. of NDSS (2018).[63] S

HAFAHI , A., H

UANG , W. R., S

TUDER , C., F

EIZI , S.,

AND G OLDSTEIN , T. Are adversarial examples inevitable? In

ICLR (2019).[64] S

HALEV -S HWARTZ , S.,

AND B EN -D AVID , S.

Understand-ing machine learning: From theory to algorithms . Cambridgeuniversity press, 2014.[65] S

IRINAM , P., I

MANI , M., J

UAREZ , M.,

AND W RIGHT , M.Deep ﬁngerprinting: Undermining website ﬁngerprinting de-fenses with deep learning. In

Proc. of CCS (2018), pp. 1928–1943.[66] S

IRINAM , P., M

ATHEWS , N., R

AHMAN , M. S.,

AND W RIGHT , M. Triplet ﬁngerprinting: More practical andportable website ﬁngerprinting with n-shot learning. In

Proc.of CCS (2019), pp. 1131–1148.[67] S

ONG , D., E

YKHOLT , K., E

VTIMOV , I., F

ERNANDES , E.,L I , B., R AHMATI , A., T

RAMER , F., P

RAKASH , A.,

AND K OHNO , T. Physical adversarial examples for object detec-tors. In

Proc. of WOOT (2018).[68] S

UCIU , O., C

OULL , S. E.,

AND J OHNS , J. Exploring adver-sarial examples in malware detection. In

Proc. of SPW (2019),IEEE, pp. 8–14.[69] S

UCIU , O., M ˘ARGINEAN , R., K

AYA , Y., D

AUMÉ

III, H.,

AND D UMITRA ¸S , T. When does machine learning fail? gen-eralized transferability for evasion and poisoning attacks. In

Proc. of USENIX Security (2018).[70] S UN , Q., S IMON , D. R., W

ANG , Y.-M., R

USSELL , W., P AD - MANABHAN , V. N.,

AND Q IU , L. Statistical identiﬁcation ofencrypted web browsing trafﬁc. In Proc. of IEEE S&P (2002),IEEE, pp. 19–30.[71] T

RAMER , F.,

AND B ONEH , D. Adversarial training and ro-bustness for multiple perturbations. In

Proc. of NeurIPS (2019), pp. 5866–5876.[72] U

ESATO , J., O’D

ONOGHUE , B., O

ORD , A. V . D ., AND K OHLI , P. Adversarial risk and the dangers of evaluat-ing against weak attacks. arXiv preprint arXiv:1802.05666 (2018).[73] W

ALLACE , E., F

ENG , S., K

ANDPAL , N., G

ARDNER , M.,

AND S INGH , S. Universal adversarial triggers for attackingand analyzing nlp. arXiv preprint arXiv:1908.07125 (2019).

74] W

ANG , T., C AI , X., N ITHYANAND , R., J

OHNSON , R.,

AND G OLDBERG , I. Effective attacks and provable defenses forwebsite ﬁngerprinting. In

Proc. of USENIX Security (2014),pp. 143–157.[75] W

ANG , T.,

AND G OLDBERG , I. Walkie-talkie: An efﬁcientdefense against passive website ﬁngerprinting attacks. In

Proc.of USENIX Security (2017), pp. 1375–1390.[76] W

ANG , X., J IN , H., AND H E , K. Natural language adver-sarial attacks and defenses in word level. arXiv preprintarXiv:1909.06723 (2019).[77] W ONG , E., R

ICE , L.,

AND K OLTER , J. Z. Fast is better thanfree: Revisiting adversarial training. In

Proc. of ICLR (2020).[78] W

RIGHT , M. K., A

DLER , M., L

EVINE , B. N.,

AND S HIELDS , C. The predecessor attack: An analysis of a threatto anonymous communications systems.

TISSEC 7 , 4 (2004),489–522. [79] W

RIGHT , M. K., A

DLER , M., L

EVINE , B. N.,

AND S HIELDS , C. Passive-logging attacks against anonymouscommunications systems.

TISSEC 11 , 2 (2008), 1–34.[80] W U , Z., L IM , S.-N., D AVIS , L.,

AND G OLDSTEIN , T. Mak-ing an invisibility cloak: Real world adversarial attacks on ob-ject detectors. arXiv preprint arXiv:1910.14667 (2019).[81] X

IANG , C., B

HAGOJI , A. N., S

EHWAG , V.,

AND M ITTAL ,P. Patchguard: Provable defense against adversarial patchesusing masks on small receptive ﬁelds. arXiv preprintarXiv:2005.10884 (2020).[82] Y

OSINSKI , J., C

LUNE , J., B

ENGIO , Y.,

AND L IPSON , H.How transferable are features in deep neural networks? In

Proc. of NeurIPS (2014).[83] Z

HANG , W. E., S

HENG , Q. Z., A

LHAZMI , A.,

AND L I , C.Adversarial attacks on deep-learning models in natural lan-guage processing: A survey. TIST 11 , 3 (2020), 1–41. Effectiveness against Intersection Attacks

Intersection attacks are popular attacks against anonymitysystems [20, 46, 78, 79]. In the context of website ﬁngerprint-ing, an intersection attacker assumes the victim visits thesame websites regularly and monitors the victim’s networktrafﬁc for a longer time period (e.g., every day over multi-ple days). Using the additional information, the attacker isable to make better inferences on user’s traces. We test

Do-los against an intersection attack intersection attack used bya previous defense [58].For each of the victim’s browsing traces, the attacker logsthe top-k results of the attack classiﬁer (top-k is the k out-put websites that have the highest probabilities that the tracebelongs to). If a website consistently appear in the top-k re-sults, then the attacker may believe that this site is in fact thewebsite that the user is visiting. We choose the same attacksetup as [58]. We assume attacker observes 5 separate visitsto the same website. Then the attacker saves the top-10 la-bels predicted by the classiﬁer (using Var-CNN attack). Theattack is successful if the selected label is the most frequentlyappeared label within joint list of 50 labels (5 sets of 10 top-10 labels). We test on 20 randomly selected websites from Rimmer . For all cases, the frequency of the correct websiteis far from the most frequent one (in the best case it ranksnumber 4 out of 41 websites) and the correct website neverappears in all 5 rounds. Thus, we conclude that intersectionattacks is not effective against

Dolos . B Theoretical Justiﬁcation of Defense

We show that

Dolos provides provable robustness guaranteeswhen both the attacker and defender use the same ﬁxed fea-ture extractor Φ .We use recent theoretical results [6,57] on learning 2-classclassiﬁers in the presence of adversarial examples which hasshown that as the strength of the attacker (defender in ourcase) increases, the 0 − any classiﬁer islower bounded by a transportation cost between the condi-tional distributions of the two classes. In the case of imageclassiﬁcation, which is the example considered in previouswork, the budget is typically too small to observe interestingbehavior in terms of this lower bound.However, since we consider network trafﬁc traces to whichmuch larger amounts of perturbation can be added, we en-counter non-trivial regimes of this bound. This implies thatwith a sufﬁciently large bandwidth, no classiﬁer used by theattacker will be able to distinguish between traces from thesource and target classes. In order to demonstrate this, wemake the following assumptions:1. The attacker is attempting to distinguish if a trace x be-longs to the original class W or the target class T , withdistributions P W and P T respectively, both of which may be defended (i.e. adversarially perturbed).2. The attacker uses a classiﬁer function F acting over thefeature space Φ ( X ) , where F can be any measurablefunction and Φ : X → R k is any ﬁxed feature extractor.The resulting end-to-end classifer is represented by thetuple ( Φ , F ) .3. The defender has an ℓ norm perturbation budget of ε Φ in the feature space, matching the choice of Dist ( · , · ) inEq. 1. The feature space budget is related to the inputspace bandwidth overhead R by a mapping function M that maps balls of radius R in the input space to balls ofradius at least ε Φ in the feature space.Given these assumptions, we can now state the followingtheorem, adapted from Bhagoji et al. [6]: Theorem 1 (Upper bound on attacker success) . For any pairof classes W and T with conditional probability distributionsP W and P T , and the joint distribution P, over X , and with aﬁxed feature extractor Φ , the attack success rate of any clas-siﬁer F isASR (( Φ , F ) , P , R ) ≤ ( + C ε Φ ( Φ ( P W ) , Φ ( P T ))) . (3) Proof.

The end-to-end classiﬁer has a ﬁxed feature extractor Φ and a classiﬁer function F that can be optimized over. Todetermine an upper bound on the attack success rate achiev-able by a classiﬁer F, we have ASR (( F , Φ ) , P , R ) = E x ∼ P (cid:20) min | ˜ x − x |≤ R (( F , Φ )( ˜ x ) = y ) (cid:21) , ≤ E Φ ( x ) ∼ φ ( P ) (cid:20) min || Φ ( ˜ x ) − Φ ( x ) || ≤ ε Φ ( F ( Φ ( ˜ x )) = y ) (cid:21) , = ASR ( F , Φ ( P ) , ε Φ ) The ≤ arises from a conservative estimate of the distancemoved in feature space. Having transformed the attack suc-cess rate calculation to one over the feature space, we cannow directly apply Theorem 1 from [6], which givesmax F ASR ( F , Φ ( P ) , ε Φ ) = ( + C ε Φ ( Φ ( P W ) , Φ ( P T ))) . From [6], C ( Φ ( P W ) , Φ ( P T ))= inf P WT ∈ Π ( P W , P T ) E ( x W , x T ) ∼ P WT [ ( | Φ ( x W ) − Φ ( x T ) | ≥ ε Φ )] , (4)where Π ( P W , P T ) is the set of joint distributions over X W × X T with marginals P W and P T .17 ataset Defender’sFeature Extractor Protection Success Rate Against WF Attacksk-NN k-FP CUMUL DF Var-CNN Sirinam DF

Sirinam

96% 97% 93% 95% 91% DF Rimmer

96% 94% 97% 92% 93%

VarCNN

Sirinam

94% 92% 95% 93% 96%

VarCNN

Rimmer

97% 96% 95% 95% 94%

Rimmer DF

Sirinam

94% 92% 95% 93% 92% DF Rimmer

95% 94% 96% 97% 91%

VarCNN

Sirinam

94% 97% 98% 94% 92%

VarCNN

Rimmer

96% 95% 96% 95% 97%

Table 9: Protection performance of

Dolos using non-robust feature extractor against different WF attacks when transfering toclassiﬁers trained on different datasets and/or architecture. T w o - c l a ss a tt a ck s u cc e ss r a t e Feature Space L2 Distance ( ε Φ ) Figure 12: Upper bound on the effective-ness of any attack classiﬁer for a ﬁxedfeature extractor Φ , averaged over 500choices of source-pairs. F ea t u r e S pa c e L2 D i s t an c e ( ε Φ ) Bandwidth Overhead Ratio

Figure 13: Variation in the distancemoved in feature space with the in-put bandwidth overhead, averaged over a100 different targets for a ﬁxed source. F ea t u r e S pa c e L2 D i s t an c e ( ε Φ ) Bandwidth Overhead Ratio

Figure 14: Variation in the distancemoved in feature space with the inputbandwidth variation, averaged over a sin-gle target for 20 different sources.The main takeaway from the above theorem is that thebetter separated the perturbed feature vectors from the twoclasses are, the higher the attack success rate will be. Thus,from the defender’s perspective, the bandwidth R has to besufﬁcient to ensure that the resulting ε Φ leads to low separa-bility. Empirical upper bounds.

We compute the feature spacedistances between 500 different sets of source W and target T websites and plot the maximum attack success rate as thebudget ε Φ in the feature space is varied (Figure 12). We usea robust feature extractor trained on the Rimmer dataset to de-rive this upper bound. With a feature space budget of 2 .

5, theattack success rate drops to 50%, which for a two-class clas-siﬁcation problem implies that no classiﬁer can distinguishbetween the two classes.Now, it remains to be established that a reasonable input bandwidth overhead can lead to a feature space budget of 2 . .

5, making it a conservative estimate for ε Φ in Theorem 1. Remarks

We note that our analysis here is restricted tothe 2-class setting, thus the conclusions drawn may not ap-ply for all source-target pairs. We hypothesize that this ex-plains the lower than 100% protection rate at R = ..