A Real-time Defense against Website Fingerprinting Attacks
aa r X i v : . [ c s . CR ] F e b A Real-time Defense against Website Fingerprinting Attacks
Shawn Shan
University of Chicago
Arjun Nitin Bhagoji
University of Chicago
Haitao Zheng
University of Chicago
Ben Y. Zhao
University of Chicago
Abstract
Anonymity systems like Tor are vulnerable to Website Fin-gerprinting (WF) attacks, where a local passive eavesdrop-per infers the victim’s activity. Current WF attacks basedon deep learning classifiers have successfully overcome nu-merous proposed defenses. While recent defenses leveragingadversarial examples offer promise, these adversarial exam-ples can only be computed after the network session has con-cluded, thus offer users little protection in practical settings.We propose
Dolos , a system that modifies user networktraffic in real time to successfully evade WF attacks.
Do-los injects dummy packets into traffic traces by computing input-agnostic adversarial patches that disrupt deep learningclassifiers used in WF attacks. Patches are then applied toalter and protect user traffic in real time. Importantly, thesepatches are parameterized by a user-side secret, ensuring thatattackers cannot use adversarial training to defeat
Dolos . Weexperimentally demonstrate that
Dolos provides 94+% pro-tection against state-of-the-art WF attacks under a variety ofsettings. Against prior defenses,
Dolos outperforms in termsof higher protection performance and lower information leak-age and bandwidth overhead. Finally, we show that
Dolos isrobust against a variety of adaptive countermeasures to detector disrupt the defense.
Website-fingerprinting (WF) attacks are traffic analysis at-tacks that allow eavesdroppers to identify websites visitedby a user, despite their use of privacy tools such as VPNsor the Tor anonymity system [30, 74]. The attacker identifieswebpages in an encrypted connection by analyzing and rec-ognizing network traffic patterns. These attacks have grownmore powerful over time, improving in accuracy and scale.The most recent variants can overwhelm existing defenses,by training deep neural network (DNN) classifiers to iden-tify the destination website given a network trace. In realworld settings, WF attacks have proven effective in identi-fying traces in the wild from a large number of candidatewebsites using limited data [7, 66]. There is a long list of defenses that have been proposedand then later defeated by DNN based WF attacks. First, aclass of defenses obfuscate traces by introducing random-ness [23, 26, 36, 75]. These obfuscation-based defenses havebeen proven ineffective (< 60% protection) against DNNbased attacks [7, 65]. Other defenses have proposed random-izing HTTP requests [17] or scattering traffic redirectionacross different Tor nodes [21,31]. Again, these defenses pro-vide poor protection against DNN-based attacks (<50% pro-tection). Unsurprisingly, the success of DNN attacks have de-railed efforts to deploy WF defenses on Tor (e.g., Tor stoppedthe implementation of WTF-PAD after it was broken by theDF attack [55, 65])Against these strong DNN attacks, the only defenses toshow promise are recent proposals that apply adversarial ex-amples to mislead WF classification models [34, 58]. Adver-sarial examples are known weaknesses of DNNs, where asmall, carefully tuned change to an input can dramaticallyalter the DNN’s output. These vulnerabilities have been stud-ied intensely in the adversarial ML community, and now aregenerally regarded a fundamental property of DNNs [35,63].Unfortunately, defenses based on adversarial exampleshave one glaring limitation. To be effective, adversarial ex-amples must be crafted individually for each input [48]. Inthe WF context, the “input” is the entire network traffic trace.Thus a defense built using adversarial examples requires theentire traffic trace to be completed, before it can compute theprecise perturbation necessary to mislead the attacker’s WFclassifier. This is problematic, since real world attackers willobserve user traffic in real time, and be unaffected by a de-fense that can only take action after the fact.In this paper, we propose
Dolos , a practical and effectivedefense against WF attacks that can be applied to networktraffic in real-time. The key insight in
Dolos is the applica-tion of the concept of trace-agnostic patches to WF defenses,derived from the concept of adversarial patches. While ad-versarial examples are customized for each input, adversar-ial patches can cause misclassifications when applied to awide range of input values. In the context of a WF defense,1 -1 1 -1 -1User NetworkTrace
Web Server
Attacker eavesdropsuser’s traffic
PredictedLabel
Attacker’s ClassifierTor UserVisit Website W 1 -1 1 -1 -1
Figure 1: A WF attacker, positioned between user and the Tornetwork, eavesdrops on user network traffic. After the con-nection terminates, the attacker classifies the entire networktrace using a pretrained WF attack classifier.“patches” are pre-computed sequences of dummy packetsthat protect all network traces of visits to a specific website.A patch is applied to an active, ongoing network connection,i.e. in “real time.” As we will show,
Dolos generates patchesparameterized by a user-side secret such as private keys. Un-like traditional patches or universal perturbations, patches pa-rameterized by a user’s secret cannot be overcome by attack-ers, unless they compromise the user’s secret.Our work describes experiences in designing and evaluat-ing
Dolos , and makes four key contributions:• We propose
Dolos , a new WF defense that is highly ef-fective against the strongest attacks using deep learningclassifiers. More importantly,
Dolos precomputes patchesbefore the network connection, and applies the patch toprotect the user in real time.• We introduce a secret-based parameterization mechanismfor adversarial patches, which allows a user to generaterandomized patches that cannot be correctly guessed with-out knowledge of the user-side secret. We show that thisgives
Dolos strong resistance to WF attacks that apply ad-versarial training with known adversarial patches and per-turbations, which can otherwise defeat patches and univer-sal perturbations [50].• We evaluate
Dolos under a variety of settings. Regardlessof how the attacker trains its classifier (handcrafted fea-tures, deep learning, or adversarial training),
Dolos pro-vides 94+% protection against the attack. Compared tothree state-of-the-art defenses [26, 36, 58],
Dolos signifi-cantly outperforms all three in key metrics: overhead, pro-tection performance, and information leakage.• Finally, we consider attackers with full knowledge of ourdefense (i.e., full access to source code but not the user se-cret), and demonstrate that
Dolos is robust against multipleadaptive attacks and countermeasures.
In WF attacks, an attacker tries to identify the destination ofuser traffic routed through an encrypted connection to Toror a VPN. The attacker passively eavesdrops on the user connection, and once the session is complete, feeds the net-work trace as input to a machine learning classifier to iden-tify the website visited. Even though packets across the con-nection are encrypted and padded to the same size, attackerscan distinguish traces as sequences of packets and their direc-tion. The attacker’s machine learning classifier is trained onpacket traces generated by visiting a large, pre-determinedset of websites beforehand.
Attacks via Hand-crafted Features.
Panchenko et al. [53] proposed the first effective WF attack against Tor trafficusing a support vector machine with hand-crafted features.Follow-up work proposed stronger attacks [11, 30, 32, 43, 44,52, 70, 74] by improving the feature set and using differentclassifier architectures. The most effective attacks based onhand crafted features such as k-NN [74], CUMUL [52] , andk-FP [30] achieve over 90% accuracy in identifying websitesbased on network traces.
Attacks via DNN Classifiers.
Recent work [1, 7, 62,65] leverages deep neural networks (DNNs) to performmore powerful WF attacks. DNNs automatically extractfeatures from raw network traces, and outperform previ-ous WF attacks based on hand-crafted features. Two of themost successful DNN-based attacks are
Deep Fingerprint-ing (DF) [65] and
VarCNN [7]. DF leverages a deep con-volutional neural network for classification, and reaches over98% accuracy on undefended traces, or traces defended usingexisting defenses (§2.2). Var-CNN [7] further improves theattack performance by using a large residual network archi-tecture, and can achieve high performance even with limitedtraining data. Later in §6, our experiments show that
Dolos is effective at defending against both of these state-of-the-artattacks, as well as against those using hand-crafted features.
WF defenses modify (add, delay, or reroute) packets to pre-vent the identification of the destination website using traceanalysis. Broadly speaking, defenses either obfuscate tracesbased on expert-designed noise heuristics or leverage adver-sarial perturbations to evade machine learning classifiers.
Defenses via Trace Obfuscation.
A number of defensesaim to obfuscate traces to increase the difficulty of classifica-tion. This obfuscation is performed either at the applicationlayer or the network layer. Since these defenses do not specif-ically target the attacker’s classifier, they generally afford lowprotection (<60%) against state-of-the-art WF attacks.In application layer obfuscation , the defender introducesrandomness into HTTP requests or the Tor routing algorithm(e.g., [18,21,31]). Application layer defenses generally makestrong assumptions that are often unrealistic in practice, suchas target websites implementing customized HTTP proto-cols [18] or only allowing attackers to observe traffic at asingle Tor entry node [21, 31]. These defenses provide less2han 60% protection against DNN-based attacks such as DF and Var-CNN [7, 65].In network layer obfuscation , the defender inserts dummypackets into network traces to make website identificationmore challenging. First, early defenses [9, 10, 23] usedconstant-rate padding to reduce the information leakagecaused by time gaps and traffic volume. However, thesemethods led to large bandwidth overhead ( > supersequence , which isa longer packet trace that contains subsequences of differ-ent websites’ traces. The strongest supersequence defense,Walkie-Talkie, achieves 50% protection against DNN attacks.Overall, against strong WF attacks, network layer obfusca-tion defenses either induce extremely large overheads (>100%) or offer low protection (< 60%). Defenses via Adversarial Perturbation.
Goodfellow etal. first proposed evasion attacks against DNNs, where an at-tacker causes a DNN to misclassify inputs by adding small, adversarial perturbations [27] to them. Such attacks havebeen widely studied in many domains, e.g., computer vi-sion [13, 15, 16, 39, 72], natural language processing [24, 83],and malware detection [28,68]. Recent WF defenses [34,58]use adversarial perturbations to defeat DNN-based attacks.The challenge facing adversarial perturbation-based WFdefense is that adversarial perturbations are computed withrespect to a given input, which in the WF context is the fullnetwork trace. Thus, computing the adversarial perturbationnecessary to protect a network connection requires the de-fender to know the entire trace before the connection is evenmade. This limitation renders the defense impractical to pro-tect user traces in real-time.Mockingbird [58] suggests that we can use a database ofrecent traces to compute perturbations, and use them on newtraces. Yet it is widely accepted in ML literature that adver-sarial perturbations are input specific and rarely transfer [48].We confirm this experimentally using Mockingbird’s pub-licly released code: for the same website, perturbations cal-culated on one trace offer an average of 18% protection to adifferent trace from the same website. We tested 10 pairs oftraces per website, over 10 websites randomly sampled fromthe same dataset used by Mockingbird [58].In a concurrent manuscript, Nasr et al. [50] propose solv-ing this limitation by precomputing universal adversarialperturbations for unseen traces, enabling the protection oflive traces. However, an attacker aware of this defense canalso compute these universal perturbations, and adversariallytrain their models against them to improve robustness. This
Trace 1Trace 2Trace 3Trace 4
Figure 2: Example traces observed by a WF attacker when aTor user u visits the same website. Each sample trace ( x u )records the packet direction (black bar: incoming packet,white bar: outgoing packet) of the first 500 packets in thissession. We see that while the Tor user visits the same web-site in these four sessions, the traces observed by the attackervary across sessions.countermeasure causes a significant drop in its protection to76%. We show that our defense outperforms [50](§6.4). We consider the problem of defending against website fin-gerprinting attacks. A user u wishes to use the Tor networkto visit websites privately. An attacker, after eavesdroppingon u ’s traffic and collecting a trace of u ’s website visit ( x u ),attempts to use x u to determine (or classify) the destinationwebsite that u has just visited. To defend against such attacks,the defender seeks to inject “obfuscation” traffic into the Tornetwork, such that the traffic trace observed by the attacker( ˆ x u ) will lead to a wrong classification result. Threat Model.
We use the same threat model adopted byexisting WF attacks and defenses.• The attacker, positioned between the user and Tor guardnodes, can only observe the user’s network traffic but notmodify it. Furthermore, the attacker can tap into user con-nections over a period of time and may see traffic on mul-tiple website requests.• We consider WF attacks that operate on packet direc-tions . This assumption is consistent with many previousdefenses [21, 31, 34, 58, 75]. For each u ’s website session,the attacker collects its trace x u as a sequence of packetdirections (i.e., marking each outgoing packet as +1 andeach incoming packet as -1). Figure 2 shows four exam-ples when a Tor user visits the same website at differ-ent times. While the Tor user visits the same website inthese four sessions, the traces observed by the attackervary across sessions.• We consider closed-world WF attacks where the user onlyvisits a set of websites known to the attacker. These at-tacks are strictly stronger than open-world attacks where Since Tor’s traffic is encrypted and padded to the same size [32], onlypacket directions and time gaps will leak information about the destinationwebsite.
Defender Capabilities.
We make two assumptions aboutthe defender.• The defender can actively inject dummy packets into usertraffic. With cooperation from both the user and a node inthe Tor circuit (i.e., a Tor bridge ), the defender can injectpackets in both directions (outgoing and incoming).• The defender has no knowledge of the WF classifier usedby the attacker. Success Metrics.
To successfully protect users against WFattacks, a WF defense needs to meet multiple criteria. First, itshould successfully defend the most powerful known attacks(i.e., those based on DNNs), and reduce their attack successto near zero. Second, it should do so without adding unrea-sonable overhead to users’ network traffic. Third, it needs tobe effective in realistic deployments where users cannot pre-dict the order of packets in a real-time network flow. Finally,a defense should be robust to adaptive attacks designed withfull knowledge of the defense (i.e., with source code access).Earlier in §2.2 we provide a detailed summary of existingWF defenses and their limitations with respect to the abovesuccess metrics.
In this paper, we present
Dolos , a new WF defense that ex-ploits the inherent weaknesses of deep learning classifiersto provide a highly effective defense that protects networktraces in real time, resists a variety of countermeasures, andincurs low overhead compared to existing defenses. In thefollowing, we describe the key concepts behind
Dolos andits design considerations. We present the detailed design of
Dolos later in §5.
Our new WF defense is inspired by the novel concept of ad-versarial patches [8] in computer vision.Adversarial patches are a special form of artifacts, whichwhen added to the input of a deep learning classifier, cause aninput to be misclassified. Adversarial patches differ from ad-versarial perturbations in that patches are both input-agnostic and location-agnostic , i.e., a patch causes misclassification Tor bridges are often used to generate incoming packets for WF de-fenses [26, 36, 50, 58].
Original Image Adversarial Perturbation Adversarial Patch
Figure 3: Sample images showing the difference between ad-versarial perturbations and adversarial patches.when applied to an input, regardless of the value of the inputor the location where the patch is applied . Thus adversar-ial patches are “universal,” and can be pre-computed withoutfull knowledge of the input. Existing works have already de-veloped adversarial patches to bypass facial recognition clas-sifiers [67, 73, 80].In computer vision, an adversarial patch is formed as afixed size pattern on the image [8]. Figure 3 shows an ex-ample of an adversarial patch next to an example of adver-sarial perturbation. Given knowledge of the target classifier,one can search for adversarial patches under specific con-straints such as patch color or intensity. For instance, for aDNN model ( F ), an targeted adversarial patch p adv is com-puted via the following optimization: p adv = argmin p E x ∈ X , l ∈ L loss ( F ( Π ( p , x , l )) , y t ) (1)where X is the set of training images, L is a distribution oflocations in the image, y t is the target label, and Π is the func-tion that applies the patch to a random location of an image.This optimization is performed over all training images inorder to make the patch effective across images. We propose to defend against WF attacks by adding adver-sarial traffic patches to user traffic traces, causing attackersto misclassify destination websites. To the best of our knowl-edge, our work is the first to use adversarial patches to defendagainst WF attacks.
Trace-agnostic.
This is the key property of adversarialpatches and why they are a natural defense against WF at-tacks. Given a user u and a website W , one can design a patchthat works on any network trace produced when u visits W .We note that like adversarial perturbations, patches can becomputed as the solution to a constrained optimization prob-lem. Empirical studies have not found any limitations in thenumber of unique patches that can be computed for any tar-geted misclassification task.Leveraging this property, the defender can pre-compute,for u , a set of W -specific adversarial patches. Once u starts to We note that while input-dependent patches have been considered [37],they are largely utilized and analyzed in an input agnostic context. W , the defender fetches a pre-computed patchand injects, in real-time, the corresponding dummy packetsinto u ’s live traffic. Furthermore, since patches are built us-ing diverse training data, they are inherently robust againstmoderate levels of website updates and/or network dynamics.The defender periodically re-computes new patches or whenthey detect significant changes in website content and/or usernetworking environments. Here, we describe new design considerations that arise fromapplying adversarial patches to network traffic traces.For a specific user/website pair (user u , website W ), the de-fender first uses some sample traces of u visiting W to com-pute a patch p for the pair. At run-time, when u initiates aconnection to W , the defender fetches p and follows the cor-responding insertion schedule to add dummy packets into u ’sTor traffic. No original packets are dropped or modified.Therefore, to generate adversarial patches for traffic traces,we follow the optimization process defined by Eq. (1), butchange the Π function for patch injection. Note that thepatches are generated for each specific user and website pair( u , W ). Another key change is the “perturbation budget” (i.e.,the maximum changes to the input), which is now defined asthe bandwidth overhead introduced by those dummy pack-ets. Patches for images are often limited by a small pertur-bation budget (e.g., < + % bandwidth over-head [55, 65]. Strong Model Transferability of Patches.
Here modeltransferability refers to the well-known phenomenon thatML classifiers trained for similar tasks share similar behav-iors and vulnerabilities, even if they are trained on differentarchitectures or training data [82]. Existing work has shownthat when applied to an input, an adversarial perturbation orpatch computed for a given DNN model will transfer acrossmodels [22, 56, 69]. In addition, a perturbed or patched in-put will also transfer to non-DNN models such as SVM andrandom forests [14,22]. The level of transferability is particu-larly strong for large perturbation sizes [22,56], as is the casefor adversarial patches for network traffic.Leveraging this strong model transferability, we can builda practical WF patch without knowing the actual classifierused by WF attackers. The defender can compute adversar-ial patches using local WF models, and they should succeedagainst attacks using other WF classifiers.
A standard response by WF attackers to our patch-based de-fenses is to take patches generated by the defense, build and label patched user traces, and use these traces to (re)trainan attack classifier to bypass the defense. In the adversar-ial ML literature, this is termed “adversarial training,” andis regarded as the most reliable defense against adversar-ial attacks, including adversarial perturbations, adversarialpatches, and universal perturbations.In our experiments, we find that existing WF defenses failunder such attacks. For [21], protection drops to 55%. Note-worthy is the fact that [50], which uses universal perturba-tions, is particularly vulnerable to this attack. An attackeraware of the defense can compute the universal perturbationthemselves, and use it for adversarial training. Our resultsshow that the protection rate of [50] under adversarial train-ing drops dramatically from 92% to 16%.To resist adaptive attackers using adversarial training, wepropose a novel secret-based patch generation mechanismthat makes it nearly impossible for the attacker to reproducethe same patch as the defender. Specifically, the defender firstcomputes a user-side secret S based on private keys, nonces,and website-specific identifiers, and then uses it to parameter-ize the optimization process of patch generation. The resultis that different secrets will generate significantly differentpatches. When applied to the same network trace, the result-ing patched traces will also display significantly disjoint rep-resentations in both input and feature spaces. Without accessto the user-side secret, adversarial training using patches gen-erated by the WF attacker will have little effect, and Dolos will continue to protect users from the WF attack. Dolos
In this section we present the design of
Dolos , starting froman overview, followed by detailed descriptions of its two keycomponents: patch generation and patch injection.
Consider the task to protect a user u ’s visits to website W . Dolos implements this protection by injecting an adversarialpatch p W , T to u ’s live traffic when visiting W , such that whenthe (defended) network trace is analyzed by a WF attacker,its classifier will conclude that u is visiting T (a website dif-ferent from W ). Here T is a configurable defense parameter. Dolos includes two key steps: patch generation to com-pute an adversarial traffic patch ( p W , T ) and patch injection toinject a pre-computed patch into u ’s live traffic as u is visiting W . This is also shown in Figure 4.To generate a patch, Dolos inspects the WF feature space,and searches for potential adversarial patches that can effec-tively “move” the feature representation of a patched traceof u visiting W close to the feature representation of the (un-patched) traces of u visiting T . When these two feature rep-resentations are sufficiently similar, WF attacker classifiers5 olos Feature Extractor:
Patch I n s e r ti on S c h e du l e
1) Patch generation
Tor user
Dolos
Time
Insert dummy packets
2) Patch injection in real-time
Internet
Current Time
User-side Secret
Live packets
Figure 4: Our proposed
Dolos system that protects a user u from WF attacks. First , Dolos precomputes a patch and its packetinsertion schedule to protect u ’s visits to website W , using u ’s secret and a feature extractor. Second , when u is visiting website W , Dolos defends user’s traces in real-time by inserting dummy packets according to the precomputed patch and schedule.will identify the patched traces of u visiting W as a trace vis-iting T .The above optimization can be formulated as follows: p W , T = argmin p E x ∈ X W , x ′ ∈ X T , s ∈ S D ( Φ ( Π ( p , x , s )) , Φ ( x ′ )) subject to | p | ≤ p budget (2)where p budget defines the maximum patch overhead, X W ( X T )defines a collection of unpatched instances of u visiting W ( T ), S defines the set of feasible schedules to inject a patchinto live traffic, and Π ( p , x , s ) defines the patch injection function that injects a patch p onto the live traffic x under aschedule s . Finally, Φ ( · ) refers to the local WF feature extrac-tor used by Dolos while D is a feature distance measure ( ℓ norm in our implementation). Figure 5 provides an abstractillustration of the patch and the injection results. Randomized Design.
To prevent attackers from extractingor reverse-engineering our patches, we design
Dolos to incor-porate randomization into both patch generation and injec-tion. In §5.2 and §5.3, we describe how
Dolos uses user-sidesecret to configure T , S , Π ( p , x , s ) and the optimization pro-cess of Eq. (2) to implement patches that are robust againstadaptive attacks. Choosing Φ ( · ) . As discussed earlier,
Dolos does not as-sume knowledge of the WF classifiers used by the attack-ers. Instead,
Dolos operates directly on the feature space anduses a feature extractor Φ , basically a partial neural networktrained on the same or similar task. Dolos can train Φ locallyor use a pre-trained WF classifier from a trusted party (e.g.,Tor). Given an input x , Dolos uses the outputs of an interme-diate layer of Φ as x ’s feature vector that quantifies distancein the feature space. A well-trained Φ can help Dolos toler-ate web content dynamics and can function well with a widerrange of websites both known and unknown [62].
The patch generation follows the optimization process de-fined by eq. (2). Here a novel contribution of
Dolos is touse a secret to configure the patch generation process, so thatthe resulting patches are strictly conditioned on this secret.The means that when we apply patches generated with differ-ent secrets to the same original trace, the resulting patched
Original TracePatched Trace (schedule 1)PatchPatched Trace (schedule 2)
Figure 5: An abstract illustration of adversarial traffic patchand how it is injected into user traffic traces. Black/whitebars mark the packet directions (out/in) of original packets,red/blue mark the (out/in) directions of dummy packets (i.e.,the patch).
Dolos injects the patch into a live user trace fol-lowing a specific schedule. Here we show the results whenthe same patch is injected using two different schedules (1 &2) and random packet flipping (see §5.3).traces will be significantly different in both input and WFfeature spaces. The secret can be a one way hash of a user’sprivate key, a time-based nonce, and an optional website spe-cific modifier. This secret allows
Dolos to compute multiple,distinct patches based on specific users, destination websites,and time. A defender can periodically recompute patcheswith updated nonces to prevent any longitudinal attacks thattry to identify a common patch across multiple connectionsto the same website. This controlled randomization preventsattackers from observing the true distribution of the patchedtraces over time or multiple traces.
Parameterized Patch Generation.
When generating apatch,
Dolos uses a secret S to determine T (the target web-site) and the exact number of dummy packets to be injected,i.e., the length of the patch | p W , T | . Choosing T : Dolos collects a large pool of candidate tar-get websites (400 ,
000 in our implementation) from the Inter-net. To protect user u ’s visit to W , Dolos uses u ’s secret S to“randomly” select a website from the candidate pool, whosefeature representation is far from that of the original trace,i.e., E x ∈ X W Φ ( x ) is largely dissimilar from E x ∈ X T Φ ( x ) . Forour implementation, we first calculate the feature distance ( ℓ norm) between W and each candidate in our large pool, iden-tify the top 75 th percentile as a reduced candidate pool for W ,and use the secret S as a random seed to select one from the6educed pool. Choosing | p W , T | : To further obfuscate the appearance ofour patches, we use the same secret S to select the patchlength (i.e., the number of dummy packets to be injected intothe user traffic). Specifically, given p budget (the maximumoverhead ratio), we “randomly” pick a patch length between ( p budget − ε , p budget ) where ε is a defense parameter (0 . Patch Optimization.
When solving the patch optimiza-tion problem defined by eq. (2), we use Stochastic GradientDescent (SGD) [64] by batching samples from X W . We notethat in order to apply SGD, we need to relax the constraintson p W , T and allow it to lie in [ − , ] n , i.e., we allow for con-tinuous values between − + − +
1. This method is widelyused to solve discrete optimization problems in domains likenatural language processing [54, 61, 76] and malware clas-sification [38], which avoids the intractability of combina-torial optimization [40]. Finally, we note that the optimiza-tion also takes into account the patch injection process (e.g., Π ( p , x , s ) ), which we describe next in §5.3. Managing Secrets.
Note that
Dolos manages secrets andtheir original components (private keys, nonces, website spe-cific identifiers) following the standard private key manage-ment recommendations [5]. Secrets are only stored on user’sdevice and updated periodically.
The patch injection component has two goals: 1) making thepatch input-agnostic and location-agnostic, in order to de-ploy our WF defense on live traffic, and 2) obfuscating thepatched traces at run-time to prevent attackers from detect-ing/removing patches from the observed traces (to recoverthe original trace), or dismantling patches via trace segmen-tation to reduce its coverage/effectiveness.Existing solutions (used by adversarial patches for images)do not meet these goals. They simply inject a given patch (asa fixed block) at any location of the trace. Under our problemcontext, an attacker can easily recognize the fixed pattern byobserving the user traces over a period of time. Instead, wepropose to combine a segment-based patch injection methodwith run-time packet-level obfuscation.
Segment-based Patch Injection.
A pre-computed patch p is a sequence of dummy packets (+1s and -1s) designed to beinjected into the user’s live traffic x . We first break p into mul-tiple segments of equal size M p (e.g., 30 packets), referred toas “mini-patches.” Each mini-patch is assigned to protect asegment of x of size M x (e.g., 100 packets). We configure thepatch generation process to ensure that, within each segmentof x , the corresponding mini-patch stays as a fixed block.Patches are location-agnostic , so they will produce the same effect regardless of location within the segment. Therefore,given M x and M p , the injection function Π ( p , x , s ) in Eq. (2)depends on s , an injection schedule that defines the randomlychosen location of each mini-patch within x ’s segments (seeFigure 5). S defines all possible sets of mini-patch locations.An advantage of splitting up the patch into mini-patches isthat it protects against attackers trying to infer the website bysearching for subsequences of packets. Our results confirmthis hypothesis later in §7.1. Run-time Patch Obfuscation using Packet Flips.
Whena single patch p is used to protect u ’s visit to W over somewindow of time, the same p could appear in multiple patchedtraces. While our segment-based injection hides p within thepatched traces, it does not change p . Thus a resourceful at-tacker could potentially recover p using advanced trace anal-ysis techniques. To further elevate the obfuscation, we applyrandom “packet flipping” to make p different in each visit to W . Specifically, in each visit session, we randomly choose asmall set of dummy packets (in p ) and flip their directions(out to in, in to out). We ensure that this random flippingoperation is accounted by the patch generation process (eq.(2)), so that it does not affect the patch effectiveness. Laterin §7, we show that this random flipping does deter counter-measures that leverage frequency analysis to estimate p . Deployment Considerations.
At run-time, the user andTor bridge follow a simple protocol to protect traces. Assoon as the user u requests a website W , Dolos sends the pre-computed patch p W , T (after random flipping) and the currentinsertion schedule s to the Tor bridge through an encryptedand fixed length (padded) network tunnel. Next, the user andthe Tor bridge coordinate to send dummy packets to eachother to achieve the protection. In this section, we perform a systematic evaluation of
Dolos under a variety of WF attack scenarios. Specifically, we eval-uate
Dolos against i) the state-of-the-art DNN WF attackswhose classifiers use either the same or different feature ex-tractors of
Dolos (§6.2), ii) non-DNN WF attacks that usehandcrafted features (§6.3). Furthermore, we compare
Dolos against existing WF defenses under these attacks (§6.4) andin terms of information leakage , which estimates the numberof potential vulnerabilities facing any WF defense (includingthose not yet exploited by existing WF attacks) (§6.5).Overall, our results show that
Dolos is highly effectiveagainst state-of-the-art WF attacks ( ≥ .
4% attack protec-tion at a 30% bandwidth overhead). It largely outperformsexisting defenses in all three key metrics: protection successrate, bandwidth overhead, and information leakage.Finally, under a simplified 2-class setting, we show that
Dolos is provably robust with a sufficiently high bandwidthoverhead. Our theory result and proof, which rely on the the-7ry of optimal transport, are listed in the Appendix. WF Datasets.
Our experiments use two well-known WFdatasets:
Sirinam and
Rimmer (see Table 1), which are com-monly used by prior works for evaluating WF attacks and de-fenses [7,33,50,62,65]. Both datasets contain Tor users’ web-site traces (as data) and their corresponding websites (as la-bels).
Sirinam was collected by Sirinam et al. around Febru-ary 2016 and includes 86 ,
000 traces belonging to 95 web-sites [65] in the Alexa top website list [3].
Rimmer was col-lected by Rimmer et al. [62] in January 2017, covering 2million traces for visiting Alexa’s top 900 websites. The twodatasets have partial overlap in their labels. Following priorworks on WF attacks and defenses, we pad each trace in thedatasets into a fixed length of 5000.
Dataset Name
Sirinam
95 76K 10K
Rimmer
900 2M 257K
Table 1: Two WF datasets used by our experiments.
Dolos
Configuration.
Next we describe how we config-ure
Dolos ’ feature extractor Φ , the patch injection parame-ters, and the user-side secret. We build four feature extrac-tors using two well-known DNN architectures for web traceanalysis, Deep CNN (from the DF attack) and ResNet-18(from the Var-CNN attack), and train them on the above twoWF datasets. We also apply standard adversarial patch train-ing [4, 50, 60] to fine-tune these feature extractors for 20epochs, which helps increase the transferability of our adver-sarial patches . Table 2 lists the resulting Φ s and we namethem based on the model architecture and training dataset. Model Architecture F Training Data X Feature Extractor Φ Deep CNN (DF)
Sirinam DF
Sirinam
ResNet-18 (Var-CNN)
Sirinam VarCNN
Sirinam
Deep CNN (DF)
Rimmer DF
Rimmer
ResNet-18 (Var-CNN)
Rimmer VarCNN
Rimmer
Table 2: The four feature extractors ( Φ ) used in our experi-ments, their model architecture and training dataset.When injecting patches, we set the mini-patch length M p to 10 and the trace segment length M x to M p / R , where R rep-resents the bandwidth overhead of the defense (0 < R < M p and find that itsimpact on the defense performance is insignificant. Thuswe empirically choose a value of 10. By default, we setthe packet flipping rate β = .
2, i.e., at run-time 20% of thedummy packets in each mini-patch will be flipped to a differ-ent direction. The impact of β on possible countermeasures We show the protection results of
Dolos when using standard featureextractors without adversarial patch training in table 9 in the Appendix. is discussed later in §7.1. We use a separate dataset that con-tains traces from 400,000 different websites as our pool oftarget websites [62].Finally, since our patch generation depends on the user-side secret, we repeat each experiment 10 times, each usinga randomly formulated user-side secret (per website). We re-port the average and standard deviation values. Overall, thestandard deviations are consistently low across our experi-ments, i.e., <
1% for protection success rate.
Attack Configuration.
We consider two types of WF at-tacks: 1) non-DNN based and using handcrafted features, i.e., k-NN [74], k-FP [30], and
CUMUL [52], and 2) DNN-basedattacks, i.e., DF [65] and Var-CNN [7]. We follow the origi-nal implementations to implement these attacks.Consistent with prior WF defenses [21, 26, 33, 58, 75], weassume that the attacker trains their classifiers using defendedtraces , i.e., those patched using our defense. To generate andlabel such traces, the attacker downloads
Dolos and runs it ontheir own traces when visiting a variety of websites. Here theattacker must input some user-side secret to run
Dolos , whichwe refer to as S attack . Since the attacker has no knowledge ofthe user-side secret S de f ense being used by Dolos to protectthe current user u , we have S de f ense = S attack . A relevant coun-termeasure by attackers is to enumerate many secrets whentraining the attack classifier, which we discuss later in §7.3. Intersection Attacks.
We also consider intersection at-tacks in our evaluation of
Dolos . Here the attacker assumesthe victim visits the same websites regularly and monitorsthe victim’s traffic for a longer time period (e.g., days). Withthese information, the attacker could make better inferenceson user’s traces. We test
Dolos against the intersection attackused by a previous WF defense [58] and find that the attackis ineffective on
Dolos . More details about our experimentsand results can be found in the Appendix.
Evaluation Metrics.
We test
Dolos against various WFattacks using the testing traces in the two WF datasets (seeTable 1). We evaluate
Dolos using three metrics: 1) pro-tection success rate defined as the WF attack’s misclassifi-cation rate on the defended traces, 2) bandwidth overheadR = patch lengthoriginal trace length , and 3) information leakage , which mea-sures the amount of potential vulnerability of any WF de-fense [17, 42].We also examine the computation cost of Dolos . The av-erage time required to compute a patch is 19 s on an NvidiaTitan X GPU and 43 s on an eight core i9 CPU machine. Dolos vs. DNN-based WF Attacks
Our experiments consider three attack scenarios.•
Matching attack/defense : the attacker and
Dolos operateon the same original trace data X (e.g., Sirinam ) and usethe same model architecture F (e.g., DF). Dolos trains itsfeature extractor Φ using model F and data X , while the8 P r o t e c t i on S u cc e ss R a t e Bandwidth Overhead R (%)VarCNN-RimmerVarCNN-SirinamDF-RimmerDF-Sirinam
Figure 6: When against DNN-based at-tacks,
Dolos ’s protection success rate in-creases rapidly with its bandwidth over-head R . Dolos achieves >
97% protec-tion rate when R reaches 30%. Assumingmatching attack/defense. P r o t e c t i on S u cc e ss R a t e Nth Closest Target LabelVarCNN-RimmerVarCNN-SirinamDF-RimmerDF-Sirinam
Figure 7: Worst-case analysis on the im-pact of key collision on
Dolos , wherethe target feature representations T attack and T de f ense are N th nearest neighbors inthe feature space. Assuming matching at-tack/defense. P r o t e c t i on S u cc e ss R a t e Figure 8:
Dolos ’s effectiveness over timeagainst fresh attacks after deploying apatch to protect u visiting W at day 0.The same patch is able to resist attacksthat train their classifiers on newly gener-ated defended traces in subsequent days. Dataset
Dolos ’sFeature Extractor
Dolos ’s Protection Success Rate (%) against WF Attacksk-NN k-FP CUMUL DF Var-CNN
Sirinam DF
Sirinam . ± . . ± . . ± . . ± . . ± . DF Rimmer . ± . . ± . . ± . . ± . . ± . VarCNN
Sirinam . ± . . ± . . ± . . ± . . ± . VarCNN
Rimmer . ± . . ± . . ± . . ± . . ± . Rimmer DF
Sirinam . ± . . ± . . ± . . ± . . ± . DF Rimmer . ± . . ± . . ± . . ± . . ± . VarCNN
Sirinam . ± . . ± . . ± . . ± . . ± . VarCNN
Rimmer . ± . . ± . . ± . . ± . . ± . Table 3:
Dolos ’s protection success rate against different WF attacks. The bold entries are results under matching attack/defense.attacker trains its attack classifier using model F and the defended version of X .• Mismatching attack/defense : the attacker and
Dolos usedifferent X and/or F when building their attack classifierand feature extractor, respectively.• Defense effectiveness over time : Dolos starts to apply apatch p to protect u ’s visits to W since day 0; the attackerruns fresh WF attacks in the subsequent days by trainingattack classifiers on defended traces freshly generated ineach day.
Scenario 1: Matching Attack/Defense.
Figure 6 plots
Do-los ’s protection success rate against its bandwidth overhead,for each of the four ( F , X ) combinations listed in Table 2. Thestandard deviation values are small ( < Dolos consistently achieves 97% or higher pro-tection rate when the defense overhead R ≥ Dolos is even more effectiveagainst WF attacks whose classifiers are trained on original(undefended) traces, i.e., 98% protection success rate with a15% overhead. For the rest of the paper, we use R =
30% asthe default configuration.
Likelihood of Secret Collisions:
In the above experi- ments, we randomly select the secret pair ( S attack , S de f ense )and show that in general, as long as S attack = S de f ense , Do-los is highly effective against WF attacks. Next, we also runa worst-case analysis that looks at cases where the combina-tion of S attack and S de f ense leads to heavy collision in the WFfeature space. That is, the patches generated by S attack and S de f ense will move the feature representation of the originaltrace to the target feature representations of website T attack and T de f ense , respectively, but the two targets are close in thefeature space (with respect to the ℓ distance). Here we askthe question: how “close” do T attack and T de f ense need to bein order to break our defense ?We answer this question in Figure 7 by plotting Dolos ’sprotection success rate when T attack is the N th nearest label to T de f ense in the feature space ( N ∈ [ , ] ). Here we show theresult for each of the four ( F , X ) combinations. We see thatas long as T attack is beyond the top 20 th nearest neighborsof T de f ense , Dolos can maintain >
96% protection successrate. Since our pool of target websites is very large (400,000websites), the probability of an attacker finding a K attack thatweakens our defense is very low ( p bad ≈ , = × − ).Later in §7, we show that even when the attacker trains theirclassifiers on defended traces produced by a large number ofsecrets, the impact on Dolos is still minimum.
Scenario 2: Mismatching Attack/Defense.
We now con-9 ataset Defense Name BandwidthOverhead Protection Success Rate Against WF Attacksk-NN k-FP CUMUL DF Var-CNN Worst Case
Sirinam
WTF-PAD 54% 87% 56% 69% 10% 11% 10%FRONT 80% 96% 68% 72% 31% 34% 31%Mockingbird 52% 94% 89% 91% 69% 73% 69%
Dolos
30% 98% 99% 97% 96% 95% 95%
Rimmer
WTF-PAD 61% 84% 58% 72% 14% 11% 11%FRONT 72% 97% 62% 68% 37% 39% 37%Mockingbird 57% 96% 87% 90% 71% 79% 71%Blind Adversary 11% - - - - 76% ∗ Dolos
10% 95% 92% 93% 87% 92% 87%
Dolos
30% 99% 98% 98% 97% 98% 97%
Table 4: Comparing bandwidth overhead and protection success rate of WTF-PAD, FRONT, Mockingbird, Blind Adversary,and
Dolos . ∗ we take the number from their original paper [50] as the authors have not released their source code.sider the more general scenario where the attacker and Dolos use different X and/or F to train their classifiers and featureextractors, respectively. Here we consider two existing DNN-based WF attacks, DF and Var-CNN, trained on Sirinam or Rimmer , and configure
Dolos to use one of the four featureextractors listed in Table 2. Using the test data of
Sirinam and
Rimmer , we evaluate
Dolos against these WF attacks (DFand Var-CNN), and list
Dolos ’s protection success rate in Ta-ble 3. As reference, we also include the results of matchingattack/defense (scenario 1), marked in bold.Overall,
Dolos remains highly effective ( >
94% protec-tion rate) against attacks using different training data and/ormodel architecture. This shows that our proposed adversarialpatches generated from a local feature extractor can success-fully transfer to a variety of attack classifiers.
Scenario 3: Defense Effectiveness Over Time.
Next, weevaluate
Dolos against freshly generated attacks over time.Here
Dolos computes and deploys a patch p to protect u ’svisits to W at day 0; the attacker continues to run WF attacksin the subsequent days, and each day, they train the attackclassifier using defended traces generated on the current day.We use this experiment to examine the robustness of Dolos ’spatches under web content dynamics and network dynamics,also referred to as concept drift.Our experiment uses the concept drift dataset provided byRimmer et al. [62], which was collected along with
Rimmer .This new dataset consists of 200 websites (a subset of 900websites in
Rimmer ), repeatedly collected over a six week pe-riod (0 day, 3 days, 10 days, 14 days, 28 days, 42 days). Werun
Dolos using each of the four feature extractors to producea patch at day 0. The attacker uses the Var-CNN classifier andtrains the classifier using fresh defended traces generated onday 3, 10, 14, 28 and 42. We see that the protection successrate remains consistent ( > Dolos vs. non-DNN Attacks
While
Dolos targets the inherent vulnerability of DNN-basedWF attacks, we show that the adversarial patches produced by
Dolos are also highly effective against non-DNN basedWF attacks. Here we consider three most-effective non-DNNattacks: k-NN [74], k-FP [30], and CUMUL [52]. Table 2lists the protection success rate of
Dolos under these attacks,using each of the four different feature extractors. Our resultsalign with existing results: adversarial patches designed forDNN models also transfer to non-DNN models [14, 22].
Table 4 lists the performance of
Dolos and four state-of-the-art defenses (WTF-PAD [36], FRONT [26], Mocking-bird [58], Blind Adversary [50] as described in §2). We eval-uate them against five attacks on the two WF datasets. For
Dolos , we use
VarCNN
Rimmer as the local feature extractor.
WTF-PAD.
WTF-PAD is reasonably effective against tra-ditional ML attacks, but performs poorly against any DNNbased attack, i.e., protection success rate drops to 10%. Thesefindings align with existing observations [36, 65].
FRONT.
FRONT is effective against non-DNN attacks butfails against DNN attacks. In our experiments, FRONT in-duces larger bandwidth overhead than reported in the originalpaper [26]. The discrepancy is because FRONT’s overhead isdataset dependent and a separate paper [33] reports the sameoverhead as ours when applying FRONT to
Sirinam . Mockingbird.
Mockingbird is effective against non-DNNattacks and reasonably effective against DNN attacks (71%protection success). As stated earlier, the biggest drawbackof Mockingbird is that the defense needs the access to thefull trace beforehand, making it unrealistic to implement inthe real-world.
Blind Adversary.
We are not able to obtain the sourcecode of Blind adversary at the time of writing. In the origi-nal paper, Blind Adversary is evaluated on the same dataset(
Rimmer ) using same robust training technique. However, theauthors report the results for 11% overhead under the white-box setting. Using the same setting and similar overhead,
Do-los achieves 92% protection success rate whereas Blind Ad-10ersary achieves 76% protection.
Recent works [17,42] argue that WF defenses need to be eval-uated beyond protection success rate because existing WF at-tacks may not show the hidden vulnerability of the proposeddefense. Li et al. [42] proposes an information leakage esti-mation framework (WeFDE) that measures a given defense’sinformation leakage on a set of features. We measure
Dolos information leakage following the WeFDE framework.WeFDE computes the mutual information of trace featuresbetween different classes. A feature has a high informationleakage if it distinguishes traces between different websites.However, we cannot directly apply such analyzer to measurethe leakage of
Dolos because
Dolos does not seek to maketraces from different websites indistinguishable from eachother. Attacker can separate the traces defended by
Dolos buthas no way of knowing which websites the traces belong to.Thus to measure the information leakage of
Dolos , weneed to obtain the overall distribution of defended traces ag-nostic to secrets, i.e., defended traces using all possible se-crets. We approximate this distribution using traces gener-ated with a large number of secrets and feed the aggregatedtraces to WeFDE for information leakage analysis.We use all websites from
Rimmer . For each website, weuse DF Sirinam and 80 different secrets to generate defendedtraces. We find that enumerating more than 80 secrets haslimited impact on the information leakage results. We ag-gregate the traces together before feeding into WeFDE. Wecompare
Dolos with three previous state-of-the-art defenses,WTF-PAD, FRONT, and MockingBird. We measure the leak-age (bits) on two sets of features: 1) hand crafted featuresfrom the WeDFE paper [42], and 2) DNN features from amodel trained on the defended traces of a given defense.
Leakage on Hand Crafted Features.
We first measurethe information leakage on the default set of 3043 handcrafted features from WeFDE. This feature set covers variouscategories, e.g., packet ngrams, counts, and bursts. Figure 9shows the empirical cumulative distribution function (ECDF)of information leakage across the features. The curve of
Do-los increases much faster than the curves of other defenses.For
Dolos , no feature leaks more than 1 . Leakage on Features Trained on Defended Traces.
Foreach defense, we measure the information leakage on the fea-ture space of Var-CNN models trained on the defended traces.For
Dolos , the defended traces used by the attacker are gener-ated by a different secret. Figure 10 shows the ECDF of theleakage across all the features. Overall, this feature set leaksmore information than the hand crafted features in Figure 9.Again the curve of
Dolos increases faster than the curves ofother defenses. The gap between
Dolos and other defenses is larger in this set of features showing
Dolos is more resilientagainst models trained on defended traces due to randomnessintroduced by the secret.
In this section, we explore additional countermeasures thatcould be launched by attackers with complete knowledgeof
Dolos . We consider three classes of countermeasures: de-tecting
Dolos patches, preprocessing inputs to disrupt
Dolos patches, and boosting WF classifier robustness. Unless oth-erwise specified, experiments in this section run VarCNN-based attacks on the
Rimmer dataset, and the defender usesthe DF Sirinam feature extractor to generate patches (see §6).
Dolos
Patches
An attacker can apply data analysis techniques to detect thepresence of patches in network traces. Detection can lead topossible identification of patches and their removal.
Frequency Analysis.
An attacker who observes multiplevisits to the same website by the same user might identifydefender’s patch sequences using frequency analysis, if thesame patch is applied to multiple traces over a period of time.In practice, this frequency analysis might be challenging be-cause: the location of the patch is randomized,
Dolos ran-domly flips a subset of the dummy packets each time thepatch is applied, and packet sequences in patches might blendin naturally with unaltered network traces.We test the feasibility of this countermeasure. We assumethe attacker has gathered network traces from 100 separatevisits by the same user to a single website. For each trace,the attacker enumerates all packet sequences with the lengthof the mini-patch (known to attacker). To address the ran-dom flipping, the attacker merges packet sequences that havea Hamming distance smaller than the flipping ratio (knownto the attacker). This produces a set of packet sequences foreach trace. The sequence of each mini-patch should appearin every set.As the flip ratio increases, however, packet sequences fromthe patch start to blend in with common packet sequencesfound frequently in benign (unpatched) network traces. Patchsequences can thus blend in with normal sequences, makingtheir identification and removal difficult.For example, for each website in
Rimmer , we take 100 orig-inal network traces and perform frequency analysis. With aflip ratio of 0 .
2, an website has on average 45% of its packetsequences showing up in every set as false positives that looklike potential patches. An aggressive attacker can remove all such high frequency packet sequences before classifier in-ference. Removing these high frequency packet sequencesmeans the attacker’s classifier accuracy is reduced down to7%. We perform this test using different values of flip ratio β and show the results in Table 8. success rate against a normal11 E CD F Information Leakage (bit)
UndefendedWTF-PADMockingbirdFRONTDolos
Figure 9: The ECDF of informationleakage on hand crafted features fromWeFDE. E CD F Information Leakage (bit)
UndefendedWTF-PADMockingbirdFRONTDolos
Figure 10: The ECDF of informationleakage on features from models trainedon defended traces. P r o t e c t i on S u cc e ss R a t e Figure 11: Protection performance dropsslightly as attacker trains on traces de-fended by increasing numbers of secrets.
Ratio ofPackets Protection Success Rate WhenDrop Packets Flip Packets
Table 5: Protection success rate remainshigh as attacker drops or flips a portionof packets before classification.
Ratio of PacketsTrimmed Protection SuccessRate
Table 6: Protection success rate remainshigh as attacker trim a portion of packetsat the end of each trace before classifica-tion.
20 96%40 95%100 94%200 95%
Table 7: Protection success rate remainshigh when transferring to increasingmore robust models.
Flip Ratio Protection Success Success ofVar-CNN Attack Frequency Analysis
Table 8: Impact of flip ratio on frequency analysis attack.Var-CNN attack and frequency analysis attack. When the flipratio is ≥ .
2, frequency analysis countermeasure offer nobenefit against our defense.
Anomaly Detection.
We also consider attackers that us-ing traditional anomaly detection techniques to distinguishpatches from normal packet sequences.We compute 80 distinct patches using DF Sirinam for each ofthe 900 websites in
Rimmer , for a total of 72,000 patches. Wecompare these patches to 72,000 natural packet sequences ofthe same length randomly chosen from original traces (withrandom offsets).
First , we run a 2-means clustering on thefeatures space of sequences (features extracted by DF Sirinam ).The resulting clusters contain 47% and 53% patches respec-tively and fail to distinguish patches from normal sequences.
Second , we also try to separate patches using supervisedtraining. We train a DF classifier on packet sequences ofpatches and natural packets. The classifier achieves 58% ac-curacy, only slightly outperforming random guessing.
A simple but effective approach to defeat adversarial patchis to transform the input before using them for both trainingand inferencing [12, 25]. In the case of WF attacks, we con-sider 3 possible transformations: i) add “noise” by randomlyflipping packets, ii) add “noise” by randomly dropping pack-ets, iii) truncating the network traces after the first N pack-ets. Note that the attacker is processing the traces locally anddoes not modify any packets in network.Our tests show that none of these transformations im-pact our defense in any meaningful way. Flipping randompackets in the trace degrades the classification accuracy ofthe attacker classifier by 22%, but Dolos remains successful >
96% (Table 5). Next, Table 5 shows randomly droppingpackets from the trace decreases protection success rate byat most 2%, but more significantly, degrades attacker’s clas-sification accuracy (by 32%). Finally, truncating the trace de-grades the attacker’s classification accuracy but
Dolos pro-tection success remains >
94% (Table 6).
Next, we evaluate the feasibility of techniques to improve therobustness of attacker classifiers against adversarial patches.
Training on Multiple Patches.
In §6, the attacker trainedtheir classifier on traces protected by patch generated using asingle secret, and failed to achieve high attack performance.Here, we consider a more general adversarial training ap-proach, that trains the attacker’s model against patched traces12enerated from multiple distinct secrets. Training againstmultiple targets gets the model closer to a more completecoverage of the space of potential adversarial patches.The attacker uses
Dolos source code (with DF Sirinam fea-ture extractor) to generate defended traces using N randomlyselected keys for each website in Rimmer and trains a Var-CNN classifier. On the defender side, the traces is protectedusing DF Sirinam feature extractor but a different key . Fig-ure 11 shows that as the attacker trains against more patchedtraces generated from different secrets, there is a small gainin robustness by the attacker’s model. At its lowest point, theefficacy of Dolos patches drops to 87%. Across all of ourcountermeasures tested, this is the most effective.
Robust Attack Classifier.
In adversarial patch training, amodel is iteratively retrained on adversarial patches gener-ated by the model. This technique is similar to but differentfrom training on defended traces, which are generated by thedefender’s model. In adversarial patch training, the classifieris iteratively trained on adversarial patches generated on theattacker model such that the classifier is robust to any type ofadversarial patches.We evaluate
Dolos against increasingly more robust clas-sifiers on the attacker side. Model robustness directly cor-relates with the number of epochs trained using adversarialtraining [41, 45, 71, 77]. In our experiment, the defender usesthe DF Sirinam feature extractor (adversarially trained for 20epochs) to generate patches. We test the defense against at-tack classifier (Var-CNN) with varying robustness (adversar-ially trained from 20 to 200 epochs). Figure 7 shows the pro-tection success rate remains ≤
94% against all the robust clas-sifiers and does not trend downwards as the model becomesmore robust. This shows that generic adversarial training isless effective than training on defended traces, likely becausethe latter is more targeted towards specific types of perturba-tions generated by the defender.
Training Orthogonal Classifiers.
Another countermea-sure by the attacker can be to explicitly avoid the featuresused by the defender to generate patches, and to find otherfeatures for their trace classification. If successful, it wouldproduce a classifier that is largely resistant to the patch. Oneapproach is to build an attack classifier that has orthogonalfeature space as the defender’s feature extractor. The attackeradds an additional loss term to model training to minimizethe neuron cosine similarity at an intermediate layer of themodel. We train such a classifier using
Rimmer and Var-CNNmodel architecture. In our tests, the classifier only achieves8% normal classification accuracy after 20 epochs of train-ing. This likely shows that there are not enough alternative,orthogonal features that can accurately identify destinationwebsites from network traces.
Other Countermeasures Against Adversarial Patches. We do not consider the case where the user and attacker’s secrets match,since its probability is extremely small (i.e., ≤ / K in cases we tested). There are other defenses against adversarial patches exploredin the computer vision domain. However, most of them arelimited to small input perturbations (less than 5% of the in-put) [2, 19, 81]. Others only work on contiguous patches [29,47,49]. To the best of our knowledge, there exist no effectivedefense against larger patches (30% of input) or nonconsecu-tive adversarial patches induced by
Dolos . While it is alwayspossible the community will develop more effective defensesagainst larger adversarial patches, there exists a proven lowerbound on adversarial robustness that increases as the size ofperturbation increases [6]. Thus, it is difficult to be robustagainst large input perturbations without sacrificing classifi-cation accuracy.
The primary contribution of
Dolos is an effective defenseagainst website fingerprinting attacks (both traditional MLbased and DNN-based) that can run in real time to protectusers. Our work is the first to apply the concept of adversar-ial patches to WF defenses.However, there are questions we have yet to study in de-tail. First, while most recent defenses and attacks focus on adirection-only threat model and ignore information leakagethrough time gaps [21, 31, 34, 58, 65, 75], some recent WF at-tacks [7,59] also utilize time gaps between packets to classifywebsites. We believe
Dolos can be extended to also defendagainst attacks utilizing time-gaps, and plan on addressingthis task in ongoing work. Second, we have not yet studied
Dolos deployed in the wild. Real measurements and tests inthe wild may reveal additional considerations, leading to ad-ditional fine tuning of our system design.
References [1] A BE , K., AND G OTO , S. Fingerprinting attack on toranonymity using deep learning.
APAN 42 (2016), 15–20.[2] A
KHTAR , N., L IU , J., AND M IAN , A. Defense against uni-versal adversarial perturbations. In
Proc. of CVPR (2018),pp. 3389–3398.[3] , 2017.[4] B
AGDASARYAN , E.,
AND S HMATIKOV , V. Blind backdoorsin deep learning models. arXiv preprint arXiv:2005.03823 (2020).[5] B
ARKER , E., B
ARKER , E., B
URR , W., P
OLK , W., S
MID , M.,
ET AL . Recommendation for key management: Part 1: Gen-eral . NIST, 2006.[6] B
HAGOJI , A. N., C
ULLINA , D.,
AND M ITTAL , P. Lowerbounds on adversarial robustness from optimal transport. In
Proc. of NeurIPS (2019), pp. 7498–7510.[7] B
HAT , S., L U , D., K WON , A.,
AND D EVADAS , S. Var-cnn:A data-efficient website fingerprinting attack based on deeplearning.
PoPETS 2019 , 4 (2019), 292–310.
8] B
ROWN , T. B., M
ANÉ , D., R OY , A., A BADI , M.,
AND G ILMER , J. Adversarial patch. arXiv preprintarXiv:1712.09665 (2017).[9] C AI , X., N ITHYANAND , R.,
AND J OHNSON , R. Cs-buflo: Acongestion sensitive website fingerprinting defense. In
Proc.of WPES (2014), pp. 121–130.[10] C AI , X., N ITHYANAND , R., W
ANG , T., J
OHNSON , R.,
AND G OLDBERG , I. A systematic approach to developing andevaluating website fingerprinting defenses. In
Proc. of CCS (2014), pp. 227–238.[11] C AI , X., Z HANG , X. C., J
OSHI , B.,
AND J OHNSON , R.Touching from a distance: Website fingerprinting attacks anddefenses. In
Proc. of CCS (2012), pp. 605–616.[12] C
ARLINI , N.,
AND W AGNER , D. Adversarial examples arenot easily detected: Bypassing ten detection methods. In
Proc.of AISec (2017).[13] C
ARLINI , N.,
AND W AGNER , D. Towards evaluating the ro-bustness of neural networks. In
Proc. of IEEE S&P (2017).[14] C
HARLES , Z., R
OSENBERG , H.,
AND P APAILIOPOULOS , D.A geometric perspective on the transferability of adversarialdirections. In
Proc. of AISTAT (2019), PMLR, pp. 1960–1968.[15] C
HEN , P.-Y., S
HARMA , Y., Z
HANG , H., Y I , J., AND H SIEH ,C.-J. Ead: elastic-net attacks to deep neural networks via ad-versarial examples. In
Proc. of AAAI (2018).[16] C
HEN , S.-T., C
ORNELIUS , C., M
ARTIN , J.,
AND C HAU , D.H. P. Shapeshifter: Robust physical adversarial attack onfaster r-cnn object detector. In
Proc. of ECML PKDD (2018),Springer, pp. 52–68.[17] C
HERUBIN , G. Bayes, not naïve: Security bounds on websitefingerprinting defenses.
PoPETS 2017 , 4 (2017), 215–231.[18] C
HERUBIN , G., H
AYES , J.,
AND J UAREZ , M. Website fin-gerprinting defenses at the application layer.
PoPETS 2017 , 2(2017), 186–203.[19] C
HIANG , P.- Y ., N I , R., A BDELKADER , A., Z HU , C., S TU - DOR , C.,
AND G OLDSTEIN , T. Certified defenses for adver-sarial patches. arXiv preprint arXiv:2003.06693 (2020).[20] D
ANEZIS , G. Statistical disclosure attacks. In
Proc. of IFIPSEC (2003), Springer, pp. 421–426.[21] D
E LA C ADENA , W., M
ITSEVA , A., H
ILLER , J., P EN - NEKAMP , J., R
EUTER , S., F
ILTER , J., E
NGEL , T., W
EHRLE ,K.,
AND P ANCHENKO , A. Trafficsliver: Fighting websitefingerprinting attacks with traffic splitting. In
Proc. of CCS (2020), pp. 1971–1985.[22] D
EMONTIS , A., M
ELIS , M., P
INTOR , M., J
AGIELSKI , M.,B
IGGIO , B., O
PREA , A., N
ITA -R OTARU , C.,
AND R OLI , F.Why do adversarial attacks transfer? explaining transferabilityof evasion and poisoning attacks. In
Proc. of USENIX Security (2019), pp. 321–338.[23] D
YER , K. P., C
OULL , S. E., R
ISTENPART , T.,
AND S HRIMP - TON , T. Peek-a-boo, i still see you: Why efficient traffic analy-sis countermeasures fail. In
Proc. of IEEE S&P (2012), IEEE,pp. 332–346.[24] E
BRAHIMI , J., R AO , A., L OWD , D.,
AND D OU , D. Hotflip:White-box adversarial examples for text classification. arXivpreprint arXiv:1712.06751 (2017). [25] F EINMAN , R., C
URTIN , R. R., S
HINTRE , S.,
AND G ARD - NER , A. B. Detecting adversarial samples from artifacts. arXiv:1703.00410 (2017).[26] G
ONG , J.,
AND W ANG , T. Zero-delay lightweight defensesagainst website fingerprinting. In
Proc. of USENIX Security (2020), pp. 717–734.[27] G
OODFELLOW , I. J., S
HLENS , J.,
AND S ZEGEDY , C. Ex-plaining and harnessing adversarial examples. arXiv preprintarXiv:1412.6572 (2014).[28] G
ROSSE , K., P
APERNOT , N., M
ANOHARAN , P., B
ACKES ,M.,
AND M C D ANIEL , P. Adversarial examples for malwaredetection. In
Proc. of ESORICS (2017), Springer, pp. 62–79.[29] H
AYES , J. On visible adversarial perturbations & digital wa-termarking. In
Proc. of CVPR (2018), pp. 1597–1604.[30] H
AYES , J.,
AND D ANEZIS , G. k-fingerprinting: A robust scal-able website fingerprinting technique. In
Proc. of USENIXSecurity (2016), pp. 1187–1203.[31] H
ENRI , S., G
ARCIA -A VILES , G., S
ERRANO , P., B
ANCHS ,A.,
AND T HIRAN , P. Protecting against website fingerprint-ing with multihoming.
PoPETS 2020 , 2 (2020), 89–110.[32] H
ERRMANN , D., W
ENDOLSKY , R.,
AND F EDERRATH , H.Website fingerprinting: attacking popular privacy enhancingtechnologies with the multinomial naïve-bayes classifier. In
Proc. of CCSW (2009), pp. 31–42.[33] H
OLLAND , J. K.,
AND H OPPER , N. Regulator: Apowerful website fingerprinting defense. arXiv preprintarXiv:2012.06609 (2020).[34] H OU , C., G OU , G., S HI , J., F U , P., AND X IONG , G. Wf-gan: Fighting back against website fingerprinting attack usingadversarial learning. In
Proc. of ISCC (2020), IEEE, pp. 1–7.[35] I
LYAS , A., S
ANTURKAR , S., T
SIPRAS , D., E
NGSTROM , L.,T
RAN , B.,
AND M ADRY , A. Adversarial examples are notbugs, they are features. In
Proc. of NeurIPS (2019).[36] J
UAREZ , M., I
MANI , M., P
ERRY , M., D
IAZ , C.,
AND W RIGHT , M. Toward an efficient website fingerprinting de-fense. In
Proc. of ESORICS (2016), Springer, pp. 27–46.[37] K
ARMON , D., Z
ORAN , D.,
AND G OLDBERG , Y. Lavan: Lo-calized and visible adversarial noise. In
Proc. of ICML (2018),PMLR, pp. 2507–2515.[38] K
OLOSNJAJI , B., D
EMONTIS , A., B
IGGIO , B., M
AIORCA ,D., G
IACINTO , G., E
CKERT , C.,
AND R OLI , F. Adversar-ial malware binaries: Evading deep learning for malware de-tection in executables. In
Proc. of EUSIPCO (2018), IEEE,pp. 533–537.[39] K
URAKIN , A., G
OODFELLOW , I.,
AND B ENGIO , S. Ad-versarial examples in the physical world. arXiv preprintarXiv:1607.02533 (2016).[40] L EE , J. A first course in combinatorial optimization , vol. 36.Cambridge University Press, 2004.[41] L I , B., W ANG , S., J
ANA , S.,
AND C ARIN , L. Towardsunderstanding fast adversarial training. arXiv preprintarXiv:2006.03089 (2020).
42] L I , S., G UO , H., AND H OPPER , N. Measuring informationleakage in website fingerprinting attacks and defenses. In
Proc. of CCS (2018), pp. 1977–1992.[43] L
IBERATORE , M.,
AND L EVINE , B. N. Inferring the sourceof encrypted http connections. In
Proc. of CCS (2006),pp. 255–263.[44] L U , L., C HANG , E.-C.,
AND C HAN , M. C. Website finger-printing and identification using ordered feature sequences. In
Proc. of ESORICS (2010), Springer, pp. 199–214.[45] M
ADRY , A., M
AKELOV , A., S
CHMIDT , L., T
SIPRAS , D.,
AND V LADU , A. Towards deep learning models resistant toadversarial attacks. In
Proc. of ICLR (2018).[46] M
ALLESH , N.,
AND W RIGHT , M. An analysis of the statisti-cal disclosure attack and receiver-bound cover.
Computers &Security 30 , 8 (2011), 597–612.[47] M C C OYD , M., P
ARK , W., C
HEN , S., S
HAH , N.,R
OGGENKEMPER , R., H
WANG , M., L IU , J. X., AND W AGNER , D. Minority reports defense: Defending againstadversarial patches. arXiv preprint arXiv:2004.13799 (2020).[48] M
OOSAVI -D EZFOOLI , S.-M., F
AWZI , A., F
AWZI , O.,
AND F ROSSARD , P. Universal adversarial perturbations. In
Proc.of CVPR (2017), pp. 1765–1773.[49] N
ASEER , M., K
HAN , S.,
AND P ORIKLI , F. Local gradientssmoothing: Defense against localized adversarial attacks. In
Proc. of WACV (2019), pp. 1300–1307.[50] N
ASR , M., B
AHRAMALI , A.,
AND H OUMANSADR , A.Blind adversarial network perturbations. arXiv preprintarXiv:2002.06495 (2020).[51] N
ITHYANAND , R., C AI , X., AND J OHNSON , R. Glove: Abespoke website fingerprinting defense. In
Proc. of WPES (2014), pp. 131–134.[52] P
ANCHENKO , A., L
ANZE , F., P
ENNEKAMP , J., E
NGEL , T.,Z
INNEN , A., H
ENZE , M.,
AND W EHRLE , K. Website finger-printing at internet scale. In
Proc. of NDSS (2016).[53] P
ANCHENKO , A., N
IESSEN , L., Z
INNEN , A.,
AND E NGEL ,T. Website fingerprinting in onion routing based anonymiza-tion networks. In
Proc. of WPES (2011), pp. 103–114.[54] P
APERNOT , N., M C D ANIEL , P., S
WAMI , A.,
AND H ARANG ,R. Crafting adversarial input sequences for recurrent neuralnetworks. In
Proc. of MILCOM (2016), IEEE, pp. 49–54.[55] P
ERRY , M. Tor protocol specification proposal, 2015. https://gitweb.torproject.org/torspec.git/tree/proposals/254-padding-negotiation.txt .[56] P
ETROV , D.,
AND H OSPEDALES , T. M. Measuring thetransferability of adversarial examples. arXiv preprintarXiv:1907.06291 (2019).[57] P
YDI , M. S.,
AND J OG , V. Adversarial risk via optimaltransport and optimal couplings. In Proc. of ICML (2020),pp. 7814–7823.[58] R
AHMAN , M. S., I
MANI , M., M
ATHEWS , N.,
AND W RIGHT ,M. Mockingbird: Defending against deep-learning-basedwebsite fingerprinting attacks with adversarial traces.
TIFS (2020), 1594–1609. [59] R
AHMAN , M. S., S
IRINAM , P., M
ATHEWS , N., G
ANGAD - HARA , K. G.,
AND W RIGHT , M. Tik-tok: The utility ofpacket timing in website fingerprinting attacks.
PoPETS 2020 ,3 (2020), 5–24.[60] R AO , S., S TUTZ , D.,
AND S CHIELE , B. Adversarial trainingagainst location-optimized adversarial patches. arXiv preprintarXiv:2005.02313 (2020).[61] R EN , S., D ENG , Y., H E , K., AND C HE , W. Generatingnatural language adversarial examples through probabilityweighted word saliency. In Proc. of ACL (2019), pp. 1085–1097.[62] R
IMMER , V., P
REUVENEERS , D., J
UAREZ , M.,V AN G OETHEM , T.,
AND J OOSEN , W. Automated websitefingerprinting through deep learning. In
Proc. of NDSS (2018).[63] S
HAFAHI , A., H
UANG , W. R., S
TUDER , C., F
EIZI , S.,
AND G OLDSTEIN , T. Are adversarial examples inevitable? In
ICLR (2019).[64] S
HALEV -S HWARTZ , S.,
AND B EN -D AVID , S.
Understand-ing machine learning: From theory to algorithms . Cambridgeuniversity press, 2014.[65] S
IRINAM , P., I
MANI , M., J
UAREZ , M.,
AND W RIGHT , M.Deep fingerprinting: Undermining website fingerprinting de-fenses with deep learning. In
Proc. of CCS (2018), pp. 1928–1943.[66] S
IRINAM , P., M
ATHEWS , N., R
AHMAN , M. S.,
AND W RIGHT , M. Triplet fingerprinting: More practical andportable website fingerprinting with n-shot learning. In
Proc.of CCS (2019), pp. 1131–1148.[67] S
ONG , D., E
YKHOLT , K., E
VTIMOV , I., F
ERNANDES , E.,L I , B., R AHMATI , A., T
RAMER , F., P
RAKASH , A.,
AND K OHNO , T. Physical adversarial examples for object detec-tors. In
Proc. of WOOT (2018).[68] S
UCIU , O., C
OULL , S. E.,
AND J OHNS , J. Exploring adver-sarial examples in malware detection. In
Proc. of SPW (2019),IEEE, pp. 8–14.[69] S
UCIU , O., M ˘ARGINEAN , R., K
AYA , Y., D
AUMÉ
III, H.,
AND D UMITRA ¸S , T. When does machine learning fail? gen-eralized transferability for evasion and poisoning attacks. In
Proc. of USENIX Security (2018).[70] S UN , Q., S IMON , D. R., W
ANG , Y.-M., R
USSELL , W., P AD - MANABHAN , V. N.,
AND Q IU , L. Statistical identification ofencrypted web browsing traffic. In Proc. of IEEE S&P (2002),IEEE, pp. 19–30.[71] T
RAMER , F.,
AND B ONEH , D. Adversarial training and ro-bustness for multiple perturbations. In
Proc. of NeurIPS (2019), pp. 5866–5876.[72] U
ESATO , J., O’D
ONOGHUE , B., O
ORD , A. V . D ., AND K OHLI , P. Adversarial risk and the dangers of evaluat-ing against weak attacks. arXiv preprint arXiv:1802.05666 (2018).[73] W
ALLACE , E., F
ENG , S., K
ANDPAL , N., G
ARDNER , M.,
AND S INGH , S. Universal adversarial triggers for attackingand analyzing nlp. arXiv preprint arXiv:1908.07125 (2019).
74] W
ANG , T., C AI , X., N ITHYANAND , R., J
OHNSON , R.,
AND G OLDBERG , I. Effective attacks and provable defenses forwebsite fingerprinting. In
Proc. of USENIX Security (2014),pp. 143–157.[75] W
ANG , T.,
AND G OLDBERG , I. Walkie-talkie: An efficientdefense against passive website fingerprinting attacks. In
Proc.of USENIX Security (2017), pp. 1375–1390.[76] W
ANG , X., J IN , H., AND H E , K. Natural language adver-sarial attacks and defenses in word level. arXiv preprintarXiv:1909.06723 (2019).[77] W ONG , E., R
ICE , L.,
AND K OLTER , J. Z. Fast is better thanfree: Revisiting adversarial training. In
Proc. of ICLR (2020).[78] W
RIGHT , M. K., A
DLER , M., L
EVINE , B. N.,
AND S HIELDS , C. The predecessor attack: An analysis of a threatto anonymous communications systems.
TISSEC 7 , 4 (2004),489–522. [79] W
RIGHT , M. K., A
DLER , M., L
EVINE , B. N.,
AND S HIELDS , C. Passive-logging attacks against anonymouscommunications systems.
TISSEC 11 , 2 (2008), 1–34.[80] W U , Z., L IM , S.-N., D AVIS , L.,
AND G OLDSTEIN , T. Mak-ing an invisibility cloak: Real world adversarial attacks on ob-ject detectors. arXiv preprint arXiv:1910.14667 (2019).[81] X
IANG , C., B
HAGOJI , A. N., S
EHWAG , V.,
AND M ITTAL ,P. Patchguard: Provable defense against adversarial patchesusing masks on small receptive fields. arXiv preprintarXiv:2005.10884 (2020).[82] Y
OSINSKI , J., C
LUNE , J., B
ENGIO , Y.,
AND L IPSON , H.How transferable are features in deep neural networks? In
Proc. of NeurIPS (2014).[83] Z
HANG , W. E., S
HENG , Q. Z., A
LHAZMI , A.,
AND L I , C.Adversarial attacks on deep-learning models in natural lan-guage processing: A survey. TIST 11 , 3 (2020), 1–41. Effectiveness against Intersection Attacks
Intersection attacks are popular attacks against anonymitysystems [20, 46, 78, 79]. In the context of website fingerprint-ing, an intersection attacker assumes the victim visits thesame websites regularly and monitors the victim’s networktraffic for a longer time period (e.g., every day over multi-ple days). Using the additional information, the attacker isable to make better inferences on user’s traces. We test
Do-los against an intersection attack intersection attack used bya previous defense [58].For each of the victim’s browsing traces, the attacker logsthe top-k results of the attack classifier (top-k is the k out-put websites that have the highest probabilities that the tracebelongs to). If a website consistently appear in the top-k re-sults, then the attacker may believe that this site is in fact thewebsite that the user is visiting. We choose the same attacksetup as [58]. We assume attacker observes 5 separate visitsto the same website. Then the attacker saves the top-10 la-bels predicted by the classifier (using Var-CNN attack). Theattack is successful if the selected label is the most frequentlyappeared label within joint list of 50 labels (5 sets of 10 top-10 labels). We test on 20 randomly selected websites from Rimmer . For all cases, the frequency of the correct websiteis far from the most frequent one (in the best case it ranksnumber 4 out of 41 websites) and the correct website neverappears in all 5 rounds. Thus, we conclude that intersectionattacks is not effective against
Dolos . B Theoretical Justification of Defense
We show that
Dolos provides provable robustness guaranteeswhen both the attacker and defender use the same fixed fea-ture extractor Φ .We use recent theoretical results [6,57] on learning 2-classclassifiers in the presence of adversarial examples which hasshown that as the strength of the attacker (defender in ourcase) increases, the 0 − any classifier islower bounded by a transportation cost between the condi-tional distributions of the two classes. In the case of imageclassification, which is the example considered in previouswork, the budget is typically too small to observe interestingbehavior in terms of this lower bound.However, since we consider network traffic traces to whichmuch larger amounts of perturbation can be added, we en-counter non-trivial regimes of this bound. This implies thatwith a sufficiently large bandwidth, no classifier used by theattacker will be able to distinguish between traces from thesource and target classes. In order to demonstrate this, wemake the following assumptions:1. The attacker is attempting to distinguish if a trace x be-longs to the original class W or the target class T , withdistributions P W and P T respectively, both of which may be defended (i.e. adversarially perturbed).2. The attacker uses a classifier function F acting over thefeature space Φ ( X ) , where F can be any measurablefunction and Φ : X → R k is any fixed feature extractor.The resulting end-to-end classifer is represented by thetuple ( Φ , F ) .3. The defender has an ℓ norm perturbation budget of ε Φ in the feature space, matching the choice of Dist ( · , · ) inEq. 1. The feature space budget is related to the inputspace bandwidth overhead R by a mapping function M that maps balls of radius R in the input space to balls ofradius at least ε Φ in the feature space.Given these assumptions, we can now state the followingtheorem, adapted from Bhagoji et al. [6]: Theorem 1 (Upper bound on attacker success) . For any pairof classes W and T with conditional probability distributionsP W and P T , and the joint distribution P, over X , and with afixed feature extractor Φ , the attack success rate of any clas-sifier F isASR (( Φ , F ) , P , R ) ≤ ( + C ε Φ ( Φ ( P W ) , Φ ( P T ))) . (3) Proof.
The end-to-end classifier has a fixed feature extractor Φ and a classifier function F that can be optimized over. Todetermine an upper bound on the attack success rate achiev-able by a classifier F, we have ASR (( F , Φ ) , P , R ) = E x ∼ P (cid:20) min | ˜ x − x |≤ R (( F , Φ )( ˜ x ) = y ) (cid:21) , ≤ E Φ ( x ) ∼ φ ( P ) (cid:20) min || Φ ( ˜ x ) − Φ ( x ) || ≤ ε Φ ( F ( Φ ( ˜ x )) = y ) (cid:21) , = ASR ( F , Φ ( P ) , ε Φ ) The ≤ arises from a conservative estimate of the distancemoved in feature space. Having transformed the attack suc-cess rate calculation to one over the feature space, we cannow directly apply Theorem 1 from [6], which givesmax F ASR ( F , Φ ( P ) , ε Φ ) = ( + C ε Φ ( Φ ( P W ) , Φ ( P T ))) . From [6], C ( Φ ( P W ) , Φ ( P T ))= inf P WT ∈ Π ( P W , P T ) E ( x W , x T ) ∼ P WT [ ( | Φ ( x W ) − Φ ( x T ) | ≥ ε Φ )] , (4)where Π ( P W , P T ) is the set of joint distributions over X W × X T with marginals P W and P T .17 ataset Defender’sFeature Extractor Protection Success Rate Against WF Attacksk-NN k-FP CUMUL DF Var-CNN Sirinam DF
Sirinam
96% 97% 93% 95% 91% DF Rimmer
96% 94% 97% 92% 93%
VarCNN
Sirinam
94% 92% 95% 93% 96%
VarCNN
Rimmer
97% 96% 95% 95% 94%
Rimmer DF
Sirinam
94% 92% 95% 93% 92% DF Rimmer
95% 94% 96% 97% 91%
VarCNN
Sirinam
94% 97% 98% 94% 92%
VarCNN
Rimmer
96% 95% 96% 95% 97%
Table 9: Protection performance of
Dolos using non-robust feature extractor against different WF attacks when transfering toclassifiers trained on different datasets and/or architecture. T w o - c l a ss a tt a ck s u cc e ss r a t e Feature Space L2 Distance ( ε Φ ) Figure 12: Upper bound on the effective-ness of any attack classifier for a fixedfeature extractor Φ , averaged over 500choices of source-pairs. F ea t u r e S pa c e L2 D i s t an c e ( ε Φ ) Bandwidth Overhead Ratio
Figure 13: Variation in the distancemoved in feature space with the in-put bandwidth overhead, averaged over a100 different targets for a fixed source. F ea t u r e S pa c e L2 D i s t an c e ( ε Φ ) Bandwidth Overhead Ratio
Figure 14: Variation in the distancemoved in feature space with the inputbandwidth variation, averaged over a sin-gle target for 20 different sources.The main takeaway from the above theorem is that thebetter separated the perturbed feature vectors from the twoclasses are, the higher the attack success rate will be. Thus,from the defender’s perspective, the bandwidth R has to besufficient to ensure that the resulting ε Φ leads to low separa-bility. Empirical upper bounds.
We compute the feature spacedistances between 500 different sets of source W and target T websites and plot the maximum attack success rate as thebudget ε Φ in the feature space is varied (Figure 12). We usea robust feature extractor trained on the Rimmer dataset to de-rive this upper bound. With a feature space budget of 2 .
5, theattack success rate drops to 50%, which for a two-class clas-sification problem implies that no classifier can distinguishbetween the two classes.Now, it remains to be established that a reasonable input bandwidth overhead can lead to a feature space budget of 2 . .
5, making it a conservative estimate for ε Φ in Theorem 1. Remarks
We note that our analysis here is restricted tothe 2-class setting, thus the conclusions drawn may not ap-ply for all source-target pairs. We hypothesize that this ex-plains the lower than 100% protection rate at R = ..