Adversarial CAPTCHAs
Chenghui Shi, Xiaogang Xu, Shouling Ji, Kai Bu, Jianhai Chen, Raheem Beyah, Ting Wang
11 Adversarial CAPTCHAs
Chenghui Shi, Xiaogang Xu, Shouling Ji, Kai Bu, Jianhai Chen, Raheem Beyah, and Ting Wang
Abstract —Following the principle of to set one’s own spear against one’s own shield , we study how to design adversarial CAPTCHAsin this paper. We first identify the similarity and difference between adversarial CAPTCHA generation and existing hot adversarialexample (image) generation research. Then, we propose a framework for text-based and image-based adversarial CAPTCHAgeneration on top of state-of-the-art adversarial image generation techniques. Finally, we design and implement an adversarialCAPTCHA generation and evaluation system, named aCAPTCHA, which integrates 10 image preprocessing techniques, 9 CAPTCHAattacks, 4 baseline adversarial CAPTCHA generation methods, and 8 new adversarial CAPTCHA generation methods. To examine theperformance of aCAPTCHA, extensive security and usability evaluations are conducted. The results demonstrate that the generatedadversarial CAPTCHAs can significantly improve the security of normal CAPTCHAs while maintaining similar usability. To facilitate theCAPTCHA security research, we also open source the aCAPTCHA system, including the source code, trained models, datasets, andthe usability evaluation interfaces.
Index Terms —CAPTCHA, Adversarial Image, Deep Learning, Usable Security. (cid:70)
NTRODUCTION
CAPTCHA (Completely Automated Public Turing testto tell Computers and Humans Apart) is a type of challenge-response test in computing which is used to distinguishbetween human and automated programs (machines). Thefirst generation of CAPTCHA was invented in 1997, whilethe term “CAPTCHA” was first coined in 2002 [1] [2].Ever since its invention, CAPTCHA has been widely usedto improve the security of websites and various onlineapplications to prevent the abuse of online services, suchas preventing phishing, bots, spam, and Sybil attacks.
Existing CAPTCHA Schemes.
In general, existing pop-ular CAPTCHAs can be classified into four categories:(1)
Text-based CAPTCHA . Text-based CAPTCHA schemesask users to recognize a string of distorted characterswith/without an obfuscated background [7] [8]. Due to itssimplicity and high efficiency, text-based CAPTCHA is themost widely deployed and acceptable form up to now andin a foreseeable future [7] [8].(2)
Image-based CAPTCHA . Image-based CAPTCHA isanother popular scheme which usually asks users to selectone or more images with specific semantic meanings froma couple of candidate images [25]. It is motivated by theintuition that compared with a string of characters, imagescarry much richer information and have a larger variationspace. Meanwhile, there are still many hard, open prob- • C. Shi, X. Xu, K. Bu and J. Chen are with the College of Computer Sci-ence and Technology, Zhejiang University, Hangzhou, Zhejiang 310027,China.E-mail: { chenghuishi, xiaogangxu, kaibu, chenjh919 } @zju.edu.cn • S. Ji is with the College of Computer Science and Technology, ZhejiangUniversity, Hangzhou, Zhejiang 310027, China, and with the School ofElectrical and Computer Engineering, Georgia Institute of Technology,Atlanta, GA 30332, USA.E-mail: [email protected] • R. Beyah is with the School of Electrical and Computer Engineering,Georgia Institute of Technology, Atlanta, GA 30332, USA.E-mail: [email protected] • T. Wang is with the Department of Computer Science and Engineering,Lehigh University, Bethlehem, PA 18015, USA.E-mail: [email protected] lems in image perception and interpretation, especially inthe context of noisy environments. Thus, to some extent,image-based CAPTCHA is more secure than text-basedCAPTCHA. Nevertheless, to the best of our knowledge, acomprehensive comparative analysis on the security andusability of text- and image-based CAPTCHAs is still void.Recently, many variants of image-based CAPTCHAs wereproposed, such as slide-based CAPTCHA which asks usersto slide a puzzle to the right part of an image [56], click-based CAPTCHA which asks users to click specific semanticregions of an image [55], etc.(3)
Audio-based CAPTCHA . Audio-based CAPTCHA asksusers to recognize the voice contents in a piece of au-dio [1] [2]. In most of the practical applications, audio-based CAPTCHA is often used together with text-basedCAPTCHA as a complementary means, mainly because ofthe usability issue, especially for non-native users of theaudio language.(4)
Video-based CAPTCHA . Video-based CAPTCHA is anew kind of CAPTCHA that asks users to finish a content-based video labeling task [34]. It is usually more complexand takes more time for users to correctly finish comparedwith other forms of CAPTCHAs. Thus, it is not widelyadopted and seldom to see in practice.There are also other different proposals for CAPTCHAdesign, e.g., game-based CAPTCHA [54] and inference-based CAPTCHA [57]. However, they are not widely de-ployed yet due to various reasons, e.g., security issues,accessability limitations, and performance issues. In thispaper, our study mainly focus on text- and image-basedCAPTCHAs . The reason is evident: they are the most ac-cepted and widely used CAPTCHAs up to now and in aforeseeable future. The study of their security and usabilityhas more potential implications for practical applicaitons.
Issues of CAPTCHAs and Motivation.
Generally speak-ing, CAPTCHA can be evaluated according to its securityperformance , which refers to the strength and resilience ofCAPTCHAs against various attacks, and usability perfor- a r X i v : . [ c s . CR ] J a n mance , which refers to how user friendly the CAPTCHAsare [1] [2]. From the security perspective, it is not a newsto see reports that a CAPTCHA scheme is broken by someattacks [1] [2]. The evolution of CAPTCHAs always movesforward in a spiral, constantly accompanied by emergingattacks. For text-based CAPTCHAs, the security goal ofits earliest version is to defend against Optical CharacterRecognition (OCR) based attacks. Therefore, many distor-tion techniques (e.g., varied fonts, varied font sizes, androtation) are applied. Over the last decade, machine learningalgorithms become more and more powerful. Following theseminal work which demonstrates that computers turn tooutperform humans in recognizing characters, even undersevere distortion, many successful attacks to text-basedCAPTCHAs were proposed, including both generic attackswhich target multiple text-based CAPTCHAs [7] [8], andspecialized attacks which targeted one kind of text-basedCAPTCHAs [24]. In spite of that it is possible to improvethe security of text-based CAPTCHAs by increasing thedistortion and obfuscation levels, their usability will besignificantly affected [7] [8].The same dilemma exists for image-based CAPTCHAseither. With the prosperity of machine learning research,especially recent deep learning progress, Deep Neural Net-works (DNNs) have achieved impressive success in imageclassification/recognization, matching or even outperform-ing the cognitive ability of humans in complex tasks withthousands of classes [16]. Along with such progress, manyDNN-based attacks have been proposed recently to crackimage-based CAPTCHAs with very high success proba-bility, as demonstrated by a large number of reports [31].To defend against existing attacks, the intuition is to relyon high-level image semantics and develop more compleximage-based CAPTCHAs, e.g., recognizing an image objectby utilizing its surrounding context [30]. Leaving the secu-rity gains aside, such designs usually induce poor usability[1] [2]. To make things worse, unlike text-based CAPTCHAs,it is difficult, if not impossible, for designers to generatespecific images with required semantical meanings throughcertain rules. In other words, it is too labor-intensive tocollect labeled images in large scale.In summary, existing text- and image-based CAPTCHAsare facing challenges from both the security and the usabil-ity perspectives. It is desired to develop a new CAPTCHAscheme that achieves high security while preserving properusability, i.e., seeks a better balance between security andusability. Our Methodology and Contributions.
To address thedilemma of existing text- and image-based CAPTCHAs,we start from analyzing state-of-the-art attacks. It is notsurprising that most, if not all, of the attacks to text- andimage-based CAPTCHAs are based on machine learningtechniques, especially the latest and most powerful ones,which are mainly based on deep learning, typically, CNNs.This is mainly because the development of CAPTCHAattacks roots in the progress of machine learning research,as we discussed before.On the other hand, with the progress of machine learningresearch, researchers found that many machine learningmodels, especially neural networks, are vulnerable to adver- sarial examples , which are defined as elaborately (maliciously,from the model’s perspective) crafted inputs that are imper-ceptible to humans but that can fool the machine learningmodel into producing undesirable behavior, e.g., producingincorrect outputs [39]. Inspired by this fact, is that possiblefor us to design a new kind of CAPTCHAs by proactivelyattacking existing CAPTCHA attacks, i.e., “ to set one’s ownspear against one’s own shield ”?Following this inspiration, we study the method to gen-erate text- and image-based CAPTCHAs based on adversar-ial learning, i.e., text-based adversarial CAPTCHAs and image-based adversarial CAPTCHAs , that are resilient to state-of-the-art CAPTCHA attacks and meanwhile preserve high usabil-ity. Specifically, we have three main objectives in the design:(1) security , which implies that the developed CAPTCHAscan effectively defend against state-of-the-art attacks, espe-cially the powerful deep learning based attacks; (2) usability ,which implies that the developed CAPTCHAs should beusable in practice and maintain high user experience; and(3) compatibility , which implies that the proposed CAPTCHAgeneration scheme is compatible with existing text- andimage-based CAPTCHA deployment and applications.With the above goals in mind, we study the method toinject human-tolerable, preprocessing-resilient (i.e., cannotbe removed by CAPTCHA attacks) perturbations to tradi-tional CAPTCHAs. Specifically, we design and implement anovel system aCAPTCHA to generate and evaluate text- andimage-based adversarial CAPTCHAs.Our main contributions can be summarized as follows.(1) Following our design principle, we propose aframework for generating adversarial CAPTCHAs on topof existing adversarial example (image) generation tech-niques. Specifically, we propose four text-based and fourimage-based adversarial CAPTCHA generation methods.Then, we design and implement a comprehensive ad-versarial CAPTCHA generation and evaluation system,named aCAPTCHA , which integrates 10 image preprocess-ing techniques, 9 CAPTCHA attacks, 4 baseline adversar-ial CAPTCHA generation methods, and 8 new adversarialCAPTCHA generation methods. aCAPTCHA can be usedfor the generation, security evaluation, and usability evalua-tion of both text- and image-based adversarial CAPTCHAs.(2) To examine the performance of the adversarialCAPTCHAs generated by aCAPTCHA, we conducted ex-tensive security and usability evaluations. The resultsdemonstrate that the generated adversarial CAPTCHAs cansignificantly improve the security of normal CAPTCHAswhile maintaining similar usability.(3) We open source the aCAPTCHA system at [60],including the source code, trained models, datasets, andthe interfaces for usability evaluation. It is expected thataCAPTCHA can facilitate the CAPTCHA security researchand can shed light on designing more secure and usableadversarial CAPTCHAs.
ACKGROUND
In this section, we briefly introduce adversarial examplesand the corresponding defense technologies.
Neural networks have achieved great performance on awide range of application domains, especially, image recog-nition. However, recent work has discovered that the ex-isting machine learning models including neural networksare vulnerable to adversarial examples . Specifically, supposewe have a classifier F with model parameters θ . Let x be an input to the classifier with corresponding groundtruth prediction y . An adversarial example x (cid:48) is an instancein the input space that is close to x according to somedistance metric d ( x, x (cid:48) ) , and causes classifier F θ to producean incorrect output. Adversarial examples that affect onemodel often affect another model, even if the two modelshave different architectures or were trained on differenttraining sets, as long as both models were trained to performthe same task [43].Prior work that considers adversarial examples under anumber of threat models can be broadly classified into twocategories: white-box attacks where the adversary has fullknowledge of the model F θ including the model architectureand parameters, and black-box attacks, where the adversaryhas no or little knowledge of the model F θ . The constructionof an adversarial example depends mainly on the gradientinformation of the target model. In the white-box setting[10], [11], [41], the gradient of the model is always visibleto the attacker. Thus, it is easy for an attacker to generateadversarial examples. In the black-box setting [43], [44], [50],attackers cannot get gradient information directly. Thereare usually two ways to generate adversarial examples inthis condition. The first one is to approximate the gradientinformation by query operations [50], i.e., sending an imageto the target model and getting the output distribution.After many rounds of queries, attackers can approximate thetarget model’s gradient and generate adversarial examples.The second way is to take advantage of the transferabilityof adversarial examples [43]. As we mentioned above, ad-versarial examples that affect one model can often affect an-other model. An attacker could trains his own local model,generates adversarial examples against the local model bywhite-box methods, and transfers them to a victim modelwhich he has limited knowledge. In the paper, we relyon the second method refers to the black-box setting togenerate adversarial CAPTCHAs against machine learningbased attacks. Due to the security threats caused by adversarial examples,improving the robustness of deep learning networks againstadversarial perturbation has been an active field of research.Various defensive techniques against adversarial exampleshave been proposed. We roughly divide them into threecategories.(1)
Adversarial Training [41], [45].
The idea is simpleand effective. One can retrain neural networks directlyon adversarial examples until the model learns to classifythem correctly. This makes the network robust against theadversarial examples in the test set and improves the overallgeneralization capability of the network. However, it doesnot resolve the problem completely, as adversarial training
Fig. 1. System overview of aCAPTCHA. can only be effective against specific adversarial examplegeneration algorithms that are used in the retraining phase.Moreover, adversarial training has been shown to be diffi-cult at a large scale, e.g., the ImageNet scale.(2)
Gradient Masking [42], [49].
This method tries toprevent an attacker from accessing the useful gradient in-formation of a model. As we mentioned, the constructionof an adversarial example depends mainly on the gradientinformation of the target model. Without useful gradientinformation, the attackers are hard to perform an attack.However, gradient masking is usually not effective againstblack-box attacks, because an adversary could run his attackalgorithm on an easy-to-attack model, and transfers theseadversarial examples to the hard-to-attack model.(3)
Input Transformation [47], [48], [51].
This kind of trans-formation method generally does not change the structureof a neural network. The main idea is to preprocess ortransform the input data, such as image cropping, rescalingand bit-depth reduction, in order to remove adversarial per-turbation, and then feed the transformed image through anunmodified classifier. This method is easy to circumvent bywhite-box attacks because attackers can modify the attackalgorithm in the mirror, e.g., considering similar operationsduring adversarial examples generation. In the black-boxattacks, it could provides good protection. However, inputtransformation cannot eliminate adversarial perturbation inthe input data but only decreases the attack success rate.In general, it is a fundamental problem that neuralnetworks are vulnerable to adversarial perturbation. Theexisting defend methods are only to some extent mitigatingthe attack. Thus, dedicated in-depth research is expected inthis area.
YSTEM O VERVIEW
In this section, we present the system architecture ofaCAPTCHA, which is shown in Fig.1. Basically, it consistsof seven modules:
Image Preprocessing (IPP) Module.
In this module,we implement 10 widely used standard image preprocess-ing techniques for CAPTCHA security analysis, including9 filters: BLUR, DETAIL, EDGE ENHANCE, SMOOTH,SMOOTH MORE, GaussianBlur, MinFilter, MedianFilter,and ModeFilter, and one standard image binarizationmethod. Basically, all the preprocessing techniques can beused to remove the noise in an image.
Text-based CAPTCHA Attack (TCA) Module.
In thismodule, we implement 5 text-based CAPTCHA attacks,including two traditional machine learning based attacks (SVM, KNN) and three state-of-the-art DNN-based at-tacks (LeNet [12], MaxoutNet [13] and NetInNet [14]) . InaCAPTCHA, TCA has two main functions. First, it canprovide necessary model information for generating text-based adversarial CAPTCHAs, i.e., for the following TCGmodule. Second, it can also be employed to evaluate theresilience of text-based CAPTCHAs against actual attacks.
Image-based CAPTCHA Attack (ICA) Module.
Simi-lar to TCA, we implement 4 state-of-the-art image-basedCAPTCHA attacks in this module (NetInNet [14], VGG[15], GoogleNet [17] and ResNet [18]) . It is used to providenecessary model information for generating image-basedadversarial CAPTCHAs and for evaluating the resilience ofimage-based CAPTCHAs against actual attacks.
Text-based Adversarial CAPTCHA Generation (TCG)Module.
In this module, we first implement 4 state-of-the-art adversarial example (image) generation algorithms toserve as the baseline. Then, we analyze the limitations ofapplying existing adversarial image generation techniquesto generate text-based adversarial CAPTCHAs. Finally, ac-cording to our analysis, we propose 4 new text-based adver-sarial CAPTCHA generation algorithms.
Image-based Adversarial CAPTCHA Generation (ICG)Module.
In this module, we first analyze the limitations ofexisting adversarial image generation techniques for gen-erating image-based adversarial CAPTCHAs. Then, we im-plement 4 image-based adversarial CAPTCHA generationalgorithms by improving existing techniques.
CAPTCHA Security Evaluation (CSE) Module.
Lever-aging TCA and ICA, this module is used to evaluatethe resilience and robustness of text- and image-basedCAPTCHAs against state-of-the-art attacks.
CAPTCHA Usability Evaluation (CUE) Module.
Thismodule is mainly used for evaluating the usability of text-and image-based CAPTCHAs.aCAPTCHA takes a fully modular design, and isthus easily extendable. We can freely add emerging at-tacks to TCA/ICA and/or add new proposed adversarialCAPTCHA generation algorithms to TCG/ICG.
In the remainder of this paper, for the text-based evaluationscenario, we employ MNIST (Modified National Institute ofStandards and Technology database) [3]. MNIST is a largedatabase of 70,000 handwritten digit images and is widelyused by the research community as a benchmark to evaluatetext-based CAPTCHA’s security and usability [8] [3].For the image-based evaluation scenario, we employanother image benchmark dataset ImageNet ILSVRC-2012(refers to the dataset used for 2012 ImageNet Large ScaleVisual Recognition Challenge) [4] [5]. The employed Im-ageNet ILSVRC-2012 contains 50,000 hand labeled pho-tographs from 1000 categories with 50 photographs fromeach category .
1. The used dataset here is a actually a subset of ImageNet ILSVRC-2012, which is sufficient for our purpose.
EXT - BASED A DVERSARIAL
CAPTCHA S With the design goals in mind and following our designprinciple, we show the design of TCG step by step below.
In fact, CAPTCHAs can be viewed as a special case ofimages. Then, following the design principle and goals,a straightforward idea is to generate text-based adversar-ial CAPTCHAs using exiting adversarial image generationtechniques. Therefore, we implement 4 baseline adversarialimage generation algorithms in TCG. Before delving to thedetails, we define some useful notations.
We first present necessary notations in the context of gen-erating adversarial images. To be consistent with existingresearch, we use the same notation system as that in [11].We represent a neural network as a function F ( x ) = y ,where x ∈ R n × n is the input image and y ∈ R m isthe corresponding output. Define F to be the full neuralnetwork including the softmax function and let Z ( x ) = z bethe output of all the layers except the softmax. According to y , F , which can be viewed as a classifier, assigns x a classlabel C ( x ) . Let C ∗ ( x ) be the correct label of x .As in [10] [11], we use L p norms to measure thesimilarity of x, x (cid:48) ∈ R n × n . Then, L p = || x − x (cid:48) || p =( (cid:80) ni =1 (cid:80) nj =1 | x − x (cid:48) | p ) /p . According to the definition, L distance measures the Euclidean distance between x and x (cid:48) ; L distance measures the number of coordinates i s.t. x i,j (cid:54) = x (cid:48) i,j ; and L ∞ distance measures the maximum changeto any of the coordinates, i.e., || x − x (cid:48) || ∞ = max {| x , − x (cid:48) , | , · · · , | x n,n − x (cid:48) n,n |} . Recently, to generate adversarial examples (adversarial im-ages in our context) against neural networks, many attackshave been proposed [40] [38]. For our purpose, those at-tacks can serve as our adversarial CAPTCHA generationmethods. In TCG, we implement four state-of-the-art suchattacks as our baseline methods.
JSMA.
In [10], Papernot et al. proposed the Jacobian-based Saliency Map Attack (JSMA) to generate adversarialimages. JSMA is a greedy algorithm. Suppose l is the targetclass of image x . Then, to obtain x (cid:48) such that x (cid:48) (cid:54) = x and C ( x (cid:48) ) = l , JSMA follows the following steps: (1) x (cid:48) = x ; (2)based on the gradient ∇ Z ( x (cid:48) ) l , compute a saliency map inwhich each value indicates the impact of the correspondingpixel on the resulting classification; (3) according to thesaliency map, select the most important pixel for modifi-cation to increase the likelihood of class l ; and (4) repeat theabove two steps until C ( x (cid:48) ) = l or more than a set thresholdof pixels have been modified.Note that, JSMA is also capable for generating untar-geted adversarial images. For that purpose, we only haveto: (1) let l = C ( x ) and change the goal as to find x (cid:48) suchthat x (cid:48) (cid:54) = x and C ( x (cid:48) ) (cid:54) = l ; (2) select the pixel to mostlydecrease the likelihood of class l for modification.
2. Note that, x is not necessary to be a square image. The setting hereis for simplicity. Carlini-Wagner Attacks.
Aiming at generating highquality adversarial images, Carlini and Wagner in [11] intro-duced three powerful attacks tailored to L , L , and L ∞ , re-spectively. Basically, all those three attacks are optimization-based and can be targeted or untargeted. Taking the untar-geted L attack as an example, it can be formalized as theoptimization problem: minimize || δ || + c · F ( x + δ ) , suchthat x + δ ∈ [0 , n , i.e., for image x , the attack seeks fora perturbation δ that is small in length and can fool theclassifier F meanwhile. In the formalization, c is a hyperpa-rameter that balances the two parts in the objective function.The constraint implies that the generated adversarial imageshould be valid. As discussed before, intuitively, it seems like that existingadversarial image generation algorithms, e.g., JSMA andCarlini-Wagner attacks, can be applied to generate adver-sarial CAPTCHAs directly. Following this intuition, weconduct a preliminary evaluation as follows: ( i ) Leverag-ing MNIST and standard CAPTCHA generation techniques[2], randomly generate 10,000 CAPTCHAs of length 4, i.e.,each CAPTCHA is composed of 4 characters from MNIST;Denote these CAPTCHAs by set C . ( ii ) Suppose LeNet fromTCA is the employed CAPTCHA attack. Then, use LeNet(trained using 50,000 CAPTCHAs for 20,000 rounds andwith batch size 50) to attack the CAPTCHAs in C . The Success Attack Rate (SAR), which is defined as the portionof successfully recognized CAPTCHAs in C , is . ; ( iii )In terms of LeNet, generate the adversarial versions of theCAPTCHAs in C using JSMA, L , L , and L ∞ , denoted by C J , C , C , and C ∞ , respectively. ( iv ) Use LeNet and possiblepreprocessing techniques from the IPP module to attack C J , C , C , and C ∞ . The corresponding SARs are shown in Table1, where “ − ” implies does not apply the corresponding prepro-cessing and B denotes the image binarization processing.From Table 1, we observe that without applying imagepreprocessing, the adversarial CAPTCHAs generated by allthe baseline algorithms can significantly reduce the SARof LeNet, e.g., L reduces the SAR of LeNet from . to . This implies that the idea of applying adversarialCAPTCHAs to defend against modern attacks is promising.However, unfortunately, without talking the usability,the security of these adversarial CAPTCHAs can be signifi-cantly affected by image preprocessing either. For instance,when attacking C ∞ , the SAR of LeNet is raised from to . after applying the SMOOTH filter and to . after further applying image binarization, which is similarto its performance on normal CAPTCHAs. This impliesthat the perturbation in the adversarial CAPTCHAs canbe removed by image preprocessing, i.e., the perturbationsadded by the baseline algorithms are not resilient/robust toimage preprocessing.We analyze the reasons from two aspects. Fromthe perspective of breaking CAPTCHAs, text-basedCAPTCHAs are monotonous compared with the image-based CAPTCHAs. Character shape is only useful infor-mation in text-based CAPTCHAs. Other information, suchas character colors and background pictures, is useless.Thus, adversaries can employ multiple kinds of techniques, e.g., filtering and image binarization, to remove noise andirrelevant information. From the perturbation generationperspective, theoretically, pre-processing such as filteringand binarization can be bypassed with minor modificationof adversarial example generation algorithm, e.g., addinganother convolutional layer to the beginning of the neuralnetwork with one output channel that performs similar fil-tering [52]. However, such modification will hugely increasethe noise added in CAPTCHAs. If we only consider filteringoperation, the adversarial examples generated by minormodification would not affect human recognition. Whilewe consider both filtering and binarization, the adversarialexamples generated by minor modification are unable torecognize by human. Therefore, existing adversarial imagegeneration techniques cannot keep the balance betweenusability and security for text-based CAPTCHAs. In the previous subsection, we analyzed the limitations ofexisting techniques for generating adversarial CAPTCHAs.Aiming at generating more robust and usable text-basedadversarial CAPTCHAs, we in this subsection proposedfour new methods based on existing techniques.Our design mainly follows two guidelines. First, ac-cording to our analysis, the perturbations added in thespace domain are frail to image preprocessing. Therefore,we consider to add perturbations in the frequency domain.This is because space domain perturbation can be consid-ered as local change of images while frequency domainperturbation is a kind of global change to images, whichis more difficult to remove, i.e., frequency domain pertur-bation is intuitively more resilient to image preprocessing.Certainly, when conducting frequency domain perturbation,we should be aware of the possible impact on the usability.Second, when generating adversarial CAPTCHAs, insteadof trying to add human-imperceptible perturbations, wefocus on adding human-tolerable perturbations. This will giveus more freedom to design more secure and fast adversar-ial CAPTCHA generation methods. Specifically, based onJSMA, L , L , and L ∞ , we propose 4 text-based adversarialCAPTCHA generation algorithms, denoted by JSMA f , L f , L f , and L f ∞ , respectively. JSMA f . We show the design of JSMA f in Algorithm 1.Basically, JSMA f follows a similar procedure as the untar-geted JSMA. We remark the differences as follows. First,in Steps 3-4, we transform a CAPTCHA to the frequencydomain by Fast Fourier Transform (FFT) and then compute asaliency map. This enables us to elaborately inject perturba-tions to a CAPTCHA in the frequency domain as expected.Second, after transforming a CAPTCHA into the fre-quency domain, its high frequency part usually correspondsto the margins of characters and other non-vital information,while the low frequency part usually corresponds to thefundamental shape information of characters. Furthermore,as we indicated before, the changes made in the frequencydomain induce global changes to an image. Therefore, todecrease possible impacts on the usability of a CAPTCHA,we introduce a mask matrix ϕ in Algorithm 1, which hasthe same size with x . ϕ has values of 1 in the high frequencypart while 0 in the low frequency part. Then, as shown in TABLE 1Performance of baseline algorithms vs LeNet. The original SAR of LeNet is . . Filter
JSMA L L L ∞ − B − B − B − B − % % % % % % % % BLUR 5.15 % % % % % % % % DETAIL 17.80 % % % % % % % % EDGE ENHANCE 9.05 % % % % % % % % SMOOTH 43.36 % % % % % % % % SMOOTH MORE 37.71 % % % % % % % % GaussianBlur 49.70 % % % % % % % % MinFilter 0.15 % % % % % % % % MedianFilter 24.31 % % % % % % % % ModeFilter 20.84 % % % % % % % % Algorithm 1: JSMA f Input : x original CAPTCHAs; C ∗ ( x ) the label of x ; F a classifier; ϕ mask. Output: x (cid:48) adversarial CAPTCHAs x (cid:48) ← x , l ← C ∗ ( x ) ; while F ( x (cid:48) ) == l do x (cid:48) f ← F F T ( x (cid:48) ) ; compute a saliency map S based on the gradient ∇ Z ( x (cid:48) f ) l ; S ← S × ϕ ; based on S , select the pixel, denoted by x (cid:48) f [ i ][ j ] ,that mostly decreases the likelihood of l ; modify x (cid:48) f [ i ][ j ] and its neighbors to decrease thelikelihood of l ; x (cid:48) ← IF F T ( x (cid:48) f ) ;Steps 5-6, we filter the pixels in the low frequency part whileonly considering to change the pixels in the high frequencypart.Third, after selecting the candidate modified pixel, in-stead of modifying one pixel each time as in JSMA, wemodify the candidate pixel and its neighbors as shown inStep 7. This design is mainly based on the fact that closepixels in the frequency domain exhibit the partial similarity [58], i.e., neighboring pixels in the frequency domain havevery similar property and features. Therefore, modifyingthe candidate pixel and its neighbors would significantlyaccelerate the adversarial CAPTCHA generation processwhile not harmfully affect its quality (recall that, we aretargeting to use user-tolerable instead of as little as possibleperturbations).Finally, we make an Inverse FFT (IFFT) for theCAPTCHA in the frequency domain and transform it backto the space domain as shown in Step 8. L f , L f , and L f ∞ . Basically, L f , L f , and L f ∞ follow thesimilar procedures as that in L , L , and L ∞ respectively,except that all the designs are finished in the frequency do-main. The differences are the same as that between JSMA f and JSMA. Therefore, we omit their algorithm descriptionshere while implementing them in TCG. Now, we evaluate the security performance of JSMA f , L f , L f , and L f ∞ and leave their usability evaluation in Section7. Generally, the evaluation procedure is the same as thatin Section 4.2. In all the evaluations of this subsection, weemploy MNIST to randomly generate CAPTCHAs of length4. For each attack in TCA, we use 50,000 normal CAPTCHAsfor training. Specifically, for the DNN based attacks LeNet,MaxOut, and NetInNet, the batch size is 50 and each modelis trained for 20,000 rounds. For each scenario, we use 1000CAPTCHAs for testing. When generating an adversarialCAPTCHA, we set the inner × area as the high frequencypart while the rest as the low frequency part for mask ϕ .Each evaluation is repeated three times and their average isreported as the final result.First, we evaluate the performance of JSMA f , L f , L f ,and L f ∞ without any image preprocessing. To conduct thisgroup of evaluations, we ( i ) leverage JSMA f , L f , L f , and L f ∞ to generate adversarial CAPTCHAs in terms of LeNet,MaxoutNet, and NetInNet, respectively; and ( ii ) leveragethe attacks in the TCA module to attack these adversarialCAPTCHAs, respectively. The results are shown in Table 2,where Normal indicates the SAR of each attack on the normalCAPTCHAs (non-adversarial versions).From Table 2, we have the following observations. (1) Allthe attacks in TCA are very powerful when attacking normalCAPTCHAs. However, when they attack the adversarialCAPTCHAs generated by JSMA f , L f , L f , or L f ∞ , none ofthem can break any adversarial CAPTCHA. This result is asexpected and further demonstrates the advantage of apply-ing adversarial CAPTCHAs to improve the security. (2) Thegenerated CAPTCHAs by JSMA f , L f , L f , and L f ∞ havevery good transferability, i.e., the adversarial CAPTCHAsgenerated in terms of one neural network model are trans-ferable to another neural network or traditional machinelearning models. This demonstrates the good robustness ofthe adversarial CAPTCHAs generated by JSMA f , L f , L f ,and L f ∞ .Now, we go further by fully considering both imagefiltering and image binarization, Common operations inbreaking text-based CAPTCHAs. Full results are shown inTable 3, from which we have the following observations. (1)For SVM and KNN, they cannot break any CAPTCHAs gen-erated by JSMA f , L f , L f , or L f ∞ even after image prepro-cessing. This implies adversarial CAPTCHAs can achievevery good security when against traditional machine learn- TABLE 2Performance of JSMA f , L f , L f , and L f ∞ (no image preprocessing). Attack Model Normal Text-based Adversarial CAPTCHA GenerationLeNet MaxoutNet NetInNetJSMA f L f L f L f ∞ JSMA f L f L f L f ∞ JSMA f L f L f L f ∞ SVM % % % % % % % % % % % % % KNN % % % % % % % % % % % % % LeNet % % % % % % % % % % % % % MaxoutNet % % % % % % % % % % % % % NetInNet % % % % % % % % % % % % % TABLE 3Performance of JSMA f , L f , L f , and L f ∞ (Filter + B ). Attack Model Filter + B Text-based Adversarial CAPTCHA GenerationLeNet MaxoutNet NetInNetJSMA f L f L f L f ∞ JSMA f L f L f L f ∞ JSMA f L f L f L f ∞ S V M , K NN BLUR 0.00 % % % % % % % % % % % % DETAIL 0.00 % % % % % % % % % % % % EDGE ENHANCE 0.00 % % % % % % % % % % % % SMOOTH 0.00 % % % % % % % % % % % % SMOOTH MORE 0.00 % % % % % % % % % % % % GaussianBlur 0.00 % % % % % % % % % % % % MinFilter 0.00 % % % % % % % % % % % % MedianFilter 0.00 % % % % % % % % % % % % ModeFilter 0.00 % % % % % % % % % % % % L e N e t BLUR 0.32 % % % % % % % % % % % % DETAIL 3.77 % % % % % % % % % % % % EDGE ENHANCE 3.77 % % % % % % % % % % % % SMOOTH 11.66 % % % % % % % % % % % % SMOOTH MORE 8.89 % % % % % % % % % % % % GaussianBlur 0.03 % % % % % % % % % % % % MinFilter 0.00 % % % % % % % % % % % % MedianFilter 0.01 % % % % % % % % % % % % ModeFilter 0.01 % % % % % % % % % % % % M ax o u t N e t BLUR 5.85 % % % % % % % % % % % % DETAIL 10.70 % % % % % % % % % % % % EDGE ENHANCE 10.70 % % % % % % % % % % % % SMOOTH 38.25 % % % % % % % % % % % % SMOOTH MORE 38.52 % % % % % % % % % % % % GaussianBlur 0.13 % % % % % % % % % % % % MinFilter 0.00 % % % % % % % % % % % % MedianFilter 0.03 % % % % % % % % % % % % ModeFilter 0.03 % % % % % % % % % % % % N e t I n N e t BLUR 17.51 % % % % % % % % % % % % DETAIL 15.90 % % % % % % % % % % % % EDGE ENHANCE 15.90 % % % % % % % % % % % % SMOOTH 28.24 % % % % % % % % % % % % SMOOTH MORE 28.88 % % % % % % % % % % % % GaussianBlur 0.48 % % % % % % % % % % % % MinFilter 0.00 % % % % % % % % % % % % MedianFilter 0.06 % % % % % % % % % % % % ModeFilter 0.06 % % % % % % % % % % % % ing model based attacks. (2) For the DNN based attacksLeNet, MaxoutNet, and NetInNet, they become more pow-erful along with image filtering and binarization and canbreak adversarial CAPTCHAs to some extent in severalscenarios. Still, adversarial CAPTCHAs are obviously moresecure than normal ones when considering the SAR ratesof these attacks. Further, comparing the results in Table 3with that in Table 1, the adversarial CAPTCHAs generatedby JSMA f , L f , L f , and L f ∞ are also much more secure thanthe ones generated by JSMA, L , L , and L ∞ . (3) Similaras the previous evaluations, the adversarial CAPTCHAsmaintain adequate transferability, which implies adversarialCAPTCHAs have stable robustness.Finally, we discuss why the frequency-based methodsperform better than space-based methods for text-based CAPTCHAs. According to the CAPTCHAs we generated (asshown in Fig. 2), after adding noise in the frequency domain,the shape and edge of the character changes, which cannotbe recovered by filtering and binaryzation. Furthermore,as we protect the low-frequency part of an image, thefundamental shape of the characters in CHAPTCHAs willnot change. Thus, human can still recognize them easily. TABLE 4Security of image-based adversarial CAPTCHAs.
Normal Image-based Adversarial CAPTCHA GenerationNetInNet GoogleNet VGG ResNet50
JSMA i L i L i L ∞ i JSMA i L i L i L ∞ i JSMA i L i L i L ∞ i JSMA i L i L i L ∞ i NetInNet % % % % % % % % % % % % % % % % % GoogleNet % % % % % % % % % % % % % % % % % VGG % % % % % % % % % % % % % % % % % ResNet50 % % % % % % % % % % % % % % % % % TABLE 5Security of image-based adversarial CAPTCHAs vs Filters.
Filter Image-based Adversarial CAPTCHA GenerationNetInNet GoogleNet VGG ResNet50
JSMA i L i L i L ∞ i JSMA i L i L i L ∞ i JSMA i L i L i L ∞ i JSMA i L i L i L ∞ i N e t I n N e t BLUR 0.0 % % % % % % % % % % % % % % % % DETAIL 0.0 % % % % % % % % % % % % % % % % EDGE ENHANCE 0.0 % % % % % % % % % % % % % % % % SMOOTH 0.0 % % % % % % % % % % % % % % % % SMOOTH MORE 0.0 % % % % % % % % % % % % % % % % GaussianBlur 0.0 % % % % % % % % % % % % % % % % MinFilter 0.0 % % % % % % % % % % % % % % % % MedianFilter 0.0 % % % % % % % % % % % % % % % % ModeFilter 0.0 % % % % % % % % % % % % % % % % G oog l e N e t BLUR 0.8 % % % % % % % % % % % % % % % % DETAIL 0.4 % % % % % % % % % % % % % % % % EDGE ENHANCE 0.0 % % % % % % % % % % % % % % % % SMOOTH 0.3 % % % % % % % % % % % % % % % % SMOOTH MORE 0.4 % % % % % % % % % % % % % % % % GaussianBlur 0.4 % % % % % % % % % % % % % % % % MinFilter 2.1 % % % % % % % % % % % % % % % % MedianFilter 0.3 % % % % % % % % % % % % % % % % ModeFilter 0.5 % % % % % % % % % % % % % % % % V GG BLUR 1.0 % % % % % % % % % % % % % % % % DETAIL 0.8 % % % % % % % % % % % % % % % % EDGE ENHANCE 0.0 % % % % % % % % % % % % % % % % SMOOTH 0.7 % % % % % % % % % % % % % % % % SMOOTH MORE 0.7 % % % % % % % % % % % % % % % % GaussianBlur 1.1 % % % % % % % % % % % % % % % % MinFilter 2.9 % % % % % % % % % % % % % % % % MedianFilter 0.5 % % % % % % % % % % % % % % % % ModeFilter 0.5 % % % % % % % % % % % % % % % % R e s N e t BLUR 4.1 % % % % % % % % % % % % % % % % DETAIL 10.8 % % % % % % % % % % % % % % % % EDGE ENHANCE 1.3 % % % % % % % % % % % % % % % % SMOOTH 6.2 % % % % % % % % % % % % % % % % SMOOTH MORE 7.0 % % % % % % % % % % % % % % % % GaussianBlur 4.4 % % % % % % % % % % % % % % % % MinFilter 7.0 % % % % % % % % % % % % % % % % MedianFilter 3.5 % % % % % % % % % % % % % % % % ModeFilter 10.1 % % % % % % % % % % % % % % % % MAGE - BASED A DVERSARIAL
CAPTCHA S For image-based adversarial CAPTCHA generation, we ac-tually follow the same design principles as that for the text-based scenario. Furthermore, similar to the situation that ex-isting adversarial image generation techniques are not suit-able for generating text-based adversarial CAPTCHAs, theyare not suitable for image-based adversarial CAPTCHAseither due to similar reasons. Existing adversarial imagegeneration techniques are mainly targeting to attack neuralnetwork models by adding as less as possible (human-imperceptible) perturbations to an image. However, weare standing on the defensive side to generate adversarialCAPTCHAs to improve the security. This implies that wemight inject as much as possible perturbations to an image- based adversarial CAPTCHA as long as it is user-tolerable(user recognizable). In addition, the adversarial examplegeneration speed may not be a concern for existing tech-niques. Although it is not a main constraint for CAPTCHAgeneration neither, since we can generate the CAPTCHAsoffline, we still expect to generate many CAPTCHAs in afast way (since we may need to update our CAPTCHAsperiodically to improve the system security). Therefore, wetake efficiency as a consideration in adversarial CAPTCHAgeneration.Image-based CAPTCHAs are also different from text-based ones. They carry much richer semantic informationwhich enables researchers to develop more processing tech-niques. Therefore, we do not have to transform an image-based CAPTCHA to the frequency domain. To some ex-tent, it is relatively easier to generate image-based adver-
Algorithm 2: JSMA i Input : x original CAPTCHAs; C ∗ ( x ) the label of x ; F a classifier; K noise level. Output: x (cid:48) adversarial CAPTCHAs x (cid:48) ← x , l ← C ∗ ( x ) ; while F ( x (cid:48) ) == l or K > do compute a saliency map S based on the gradient ∇ Z ( x (cid:48) ) l ; based on S , select the pixel, denoted by x (cid:48) [ i ][ j ] ,that mostly decreases the likelihood of l ; modify x (cid:48) [ i ][ j ] and its neighbors to decrease thelikelihood of l ; K − − ;sarial CAPTCHAs than generating text-based adversarialCAPTCHAs. Here, similar to the text-based scenario, weimplement four image-based adversarial CAPTCHA gen-eration methods based on JSMA, L , L , and L ∞ , denotedby JSMA i , L i , L i , and L i ∞ , respectively. JSMA i . We show the design of JSMA i in Algorithm2, which basically follows the same procedure as JSMA.Following our design principle, we make two changes. First,we introduce an integer parameter K to control the leastperturbation that should be made. This implies that in ourdesign, we try to inject as much as possible perturbationsas long as the CAPTCHA is user tolerable (certainly, K is an empirical value that can be decided based on somepreliminary usability testing). Second, like to the text-basedscenario, we modify multiple pixels simultaneously to ac-celerate the generation process. L i , L i , and L i ∞ . For the designs of L i , L i , and L i ∞ ,their procedures are the same as L , L , and L ∞ except thatwe choose a small step and less iterations to accelerate theCAPTCHA generation process. This also implies that ourperturbation injection scheme may not be optimal comparedwith the original L , L , and L ∞ . As we explained before,we are not targeting to add as less perturbation as possiblelike the original algorithms. Towards another direction, wetry to inject more perturbations in a fast way when theCAPTCHA is user-tolerable. Now, we evaluate the security performance of JSMA i , L i , L i , and L i ∞ while leaving their usability evaluation inthe next section. In the evaluation, we employ ImageNetILSVRC-2012 to generate all the needed CAPTCHAs. Mean-while, we use the pretrained models (all trained using thedata in ImageNet ILSVRC-2012) of the attacks in ICA to ex-amine the security performance of the generated adversarialCAPTCHAs, i.e., using the attacks in ICA to recognize thegenerated CAPTCHAs. These pretrained models have state-of-the-art performance and are available at Caffe Model Zoo[6]. For each evaluation scenario, we use 1000 CAPTCHAsfor testing. Each evaluation is repeated three times and theiraverage is reported as the final result. We first evaluate the security of the adversarialCAPTCHAs generated by JSMA i , L i , L i , and L i ∞ in thescenario of not considering any image preprocessing. Theresults are shown in Table 4. Normal implies the SAR ofeach attack when against normal CAPTCHAs, and in therest of evaluation scenarios, we first generate adversarialCAPTCHAs in terms of the neural network model of anattack, e.g., VGG, and then using different attacks to attackthem. Further, the default setting is K = 50 for JSMA i , and K = 100 for L i , L i , and L i ∞ (note that, in the original L , L , and L ∞ , there is also a parameter to control the noiselevel. We denote it by K for consistence in L i , L i , and L i ∞ ).From Table 4, we have the following observations. First,for image-based CAPTCHAs, adversarial learning tech-niques can significantly improve their security. This fur-ther confirms our design principle: to set one’s own spearagainst one’s own shield . Second, the generated adversarialCAPTCHAs demonstrate adequate transferability, i.e., theadversarial CAPTCHAs generated in terms of one neuralnetwork model also exhibits good resilience to other attacks.Thus, they are robust.Under the same settings with Table 4, we examine thesecurity performance of JSMA i , L i , L i , and L i ∞ against theattacks in ICA plus image preprocessing. Note that, since allthe CAPTCHAs are color images, we do not consider imagebinarization here. We show the results in Table 5, Basically,same conclusions can be drawn from Table 5 as that fromTable 4. In addition, we can find that image filtering haslittle impact on the security of the adversarial CAPTCHAsgenerated by JSMA i , L i , L i , or L i ∞ , i.e., they are veryrobust.Now, we consider the impact of different perturbation(noise) levels on the security of the generated adversarialCAPTCHAs. Taking JSMA i as an example, we show partialresults in Table 6, from which we make the following obser-vations. First, in most of the scenarios, when adding morenoise, better security can be achieved, which is consistentwith our intuition. However, according to the results, suchsecurity improvement is slight in most of the cases. Second,as before, the generated adversarial CAPTCHAs are resilientand robust to various attacks. DAPTIVE S ECURITY A NALYSIS
In Sections 4 and 5, we evaluate the security performance ofaCAPTCHA when attackers have no idea whether possibledefense has been implemented. In this section, we analyzein depth the adaptive methods that could be applied againstaCAPTCHA.
In practical scenario, we assume the threat follows all of thefollowing models.
Knowledge of Adversarial Example Generationand Defense:
The attacker has full knowledgeof adversarial example generation and defenseschemes. They can get that information from theresearch community and other means.
No Knowledge of CAPTCHA Generation:
Theattacker can realize that the CAPTCHAs were up-dated by adding adversarial noise, while they do TABLE 6Security of image-based adversarial CAPTCHAs vs Noise level.
Filter Image-based Adversarial CAPTCHA GenerationNetInNet GoogleNet VGG ResNet5020 30 40 50 20 30 40 50 20 30 40 50 20 30 40 50 N e t I n N e t BLUR 0.0 % % % % % % % % % % % % % % % % DETAIL 0.0 % % % % % % % % % % % % % % % % EDGE ENHANCE 0.0 % % % % % % % % % % % % % % % % SMOOTH 0.0 % % % % % % % % % % % % % % % % SMOOTH MORE 0.0 % % % % % % % % % % % % % % % % GaussianBlur 0.0 % % % % % % % % % % % % % % % % MinFilter 0.0 % % % % % % % % % % % % % % % % MedianFilter 0.0 % % % % % % % % % % % % % % % % ModeFilter 0.0 % % % % % % % % % % % % % % % % G oog l e N e t BLUR 0.2 % % % % % % % % % % % % % % % % DETAIL 0.2 % % % % % % % % % % % % % % % % EDGE ENHANCE 0.0 % % % % % % % % % % % % % % % % SMOOTH 0.1 % % % % % % % % % % % % % % % % SMOOTH MORE 0.1 % % % % % % % % % % % % % % % % GaussianBlur 0.0 % % % % % % % % % % % % % % % % MinFilter 0.4 % % % % % % % % % % % % % % % % MedianFilter 0.1 % % % % % % % % % % % % % % % % ModeFilter 0.2 % % % % % % % % % % % % % % % % V GG BLUR 0.4 % % % % % % % % % % % % % % % % DETAIL 0.1 % % % % % % % % % % % % % % % % EDGE ENHANCE 0.0 % % % % % % % % % % % % % % % % SMOOTH 0.3 % % % % % % % % % % % % % % % % SMOOTH MORE 0.2 % % % % % % % % % % % % % % % % GaussianBlur 0.3 % % % % % % % % % % % % % % % % MinFilter 1.6 % % % % % % % % % % % % % % % % MedianFilter 0.3 % % % % % % % % % % % % % % % % ModeFilter 0.1 % % % % % % % % % % % % % % % % R e s N e t BLUR 1.6 % % % % % % % % % % % % % % % % DETAIL 3.6 % % % % % % % % % % % % % % % % EDGE ENHANCE 0.2 % % % % % % % % % % % % % % % % SMOOTH 1.6 % % % % % % % % % % % % % % % % SMOOTH MORE 2.0 % % % % % % % % % % % % % % % % GaussianBlur 1.2 % % % % % % % % % % % % % % % % MinFilter 3.3 % % % % % % % % % % % % % % % % MedianFilter 0.9 % % % % % % % % % % % % % % % % ModeFilter 2.4 % % % % % % % % % % % % % % % % not know the specific model and method used togenerate the adversarial CAPTCHAs. No Access to the Source Images:
The at-tacker can only access to all generated adversarialCAPTCHAs but not to their source. They has noknowledge about the particular image used forgenerating the adversarial CAPTCHAs.From the aCAPTCHAs generation perspective, we donot know which model the attacker uses. From the attackerperspective, it is also reasonable to assume that they do notknow the specific method we use. In summary, it is black-box attack versus black-box defense.
When attackers are aware of the existence of the possibledefense, they will try other state-of-the-art methods againstadversarial CAPTCHAs. As we discussed in Section 2, thereare three types of defensive techniques against adversarialexamples: adversarial training, gradient masking and inputtransformation. Attackers can adopt these techniques to im-prove their attacks. We introduce one representative methodfor each type of defense respectively below.
Ensemble Adversarial Training [45]: This method aug-ments a model’s training data with adversarial examples crafted on other static pre-trained models. As a result, min-imizing the training loss implies increased the robustnessto black-box attacks from some set of models. In particular,the model trained by this method won the first round ofthe NIPS 2017 competition on Defenses against AdversarialAttacks. We believe this method is one of the most powerfulchoice against adversarial CAPTCHAs.
Defense Distillation [42]: This method is a type of gradi-ent masking based defense technique. Defensive distillationmodifies the softmax function to include a temperatureconstant T : sof tmax ( x, T ) i = e x i /T (cid:80) j e x i /T (1)First, training a teacher model on the training set, usingsoftmax at temperature T . Then using the teacher modelto label each instance in the training set with soft labels(the output vector from the teacher model), using softmaxat temperature T . Finally, training the distilled model onthe soft labels from the teacher model, again using softmaxat temperature T . Distillation can potentially increase theaccuracy on the test set as well as the robustness againstadversarial examples. Thermometer Encoding [47]: Actually, image binaryza-tion and filtering are representative instances of input trans- TABLE 7Performance of adversarial CAPTCHAs against adaptive attack.
Adaptive Methods Normal AdversarialJSMA f L f L f L f ∞ − % % % % % EnAdv. Training 96.95 % % % % % EnAdv. Training + % % % % % Distillation 94.36 % % % % % Therm. Encoding 92.39 % % % % % formation [51]. In section 4, we have demonstrated thatour text-based adversarial CAPTCHAs are resistant to them.Thus, we consider a more effective method here. In contrastto prior work which viewed adversarial examples as blindspot in neural networks, Goodfellow et al. [26] arguedthat the reason adversarial examples exist is that neuralnetworks behave in a largely linear manner. The purposeof thermometer encoding is to break this linearity. Given animage x , for each pixel color x ( i,j,c ) , the l -level thermometerencoding τ ( x ( i,j,c ) ) is a l -dimensional vector: τ ( x ( i,j,c ) ) = (cid:40) if x ( i,j,c ) > k/l, otherwise . (2)For example, for a 10-level thermometer encoding, we had τ (0 .
57) = 1111100000 . Then we use thermometer encodingto train a model.
Generally, the evaluation procedure is the same as that inSection 4.4. In all the evaluations of this subsection, weemploy MNIST to randomly generate CAPTCHAs of length4. For each scenario, we use 1000 CAPTCHAs for testing.When generating an adversarial CAPTCHA, we set theinner × area as the low frequency part while the restas the high frequency part for mask ϕ . Each evaluation isrepeated three times and their average is reported as thefinal result.Specifically, we use MaxoutNet to generate adversarialCAPTCHAs. For ensemble adversarial training, we useMaxoutNet, NetInNet and LeNet to generate adversarialexamples by JSMA f , L f , L f and L f ∞ respectively, and usethese examples to train a LeNet model. In Table 7, EnAdv.Training means we do not use adversarial examples craftedon MaxoutNet, while
EnAdv. Training + do. For defense dis-tillation, we set T as 100 which is the strong defense setting.For Thermometer Encoding, we set l as 16 which is the sameas the original paper. In addition, image binaryzation is usedin all of the tests.The results are shown in Table 7, from which we makethe following observations. First, defense distillation whichis based on gradient masking is not suitable to black-boxdefense. The result is consistent with our analysis that gra-dient masking is not an effective solution against black-boxattacks. Second, thermometer encoding shows limited valueto recognize adversarial examples. This may be due to thelarge perturbation we injected. Third, ensemble adversarialtraining largely improves the SAR, especially in the EnAdv.Training + setting. However, in practice, attackers are hard to know what methods and models used in adversarialCAPTCHAs generation, which restricts the practical effectof ensemble adversarial training. Overall, the generated ad-versarial CAPTCHAs are resilient to state-of-the-art defensemethods. Now, we would like to discuss why the results in thispaper are better than previous work (the attacks based onthe transferability of adversarial examples did not performwell). First, we stand on the defense side, and follow therule to inject as much perturbation as possible when theadversarial CAPTCHAs remain human-tolerable. Large per-turbation magnitudes usually cause stronger defense effect.Second, the recognition of CAPTCHAs is carried out bymultiple recognition tasks simultaneously. When the suc-cess rate of a single recognition task decreases, the overallsuccess rate will decrease exponentially. For example, whenthe successful recognition rate of a single character is ,the expected successful recognition rate of the text-basedCAPTCHAs of length four is . , and the expectedsuccessful recognition rate of the text-based CAPTCHAs oflength six is only . .Then we consider why the improved attacks basedon state-of-the-art techniques is limited. We inject largerperturbation into CAPTCHAs, and input transformation,such as image rescaling and bit-depth reduction can onlyeliminate part of the perturbation. As a result, the remainingperturbation is still effective to downgrade the recogni-tion model. Further, in this paper, we generate adversarialCAPTCHAs against the local model trained by ourselves,instead of attacking the target model directly. There is nowidely accepted conclusion about the phenomenon thatan adversarial example generated by one model is oftenmisclassified by other models. The existing adversarialexample defense strategies cannot perform well againsttransfer attacks, e.g., gradient masking based methods. Ad-versarial training, especially ensemble adversarial training,is regarded as the most effective defense strategy againstblack-box attacks. However, it requires the attacker to guessthe methods and collect enough source images that areused in adversarial CAPTCHAs generation, which impliesa large potential cost. Overall, existing adversarial exampledefense techniques are difficult, if not impossible, to breakour adversarial CAPTCHAs.In this section, we do not conduct further evaluationfor image-based adversarial CAPTCHAs. This is due tothat training models on ImageNet require lots of com-putation resources. Furthermore, we believe that image-based adversarial CAPTCHAs are more secure than text-based adversarial CAPTCHAs. On the one hand, image-based CAPTCHAs contain rich and important informationwhich plays a key role in image classification. Thus, at-tackers cannot use radical image preprocessing, such asimage binarization, and this increases the dimensionalityof the space of adversarial examples. On the other hand,many state-of-the-art adversarial example detection tech-niques fail to or are hard to deploy on large-scale datasets,e.g., ImageNet. This enhances the security of image-basedadversarial CAPTCHAs indirectly. (a) Text-based Adversarial CAPTCHAs(b) Image-based Adversarial CAPTCHAs Fig. 2. Examples of aCAPTCHA. Text-based CAPTCHA is generated byJSMA f and image-based one is generated by JSMA i using K=50. SABILITY E VALUATION
We have examined the security performance of aCAPTCHAfrom multiple perspectives in Sections 4, 5 and 6, respec-tively. In this section, we conduct experiments to evaluatethe usability performance of aCAPTCHA. As in the securityevaluation, we employ MNIST and ImageNet ILSVRC-2012to generate normal and adversarial CAPTCHAs for the text-and image-based scenarios, respectively.
To evaluate the usability of aCAPTCHA, we set the base-line as the usability of normal text- and image-basedCAPTCHAs.
Methodology.
To conduct our evaluation, we constructa real world website [60], on which the evaluation webpageis self-adapted to both PC and mobile clients, to deploy nor-mal and adversarial CAPTCHAs and collect the evaluationdata. Then, we recruit volunteer users to do the evaluation.For each user, she/he will be asked to finish the evaluationin six steps.Step 1: providing some general statistical information , in-cluding gender, age range, and education level.Step 2: finishing 10 tasks of recognizing randomly generatedtext-based normal CAPTCHAs , including 5 CAPTCHAs oflength 4 and 5 CAPTCHAs of length 6.Step 3: finishing 10 tasks of recognizing randomly generatedtext-based adversarial CAPTCHAs , including 5 CAPTCHAsof length 4 and 5 CAPTCHAs of length 6. To simplifyour evaluation, we here employ JSMA f to generate theadversarial CAPTCHAs.Step 4: finishing 5 tasks of recognizing randomly generatedimage-based normal CAPTCHAs . For each recognition task,we first randomly select two images belong to the samecategory from ILSVRC-2012, and set one as the source image TABLE 8User statistics. gender female male43 82age [16-20] [21-30] [31-40] [41-50] [51-60]76 40 1 3 5education primary school high school B.S. M.S. Ph.D.1 17 85 12 10 and the other one as the target image . Then, we randomlyselect nine images from ILSVRC-2012 that are with differentcategories of the target image, and mix the target image withthe nine images to form a candidate set . Finally, given thesource image, we ask a user to recognize the target imagefrom the candidate set.Step 5: finishing 25 tasks of recognizing randomly generatedimage-based adversarial CAPTCHAs at five difficulty levels , witheach difficulty level has 5 tasks. For each task in this step,its procedure is the same as the task in Step 4 except for theimages used here are the adversarial versions. For simplic-ity, we employ JSMA i (in terms of NetInNet) to generate theadversarial versions for the source image and the images inthe candidate set. As shown in Section 5, we can control thenoise level of JSMA i using K . Hence, in this step, we set upfive difficulty levels with K = 10 , , , , , respectively.For each difficulty level, each user is asked to do 5 tasks.Step 6: providing some feedbacks of the evaluation . After fin-ishing the previous five steps, we will show the user her/hisevaluation result, including how many tasks she/he failed,which task she/he failed, etc. Then, we ask feedbacks fromthe users by asking some questions, e.g., which CAPTCHA ismore difficult to recognize? For each task in Steps 2 and 3, if all the characters in aCAPTCHA are correctly recognized, we define that the taskhas been successfully finished. For each task in Steps 4 and5, if a user can correctly select the target image, we definethat the task has been successfully finished. After a userfinished all the six steps, the results will be transferred to thewebsite server. The visualization of adversarial CAPTCHAsused in test are shown in Figure 2.
Ethical Discussion.
In our usability evaluation, humansubjects are involved. Therefore, we consulted with the IRBoffice for potential ethical issues. Since we strictly limit our-selves to only collect necessary information and no PersonalIdentifiable Information (PII) is collected, our evaluationwas approved by IRB.
After moving the usability evaluation website online, wefinally recruit 125 volunteer users as shown in Table 8.Specifically, the users include 43 females and 82 males, andmost of them have ages ranging from 16 to 30. Furthermore,almost all the users’ education levels are high school orhigher. Following the evaluation procedure, all the 125 userssuccessfully finished the evaluation ( ∼ users finish theevaluation through smart phones). We then collect all theresults to our server.Based on the collected data, we show the main resultsin Table 9, where ι denotes the length of a text-basedCAPTCHA, K indicates the noise (difficulty) level of an TABLE 9Usability of aCAPTCHA.
Text-based CAPTCHAs Image-based CAPTCHAsNormal Adversarial Normal Adversarial ι = 4 ι = 6 ι = 4 ι = 6 K = 10 K = 20 K = 30 K = 40 K = 50 Success rate .
8% 87 .
2% 88 .
0% 82 .
2% 80 .
0% 79 .
2% 81 .
6% 80 .
0% 80 .
8% 80 . Average time 8.6s 9.7s 8.6s 10.2s 19.7s 15.3s 12.3s 12.8s 11.8s 11.5sMedian time 7.1s 7.8s 6.2s 8.4s 16.0s 10.9s 9.4s 9.4s 8.8s 9.4s image-based adversarial CAPTCHA, and success rate , aver-age time , and median time measure the average successfulprobability, the average time consumption, and the mediantime consumption of all the users to finish the correspond-ing task, respectively. From Table 9, we have the followingobservations.For text-based CAPTCHAs, although the adversarialversions can significantly improve the security performanceas shown in Section 4, their success rate of recognition alsomaintains a high level, which is only slightly lower thanthat of the normal versions. Meanwhile, it takes similar timefor users to recognize normal and adversarial CAPTCHAs.These results suggest that text-based adversarial and normalCAPTCHAs have similar usability. In addition, given thatlong CAPTCHAs usually have better security than shortones [7], we also find that long text-based CAPTCHAs costmore time for recognition and have a lower success ratethan that of the short ones (consistent with our intuition).This implies that there is a tradeoff between security andusability.For image-based CAPTCHAs, the advantage of adver-sarial versions is more evident. Adversarial CAPTCHAshave similar or even better success rates as the normalones in all the cases. The success rates of adversarialCAPTCHAs with different noise (difficulty) levels are alsosimilar. This suggests that image-based CAPTCHAs aremore robust to adversarial perturbations. Given the obvioussecurity advantage shown in Section 5, image-based adver-sarial CAPTCHAs is more promising compared to normalones. Another interesting observation is that adversarialCAPTCHAs cost less time for recognition than the normalversions, which is a little bit out of our expectation. Weconjecture the reasons as follows: ( i ) deliberately adversarialperturbation has little impact on the quality of images withrespect to human recognition; and ( ii ) as the evaluation goeson, users become more and more familiar with the tasks.Thus, they can finish the tasks faster.Now, we give a close look at the success rate of differentusers based on their statistical categories. The results areshown in Fig.3. From Fig.3, we can see that, in most ofthe scenarios, users from different statistical categories ex-hibit similar success rate over both adversarial and normalCAPTCHAs. This further demonstrates the generality ofaCAPTCHA.In summary, according to our evaluation, theCAPTCHAs generated by aCAPTCHA, especially theimage-based adversarial CAPTCHAs, have similar usabilityas the normal versions. Recall the security evaluation ofaCAPTCHA in Sections 4 and 5, they together demonstratethat aCAPTCHA is promising in addressing the dilemma ofexisting text- and image-based CAPTCHAs . (a) Success rate VS Gender(b) Success rate VS Age(c) Success rate VS Education Fig. 3. Success rate VS Statistical category.
Following the evaluation procedure, we ask some feedbacksof users after finishing the CAPTCHA recognition tasks. Thefirst question is that which CAPTCHA is the most difficult onefor recognition?
The results are shown in Fig.4. • From Fig.4 (a), in the text-based context, we can findthat users think that adversarial and normalCAPTCHAs have similar difficulty, users thinkthat adversarial CAPTCHAs are more difficult forrecognition, and interestingly, there are also users think that the normal versions are more dif-ficult. This indicates that adversarial CAPTCHAs donot increase the recognition difficulty obviously fromthe view of users. • From Fig.4 (b), in the image-based context, we canfind that the users that think adversarial and normalCAPTCHAs have similar difficulty take the largest (a) Text-based CAPTCHAs(b) Image-based CAPTCHAs Fig. 4. Difficulty of normal and adversarial CAPTCHAs.Fig. 5. Reasons of failure recognition. portion, saying , while the other six optionsare varied from to . Still, no adversarialCAPTCHAs are obviously difficult than the normalversions. This again indicates that image-based ad-versarial and normal CAPTCHAs have similar diffi-culty.In Step 6 of the evaluation, if a user has one or morefailures in Steps 2-5, we will show her/him the failed tasksand ask a question “ what is the most possible reason for thisfailure? ” for each failed task. We also provide five choicesfor this question: incorrectly recognize the source image , can notfind the target image , find more than one target images , mistakes ,and other reasons . After analyzing the collected data, wefind that users successfully finished all the CAPTCHArecognition tasks without any failure. For the rest of theusers, their feedbacks are shown in Fig.5.From Fig.5, we can find that most of the failures arecaused by either cannot recognize the source image orcannot recognize the target image. We conjecture the mainreason is that some of the randomly selected images fromILSVRC-2012 might be semantically improper, which aredifficult to understand their semantical meanings and fur-ther distinguish them. Furthermore, most of the users finishthe evaluation on their smart phones. The relatively smallscreens may harm the recognizability of images. ISCUSSION
Remarks on aCAPTCHA.
Different from traditionalCAPTCHA designs, which are mainly focusing on de-fending against attacks in a passive manner, we designaCAPTCHA following a more proactive principle: to set one’sown spear against one’s own shield . Then, in terms of the modelof state-of-the-art CAPTCHA attacks, we designed and im-plemented text- and image-based adversarial CAPTCHAs.When implementing adversarial CAPTCHAs, we alsofollow a different methodology from that of existing adver-sarial image generation techniques. The main reason, as wediscussed before, is because we stand on a different position.Existing adversarial image generation techniques focus onattacks in a hidden manner. For instance, some methodmay focus on generating an adversarial image which isonly different from the original image in one pixel [19] (itis impossible for humans to identify such difference). Incontrast, we follow the rule to inject as much perturbation aspossible when the adversarial CAPTCHAs remain human-tolerable. By this way, we would find a better balancebetween CAPTCHA security and usability, which can bedemonstrated by our evaluation results.One thing deserves further emphasis is that: aCAPTCHAis not designed as a replacement while is designed as an enhance-ment of existing CAPTCHA systems.
According to our design,aCAPTCHA can be seamlessly combined with the deployedtext- and image-based CAPTCHA systems. The only changeis to update the normal CAPTCHAs with their adversarialversions. Therefore, we believe aCAPTCHA has a greatapplicability. Actually, we have contacted with several Inter-net companies to introduce aCAPTCHA. They are all veryinterested with aCAPTCHA and two of them have shownthe intension to integrate aCAPTCHA to their systems.Finally, we believe open source is an important wayto promote computer science research, especially in theCAPTCHA defense domain. Therefore, we make theaCAPTCHA system publicly available at [60], includingthe source code, trained models, datasets, as well as theusability evaluation interfaces.
Limitations and Future Work.
As an attempt to designadversarial CAPTCHAs, we believe aCAPTCHA can beimproved in many perspectives. We discuss the limitationsof this work along with future work below.First, in the design of aCAPTCHA, we only integratethe popular attacks to text- and image-based CAPTCHAs.Also, following our design principle, we propose and im-plement four text-based and four image-based adversarialCAPTCHA generation methods, respectively. Note that, allthese designs and implementations are for demonstratingthe advantages of adversarial CAPTCHAs. Furthermore,aCAPTCHA employs a modular design style, which iseasy for new technique integration. Hence, we will addmore attacks as well as more adversarial CAPTCHA gen-eration methods to aCAPTCHA, especially the emergingtechniques. We believe the open source nature will facilitatethe improvement process of aCAPTCHA.Second, as we discussed, adversarial CAPTCHAs expecthuman-tolerable instead of human-imperceptible pertur-bations. However, in our evaluation, we set the human-tolerable perturbation based on our experience and prelim-inary evaluation in our experiments, i.e., we do not have a standard to quantify human-tolerable perturbation yet.Therefore, it is expected to conduct more dedicated researchin understanding and quantifying the tradeoff betweenCAPTCHA security and usability.Third, in the paper, we do not consider that CAPTCHAswere being outsourced to human labor. By design,CAPTCHAs are simple and easy to solve by humans whilehard to solve by automated bot. This quality has madethem easy to outsource to the global unskilled labor market.This type of attack is hard to prevent. The function ofCAPTCHAs is only to distinguish between the machine andthe human. We should to design complementary system toagainst human labor attack. This task is another interestingfuture research topic. ELATED W ORK
Text-based CAPTCHAs.
The robustness of text-basedCAPTCHAs is always an active research field. In [23], Chel-lapilla and Simard studied the security of early text-basedCAPTCHAs and proposed an effective machine learningbased attack to break them. In [7], Bursztein et al. conducteda systematic study on the security of text-based CAPTCHAswith anti-segmentation techniques. In [22], Yan and Ahmadfound that the
Crowding Characters Together (CCT) mecha-nism could improve the security of CAPTCHAs. However,such kind of security mechanisms are broken soon by agroup of attacks that leverage better machine learning tech-niques [21] [20]. Recently, Gao et al. demonstrated anothersimple yet powerful machine learning based attack that canbreak a wide range of text-based CAPTCHAs. In a word, text-based CAPTCHA attacks continuously emerging while thedefense research is far from enough . Image-based CAPTCHAs.
As another popular topic,image-based CAPTCHAs also draw a lot of attention [28][27] [29]. In [25], Chew and Tygar proposed three image-based CAPTCHA schemes, which are still in wide use now.On the other hand, in [53], Golle developed a machinelearning based attack against the Asirra CAPTCHA. More-over, in [30], Zhu et al. systematically studied the designof image-based CAPTCHAs and showed an attack to break12 existing CAPTCHA schemes. Following another track,Sivakorn et al. designed a novel attack that leverages onlineimage annotation services and libraries [31]. Similar to thetext-based CAPTCHA scenario, more defensive research is alsoexpected to secure image-based CAPTCHAs . Other CAPTCHAs.
There are also many other forms ofCAPTCHAs, such as audio-based CAPTCHAs [37], video-based CAPTCHAs [34], game-based CAPTCHAs [36], etc.However, those CAPTCHAs are not widely employed inpractice mainly because of the usability issue. Furthermore,there also exist plenty of attacks that can break them [33][35] [54].
DeepCAPTCHA.
In [59], Osadchy et al. introduced anew image-based CAPTCHA scheme which is designedto resist machine learning attacks. It adds
Immutable Ad-versarial Noise (IAN) to the correctly classified images that deceive deep learning tools and cannot be removed usingimage filtering. However, DeepCAPTCHA is different fromour approach. In general, DeepCAPTCHA is a new typeof image-based CAPTCHA scheme which could providehigh security. While our aCAPTCHA system is designedto enhance the existing CAPTHCA schemes. Furthermore,the proposed IAN, which is resistance to filtering attack,cannot be used in text-based CAPTCHA generation. In thiswork, we consider more state-of-the-art adversarial exampledefense strategies and propose several new methods togenerate text- or image- based adversarial CAPTCHAs. reCAPTCHA.
The reCAPTCHA service offered by Googleis the most widely used CAPTCHA service. It is a newmulti-stage CAPTCHA system [31]. At the first round ofcheck-authentication, Google leverages information aboutuser’s activities to correlate requests to users that havepreviously interacted with any of its services. If the useris deemed legitimate, he is not required to solve a challenge.Otherwise, the user needs to further solves the given text-or image- based CAPTCHA correctly. reCAPTCHA andaCATPCHA do not conflict. aCAPTCHA can be used forfurther improving reCAPTCHA’s security.
The robustness of machine learning models against adver-sarial examples is an active research filed recently. In [39],Szegedy et al. found that adversarial training increases therobustness of a model by augmenting training data withadversarial examples. In [46], Madry et al. showed thatadversarially trained models can be more robust againstwhite-box attacks if the perturbation during training closelymaximizes the model’s loss. In [45] , Tramer et al. propsedensemble adversarial training, a technique that augmentstraining data with perturbation transferred from other mod-els. It can somehow make a model resist to black-boxattacks.Another way to defend against adversarial perturbationis input transformation. Without of changing the modelstructure, it tries to eliminate the perturbation in the input.Xu et al. [51] proposed feature squeezing, reducing the colorbit depth and spatial smoothing. These simple strategiesare inexpensive and can sever as complementary to otherdefenses. In [48], Guo et al. ensembled various input trans-formations to counter adversarial images. However, thesemethods are not strong when against white-box attacks,and can be broken by minor modifications [49]. Moreover,Athalye et al. [49] described that gradient masking is anincomplete defense to adversarial examples. Many state-of-the-art gradient masking schemes can be successfullycircumvented by their attacks.
10 C
ONCLUSION
In this paper, we study the generation of adversarialCAPTCHAs. First, we propose a framework for generat-ing text- and image-based adversarial CAPTCHAs. Then,we design and implement aCAPTCHA, a comprehen-sive adversarial CAPTCHA generation and evaluation sys-tem, which integrates 10 image preprocessing techniques,9 CAPTCHA attacks, 4 baseline adversarial CAPTCHA generation methods, and 8 new adversarial CAPTCHAgeneration methods, and can be used for the genera-tion, security evaluation, and usability evaluation of ad-versarial CAPTCHAs. To evaluate the performance ofaCAPTCHA, we conduct extensive experiments. The resultsdemonstrate that the adversarial CAPTCHAs generated byaCAPTCHA can significantly improve the security of nor-mal CAPTCHAs while maintaining similar usability. Finally,we open source aCAPTCHA to facilitate the CAPTCHAsecurity research. R EFERENCES
Text-based CAPTCHAStrengths and Weaknesses , ACM CCS 2011.[8] H. Gao, J. Yan, et al.,
A Simple Generic Attack on Text Captchas , NDSS2016.[9] K. A. Kluever and R. Zanibbi,
Balancing Usability and Security in aVideo CAPTCHA , SOUPS 2009.[10] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A.Swami,
The Limitations of Deep Learning in Adversarial Settings , IEEEEuro S&P 2016.[11] N. Carlini and D. Wagner,
Towards Evaluating the Robustness ofNeural Networks , IEEE S&P 2017.[12] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner,
Gradient-basedlearning applied to document recognition , Proceedings of the IEEE,1998.[13] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y.Bengio,
Maxout networks , ICML 2013.[14] M Lin, Q Chen, and S Yan,
Network In Network , ICLR 2013.[15] K. Simonyan and A. Zisserman,
Very deep convolutional networks forlarge-scale image recognition , arXiv: 1409.1556, 2014.[16] He, K. et al.
Delving deep into rectifiers: Surpassing human-levelperformance on imagenet classification , ICCV 2015.[17] C. Szegedy, W. Liu et al,
Going deeper with convolutions , CVPR 2015.[18] K. He, X. Zhang, S. Ren, and J. Sun,
Deep residual learning for imagerecognition , CVPR 2016.[19] J. Su, D. V. Vargas, and S. Kouichi,
One pixel attack for fooling deepneural networks , arXiv 2017.[20] Bursztein, E., Aigrain, J., Moscicki, A., Mitchell, J. C.
The End isNigh: Generic Solving of Text-based CAPTCHAs. , WOOT 2014.[21] El Ahmad, A. S and Yan, Jeff and Tayara, Mohamad,
The Robustnessof Google CAPTCHAs , Technical report, Newcastle University, 2011.[22] , J. Yan and E. Ahmad, A. Salah,
A Low-cost Attack on a MicrosoftCAPTCHA , CCS 2008.[23] K. Chellapilla, and P. Y. Simard,
Using machine learning to breakvisual human interaction proofs (HIPs) , NIPS 2005.[24] Gao, H., Wang, W., Qi, J., Wang, X., Liu, X., Yan, J.,
The robustnessof hollow CAPTCHAs , CCS 2013.[25] M. Chew,and J. D. Tygar,
Image recognition captchas , ICIS 2004.[26] Goodfellow, I. J. et al.,
Multi-digit number recognition from street viewimagery using deep convolutional neural networks , arXiv 2013.[27] Rui, Yong and Liu, Zicheng,
Artifacial: Automated reverse turing testusing facial features , Multimedia Systems 2004[28] Datta, Ritendra and Li, Jia and Wang, James Z,
IMAGINATION: arobust image-based CAPTCHA generation system , MM’05.[29] E. Jeremy, D. John JD, H. Jon and S. Jared,
Asirra: a CAPTCHA thatexploits interest-aligned manual image categorization , CCS 2007.[30] Zhu, Bin B and Yan, Jeff et al.
Attacks and design of image recognitionCAPTCHAs , CCS 2010.[31] S. Suphannee , P. Iasonas, K., Angelos D,
I am robot:(deep) learningto break semantic image captchas , IEEE Euro S&P 2016.[32] Solanki, Saumya and Krishnan, Gautam and Sampath, Varshiniand Polakis, Jason,
In (Cyber)Space Bots Can Hear You Speak: BreakingAudio CAPTCHAs Using OTS Speech Recognition , AISec 2017.[33] Bock, K. and Patel, D. and Hughey, G. and Levin, D., unCaptcha: ALow-resource Defeat of Recaptcha’s Audio Challenge , WOOT 2017. [34] Kluever, Kurt Alfred and Zanibbi, Richard,
Balancing Usability andSecurity in a Video CAPTCHA , SOUPS 2009.[35] M. Manar and G. Song et al.,
On the security and usability of dynamiccognitive game CAPTCHAs , Journal of Computer Security 2017.[36] Mohamed, Manar and Sachdeva, Niharika et al.,
A three-wayinvestigation of a game-CAPTCHA: automated attacks, relay attacks andusability , ASIA CCS 2014.[37] Gao, Haichang and Liu, Honggang and Yao, Dan and Liu, Xiyangand Aickelin, Uwe,
An audio CAPTCHA to distinguish humans fromcomputers , ISECS 2010.[38] Akhtar, N., Mian, A,
Threat of Adversarial Attacks on Deep Learningin Computer Vision: A Survey , arXiv 2018.[39] Szegedy, C. and Zaremba, W. et al.,
Intriguing properties of neuralnetworks , CoRR 2013.[40] Yuan, X., He, P., Zhu, Q., Bhat, R. R., Li, X,
Adversarial Examples:Attacks and Defenses for Deep Learning , arXiv 2017.[41] Goodfellow, I. J. and Shlens, J. and Szegedy, C.,
Explaining andHarnessing Adversarial Examples , journal:Computer Science 2014.[42] N. Papernot, P. McDaniel et al.,
Distillation as a defense to adversarialperturbations against deep neural networks , IEEE S&P 2016.[43] Papernot, Nicolas and McDaniel, Patrick and Goodfellow, Ian,
Transferability in machine learning: from phenomena to black-box attacksusing adversarial samples , arXiv 2016.[44] Papernot, Nicolas and McDaniel, Patrick et al.,
Practical black-boxattacks against machine learning , ASIA CCS 2017.[45] Florian Tramer, Alexey Kurakin et al.,
Ensemble Adversarial training:Attacks and Defenses.
ICLR 2018.[46] Aleksander Madry, Aleksandar Makelov, Ludwing Schmidt, Dim-itris Tsipras, and Adrian Vladu,
Towards Deep Learning ModelsResistant to Adversarial Attacks. arXiv 2017.[47] Jacob Buckman, Aurko Roy, Colin Raffel, Ian Goodfellow,
Ther-mometer Encoding: One Hot Way To Resist Adversarial Examples.
ICLR2018.[48] Guo, C., Ranna, M., Cisse, M., and van der Maaten, L,
CounteringAdversarial Images Using Input Transformations.
ICLR 2018.[49] Anish Athalye, Nicholas Carlini, David Wagner,
Obfuscated Gradi-ents Give a False Sense of Security: Circumventing Defenses to Adversar-ial Examples.
ICML 2018.[50] Andrew Ilyas, Logan Engstrom, Anish Athalye, Jessy Lin,
Black-box Adversarial Attacks with Limited Queries and Information.
ICML2018.[51] Weilin Xu, David Evans, Yanjun Qi,
Feature Squeezing: DetectingAdversarial Examples in Deep Neural Networks.
NDSS 2018.[52] Nicholas Carlini, David Wagner,
Adversarial Examples Are NotEasily Detected: Bypassing Ten Detection Methods arXiv 2017.[53] P. Golle,
Machine Learning Attacks Against the Asirra CAPTCHA ,CCS 2008.[54] Xu, Yi and Reynaga, Gerardo et al.,
Security and Usability Chal-lenges of Moving-Object CAPTCHAs: Decoding Codewords in Motion.
USENIX Security 2012.[55] Hwang, K. F., Huang, C. C., You, G. N.
A spelling based CAPTCHAsystem by using click. , ISBAST 2012.[56] Aleksandrovich, P. N., Alekseevich, N. I., Vladimirovich, V. M.,Igorevich, N. A., Borisovna, P. V., Igorevna, N. O. (2012). U.S. PatentApplication No. 13/528,373.[57] Saha, S. K., Nag, A. K., Dasgupta, D. (2015).
Human-Cognition-Based CAPTCHAs , IT Professional, 17(5), 42-48.[58] R. Gonzalez,
Digital Image Processing , Pearson Hall, 2008.[59] Margarita Osadchy, Julio Hernandez-Castro et al.,