Generating Natural Language Adversarial Examples
Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, Kai-Wei Chang
aa r X i v : . [ c s . C L ] S e p Generating Natural Language Adversarial Examples
Moustafa Alzantot ∗ , Yash Sharma ∗ , Ahmed Elgohary ,Bo-Jhang Ho , Mani B. Srivastava , Kai-Wei Chang Department of Computer Science, University of California, Los Angeles (UCLA) { malzantot, bojhang, mbs, kwchang } @ucla.edu Cooper Union [email protected] Computer Science Department, University of Maryland [email protected]
Abstract
Deep neural networks (DNNs) are vulnera-ble to adversarial examples, perturbations tocorrectly classified examples which can causethe model to misclassify. In the image do-main, these perturbations are often virtuallyindistinguishable to human perception, caus-ing humans and state-of-the-art models to dis-agree. However, in the natural language do-main, small perturbations are clearly percep-tible, and the replacement of a single wordcan drastically alter the semantics of the doc-ument. Given these challenges, we use ablack-box population-based optimization al-gorithm to generate semantically and syntac-tically similar adversarial examples that foolwell-trained sentiment analysis and textual en-tailment models with success rates of 97% and70%, respectively. We additionally demon-strate that 92.3% of the successful sentimentanalysis adversarial examples are classified totheir original label by 20 human annotators,and that the examples are perceptibly quitesimilar. Finally, we discuss an attempt to useadversarial training as a defense, but fail toyield improvement, demonstrating the strengthand diversity of our adversarial examples. Wehope our findings encourage researchers topursue improving the robustness of DNNs inthe natural language domain.
Recent research has found that deep neural net-works (DNNs) are vulnerable to adversarial ex-amples (Goodfellow et al., 2014; Szegedy et al.,2013). The existence of adversarial ex-amples has been shown in image classifica-tion (Szegedy et al., 2013) and speech recogni-tion (Carlini and Wagner, 2018). In this work,we demonstrate that adversarial examples can ∗ Moustafa Alzantot and Yash Sharma contribute equallyto this work. be constructed in the context of natural lan-guage. Using a black-box population-based op-timization algorithm, we successfully generateboth semantically and syntactically similar adver-sarial examples against models trained on boththe IMDB (Maas et al., 2011) sentiment analysistask and the Stanford Natural Language Inference(SNLI) (Bowman et al., 2015) textual entailmenttask. In addition, we validate that the examples areboth correctly classified by human evaluators andsimilar to the original via a human study. Finally,we attempt to defend against said adversarial at-tack using adversarial training, but fail to yield anyrobustness, demonstrating the strength and diver-sity of the generated adversarial examples.Our results show that by minimizing the seman-tic and syntactic dissimilarity, an attacker can per-turb examples such that humans correctly classify,but high-performing models misclassify. We areopen-sourcing our attack to encourage researchin training DNNs robust to adversarial attacks inthe natural language domain. Adversarial examples have been exploredprimarily in the image recognition domain.Examples have been generated through solv-ing an optimization problem, attempting toinduce misclassification while minimizing theperceptual distortion (Szegedy et al., 2013;Carlini and Wagner, 2017; Chen et al., 2017b;Sharma and Chen, 2017). Due to the computa-tional cost of such approaches, fast methods wereintroduced which, either in one-step or iteratively,shift all pixels simultaneously until a distortionconstraint is reached (Goodfellow et al., 2014;Kurakin et al., 2016; Madry et al., 2017). Nearly https://github.com/nesl/nlp_adversarial_examples ll popular methods are gradient-based.Such methods, however, rely on the fact thatadding small perturbations to many pixels in theimage will not have a noticeable effect on a humanviewer. This approach obviously does not transferto the natural language domain, as all changes areperceptible. Furthermore, unlike continuous im-age pixel values, words in a sentence are discretetokens. Therefore, it is not possible to compute thegradient of the network loss function with respectto the input words. A straightforward workaroundis to project input sentences into a continuousspace (e.g. word embeddings) and consider this asthe model input. However, this approach also failsbecause it still assumes that replacing every wordwith words nearby in the embedding space will notbe noticeable. Replacing words without account-ing for syntactic coherence will certainly lead toimproperly constructed sentences which will lookodd to the reader.Relative to the image domain, little work hasbeen pursued for generating natural language ad-versarial examples. Given the difficulty in gen-erating semantics-preserving perturbations, dis-tracting sentences have been added to the in-put document in order to induce misclassifica-tion (Jia and Liang, 2017). In our work, weattempt to generate semantically and syntac-tically similar adversarial examples, via wordreplacements, resolving the aforementioned is-sues. Minimizing the number of word replace-ments necessary to induce misclassification hasbeen studied in previous work (Papernot et al.,2016b), however without consideration givento semantics or syntactics, yielding incoher-ent generated examples. In recent work,there have been a few attempts at generat-ing adversarial examples for language tasks byusing back-translation (Iyyer et al., 2018), ex-ploiting machine-generated rules (Ribeiro et al.,2018), and searching in underlying semanticspace (Zhao et al., 2018). In addition, whilepreparing our submission, we became aware ofrecent work which target a similar contribu-tion (Kuleshov et al., 2018; Ebrahimi et al., 2018).We treat these contributions as parallel work. We assume the attacker has black-box access tothe target model; the attacker is not aware of themodel architecture, parameters, or training data, and is only capable of querying the target modelwith supplied inputs and obtaining the output pre-dictions and their confidence scores. This set-ting has been extensively studied in the image do-main (Papernot et al., 2016a; Chen et al., 2017a;Alzantot et al., 2018), but has yet to be exploredin the context of natural language.
To avoid the limitations of gradient-based attackmethods, we design an algorithm for constructingadversarial examples with the following goals inmind. We aim to minimize the number of modifiedwords between the original and adversarial exam-ples, but only perform modifications which retainsemantic similarity with the original and syntacticcoherence. To achieve these goals, instead of rely-ing on gradient-based optimization, we developedan attack algorithm that exploits population-basedgradient-free optimization via genetic algorithms.An added benefit of using gradient-free opti-mization is enabling use in the black-box case;gradient-reliant algorithms are inapplicable in thiscase, as they are dependent on the model be-ing differentiable and the internals being accessi-ble (Papernot et al., 2016b; Ebrahimi et al., 2018).Genetic algorithms are inspired by the processof natural selection, iteratively evolving a popu-lation of candidate solutions towards better solu-tions. The population of each iteration is a called a generation . In each generation, the quality of pop-ulation members is evaluated using a fitness func-tion. “Fitter” solutions are more likely to be se-lected for breeding the next generation. The nextgeneration is generated through a combination of crossover and mutation . Crossover is the pro-cess of taking more than one parent solution andproducing a child solution from them; it is anal-ogous to reproduction and biological crossover.Mutation is done in order to increase the diver-sity of population members and provide better ex-ploration of the search space. Genetic algorithmsare known to perform well in solving combinato-rial optimization problems (Anderson and Ferris,1994; M¨uhlenbein, 1989), and due to employinga population of candidate solutions, these algo-rithms can find successful adversarial exampleswith fewer modifications.
Perturb
Subroutine:
In order to explain ouralgorithm, we first introduce the subroutine
Perturb . This subroutine accepts an input sen-ence x cur which can be either a modified sentenceor the same as x orig . It randomly selects a word w in the sentence x cur and then selects a suitable re-placement word that has similar semantic mean-ing, fits within the surrounding context, and in-creases the target label prediction score.In order to select the best replacement word, Perturb applies the following steps: • Computes the N nearest neighbors of the se-lected word according to the distance in theGloVe embedding space (Pennington et al.,2014). We used euclidean distance, as wedid not see noticeable improvement usingcosine. We filter out candidates with dis-tance to the selected word greater than δ .We use the counter-fitting method presentedin (Mrkˇsi´c et al., 2016) to post-process theadversary’s GloVe vectors to ensure that thenearest neighbors are synonyms. The result-ing embedding is independent of the embed-dings used by victim models. • Second, we use the Google 1 billion wordslanguage model (Chelba et al., 2013) to fil-ter out words that do not fit within the contextsurrounding the word w in x cur . We do so byranking the candidate words based on theirlanguage model scores when fit within the re-placement context, and keeping only the top K words with the highest scores. • From the remaining set of words, we pick theone that will maximize the target label pre-diction probability when it replaces the word w in x cur . • Finally, the selected word is inserted in placeof w , and Perturb returns the resulting sen-tence.The selection of which word to replace in theinput sentence is done by random sampling withprobabilities proportional to the number of neigh-bors each word has within Euclidean distance δ inthe counter-fitted embedding space, encouragingthe solution set to be large enough for the algo-rithm to make appropriate modifications. We ex-clude common articles and prepositions (e.g. a, to)from being selected for replacement. Optimization Procedure:
The optimization al-gorithm can be seen in Algorithm 1. The algo-rithm starts by creating the initial generation P ofsize S by calling the Perturb subroutine S timesto create a set of distinct modifications to the orig-inal sentence. Then, the fitness of each popula- Algorithm 1
Finding adversarial examples for i = 1 , ..., S in population do P i ← Perturb ( x orig , target ) for g = 1 , ...G generations dofor i = 1 , ..., S in population do F g − i = f ( P g − i ) target x adv = P g − j F g − j if arg max c f ( x adv ) c == t thenreturn x adv ⊲ { Found successful attack } else P g = { x adv } p = N ormalize ( F g − ) for i = 2 , ..., S in population do Sample parent from P g − with probs p Sample parent from P g − with probs pchild = Crossover ( parent , parent ) child mut = Perturb ( child, target ) P gi = { child mut } tion member in the current generation is computedas the target label prediction probability, found byquerying the victim model function f . If a pop-ulation member’s predicted label is equal to thetarget label, the optimization is complete. Other-wise, pairs of population members from the cur-rent generation are randomly sampled with prob-ability proportional to their fitness values. A new child sentence is then synthesized from a pair ofparent sentences by independently sampling fromthe two using a uniform distribution. Finally, the Perturb subroutine is applied to the resultingchildren.
To evaluate our attack method, we trained mod-els for the sentiment analysis and textual en-tailment classification tasks. For both models,each word in the input sentence is first projectedinto a fixed 300-dimensional vector space usingGloVe (Pennington et al., 2014). Each of the mod-els used are based on popular open-source bench-marks, and can be found in the following reposito-ries . Model descriptions are given below. Sentiment Analysis:
We trained a sentimentanalysis model using the IMDB dataset of moviereviews (Maas et al., 2011). The IMDB datasetconsists of 25,000 training examples and 25,000test examples. The LSTM model is composed of https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py https://github.com/Smerity/keras_snli/blob/master/snli_rnn.py riginal Text Prediction = Negative . (Confidence = 78.0%)
This movie had terrible acting, terrible plot, and terrible choice of actors. (Leslie Nielsen ...come on!!!)the one part I considered slightly funny was the battling FBI/CIA agents, but because the audience wasmainly kids they didn’t understand that theme.
Adversarial Text Prediction =
Positive . (Confidence = 59.8%)
This movie had horrific acting, horrific plot, and horrifying choice of actors. (Leslie Nielsen ...comeon!!!) the one part I regarded slightly funny was the battling FBI/CIA agents, but because the audiencewas mainly youngsters they didn’t understand that theme.
Table 1: Example of attack results for the sentiment analysis task. Modified words are highlighted in green andred for the original and adversarial texts, respectively.
Original Text Prediction:
Entailment (Confidence = 86%)
Premise:
A runner wearing purple strives for the finish line.
Hypothesis: A runner wants to head for the finish line. Adversarial Text Prediction:
Contradiction (Confidence = 43%)
Premise:
A runner wearing purple strives for the finish line.
Hypothesis: A racer wants to head for the finish line. Table 2: Example of attack results for the textual entailment task. Modified words are highlighted in green and redfor the original and adversarial texts, respectively.
Sentiment Analysis Textual Entailment% success % modified % success % modified
Perturb baseline 52% 19% – –Genetic attack 97% 14.7% 70% 23%
Table 3: Comparison between the attack success rate and mean percentage of modifications required by the geneticattack and perturb baseline for the two tasks.
128 units, and the outputs across all time steps areaveraged and fed to the output layer. The test accu-racy of the model is 90%, which is relatively closeto the state-of-the-art results on this dataset.
Textual Entailment:
We trained a textual en-tailment model using the Stanford Natural Lan-guage Inference (SNLI) corpus (Bowman et al.,2015). The model passes the input through aReLU “translation” layer (Bowman et al., 2015),which encodes the premise and hypothesis sen-tences by performing a summation over the wordembeddings, concatenates the two sentence em-beddings, and finally passes the output through 3600-dimensional ReLU layers before feeding it toa 3-way softmax. The model predicts whether thepremise sentence entails, contradicts or is neutralto the hypothesis sentence. The test accuracy ofthe model is 83% which is also relatively close tothe state-of-the-art (Chen et al., 2017c).
We randomly sampled 1000, and 500 correctlyclassified examples from the test sets of the twotasks to evaluate our algorithm. Correctly classi-fied examples were chosen to limit the accuracy levels of the victim models from confounding ourresults. For the sentiment analysis task, the at-tacker aims to divert the prediction result frompositive to negative, and vice versa. For the tex-tual entailment task, the attacker is only allowedto modify the hypothesis, and aims to divert theprediction result from ‘entailment’ to ‘contradic-tion’, and vice versa. We limit the attacker tomaximum G = 20 iterations, and fix the hyper-parameter values to S = 60 , N = 8 , K = 4 , and δ = 0 . . We also fixed the maximum percentageof allowed changes to the document to be 20% and25% for the two tasks, respectively. If increased,the success rate would increase but the mean qual-ity would decrease. If the attack does not succeedwithin the iterations limit or exceeds the specifiedthreshold, it is counted as a failure.Sample outputs produced by our attack areshown in tables 4 and 5. Additional outputs canbe found in the supplementary material. Table 3shows the attack success rate and mean percent-age of modified words on each task. We compareto the Perturb baseline, which greedily appliesthe
Perturb subroutine, to validate the use ofopulation-based optimization. As can be seenfrom our results, we are able to achieve high suc-cess rate with a limited number of modificationson both tasks. In addition, the genetic algorithmsignificantly outperformed the
Perturb baselinein both success rate and percentage of words mod-ified, demonstrating the additional benefit yieldedby using population-based optimization. Testingusing a single TitanX GPU, for sentiment analy-sis and textual entailment, we measured averageruntimes on success to be 43.5 and 5 seconds perexample, respectively. The high success rate andreasonable runtimes demonstrate the practicalityof our approach, even when scaling to long sen-tences, such as those found in the IMDB dataset.Speaking of which, our success rate on textualentailment is lower due to the large disparity insentence length. On average, hypothesis sentencesin the SNLI corpus are 9 words long, which isvery short compared to IMDB (229 words, lim-ited to 100 for experiments). With sentences thatshort, applying successful perturbations becomesmuch harder, however we were still able to achievea success rate of 70%. For the same reason, wedidn’t apply the
Perturb baseline on the textualentailment task, as the
Perturb baseline fails toachieve any success under the limits of the maxi-mum allowed changes constraint.
We performed a user study on the sentiment anal-ysis task with 20 volunteers to evaluate how per-ceptible our adversarial perturbations are. Notethat the number of participating volunteers issignificantly larger than used in previous stud-ies (Jia and Liang, 2017; Ebrahimi et al., 2018).The user study was composed of two parts. First,we presented 100 adversarial examples to the par-ticipants and asked them to label the sentiment ofthe text (i.e., positive or negative.) 92.3% of theresponses matched the original text sentiment, in-dicating that our modification did not significantlyaffect human judgment on the text sentiment. Sec-ond, we prepared 100 questions, each question in-cludes the original example and the correspondingadversarial example in a pair. Participants wereasked to judge the similarity of each pair on a scalefrom 1 (very similar) to 4 (very different). The av-erage rating is . ± . , which shows the per-ceived difference is also small. The results demonstrated in section 4.1 raise thefollowing question: How can we defend againstthese attacks? We performed a preliminary exper-iment to see if adversarial training (Madry et al.,2017), the only effective defense in the image do-main, can be used to lower the attack success rate.We generated 1000 adversarial examples on thecleanly trained sentiment analysis model using theIMDB training set, appended them to the existingtraining set, and used the updated dataset to ad-versarially train a model from scratch. We foundthat adversarial training provided no additional ro-bustness benefit in our experiments using the testset, despite the fact that the model achieves near100% accuracy classifying adversarial examplesincluded in the training set. These results demon-strate the diversity in the perturbations generatedby our attack algorithm, and illustrates the diffi-culty in defending against adversarial attacks. Wehope these results inspire further work in increas-ing the robustness of natural language models.
We demonstrate that despite the difficulties in gen-erating imperceptible adversarial examples in thenatural language domain, semantically and syntac-tically similar adversarial examples can be craftedusing a black-box population-based optimizationalgorithm, yielding success on both the sentimentanalysis and textual entailment tasks. Our humanstudy validated that the generated examples wereindeed adversarial and perceptibly quite similar.We hope our work encourages researchers to pur-sue improving the robustness of DNNs in the nat-ural language domain.
Acknowledgement
This research was supported in part by the U.S.Army Research Laboratory and the UK Ministryof Defence under Agreement Number W911NF-16-3-0001, the National Science Foundation underaward eferences
M. Alzantot, Y. Sharma, S. Chakraborty, and M. Srivas-tava. 2018. Genattack: Practical black-box attackswith gradient-free optimization. arXiv preprintarXiv:1805.11090 .Edward J Anderson and Michael C Ferris. 1994. Ge-netic algorithms for combinatorial optimization: theassemble line balancing problem.
ORSA Journal onComputing , 6(2):161–173.Samuel R Bowman, Gabor Angeli, Christopher Potts,and Christopher D Manning. 2015. A large anno-tated corpus for learning natural language inference. arXiv preprint arXiv:1508.05326 .N. Carlini and D. Wagner. 2017. Towards evaluatingthe robustness of neural networks. arXiv preprintarXiv:1608.04644 .Nicholas Carlini and David Wagner. 2018. Audio ad-versarial examples: Targeted attacks on speech-to-text. arXiv preprint arXiv:1801.01944 .Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge,Thorsten Brants, Phillipp Koehn, and Tony Robin-son. 2013. One billion word benchmark for measur-ing progress in statistical language modeling. arXivpreprint arXiv:1312.3005 .P. Chen, H Zhang, Y. Sharma, J. Yi, and C. Hseih.2017a. Zoo: Zeroth order optimization basedblack-box attacks to deep neural networks with-out training substitute models. arXiv preprintarXiv:1708.03999 .P. Y. Chen, Y. Sharma, H. Zhang, J. Yi, and C. Hsieh.2017b. Ead: Elastic-net attacks to deep neuralnetworks via adversarial examples. arXiv preprintarXiv:1709.0414 .Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, HuiJiang, and Diana Inkpen. 2017c. Enhanced lstm fornatural language inference. In
Proceedings of the55th Annual Meeting of the Association for Compu-tational Linguistics (Volume 1: Long Papers) , vol-ume 1, pages 1657–1668.J. Ebrahimi, A. Rao, D. Lowd, and D. Dou. 2018. Hot-flip: White-box adversarial examples for text classi-fication.
ACL’18; arXiv preprint arXiv:1712.06751 .Ian J Goodfellow, Jonathon Shlens, and ChristianSzegedy. 2014. Explaining and harnessing adver-sarial examples. arXiv preprint arXiv:1412.6572 .Mohit Iyyer, John Wieting, Kevin Gimpel, and LukeZettlemoyer. 2018. Adversarial example generationwith syntactically controlled paraphrase networks.In
Proceedings of NAACL .Robin Jia and Percy Liang. 2017. Adversarial exam-ples for evaluating reading comprehension systems. arXiv preprint arXiv:1707.07328 . V. Kuleshov, S. Thakoor, T. Lau, and S. Ermon. 2018.Adversarial examples for natural language classifi-cation problems.
OpenReview submission OpenRe-view:r1QZ3zbAZ .A. Kurakin, I. Goodfellow, and S. Bengio. 2016. Ad-versarial machine learning at scale.
ICLR’17; arXivpreprint arXiv:1611.01236 .Andrew L. Maas, Raymond E. Daly, Peter T. Pham,Dan Huang, Andrew Y. Ng, and Christopher Potts.2011. Learning word vectors for sentiment analy-sis. In
Proceedings of the 49th Annual Meeting ofthe Association for Computational Linguistics: Hu-man Language Technologies , pages 142–150, Port-land, Oregon, USA. Association for ComputationalLinguistics.A. Madry, A. Makelov, L. Schmidt, D. Tsipras, andA. Vladu. 2017. Towards deep learning mod-els resistant to adversarial attacks. arXiv preprintarXiv:1706.06083 .Nikola Mrkˇsi´c, Diarmuid O S´eaghdha, Blaise Thom-son, Milica Gaˇsi´c, Lina Rojas-Barahona, Pei-Hao Su, David Vandyke, Tsung-Hsien Wen, andSteve Young. 2016. Counter-fitting word vec-tors to linguistic constraints. arXiv preprintarXiv:1603.00892 .Heinz M¨uhlenbein. 1989. Parallel genetic algorithms,population genetics and combinatorial optimization.In
Workshop on Parallel Processing: Logic, Organi-zation, and Technology , pages 398–406. Springer.N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. Ce-lik, and A. Swami. 2016a. Practical black-boxattacks against machine learning. arXiv preprintarXiv:1602.02697 .N. Papernot, P. McDaniel, A. Swami, and R. Ha-rang. 2016b. Crafting adversarial input sequencesfor recurrent neural networks. arXiv preprintarXiv:1604.08275 .Jeffrey Pennington, Richard Socher, and ChristopherManning. 2014. Glove: Global vectors for wordrepresentation. In
Proceedings of the 2014 confer-ence on empirical methods in natural language pro-cessing (EMNLP) , pages 1532–1543.Marco Tulio Ribeiro, Sameer Singh, and CarlosGuestrin. 2018. Semantically equivalent adversar-ial rules for debugging nlp models. In
Proceed-ings of the 56th Annual Meeting of the Associationfor Computational Linguistics (Volume 1: Long Pa-pers) , volume 1, pages 856–865.Y. Sharma and P. Y. Chen. 2017. Attacking the madrydefense model with l1-based adversarial examples. arXiv preprint arXiv:1710.10733 .Christian Szegedy, Wojciech Zaremba, Ilya Sutskever,Joan Bruna, Dumitru Erhan, Ian Goodfellow, andRob Fergus. 2013. Intriguing properties of neuralnetworks. arXiv preprint arXiv:1312.6199 .hengli Zhao, Dheeru Dua, and Sameer Singh. 2018.Generating natural adversarial examples. arXivpreprint arXiv:1710.11342 . upplemental Materials: Generating Natural Language AdversarialExamples Additional Sentiment Analysis Results
Table 4 shows an additional set of attack results against the sentiment analysis model described in ourpaper.Original Text Prediction =
Positive . (Confidence = 78%)The promise of Martin Donovan playing Jesus was, quite honestly , enough to get me to see the film.Definitely worthwhile; clever and funny without overdoing it. The low quality filming was probably an appropriate effect but ended up being a little too jarring, and the ending sounded more like a PBSprogram than Hartley. Still, too many memorable lines and great moments for me to judge it harshly.Adversarial Text Prediction =
Negative . (Confidence = 59.9%)The promise of Martin Donovan playing Jesus was, utterly frankly , enough to get me to see the film.Definitely worthwhile; clever and funny without overdoing it. The low quality filming was presumably an appropriate effect but ended up being a little too jarring, and the ending sounded more like a PBSprogram than Hartley. Still, too many memorable lines and great moments for me to judge it harshly.Original Text Prediction =
Negative . (Confidence = 74.30%)Some sort of accolade must be given to ‘Hellraiser: Bloodline’. It’s actually out full-mooned FullMoon. It bears all the marks of, say, your ‘demonic toys’ or ‘puppet master’ series, without their dopey ,uh, charm? Full Moon can get away with silly product because they know it’s silly. These Hellraiserthings, man, do they ever take themselves seriously. This increasingly stupid franchise ( though notnearly as stupid as I am for having watched it) once made up for its low budgets by being stylish. Nowit’s just ish.Adversarial Text Prediction =
Positive . (Confidence = 51.03%)Some kind of accolade must be given to ‘Hellraiser: Bloodline’. it’s truly out full-mooned Full Moon. Itbears all the marks of, say, your ‘demonic toys’ or ‘puppet master’ series, without their silly , uh, charm?Full Moon can get away with daft product because they know it’s silly. These Hellraiser things, man, dothey ever take themselves seriously. This steadily daft franchise ( whilst not nearly as daft as i am forhaving witnessed it) once made up for its low budgets by being stylish. Now it’s just ish.Original Text Prediction =
Negative . (Confidence = 50.53%)Thinly-cloaked retelling of the garden-of-eden story – nothing new, nothing shocking, although I feelthat is what the filmmakers were going for. The idea is trite . Strong performance from Daisy Eagan,that’s about it. I believed she was 13, and I was interested in her character, the rest left me cold.Adversarial Text Prediction =
Positive . (Confidence = 63.04%)Thinly-cloaked retelling of the garden-of-eden story – nothing new, nothing shocking, although I feelthat is what the filmmakers were going for. The idea is petty . Strong performance from Daisy Eagan,that’s about it. I believed she was 13, and I was interested in her character, the rest left me cold.
Table 4: Example of attack results against the sentiment analysis model. Modified words are highlighted in greenand red for the original and adversarial texts, respectively. dditional Textual Entailment Results
Table 5 shows an additional set of attack results against the textual entailment model described in ourpaper.Original Text Prediction:
Contradiction (Confidence = 91%)
Premise:
A man and a woman stand in front of a Christmas tree contemplating a single thought.
Hypothesis:
Two people talk loudly in front of a cactus.Adversarial Text Prediction:
Entailment (Confidence = 51%)
Premise:
A man and a woman stand in front of a Christmas tree contemplating a single thought.
Hypothesis:
Two humans chitchat loudly in front of a cactus.Original Text Prediction:
Contradiction (Confidence = 94%)
Premise:
A young girl wearing yellow shorts and a white tank top using a cane pole to fish at a smallpond.
Hypothesis:
A girl wearing a dress looks off a cliff .Adversarial Text Prediction:
Entailment (Confidence = 40%)
Premise:
A young girl wearing yellow shorts and a white tank top using a cane pole to fish at a smallpond.
Hypothesis:
A girl wearing a skirt looks off a ravine .Original Text Prediction:
Entailment (Confidence = 86%)
Premise:
A large group of protesters are walking down the street with signs.
Hypothesis:
Some people are holding up signs of protest in the street.Adversarial Text Prediction:
Contradiction (Confidence = 43%)
Premise:
A large group of protesters are walking down the street with signs.
Hypothesis:
Some people are holding up signals of protest in the street.of protest in the street.