[PDF] Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge

Abstract

Adversarial examples are inputs to machine learning models designed to cause the model to make a mistake. They are useful for understanding the shortcomings of machine learning models, interpreting their results, and for regularisation. In NLP, however, most example generation strategies produce input text by using known, pre-specified semantic transformations, requiring significant manual effort and in-depth understanding of the problem and domain. In this paper, we investigate the problem of automatically generating adversarial examples that violate a set of given First-Order Logic constraints in Natural Language Inference (NLI). We reduce the problem of identifying such adversarial examples to a combinatorial optimisation problem, by maximising a quantity measuring the degree of violation of such constraints and by using a language model for generating linguistically-plausible examples. Furthermore, we propose a method for adversarially regularising neural NLI models for incorporating background knowledge. Our results show that, while the proposed method does not always improve results on the SNLI and MultiNLI datasets, it significantly and consistently increases the predictive accuracy on adversarially-crafted datasets -- up to a 79.6% relative improvement -- while drastically reducing the number of background knowledge violations. Furthermore, we show that adversarial examples transfer among model architectures, and that the proposed adversarial training procedure improves the robustness of NLI models to adversarial examples.

Full PDF

AAdversarially Regularising Neural NLI Modelsto Integrate Logical Background Knowledge

Pasquale Minervini

University College London [email protected]

Sebastian Riedel

University College London [email protected]

Abstract

Adversarial examples are inputs to machinelearning models designed to cause the modelto make a mistake. They are useful for under-standing the shortcomings of machine learn-ing models, interpreting their results, and forregularisation. In NLP, however, most ex-ample generation strategies produce input textby using known, pre-speciﬁed semantic trans-formations, requiring signiﬁcant manual ef-fort and in-depth understanding of the prob-lem and domain. In this paper, we investi-gate the problem of automatically generatingadversarial examples that violate a set of givenFirst-Order Logic constraints in Natural Lan-guage Inference (NLI). We reduce the prob-lem of identifying such adversarial examplesto a combinatorial optimisation problem, bymaximising a quantity measuring the degreeof violation of such constraints and by using alanguage model for generating linguistically-plausible examples. Furthermore, we proposea method for adversarially regularising neu-ral NLI models for incorporating backgroundknowledge. Our results show that, while theproposed method does not always improveresults on the SNLI and MultiNLI datasets,it signiﬁcantly and consistently increases thepredictive accuracy on adversarially-crafteddatasets – up to a 79.6% relative improve-ment – while drastically reducing the num-ber of background knowledge violations. Fur-thermore, we show that adversarial examples transfer among model architectures, and thatthe proposed adversarial training procedureimproves the robustness of NLI models to ad-versarial examples.

An open problem in Artiﬁcial Intelligence is quan-tifying the extent to which algorithms exhibit in-telligent behaviour (Levesque, 2014). In MachineLearning, a standard procedure consists in esti- mating the generalisation error , i.e. the predic-tion error over an independent test sample (Hastieet al., 2001). However, machine learning modelscan succeed simply by recognising patterns thathappen to be predictive on instances in the testsample, while ignoring deeper phenomena (Rimelland Clark, 2009; Paperno et al., 2016).Adversarial examples are inputs to machinelearning models designed to cause the model tomake a mistake (Szegedy et al., 2014; Goodfel-low et al., 2014). In Natural Language Processing(NLP) and Machine Reading, generating adversar-ial examples can be really useful for understandingthe shortcomings of NLP models (Jia and Liang,2017; Kannan and Vinyals, 2017) and for regular-isation (Minervini et al., 2017).In this paper, we focus on the problem of gener-ating adversarial examples for Natural LanguageInference (NLI) models in order to gain insightsabout the inner workings of such systems, and reg-ularising them. NLI, also referred to as Recog-nising Textual Entailment (Fyodorov et al., 2000;Condoravdi et al., 2003; Dagan et al., 2005), is acentral problem in language understanding (Katz,1972; Bos and Markert, 2005; van Benthem, 2008;MacCartney and Manning, 2009), and thus it is es-pecially well suited to serve as a benchmark taskfor research in machine reading. In NLI, a modelis presented with two sentences, a premise p and a hypothesis h , and the goal is to determine whether p semantically entails h .The problem of acquiring large amounts of la-belled data for NLI was addressed with the cre-ation of the SNLI (Bowman et al., 2015) andMultiNLI (Williams et al., 2017) datasets. Inthese processes, annotators were presented witha premise p drawn from a corpus, and were re-quired to generate three new sentences ( hypothe-ses ) based on p , according to the following crite-ria: a) Entailment – h is deﬁnitely true given p ( p a r X i v : . [ c s . L G ] A ug ntails h ); b) Contradiction – h is deﬁnitely nottrue given p ( p contradicts h ); and c) Neutral – h might be true given p . Given a premise-hypothesissentence pair ( p, h ) , a NLI model is asked to clas-sify the relationship between p and h – i.e. either entailment , contradiction , or neutral . Solving NLIrequires to fully capture the sentence meaning byhandling complex linguistic phenomena like lexi-cal entailment, quantiﬁcation, co-reference, tense,belief, modality, and lexical and syntactic ambigu-ities (Williams et al., 2017).In this work, we use adversarial examples for: a) identifying cases where models violate existingbackground knowledge, expressed in the form of logic rules , and b) training models that are robust to such violations.The underlying idea in our work is that NLImodels should adhere to a set of structural con-straints that are intrinsic to the human reasoningprocess. For instance, contradiction is inherently symmetric : if a sentence p contradicts a sentence h , then h contradicts p as well. Similarly, entail-ment is both reﬂexive and transitive . It is reﬂexivesince a sentence a is always entailed by (i.e. is truegiven) a . It is also transitive, since if a is entailedby b , and b is entailed by c , then a is entailed by c as well. Example 1 (Inconsistency) . Consider three sen-tences a , b and c each describing a situation, suchas: a) “The girl plays”, b) “The girl plays with aball”, and c) “The girl plays with a red ball”. Notethat if a is entailed by b , and b is entailed by c , thenalso a is entailed by c . If a NLI model detects that b entails a , c entails b , but c does not entail a , weknow that it is making an error (since its results areinconsistent), even though we may not be aware ofthe sentences a , b , and c and the true semantic re-lationships holding between them. (cid:52) Our adversarial examples are different fromthose used in other ﬁelds such as computer vi-sion, where they typically consist in small, seman-tically invariant perturbations that result in dras-tic changes in the model predictions. In this pa-per, we propose a method for generating adver-sarial examples that cause a model to violate pre-existing background knowledge (Section 4), basedon reducing the generation problem to a combina-torial optimisation problem. Furthermore, we out-line a method for incorporating such backgroundknowledge into models by means of an adversar-ial training procedure (Section 5). Our results (Section 8) show that, even thoughthe proposed adversarial training procedure doesnot sensibly improve accuracy on SNLI andMultiNLI, it yields signiﬁcant relative improve-ment in accuracy (up to 79.6%) on adversarialdatasets. Furthermore, we show that adversarialexamples transfer across models, and that the pro-posed method allows training signiﬁcantly morerobust NLI models.

Neural NLI Models.

In NLI, in particu-lar on the Stanford Natural Language In-ference (SNLI) (Bowman et al., 2015) andMultiNLI (Williams et al., 2017) datasets, neu-ral NLI models – end-to-end differentiable modelsthat can be trained via gradient-based optimisation– proved to be very successful, achieving state-of-the-art results (Rocktäschel et al., 2016; Parikhet al., 2016; Chen et al., 2017).Let S denote the set of all possible sentences,and let a = ( a , . . . , a (cid:96) a ) ∈ S and b =( b , . . . , b (cid:96) b ) ∈ S denote two input sentences –representing the premise and the hypothesis – oflength (cid:96) a and (cid:96) b , respectively. In neural NLI mod-els, all words a i and b j are typically representedby k -dimensional embedding vectors a i , b j ∈ R k .As such, the sentences a and b can be encoded bythe sentence embedding matrices a ∈ R k × (cid:96) a and b ∈ R k × (cid:96) b , where the columns a i and b j respec-tively denote the embeddings of words a i and b j .Given two sentences a, b ∈ S , the goal of a NLImodel is to identify the semantic relation between a and b , which can be either entailment , contra-diction , or neutral . For this reason, given an in-stance, neural NLI models compute the followingconditional probability distribution over all threeclasses: p Θ ( · | a, b ) = softmax(score Θ ( a , b )) (1)where score Θ : R k × (cid:96) a × R k × (cid:96) b → R is a model-dependent scoring function with parameters Θ ,and softmax( x ) i = exp { x i } / (cid:80) j exp { x j } de-notes the softmax function.Several scoring functions have been proposed inthe literature, such as the conditional BidirectionalLSTM (cBiLSTM) (Rocktäschel et al., 2016), theDecomposable Attention Model (DAM) (Parikhet al., 2016), and the Enhanced LSTM model(ESIM) (Chen et al., 2017). One desirable qual-ity of the scoring function score Θ is that it shoulde differentiable with respect to the model param-eters Θ , which allows the neural NLI model to betrained from data via back-propagation. Model Training.

Let D = { ( x , y ) , . . . , ( x m , y m ) } represent a NLI dataset, where x i de-notes the i -th premise-hypothesis sentence pair,and y i ∈ { , . . . , K } their relationship, where K ∈ N is the number of possible relationships –in the case of NLI, K = 3 . The model is trainedby minimising a cross-entropy loss J D on D : J D ( D , Θ) = − m (cid:88) i =1 K (cid:88) k =1 { y i = k } log(ˆ y i,k ) (2)where ˆ y i,k = p Θ ( y i = k | x i ) denotes the proba-bility of class k on the instance x i inferred by theneural NLI model as in Eq. (1).In the following, we analyse the behaviour ofneural NLI models by means of adversarial exam-ples – inputs to machine learning models designedto cause the model to commit mistakes. In com-puter vision models, adversarial examples are cre-ated by adding a very small amount of noise tothe input (Szegedy et al., 2014; Goodfellow et al.,2014): these perturbations do not change the se-mantics of the images, but they can drasticallychange the predictions of computer vision mod-els. In our setting, we deﬁne an adversary whosegoal is ﬁnding sets of NLI instances where themodel fails to be consistent with available back-ground knowledge, encoded in the form of First-Order Logic (FOL) rules. In the following sec-tions, we deﬁne the corresponding optimisationproblem, and propose an efﬁcient solution. For analysing the behaviour of NLI models, weverify whether they agree with the provided back-ground knowledge, encoded by a set of FOL rules.Note that the three NLI classes – entailment , con-tradiction , and neutrality – can be seen as binarylogic predicates , and we can deﬁne FOL formulasfor describing the formal relationships that holdbetween them.In the following, we denote the predicates asso-ciated with entailment, contradiction, and neutral-ity as ent , con , and neu , respectively. By doing so,we can represent semantic relationships betweensentences via logic atoms. For instance, giventhree sentences s , s , s ∈ S , we can represent NLI Rules R (cid:62) ⇒ ent( X , X ) R con( X , X ) ⇒ con( X , X ) R ent( X , X ) ⇒ ¬ con( X , X ) R neu( X , X ) ⇒ ¬ con( X , X ) R ent( X , X ) ∧ ent( X , X ) ⇒ ent( X , X ) Table 1 : First-Order Logic rules deﬁning desiredproperties of NLI models: X i are universallyquantiﬁed variables, and operators ∧ , ¬ , and (cid:62) de-note logic conjunction, negation, and tautology.the fact that s entails s and s contradicts s byusing the logic atoms ent( s , s ) and con( s , s ) .Let X , . . . , X n be a set of universally quanti-ﬁed variables. We deﬁne our background knowl-edge as a set of FOL rules, each having the follow-ing body ⇒ head form: body( X , . . . , X n ) ⇒ head( X , . . . , X n ) , (3)where body and head represent the premise andthe conclusion of the rule – if body holds, head holds as well. In the following, we consider therules R , . . . , R outlined in Table 1. Rule R enforces the constraint that entailment is reﬂex-ive; rule R that contradiction should always besymmetric (if s contradicts s , then s contra-dicts s as well); rule R that entailment is tran-sitive; while rules R and R describe the formalrelationships between the entailment , neutral , and contradiction relations.In Section 4 we propose a method to automat-ically generate sets of sentences that violate therules outlined in Table 1 – effectively generating adversarial examples . Then, in Section 5 we showhow we can leverage such adversarial examples bygenerating them on-the-ﬂy during training and us-ing them for regularising the model parameters, inan adversarial training regime. In this section, we propose a method for efﬁciently generating adversarial examples for NLI models– i.e. examples that make the model violate thebackground knowledge outlined in Section 3.

We cast the problem of generating adversarial ex-amples as an optimisation problem. In particular,we propose a continuous inconsistency loss thateasures the degree to which a set of sentencescauses a model to violate a rule.

Example 2 (Inconsistency Loss) . Consider therule R in Table 1, i.e. con( X , X ) ⇒ con( X , X ) . Let s , s ∈ S be two sentences:this rule is violated if, according to the model, asentence s contradicts s , but s does not con-tradict s . However, if we just use the ﬁnal deci-sion made by the neural NLI model, we can sim-ply check whether the rule is violated by two givensentences, without any information on the degree of such a violation.Intuitively, for the rule being maximally vi-olated , the conditional probability associated to con( s , s ) should be very high ( ≈ ), while theone associated to con( s , s ) should be very low ( ≈ ). We can measure the extent to which the ruleis violated – which we refer to as inconsistencyloss J I – by checking whether the probability ofthe body of the rule is higher than the probabilityof its head: J I ( S = { X (cid:55)→ s , X (cid:55)→ s } )= [ p Θ (con | s , s ) − p Θ (con | s , s )] + where S is a substitution set that maps the vari-ables X and X in R to the sentences s and s , [ x ] + = max(0 , x ) , and p Θ (con | s i , s j ) is the(conditional) probability that s i contradicts s j ac-cording to the neural NLI model. Note that, inaccordance with the logic implication, the incon-sistency loss reaches its global minimum when theprobability of the body is close to zero – i.e. the premise is false – and when the probabilities ofboth the body and the head are close to one – i.e.the premise and the conclusion are both true. (cid:52) We now generalise the intuition in Ex. 2 to anyFOL rule. Let r = (body ⇒ head) denote anarbitrary FOL rule in the form described in Eq. (3),and let vars( r ) = { X , . . . , X n } denote the set ofuniversally quantiﬁed variables in the rule r .Furthermore, let S = { X (cid:55)→ s , . . . , X n (cid:55)→ s n } denote a substitution set , i.e. a mapping fromvariables in vars( r ) to sentences s , . . . , s n ∈ S .The inconsistency loss associated with the rule r on the substitution set S can be deﬁned as: J I ( S ) = [ p ( S ; body) − p ( S ; head)] + (4)where p ( S ; body) and p ( S ; head) denote theprobability of body and head of the rule, after re-placing the variables in r with the corresponding sentences in S . The motivation for the loss inEq. (4) is that logic implications can be understoodas “whenever the body is true, the head has to betrue as well”. In terms of NLI models, this trans-lates as “the probability of the head should at leastbe as large as the probability of the body”.For calculating the inconsistency loss in Eq. (4),we need to specify how to calculate the probabil-ity of head and body . The probability of a singleground atom is given by querying the neural NLImodel, as in Eq. (1). The head contains a singleatom, while the body can be a conjunction of mul-tiple atoms. Similarly to Minervini et al. (2017),we use the Gödel t-norm, a continuous generali-sation of the conjunction operator in logic (Guptaand Qi, 1991), for computing the probability of thebody of a clause: p Θ ( a ∧ a ) = min { p Θ ( a ) , p Θ ( a ) } where a and a are two clause atoms.In this work, we cast the problem of generatingadversarial examples as an optimisation problem:we search for the substitution set S = { X (cid:55)→ s , . . . , X n (cid:55)→ s n } that maximises the inconsis-tency loss in Eq. (4), thus (maximally) violatingthe available background knowledge. Maximising the inconsistency loss in Eq. (4) maynot be sufﬁcient for generating meaningful adver-sarial examples: they can lead neural NLI mod-els to violate available background knowledge, butthey may not be well-formed and meaningful.For such a reason, in addition to maximising theinconsistency loss, we also constrain the perplex-ity of generated sentences by using a neural lan-guage model (Bengio et al., 2000). In this work,we use a LSTM (Hochreiter and Schmidhuber,1997) neural language model p L ( w , . . . , w t ) forgenerating low-perplexity adversarial examples. As mentioned earlier in this section, we cast theproblem of automatically generating adversarialexamples – i.e. examples that cause NLI modelsto violate available background knowledge – as anoptimisation problem. Speciﬁcally, we look forsubstitutions sets S = { X (cid:55)→ s , . . . , X n (cid:55)→ s n } that jointly: a ) maximise the inconsistency loss described in Eq. (4), and b ) are composed by sen-tences with a low perplexity, as deﬁned by the neu-ral language model in Section 4.2.he search objective can be formalised by thefollowing optimisation problem:maximise S J I ( S ) subject to log p L ( S ) ≤ τ (5)where log p L ( S ) denotes the log-probability of thesentences in the substitution set S , and τ is athreshold on the perplexity of generated sentences.For generating low-perplexity adversarial ex-amples, we take inspiration from Guu et al. (2017)and generate the sentences by editing prototypesextracted from a corpus. Speciﬁcally, for search-ing substitution sets whose sentences jointly havea high probability and are highly adversarial, asmeasured the inconsistency loss in Eq. (4), we usethe following procedure: a) we ﬁrst sample sen-tences close to the data manifold (i.e. with a lowperplexity), by either sampling from the trainingset or from the language model; b) we then makesmall variations to the sentences – analogous toadversarial images, which consist in small pertur-bations of training examples – so to optimise theobjective in Eq. (5).When editing prototypes, we consider the fol-lowing perturbations: a) change one word in oneof the input sentences; b) remove one parse sub-tree from one of the input sentences; c) insert oneparse sub-tree from one sentence in the corpus inthe parse tree of one of the input sentences.Note that the generation process can easily leadto ungrammatical or implausible sentences; how-ever, these will be likely to have a high perplexityaccording to the language model (Section 4.2), andthus they will be ruled out by the search algorithm. We now show one can use the adversarial exam-ples to regularise the training process. We proposetraining NLI models by jointly: a) minimising thedata loss (Eq. (2)), and b) minimising the incon-sistency loss (Eq. (4)) on a set of generated adver-sarial examples (substitution sets).More formally, for training, we jointly minimisethe cross-entropy loss deﬁned on the data J D (Θ) and the inconsistency loss on a set of generatedadversarial examples max S J I ( S ; Θ) , resulting inthe following optimisation problem:minimise Θ J D ( D , Θ) + λ max S J I ( S ; Θ) subject to log p L ( S ) ≤ τ (6) Algorithm 1

Solving the optimisation problem inEq. (6) via Mini-Batch Gradient Descent

Require:

Dataset D , weight λ ∈ R + Require:

No. of epochs τ ∈ N + Require:

No. of adv. substitution sets n a ∈ N + // Initialise the model parameters ˆΘ ˆΘ ← initialise () for i ∈ { , . . . , τ } do for D j ∈ batches ( D ) do // Generate the adv. substitution sets S i { S , . . . , S n a } ← generate ( D j ) // Compute the gradient of Eq. (6) L ← J D ( D j , ˆΘ) + λ (cid:80) n a k =1 J I ( S k ; ˆΘ) g ← ∇ Θ L // Update the model parameters ˆΘ ← ˆΘ − ηg end for end for return ˆΘ where λ ∈ R + is a hyperparameter specifying thetrade-off between the data loss J D (Eq. (2)), andthe inconsistency loss J I (Eq. (4)), measured onthe generated substitution set S .In Eq. (6), the regularisation term max S J I ( S ; Θ) has the task of generating the adversarial substitution sets by maximisingthe inconsistency loss. Furthermore, the con-straint log p L ( S ) ≤ τ ensures that the perplexityof generated sentences is lower than a threshold τ . For this work, we used the max aggregationfunction. However, other functions can be usedas well, such as the sum or mean of multipleinconsistency losses.For minimising the regularised loss in Eq. (6),we alternate between two optimisation processes –generating the adversarial examples (Eq. (5)) andminimising the regularised loss (Eq. (6)). The al-gorithm is outlined in Algorithm 1. At each itera-tion, after generating a set of adversarial examples S , it computes the gradient of the regularised lossin Eq. (6), and updates the model parameters viaa gradient descent step. On line 6, the algorithmgenerates a set of adversarial examples, each in theform of a substitution set S . On line 9, the algo-rithm computes the gradient of the adversariallyregularised loss – a weighted combination of thedata loss in Eq. (2) and the inconsistency loss inEq. (4). The model parameters are ﬁnally updatedon line 11 via a gradient descent step. remise A man in a suit walks through a train station.

Hypothesis

Two boys ride skateboard.

Type ContradictionPremise

Two boys ride skateboard.

Hypothesis

A man in a suit walks through a train station.

Type ContradictionPremise

Two people are surﬁng in the ocean.

Hypothesis

There are people outside.

Type EntailmentPremise

There are people outside.

Hypothesis

Two people are surﬁng in the ocean.

Type Neutral

Table 2 : Sample sentences from an AdversarialNLI Dataset generated using the DAM model, bymaximising the inconsistency loss J I . We crafted a series of datasets for assessing the ro-bustness of the proposed regularisation method toadversarial examples. Starting from the SNLI testset, we proceeded as follows. We selected the k instances in the SNLI test set that maximise the in-consistency loss in Eq. (4) with respect to the rulesin R , R , R , and R in Table 1. We refer to thegenerated datasets as A k m , where m identiﬁes themodel used for selecting the sentence pairs, and k denotes number of examples in the dataset.For generating each of the A k m datasets,we proceeded as follows. Let D = { ( x , y i ) , . . . , ( x n , y n ) } be a NLI dataset (suchas SNLI), where each instance x i = ( p i , h i ) is apremise-hypothesis sentence pair, and y i denotesthe relationship holding between p i and h i . Foreach instance x i = ( p i , h i ) , we consider twosubstitution sets: S i = { X (cid:55)→ p i , X (cid:55)→ h i } and S (cid:48) i = { X (cid:55)→ h i , X (cid:55)→ p i } , each correspondingto a mapping from variables to sentences.We compute the inconsistency score associatedto each instance x i in the dataset D as J I ( S i ) + J I ( S (cid:48) i ) . Note that the inconsistency score only de-pends on the premise p i and hypothesis h i in eachinstance x i , and it does not depend on its label y i .After computing the inconsistency scores for allsentence pairs in D using a model m , we select the k instances with the highest inconsistency score,we create two instances x i = ( p i , h i ) and ˆ x i =( h i , p i ) , and add both ( x i , y i ) and (ˆ x i , ˆ y i ) to thedataset A k m . Note that, while y i is already knownfrom the dataset D , ˆ y i is unknown. For this reason,we ﬁnd ˆ y i by manual annotation. Model Original Regularised

Valid. Test Valid. Test M u l t i N L I cBiLSTM 61.52 63.95 DAM 72.78 73.28

ESIM 73.66 75.22 S N L I cBiLSTM 81.41 80.99 DAM 86.96 86.29

ESIM 87.83 87.25

Table 3 : Accuracy on the SNLI and MultiNLIdatasets with different neural NLI models before (left) and after (right) adversarial regularisation.

Adversarial examples are receiving a considerableattention in NLP; their usage, however, is consid-erably limited by the fact that semantically invari-ant input perturbations in NLP are difﬁcult to iden-tify (Buck et al., 2017).Jia and Liang (2017) analyse the robustness ofextractive question answering models on exam-ples obtained by adding adversarially generateddistracting text to SQuAD (Rajpurkar et al., 2016)dataset instances. Belinkov and Bisk (2017) alsonotice that character-level Machine Translationare overly sensitive to random character manipu-lations, such as typos. Hosseini et al. (2017) showthat simple character-level modiﬁcations can dras-tically change the toxicity score of a text. Iyyeret al. (2018) proposes using paraphrasing for gen-erating adversarial examples. Our model is fun-damentally different in two ways: a) it does notneed labelled data for generating adversarial ex-amples – the inconsistency loss can be maximisedby just making an NLI model produce inconsistentresults, and b) it incorporates adversarial examplesduring the training process, with the aim of train-ing more robust NLI models.Adversarial examples are also used for as-sessing the robustness of computer vision mod-els (Szegedy et al., 2014; Goodfellow et al., 2014;Nguyen et al., 2015), where they are created byadding a small amount of noise to the inputs thatdoes not change the semantics of the images, butdrastically changes the model predictions. We trained DAM, ESIM and cBiLSTM on theSNLI corpus using the hyperparameters providedin the respective papers. The results provided bysuch models on the SNLI and MultiNLI validation odel Rule | B | | B ∧ ¬ H | Violations (%)cBiLSTM R R R R R R R R R R R R Table 4 : Violations (%) of rules R , R , R , R from Table 1 on the SNLI training set, yield bycBiLSTM, DAM, and ESIM.and tests sets are provided in Table 3. In the caseof MultiNLI, the validation set was obtained byremoving 10,000 instances from the training set(originally composed by 392,702 instances), andthe test set consists in the matched validation set. Background Knowledge Violations.

As a ﬁrstexperiment, we count the how likely our model isto violate rules R , R , R , R in Table 1.In Table 4 we report the number sentence pairsin the SNLI training set where DAM, ESIM andcBiLSTM violate R , R , R , R . In the | B | column we report the number of times the bodyof the rule holds, according to the model. In the | B ∧ ¬ H | column we report the number of timeswhere the body of the rule holds, but the head doesnot – which is clearly a violation of available rules.We can see that, in the case of rule R (reﬂex-ivity of entailment), DAM and ESIM make a rel-atively low number of violations – namely 0.09and 1.00 %, respectively. However, in the case ofcBiLSTM, we can see that, each sentence s ∈ S in the SNLI training set, with a 23.76 % chance, s does not entail itself – which violates our back-ground knowledge.With respect to R (symmetry of contradic-tion), we see that none of the models is completelyconsistent with the available background knowl-edge. Given a sentence pair s , s ∈ S from theSNLI training set, if – according to the model – s contradicts s , a signiﬁcant number of times (be-tween 9.84% and 46.17%) the same model alsoinfers that s does not contradict s . This phe-nomenon happens 16.70 % of times with DAM,9.84 % of times with ESIM, and 46.17 % withcBiLSTM: this indicates that all considered mod- Regularisation Parameter V i o l a t i on s ( % ) Number of violations (%) made by

ESIM con( X , X ) con( X , X )ent( X , X ) ¬con( X , X )neut( X , X ) ¬con( X , X )ent( X , X ) Figure 1 : Number of violations (%) to rules in Ta-ble 1 made by ESIM on the SNLI test set.els are prone to violating R in their predictions,with ESIM being the more robust.In Appendix A.2 we report several examples ofsuch violations in the SNLI training set. We se-lect those that maximise the inconsistency loss de-scribed in Eq. (4), violating rules R and R . Wecan notice that the presence of inconsistencies isoften correlated with the length of the sentences.The model tends to detect entailment relationshipsbetween longer (i.e., possibly more speciﬁc) andshorter (i.e., possibly more general) sentences. In the following, we analyse the automatic gen-eration of sets of adversarial examples that makethe model violate the existing background knowl-edge. We search in the space of sentences by ap-plying perturbations to sampled sentence pairs, us-ing a language model for guiding the search pro-cess. The generation procedure is described inSection 4.The procedure was especially effective in gen-erating adversarial examples – a sample is shownin Table 6. We can notice that, even thoughDAM and ESIM achieve results close to humanlevel performance on SNLI, they are likely to failwhen faced with linguistic phenomena such asnegation, hyponymy, and antonymy. Gururanganet al. (2018) recently showed that NLI datasetstend to suffer from annotation artefacts and lim-ited linguistic variations: this allows NLI mod-els to achieve nearly-human performance by cap-turing repetitive patterns and idiosyncrasies in adataset, without being able of effectively captur-ing textual entailment. This is visible, for instance,in example 5 of Table 6, where the model failsto capture the hyponymy relation between “male”and “man”, incorrectly predicting an entailment inplace of a neutral relationship. Furthermore, it is odel Dataset A DAM A DAM A DAM A ESIM A ESIM A ESIM A cBiLSTM A cBiLSTM A cBiLSTM DAM AR DAM 47.40 47.93 51.66 55.73 60.94 60.88 81.50 77.37 75.28ESIM AR ESIM 72.40 74.59 76.92 52.08 58.65 60.78 87.00 84.34 82.05cBiLSTM AR cBiLSTM 56.25 59.96 61.75 47.92 53.23 53.73 51.50 52.83 53.24 Table 5 : Accuracy of unregularised and regularised neural NLI models DAM, cBiLSTM, and ESIM, andtheir adversarially regularised versions DAM AR , cBiLSTM AR , and ESIM AR , on the datasets A k m . Adversarial Example Prediction Inconsistency s A man in uniform is pushing a medical bed. s . −−→ s . (cid:32) . s a man is pushing carrying something. s . −−→ s s A dog swims in the water s . −−→ s . (cid:32) . s A dog is swimming outside. s . −−→ s s A young man is sledding down a snow covered hill on a green sled. s . −−→ s . (cid:32) . s A man is sledding down to meet his daughter. s . −−→ s s A woman sleeps on the ground. A boy and girl play in a pool. s . −−→ s . (cid:32) . s Two kids are happily playing in a swimming pool. s . −−→ s s The school is having a special event in order to show the american culture on how other cultures are dealt with in parties. s . −−→ s . (cid:32) . s A school dog is hosting an event. s . −−→ s s A boy is drinking out of a water fountain shaped like a woman. s . −−→ s s A male is getting a drink of water. s . −−→ s . (cid:32) . s A male man is getting a drink of water. s . −−→ s Table 6 : Inconsistent results produced by DAM on automatically generated adversarial examples. Thenotation segment one segment two denotes that the corruption process removes “segment one” and intro-duced “segment two” in the sentence, and s p −→ s indicates that DAM classiﬁes the relation between s and s as contradiction , with probability p . We use different colours for representing the contradiction,entailment and neutral classes. Examples 1, 2, 3, and 4 violate the rule R , while example 5 violates therule R . . (cid:32) . indicates that the corruption process increases the inconsistency loss from .00 to .99,and the red boxes are used for indicating mistakes made by the model on the adversarial examples.clear that models lack commonsense knowledge,such as the relation between “pushing” and “car-rying” (example 1), and being outside and swim-ming (example 2). Generating such adversarialexamples provides us with useful insights on theinner workings of neural NLI models, that can beleveraged for improving the robustness of state-of-the-art models. We evaluated whether our approach for integrat-ing logical background knowledge via adversar-ial training (Section 5) is effective at reducingthe number of background knowledge violations,without reducing the predictive accuracy of themodel. We started with pre-trained DAM, ESIM,and cBiLSTM models, trained using the hyperpa-rameters published in their respective papers. After training, each model was then ﬁne-tunedfor 10 epochs, by minimising the adversariallyregularised loss function introduced in Eq. (6). Ta-ble 3 shows results on the SNLI and MultiNLIdevelopment and test set, while Fig. 1 shows thenumber of violations for different values of λ ,where regularised models are much more likely tomake predictions that are consistent with the avail-able background knowledge.We can see that, despite the drastic reduc-tion of background knowledge violations, the im-provement may not be signiﬁcant, supporting theidea that models achieving close-to-human per-formance on SNLI and MultiNLI may be captur-ing annotation artefacts and idiosyncrasies in suchdatasets (Gururangan et al., 2018). Evaluation on Adversarial Datasets.

We eval-uated the proposed approach on 9 adversarialatasets A k m , with k ∈ { , , } , gen-erated following the procedure described in Sec-tion 6 – results are summarised in Table 5. Wecan see that the proposed adversarial trainingmethod signiﬁcantly increases the accuracy onthe adversarial test sets. For instance, consider A DAM : prior to regularising ( λ = 0 ), DAMachieves a very low accuracy on this dataset – i.e. . . By increasing the regularisation parameter λ ∈ { − , − , − , − } , we noticed sensi-ble accuracy increases, yielding relative accuracyimprovements up to . in the case of DAM,and . in the case of cBiLSTM.From Table 5 we can notice that adversarial ex-amples transfer across different models: an unreg-ularised model is likely to perform poorly also onadversarial datasets generated by using differentmodels, with ESIM being the more robust modelto adversarially generated examples.Furthermore, we can see that regularised mod-els are generally more robust to adversarial exam-ples, even when those were generated using differ-ent model architectures. For instance we can seethat, while cBiLSTM is vulnerable also to adver-sarial examples generated using DAM and ESIM,its adversarially regularised version cBiLSTM AR is generally more robust to any sort of adversarialexamples. In this paper, we investigated the problem of auto-matically generating adversarial examples that vi-olate a set of given First-Order Logic constraints inNLI. We reduced the problem of identifying suchadversarial examples to an optimisation problem,by maximising a continuous relaxation of the vio-lation of such constraints, and by using a languagemodel for generating linguistically-plausible ex-amples. Furthermore, we proposed a method foradversarially regularising neural NLI models forincorporating background knowledge.Our results showed that the proposed methodconsistently yields signiﬁcant increases to the pre-dictive accuracy on adversarially-crafted datasets– up to a 79.6% relative improvement – whiledrastically reducing the number of backgroundknowledge violations. Furthermore, we showedthat adversarial examples transfer across modelarchitectures, and the proposed adversarial train-ing procedure produces generally more robustmodels. The source code and data for re- producing our results is available online, athttps://github.com/uclmr/adversarial-nli/.

Acknowledgements

We are immensely grateful to Jeff Mitchell, Jo-hannes Welbl, Sameer Singh, and the whole UCLMachine Reading group for all useful discussions,inputs, and ideas. This work has been supportedby an Allen Distinguished Investigator Award.

References

Yonatan Belinkov and Yonatan Bisk. 2017. Syntheticand natural noise both break neural machine transla-tion.

CoRR , abs/1711.02173.Yoshua Bengio, Réjean Ducharme, and Pascal Vincent.2000. A neural probabilistic language model. In

Advances in Neural Information Processing Systems13, Papers from Neural Information Processing Sys-tems (NIPS) 2000 , pages 932–938. MIT Press.Johan van Benthem. 2008. A brief history of naturallogic. In M. Chakraborty, B. Löwe, M. Nath Mi-tra, and S. Sarukki, editors,

Logic, Navya-Nyayaand Applications: Homage to Bimal Matilal . Col-lege Publications.Johan Bos and Katja Markert. 2005. Recognis-ing textual entailment with logical inference. In

HLT/EMNLP 2005, Human Language TechnologyConference and Conference on Empirical Methodsin Natural Language Processing, Proceedings of theConference , pages 628–635. The Association forComputational Linguistics.Samuel R. Bowman, Gabor Angeli, Christopher Potts,and Christopher D. Manning. 2015. A large anno-tated corpus for learning natural language inference.In

Proceedings of the 2015 Conference on EmpiricalMethods in Natural Language Processing, EMNLP2015 , pages 632–642. The Association for Compu-tational Linguistics.Christian Buck, Jannis Bulian, Massimiliano Cia-ramita, Andrea Gesmundo, Neil Houlsby, WojciechGajewski, and Wei Wang. 2017. Ask the right ques-tions: Active question reformulation with reinforce-ment learning.

CoRR , abs/1705.07830.Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, HuiJiang, and Diana Inkpen. 2017. Enhanced LSTM fornatural language inference. In

Proceedings of the55th Annual Meeting of the Association for Compu-tational Linguistics, ACL 2017 , pages 1657–1668.Association for Computational Linguistics.Cleo Condoravdi, Dick Crouch, Valeria de Paiva, Rein-hard Stolle, and Daniel G. Bobrow. 2003. Entail-ment, intensionality and text understanding. In

Pro-ceedings of the HLT-NAACL 2003 Workshop on TextMeaning , pages 38–45.do Dagan, Oren Glickman, and Bernardo Magnini.2005. The PASCAL recognising textual entailmentchallenge. In

Machine Learning Challenges, Eval-uating Predictive Uncertainty, Visual Object Clas-siﬁcation and Recognizing Textual Entailment, FirstPASCAL Machine Learning Challenges Workshop,MLCW 2005 , volume 3944 of

LNCS , pages 177–190. Springer.Yaroslav Fyodorov, Yoad Winter, and Nissim Francez.2000. A natural logic inference system. In

Proceed-ings of the of the 2nd Workshop on Inference in Com-putational Semantics .Ian J. Goodfellow, Jonathon Shlens, and ChristianSzegedy. 2014. Explaining and harnessing adver-sarial examples.

CoRR , abs/1412.6572.M. M. Gupta and J. Qi. 1991. Theory of t-normsand fuzzy inference methods.

Fuzzy Sets Syst. ,40(3):431–450.Suchin Gururangan, Swabha Swayamdipta, OmerLevy, Roy Schwartz, Samuel R. Bowman, andNoah A. Smith. 2018. Annotation artifacts in natu-ral language inference data.

CoRR , abs/1803.02324.Kelvin Guu, Tatsunori B. Hashimoto, Yonatan Oren,and Percy Liang. 2017. Generating sentences byediting prototypes.

CoRR , abs/1709.08878.Trevor Hastie, Robert Tibshirani, and Jerome Fried-man. 2001.

The Elements of Statistical Learning .Springer Series in Statistics. Springer New York Inc.Sepp Hochreiter and Jürgen Schmidhuber. 1997.Long short-term memory.

Neural Computation ,9(8):1735–1780.Hossein Hosseini, Baicen Xiao, and Radha Pooven-dran. 2017. Deceiving google’s cloud video intel-ligence API built for summarizing videos. In , pages1305–1309. IEEE Computer Society.Mohit Iyyer, John Wieting, Kevin Gimpel, and LukeZettlemoyer. 2018. Adversarial example generationwith syntactically controlled paraphrase networks.

CoRR , abs/1804.06059.Robin Jia and Percy Liang. 2017. Adversarial exam-ples for evaluating reading comprehension systems.In

Proceedings of the 2017 Conference on EmpiricalMethods in Natural Language Processing, EMNLP2017 , pages 2011–2021. Association for Computa-tional Linguistics.Anjuli Kannan and Oriol Vinyals. 2017. Adver-sarial evaluation of dialogue models.

CoRR ,abs/1701.08198.J.J. Katz. 1972.

Semantic theory . Studies in language.Harper & Row.Hector J. Levesque. 2014. On our best behaviour.

Ar-tif. Intell. , 212:27–35. Bill MacCartney and Christopher D Manning. 2009.An extended model of natural logic. In

Proceed-ings of the of the Eighth International Conferenceon Computational Semantics , Tilburg, Netherlands.Pasquale Minervini, Thomas Demeester, Tim Rock-täschel, and Sebastian Riedel. 2017. Adversarialsets for regularising neural link predictors. In

Pro-ceedings of the Thirty-Third Conference on Uncer-tainty in Artiﬁcial Intelligence, UAI 2017 . AUAIPress.Anh Mai Nguyen, Jason Yosinski, and Jeff Clune.2015. Deep neural networks are easily fooled: Highconﬁdence predictions for unrecognizable images.In

IEEE Conference on Computer Vision and Pat-tern Recognition, CVPR 2015 , pages 427–436. IEEEComputer Society.Denis Paperno, Germán Kruszewski, Angeliki Lazari-dou, Quan Ngoc Pham, Raffaella Bernardi, San-dro Pezzelle, Marco Baroni, Gemma Boleda, andRaquel Fernández. 2016. The LAMBADA dataset:Word prediction requiring a broad discourse context.In

Proceedings of the 54th Annual Meeting of the As-sociation for Computational Linguistics, ACL 2016 .The Association for Computer Linguistics.Ankur P. Parikh, Oscar Täckström, Dipanjan Das, andJakob Uszkoreit. 2016. A decomposable attentionmodel for natural language inference. In (Su et al.,2016), pages 2249–2255.Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, andPercy Liang. 2016. Squad: 100, 000+ questions formachine comprehension of text. In (Su et al., 2016),pages 2383–2392.Laura Rimell and Stephen Clark. 2009. Port-ing a lexicalized-grammar parser to the biomedi-cal domain.

Journal of Biomedical Informatics ,42(5):852–865.Tim Rocktäschel, Edward Grefenstette, Karl MoritzHermann, Tomas Kocisky, and Phil Blunsom. 2016.Reasoning about entailment with neural attention.In

International Conference on Learning Represen-tations (ICLR) .Jian Su et al., editors. 2016.

Proceedings of the 2016Conference on Empirical Methods in Natural Lan-guage Processing, EMNLP 2016 . The Associationfor Computational Linguistics.Christian Szegedy, Wojciech Zaremba, Ilya Sutskever,Joan Bruna, Dumitru Erhan, Ian Goodfellow, andRob Fergus. 2014. Intriguing properties of neuralnetworks. In

International Conference on LearningRepresentations .Adina Williams, Nikita Nangia, and Samuel R. Bow-man. 2017. A broad-coverage challenge corpus forsentence understanding through inference.

CoRR ,abs/1704.05426.

Supplementary Material

A.1 Accuracy on Adversarial Datasets

In the following, we report the accuracy ofDAM on several adversarial datasets A k m , with k = 100 and m ∈ { DAM , ESIM , cBiLSTM } . Regularisation Parameter A cc u r a cy Accuracy of

DAM on k DAM

Dataset Regularisation Parameter A cc u r a cy Accuracy of

DAM on k ESIM

Dataset Regularisation Parameter A cc u r a cy Accuracy of

DAM on k cBiLSTM Dataset

In the following, we report the accuracyof ESIM on several adversarial datasets A k m . Regularisation Parameter A cc u r a cy Accuracy of

ESIM on k DAM

Dataset Regularisation Parameter A cc u r a cy Accuracy of

ESIM on k ESIM

Dataset Regularisation Parameter A cc u r a cy Accuracy of

ESIM on k cBiLSTM Dataset

In the following, we report the accuracy ofcBiLSTM on several adversarial datasets A k m . Regularisation Parameter A cc u r a cy Accuracy of cBiLSTM on k DAM

Dataset Regularisation Parameter A cc u r a cy Accuracy of cBiLSTM on k ESIM

Dataset Regularisation Parameter A cc u r a cy Accuracy of cBiLSTM on k cBiLSTM Dataset

A.2 Adversarial examples

In Table 7 we report inconsistent results producedby DAM on the SNLI training set, which violaterules R and R outlined in Table 1. In Table 8,we report inconsistent results yield by DAM onexamples generated using the procedure describedin Section 4.3. entence Classiﬁcation s A young girl is holding a long thin yellow balloon. s . −−→ s s There is a girl watching a balloon s . −−→ s s A woman dressed in green is rollerskating outside at an event. s . −−→ s s A woman dressed in green is not rollerskating s . −−→ s s A young adult male, wearing black pants, a white shirt and a red belt, is practicing martial arts. s . −−→ s s A guy playing a video game on his ﬂat screen television. s . −−→ s s Man sitting at a computer. s . −−→ s s The man is not outside running. s . −−→ s s Two young women wearing bikini tops and denim shorts walk along side an orange VW Beetle. s . −−→ s s Two young women are not wearing coats and jeans s . −−→ s s A woman in a hat sits reading and drinking a coffee. s . −−→ s s martial arts demonstration s . −−→ s Table 7 : Inconsistent results yield by DAM on the SNLI training set. The notation s p −→ s indicates thatDAM classiﬁes the relation between s and s as contradiction with probability p . We use differentcolours for representing the contradiction, entailment and neutral classes. Examples { , , } (resp. { , , } ) violate the logic rule R (resp. R ) in Table 1. Sentence Classiﬁcation s Two adults, one female in white, with shades and one male, gray clothes, walking across a street, away from a eatery with ablurred image of a dark colored red shirted person in the foreground. s . −−→ s s Two adults dogs walk across a street. s . −−→ s s A person on skis on a rail at night. s . −−→ s s They are fantastic sleeping skiiers s . −−→ s s The school is having a special event in order to show the american culture on how other cultures are dealt with in parties. s . −−→ s s A school dog is hosting an event. s . −−→ s s A woman is walking across the street eating a banana, while a man is following with his briefcase. s . −−→ s s A person that is hungry holding. s . −−→ s s A man and a woman cross the street in front of a pizza and gyro restaurant. s . −−→ s s Near a couple of restaurants picture, two people walk across the street. s . −−→ s s Woman in white in foreground and a man slightly behind walking with a sign for John’s Pizza and Gyro in the background. s . −−→ s s The man with the sign is caucasian near. s . −−→ s s A boy is drinking out of a water fountain shaped like a woman. s . −−→ s s A male is getting a drink of water. s . −−→ s s A male man is getting a drink of water. s . −−→ s s A middle-aged oriental woman in a green headscarf and blue shirt is ﬂashing a giant smile. s . −−→ s s A middle aged oriental woman in a green headscarf and blue shirt is ﬂashing a giant smile s . −−→ s s A middle aged oriental young woman in a green headscarf and blue shirt is ﬂashing a giant smile s . −−→ s s Bicyclists waiting at an intersection. s . −−→ s s The bicycles are on a road. s . −−→ s s The riding bicycles are on a road. s . −−→ s Table 8 : Inconsistent results produced by DAM on adversarial examples generated using the discretesearch procedure described in Section 4.3 – the pattern segment one segment two denotes that the cor-ruption process replaced “segment one” with “segment two”. Examples { , , } (resp. { , , } ) violatethe rule R (resp. R ), while examples { , , } violate the logic rule in R . .3 Background Knowledge Violations In the following we report the number of viola-tions (%) to rules in Table 1 made by DAM, ESIM,and cBiLSTM on the SNLI test set. Regularisation Parameter V i o l a t i on s ( % ) Number of violations (%) made by

DAM con( X , X ) con( X , X )ent( X , X ) ¬con( X , X )neut( X , X ) ¬con( X , X )ent( X , X ) Regularisation Parameter V i o l a t i on s ( % ) Number of violations (%) made by

ESIM con( X , X ) con( X , X )ent( X , X ) ¬con( X , X )neut( X , X ) ¬con( X , X )ent( X , X ) Regularisation Parameter V i o l a t i on s ( % ) Number of violations (%) made by cBiLSTM con( X , X ) con( X , X )ent( X , X ) ¬con( X , X )neut( X , X ) ¬con( X , X )ent( X , X ) A.4 Optimisation algorithms

In Algorithm 2 we describe our algorithm for gen-erating adversarial examples by perturbing sen-tences in a dataset, and using a language modelfor constraining the generation process. In Al-gorithm 3 we describe our adversarial trainingalgorithm: it solves a minimax problem, whereﬁrst a set of adversarial examples is generated bymaximising the inconsistency loss J I . Then, themodel is trained by jointly minimising the dataloss J D and inconsistency loss on the generatedadversarial examples. Algorithm 2

Generation of Adversarial Sentencesvia Stochastic Perturbation Re-ranking

Require:

Perplexity threshold τ ∈ R + // Sample seed sentences from the dataset S ← sample( D ) // Generate a set of candidates, excluding theones with a perplexity higher than τ P ← { (cid:101) S ∈ perturb( S ) | log p L ( (cid:101) S ) ≤ τ } // Return the perturbations that maximise the in-consistency loss J I return arg max (cid:101) S ∈P J I ( (cid:101) S ) Algorithm 3

Solving the optimisation problem inEq. (6) via Mini-Batch Gradient Descent

Require:

Dataset D , weight λ ∈ R + Require:

No. of epochs τ ∈ N + Require: