Learning Credible Deep Neural Networks with Rationale Regularization
LLearning Credible Deep Neural Networks withRationale Regularization
Mengnan Du, Ninghao Liu, Fan Yang, Xia Hu
Department of Computer Science and Engineering, Texas A&M University { dumengnan, nhliu43, nacoyang, xiahu } @tamu.edu Abstract —Recent explainability related studies have shownthat state-of-the-art DNNs do not always adopt correct evidencesto make decisions. It not only hampers their generalization butalso makes them less likely to be trusted by end-users. In pursuitof developing more credible DNNs, in this paper we proposeCREX, which encourages DNN models to focus more on evidencesthat actually matter for the task at hand, and to avoid overfittingto data-dependent bias and artifacts. Specifically, CREX regular-izes the training process of DNNs with rationales, i.e. , a subsetof features highlighted by domain experts as justifications forpredictions, to enforce DNNs to generate local explanations thatconform with expert rationales. Even when rationales are notavailable, CREX still could be useful by requiring the generatedexplanations to be sparse. Experimental results on two textclassification datasets demonstrate the increased credibility ofDNNs trained with CREX. Comprehensive analysis further showsthat while CREX does not always improve prediction accuracyon the held-out test set, it significantly increases DNN accuracyon new and previously unseen data beyond test set, highlightingthe advantage of the increased credibility.
Index Terms —Deep neural network; Explainability; Credibil-ity; Expert rationales
I. I
NTRODUCTION
There has been an increasing interest recently in developingexplainable deep neural networks (DNNs) [1]–[4]. To this end,a DNN model should be able to provide intuitive explanationsfor its predictions. Explainability could shed light into thedecision making process of DNNs and thus increase theiracceptance by end-users. However, explainability alone isinsufficient for DNNs to be credible [5], unless the providedexplanations conform with the well-established domain knowl-edge. That is to say, correct evidences should be adoptedby the networks to make predictions. The incredibility issuehas been observed in various DNN systems. For instance, inquestion answering (QA) tasks, DNNs rely more on functionwords rather than pay attention to task-specific verbs, nounsand adjectives to make decisions [6], [7]. Similarly, in imageclassification, CNNs may make decisions solely according tobackground within images, rather than paying attention toevidences relevant to the objects of interest [8].In this work, we define credible DNNs as the modelsthat could provide explanations to their predictions, while atthe same time the explanations are consistent with the well-established domain knowledge. Considering that correct evi-dences are employed in decision making process, it would beeasier for credible DNNs to build up trust among practitionersand end-users. In addition, credible DNNs could have better generalization capability comparing to untrustable ones. Sincecredible DNNs have truly grasped useful knowledge insteadof memorizing unreliable dataset-specific biases and artifacts ,they could maintain high prediction accuracy for those unseendata instances beyond the training dataset.It is possible to enhance the credibility and generalization ofDNNs from two perspectives: dataset and model training. Theformer category tackles this problem by constructing datasetswith larger quantity and higher quality. Any training data maycontain some biases, either intrinsic noise or additional signalsinadvertently introduced by human annotators [9]. DNNs notonly rely on these biases to make decisions, but also couldamplify them [10], which partly leads to the low credibilityand low generalization problem. Some work has developed de-biased datasets either by filtering out bias data, or constructingnew datasets in an adversarial manner [11]. Nevertheless, thisscheme cannot fully eliminate bias, which still could affectmodel performance. The second category aims at regulatingthe training of DNNs using domain knowledge established byhumans. This is motivated by the observation that purely data-driven learning could lead to counter-intuitive results [12].Thus it is desirable to combine DNNs with the domainknowledge that humans utilize to understand the world, whichhas been proven beneficial in a lot of learning problems [12]–[14]. Therefore, we follow the second strategy using domainknowledge to enhance the credibility of DNNs.Nevertheless, regulating the training of DNNs with domainknowledge to promote model credibility is still a technicallychallenging problem. First, one difficulty lies in how to accu-rately obtain and effectively utilize DNNs’ attention towardsinput features. Although DNN local explanations could iden-tify the contributions of each input feature towards a specificmodel prediction [8], it is still challenging to incorporateexplanation into the end-to-end back-propagation procedureto influence model parameter update. The second challengeis how to use domain knowledge to regularize the models’attention and force models to focus on correct evidences.Previous work have demonstrated that domain knowledge isbeneficial in terms of promoting prediction accuracy of DNNs.For instance, structured knowledge in the form of logical rulescan be transferred to the weights of DNNs through iterativedistillation process [12]. However, it is still unclear how toutilize knowledge to guide the attention of a DNN.To overcome the above challenges, we propose to explorewhether a specific kind of domain knowledge, called rationale , a r X i v : . [ c s . L G ] A ug ask : movie review Label : negative
The movie is so badly put together that even the most casualviewer may notice the miserable pacing and stray plot threads . Task : beer appearance
Label : positive
A beautiful beer, coal black with a thin brown head.
Extremely powerful flavors, but everything is muted by the intense alcohol . the alcohol is so strong.
Fig. 1: Two examples of expert rationale: words marked with purplecolor, for movie review and product review respectively. would be useful in terms of enhancing DNN credibility. Arationale is a subset of features highlighted by annotators andregarded to be more important in predicting an instance [15],[16], with illustrative examples shown in Fig. 1. The rationalesare utilized to direct the model’s attention, enabling it to teaseapart useful evidence from noises and pushing it to pay moreattention to relevant features. Rationales have been applied tothe training process of SVMs [15], [17] to enhance predictiveperformance. Another benefit of rationales is that they requirelittle effort to obtain [18], thus they are possible to be widelyapplied in different applications.In this work, we propose CREX (CRedible EXplanation),an approach regularizing DNNs to utilize correct evidencesto make decisions, in order to promote their credibility andgeneralization capability. The intuition behind CREX is touse external knowledge to regulate the DNN training process.For those training instances coupled with expert rationales,we require the DNN model to generate local explanations thatconform with the rationales. Even when expert rationales arenot available, CREX can still promote model performance byrequiring the generated explanations to be sparse. Throughexperiments on text classification tasks, we demonstrate thatour trained DNNs generally rely on correct evidences to makepredictions. Besides, our trained DNNs generalize much betteron new and previously unseen inputs beyond test set. Themajor contributions of this paper are summarized as follows: • We propose a method to regularize the training of DNNs,called CREX, which aims to enable trained DNNs to focuson correct evidences to make decisions. • CREX is widely applicable to different variants of DNNs.We demonstrate its applicability via three standard architec-tures, including CNN, LSTM and self-attention model. • Experimental results on two text classification datasets val-idate that our trained DNNs could generate explanationsaligning well with expert rationales and show good gen-eralization properties on data beyond test set.II. R
ELATED W ORK
In this section, we briefly present reviews for several re-search areas closely relevant to our work.
A. DNN Interpretability
DNNs are often regarded as black-boxes and criticizedby the lack of interpretability. Towards this end, there is a wide range of work targeting to derive explanations and shedsome insights into the decision making process of DNNs [2],[19]. These work can be grouped into two main categories:global and local explanation, depending on whether the goalis to understand how the DNN works globally or how DNNmakes a specific prediction [1]. Most current work focus onaugmenting DNNs with interpretability [8], [20], [21], whileemploying explanation to enhance the performance of DNNmodels has seldom been explored. In this work, we aim totake advantage of DNN local explanation to promote thegeneralization performance of DNN classifiers.
B. Model Credibility and Generalization
Despite the high performance of DNN models on test set,recent work shows that these models heavily rely on datasetbias instead of true evidences to make decisions [22]. Forinstance, a DNN local explanation approach analyzes threequestion answering models, showing that these models oftenignore important part of the questions, e.g., verbs in questionscarry little influence for the DNN decisions, and rely onirrelevant words to make decisions [6]. Similarly, for binaryhusky and wolf classification task, the CNN simply makesdecisions according to whether there is snow within an imageor not, rather than pays attention to evidences relevant toanimals [8]. This makes the DNN models unreliable andhampers their generalization. In addition, this also makes thesemodels fragile and easily broken by adversarial samples.
C. Unwanted Dataset Bias
Datasets may contain lots of unwanted bias and artifacts,either explicit ones, e.g., gender and ethnic biases, or implicitones. DNNs not only rely on these biases to make decisions,but also could amplify them [10], which partly lead to thelow credibility and low generalization of DNNs on unseendata. In order to alleviate the influence of unwanted datasetbias to models’ performance, one line of work tackles thisproblem by regulating the training of models [23], [24], whilesome others consider to construct more challenging datasetsby eliminating biases and annotation artifacts [11], [25].
D. Combining Human Knowledge with DNNs
Some work enhances DNN models with human-like com-mon sense to make them more credible and robust. Forinstance, the attention of RNN is regularized with human atten-tion values derived from eye-tracking corpora [26]. Structuredknowledge such as logical rules are transfered to the weightof DNNs through iterative distillation process [12]. Besides,rationales are augmented to the training process of CNNmodels [14], linear classification model [5], and SVMs [17].These work indicates that human knowledge has indeed pro-moted the credibility models to some extent. The most similarwork to ours is using human rationales to improve neuralpredictions [27]. However, their work is exclusively designedto regularize the intrinsically interpretable model, i.e., attentionmodel. In contrast, our method is widely applicable to differentnetwork architectures, including both interpretable models andblack-box models, such as CNN and LSTM.II. P
ROBLEM S TATEMENT
In this section, we first introduce the basic notations used inthis paper. Then we present the problem of learning credibledeep neural network models.
Notations : Consider a typical multi-class text classificationtask. Given a training dataset which consists of N instances: D = { ( x , y ) , ... ( x N , y N ) } . Each input text x n is composedof a sequence of T words: x n = { x (1) n , ..., x ( T ) n } , where x ( t ) n ∈ R d denotes the embedding representation of the t -th word. Each y n ∈ { , , ..., C } belongs to one of the C output classes. Part of the training data, with a number of N r ,contains not only input-label pairs ( x n , y n ) , but also rationale r n from domain expert, with two illustrate examples shownin Fig. 1. Each entry of the expert rationale r ( t ) n ∈ { , } ,where indicate that word x ( t ) n is actually responsible for theprediction task, and vice versa. Learning Credible DNNs : The goal is to learn a DNN-based classification model which maps a text input x n to theprobability output f ( x n ) . We expect a trained DNN to rely oncorrect evidences to make decisions and pay more attention towords within the rationales. That is, for a trained DNN, thegenerated local explanation for each testing instance shouldalign well with expert rationales.IV. P ROPOSED
CREX F
RAMEWORK
In this section, we introduce the CREX framework, whichaims to regularize the local explanation when training a DNNfor the task of interest, so as to promote its credibility andgeneralization. Besides feeding labels as supervised signals,we also enforce the explanations of the DNN predictions toconform with expert rationales, and encourage the explana-tions to be sparse if rationales are absent. In this way, thetrained network could make predictions based on the correctevidences that we expect it to focus on.
A. Augmenting Local Explanation
The general idea of DNN local explanation is to attributethe prediction of a DNN to its input, producing a heatmapindicating the contribution of each feature in the input to theprediction. There are several key desiderata for the augmentedlocal explanation method in this work: • Faithful : The provided explanations should be of high fi-delity with respect to predictions of the original model. • Differentiable : We expect the explanation method to beend-to-end differentiable, amenable for training with back-propagation and updating DNN parameters. • Model-agnostic : It is desirable that the explanation methodto be agnostic to network architectures, and thus generallyapplicable to different networks, e.g., CNNs and LSTMs.The explanation of prediction f ( x n ) for input x n is a matrix s n ∈ R T × C , where s ( t,c ) n denotes the contribution of word x ( t ) n towards prediction f c ( x n ) for output class c . We utilize an omission based method [28] to measure the contribution of x ( t ) n , denoted as below: s ( t,c ) n = f c ( x n ) − f c ( x ( \ t ) n ) , (1)which quantifies the deviation of the prediction between theoriginal input x n and the partial input x ( \ t ) n = x (1: t − n ⊕ x ( t +1: T ) n with x ( t ) n omitted. The motivation is that more impor-tant features, once being changed, will cause more significantvariation to the prediction score. It is worth noting that theomission operation may lead to invalid input, which couldtrigger the adversarial side of DNNs. To reflect model behav-iors under normal conditions, phrase omission is conductedinstead of individual word omission. Formally, we computethe contribution of x ( t ) n by averaging the prediction changesof deleting different length- m phrases that contain x ( t ) n : s ( t,c ) n = 1 m m (cid:88) j =1 [ f c ( x n ) − f c ( x (1: t − − m + j ) n ⊕ x ( t + j : T ) n )] . (2)For long text classification, such as documents, we segmenteach original text into sentences and sequentially performomission for each sentence. In such scenario, sentence-levelcontribution scores are obtained as explanation, rather thanword-level scores. Both phrase omission and sentence omis-sion could increase the faithfulness of explanation, comparedwith directly removing individual words [29]. B. Aligning Explanations with Rationales
The key idea of CREX is that DNNs should rely onreasonable evidences to make decisions rather than bias orartifacts. We encourage the explanation to align well withexpert rationales when they are available, by considering twocomplementary conditions as follows. First, for the originalinput, we encourage the generated explanation to be confident and focus on the relevant features as indicated by rationales.Second, for the negative input, where the important featuresare suppressed, the explanation should be uncertain and haverelatively uniform contribution across classes.
1) Confident Explanation:
We first feed original input x n to DNN and get model output f ( x n ) and explanation s n . Therationale r n points out which subset of features is importantand the rest to be irrelevant. Intuitively, we achieve credibilityby encouraging dense contribution scores on known importantfactors and encouraging sparse contribution scores on the re-maining irrelevant features. We define a confident explanation loss ( g conf ), which encourages the explanation to concentrateon rationales: g conf ( x n ) = 1 C C (cid:88) c =1 || (1 − r n ) (cid:12) s (: ,c ) n ) || . (3)The loss aims to shrink the contribution scores of irrelevantfeatures, in order to discourage models from capturing trainingdata specific biases. An implicit effect of this loss is toencourage f to give dense explanation scores to the relevantfeatures, thus making f pay more attention to them. As aresult, the final explanation scores tend to aligning well with NN f ( x n ) x n y n s n r n L rationale L supv L sparse Fig. 2: Schematic of CREX. Black solid lines denote forward pass.Dashed line with arrows on both ends are losses. Dashed line witharrows on one side denote flow of gradients. Three vectors from left toright are input, explanation and rationale, respectively. CREX is DNNarchitecture agnostic, end-to-end trainable, and simple to implement. rationales. In addition, we observe that summing all categories { , ...C } could yield better results comparing to only usinglabel y n when imposing confident explanation regularizationto instance x n .
2) Uncertain Explanation:
When the subset of importantfeatures, as indicated in r n , is deleted in the original input x n ,we expect the DNN model to become uncertain about whichcategory to output. This kind of inputs, named as negativeinputs, are generated as the Hadamard product between theoriginal input x n and the reversed rationale vector (1 − r n ) : x (cid:48) n = x n (cid:12) (1 − r n ) . (4)For instance, the negative input corresponding to the first inputin Fig. 1 is “The movie is that even the most casual viewermay notice the” . The intuition is that after feeding the negativeinput x (cid:48) n to a DNN model, we expect its probability outputfor ground truth label y n to be much smaller comparing tothe probability value of original input x n , since x (cid:48) n lacks theevidence supporting the prediction. At the same time, the con-tributions of different words/sentences should be distributeduniformly. Its implicit effect is to encourage the DNN model togive lower explanation scores to the features not belonging torationales. We first calculate the absolute value of explanationfor x (cid:48) n as ˆ s (: ,y n ) n = | s (: ,y n ) n | , and then normalize it as: e ( t,y n ) n = ˆ s ( t,y n ) n / T (cid:88) k =1 ˆ s ( k,y n ) n (5)The resultant e (: ,y n ) n can be seen as the soft-attention assignedby DNN for x (cid:48) n . After that, we define an uncertain explanation loss ( g unc ): g unc ( x (cid:48) n ) = −| f y n ( x n ) − f y n ( x (cid:48) n ) | − α cos( e (: ,y n ) n , q ) , (6)where q is the discrete uniform distribution denoted as U (1 , T ) , and α is used to balance probability output andexplanation distribution. The cosine similarity is employed toencourage explanation scores to be distributed uniformly.We linearly combine the two loss functions at hand, andcalculate the average value over all training instances with Algorithm 1:
Learning credible DNNs.
Input:
Training data D = { ( x n , y n ) } Nn =1 , validation data D v = { ( x n , y n ) } N v n =1 , and rationales { r n ) } N r n =1 . Set hyperparameters α, β, λ , λ , learning rate η ,iteration number max iter = 10 , sample index i ∈ { , ..., n } , and epoch index t = 0 ; Initialize DNN parameters W ; while t ≤ max iter do L supv = N (cid:80) Nn =1 (cid:80) Cc =1 − ( y n = c ) · log ( f c ( x n )) ; L rationale = N r (cid:80) N r n =1 g conf ( x n ) + βg unc ( x (cid:48) n ) ; L sparse = N − N r ) · C (cid:80) Nn = N r +1 (cid:80) Cc =1 || s (: ,c ) n || ; L ( θ, x, y, r ) = L supv + λ L rationale + λ L sparse W t +1 = Adam ( L ( θ, x, y, r ) , η ) ; Get DNN accuracy on validation set D v ; t = t + 1 ; Output:
DNN f with best accuracy on validation set.rationales as the explanation rationale loss, formulated asfollows: L rationale = 1 N r N r (cid:88) n =1 [ g conf ( x n ) + βg unc ( x (cid:48) n )] . (7)Parameter β is utilized to balance the confident explanationand uncertain explanation. By encouraging confident expla-nations to conform with rationales in original input x n , andsuppressing the probability output as well as explanationvalues in a negative input x (cid:48) n , L rationale regulates a DNN tolearn useful input representations from features belonging torationales and omit information in the irrelevant feature subset. C. Self-guidance When Rationale not Available
In last section, given expert rationales, we render the localexplanation of each instance to conform with its rationale.However, expert rationales may not always be available. Inpractice, the experts may only annotate a small ratio of trainingdata. This could be done either when annotating a new corpus,or when adding rationales post-hoc to an existing corpus.To guide the DNN model to focus on correct evidences insuch scenario, we enforce the generated local explanationvector to be sparse for training instances without rationales.Simpler explanations are more credible, otherwise the densedependencies could make it hard to disentangle the patternsin the input that actually trigger a prediction [30]–[32]. Toachieve this, we propose the sparse explanation loss for thoseinstances without rationales, denoted as follows: L sparse = 1( N − N r ) · C N (cid:88) n = N r +1 C (cid:88) c =1 || s (: ,c ) n || , (8)where the (cid:96) norm helps produce sparse contribution vectors.Note that this summation is performed over the ( N - N r ) instances which have no rationales. . CREX Training Besides regularizing the local explanations for DNN predic-tions, we also expect the DNN model to learn from the groundtruth labels, which is defined using supervised cross-entropyloss function as follows: L supv = 1 N N (cid:88) n =1 C (cid:88) c =1 − ( y n = c ) · log ( f c ( x n )) . (9)Our final model is learned by balancing the supervised approx-imation to the labels and the conformation to expert rationales.We propose the training objective by jointly minimizing thelosses as below: L ( θ, x, y, r ) = L supv + λ L rationale + λ L sparse . (10)Parameters λ and λ are utilized to balance the supervisedloss, rationale loss and sparse loss. For those N r inputscoupled with expert rationales, we impose rationale loss, whilefor the rest N − N r inputs we regularize them with sparseloss. The overall idea of CREX is illustrated in Fig. 2, andthe learning algorithm of CREX is presented in Algorithm 1.Our framework is designed to train the DNN model whichcould make highly accurate predictions (the first term) as wellas make decisions by relying on the correct evidences (thelast two terms). In addition, our CREX training frameworkcan be treated as knowledge distillation process that transfersexpert knowledge from rationales to DNN parameters in orderto yield more credible models. CREX is also general, and canbe added to any DNN models, e.g., CNNs and LSTMs, inorder to enhance model’s credibility.V. E XPERIMENTS
In this section, we evaluate the proposed CREX frameworkon several real-world datasets and present experimental resultsin order to answer the following four research questions. • RQ1 - Does CREX enhance the credibility of DNNs byregularizing the local explanation using expert rationales inthe training process? • RQ2 - Does CREX promote the generalization of DNNswhen processing unseen instances, especially for those databeyond test set? • RQ3 - How do CREX components and hyperparametersaffect DNNs’ performance? • RQ4 - How do the quantity and quality of expert rationalesinfluence the performance of DNNs trained by CREX?
A. Experimental Setup
In this section, we introduce the overall setup of theexperiments, including: I. DNN architectures, II. datasets, III.baseline methods, and IV. implementation details.
1) DNN Architectures:
We consider three representativeDNN architectures for text classification, including CNN [33],LSTM [34], and Self-attention model [35].
CNN : This is a 2-D convolutional network. The convolutionoperation is performed on embedding input { x (1) n , ..., x ( T ) n } using three sizes of kernel: [2, 3, 4]. We will use ReLU Dataset Train Dev Test Text lengthMovie Review (MR) 1,500 100 200 794Product Review (PR) 4,000 473 1,700 113TABLE I: Dataset statistics of MR and PR dataset, including numberfor training, development and test set, as well as average text length. activation after the convolution operation and then apply maxpooling operation for every channel. Finally, the resultingtensors will be concatenated as final input representation.
LSTM : After feeding the input x n = { x (1) n , ..., x ( T ) n } tothe LSTM model, T hidden state vectors { h (1) n , ..., h ( T ) n } areobtained. The dimension of each hidden state vector is 150.Max pooling is performed after all T hidden vectors to obtainthe final input representation. Self-attention : A bidirectional LSTM is first utilized to learninput representations with hidden size of 300. Then the self-attention mechanism is applied on top of LSTM representa-tions to produce a matrix embedding of the input sentence.This matrix contains 10 embeddings, where every embeddingrepresents an encoding of the input sentence but giving anattention to a specific part of the sentence. These embeddingsare concatenated as the final input representation.For all three networks, after transforming variable lengthsentences into fixed size representations, fully connected layersare added after the representations to get logits [36] formultiple output classes. Finally, a softmax layer is added toconvert logits to probability outputs.
2) Datasets and Rationales:
We consider two benchmarktext classification datasets. Both datasets are randomly splitinto training, development and test set, the statistics of whichare reported in Tab. I.
Movie Review Dataset (MR) : It is a binary sentiment classifi-cation dataset with movie reviews from IMDB [37]. Originally,this dataset is obtained by crawling movie reviews from theInternet Movie Database (IMDB), consisting 1000 positiveand 1000 negative movie reviews [37]. Zaidan et al. [15]supplemented this dataset rationales for 1800 documents .The rationales used in this dataset are sub-sentential snippetswith a higher relevance for prediction task , with illustrativeexample shown in Fig. 1. The average length per rationale forper input text is 125, while the average text length is 794.Comparing to the whole text, the rationale is sparse. Product Review Dataset (PR) : It is a multi-aspect beer reviewdataset [38] with data derived from BeerAdvocate . Thisdataset contains reviews for three aspects of beer: appearance,aroma and palate, where we only distinguish appearance.Originally this dataset contains reviews with rating in the rangeof [0 , . Similar to [27], we consider this as binary classifi- ∼ ozaidan/rationales/ In terms of the rationale collection process, the agreement among differentannotators, as well as the time complexity of rationale annotations, we referinterested readers to the work by Zaidan et al. [15]. R PRModels CNN LSTM Atten CNN LSTM AttenVanilla DNN 2.86 2.67 2.40 3.96 3.77 3.73Data Augment 2.75 3.20 2.29 3.85 3.70 4.16Rationale Augment 2.52 2.45 2.25 3.65 3.61 3.59CREX
Parameter λ α cation task, by labelling ratings ≤ ≥
3) Baseline Methods:
We evaluate effectiveness of CREXby comparing it with three baseline approaches. • Vanilla DNN : This is the most typical way to train DNNfor text classification tasks. DNN models are trained withonly standard cross entropy loss, optimizing parameters tominimize Eq. (9). • Data Augmentation : Back translation is an effective dataaugmentation method to boost model performance, e.g.,machine translation [39], [40]. The original text is firsttranslated to an intermediate language (we use German) andthen translated back to English via the Google TranslateAPI . The motivation is to use synonym replacement andsentence paraphrase to avoid overfitting to functional words. • Rationale Augmentation : Expert rationales are extractedfrom the original text as additional training instances. Thesedata are incorporated with original training data, resulting afinal training dataset of double size comparing with originalone. The intuition is to explicitly push DNNs to focus onrationales to make decisions.
4) Implementation Details:
We use the pre-trained 300-dimensional word2vec word embedding [41] to initialize theembedding layer for all three architectures. For those wordsthat do not exist in word2vec, their embedding vectors areinitialized with some random values. We tune the learningrate over the range { e -4, e -3, e -2, e -1 } and utilizeAdam optimizer [42] to optimize these models. For eachmodel, all hyperparameters are tuned using the developmentset, according to the accuracy and credibility performance.Optimal values of α and λ for different models are listedin Tab. II, while β and λ are fixed as 1 and e -5 respectivelyfor all models. To avoid overfitting, we apply dropout to fullyconnected layers for all DNN models [43]. We implement https://pypi.org/project/googletrans https://code.google.com/archive/p/word2vec/ MR PRModels CNN LSTM Atten CNN LSTM AttenVanilla DNN 93.7 93.2
TABLE III: Accuracy comparisons (in percent) of CREX and baselinemethods for three DNN architectures on MR and PR test set. all DNN models using the PyTorch library. Each model istrained for ten epoches and the one with the best performanceon the development set is selected as the final model. Inour experiments, all DNN models could converge within 10epoches, and increasing the number may lead to overfitting.Besides, since all models use random initialization, whichleads to variance in performances at different runs. Therefore,we report the average values over three runs for all DNNs inthe following experiments.
B. Credibility and Accuracy on Test Set
In this section, we evaluate the performance of all trainedDNNs on test set. Two metrics are employed for evaluation:credibility and prediction accuracy. The credibility here isdefined as the extent of agreement between the generated DNNlocal explanations and expert rationales.
1) Quantitative Evaluation of Credibility:
To measure cred-ibility, we calculate the matching degree between local expla-nation of DNN prediction with rationale. Specifically, We usethe symmetric KL divergence between the normalized absolutevalue of explanation s n and the normalized rationale r n : symKL ( s (cid:48) n , r (cid:48) n ) = 12 [ KL ( s (cid:48) n || r (cid:48) n ) + KL ( r (cid:48) n || s (cid:48) n )] (11)where lower divergence means higher credibility [5]. Wecompare the credibility scores of CREX with three baselinemethods on three DNN architectures over MR and PR dataset.The credibility results are presented in Tab. II. Comparing withVanilla, the relative improvement of CREX is encouraging,with KL divergence drops ranging from 0.29 to 0.62 for DNNsin MR, from 0.23 to 0.58 for DNNs in PR. This ascertainsthe effectiveness of CREX in boosting the credibility ofDNNs by pushing them to employ correct evidences to makedecisions. The increased credibility of Rationale Augmentationcomparing to Vanilla DNN also validates the value of expertknowledge, which succeeds to push models to focus more onevidences in the rationales to make decisions. In contrast, usingback translation as Data Augmentation cannot always enhancethe model credibility.
2) Quantitative Evaluation of Accuracy:
DNNs trained viaCREX have comparable predictive accuracy with the threebaselines on MR and PR test set, as shown in Tab. III.Besides, the results of three comparing methods, includingVanilla training, Rationale augmentation, and CREX, are not anillaCREX Nice looking lacing and head. Sweet initial taste with a smokey aftertaste.
Ladies and gentleman, 1997’s Independence day is here!
Its title: Starship Troopers. And surprisingly, it is more entertaining than ID4. CREXVanilla ( b ) Movie review( a ) Product review Ladies and gentleman, 1997’s Independence Day is here!
Its title: Starship Troopers. And surprisingly, it is more entertaining than ID4. Nice looking lacing and head. Sweet initial taste with a smokey aftertaste.
Fig. 3: Sentence-level explanation heatmap comparison betweenCREX and Vanilla DNN. Ground truth is annotated with underline.(a) Beer appearance review, positive label. (b) Movie review, negativesentiment label. Here ID4 denotes the movie Independence Day. substantially different. It means that the increased credibilitydoes not sacrifice model performance on test set.
3) Qualitative Evaluation of Credibility:
We provide casestudies to qualitatively show the effectiveness of the increasedcredibility, as shown in Fig. 3. We show the sentence-levelexplanation scores, where deeper color means higher contri-bution to the prediction. For both cases, these two predictionsare made by self-attention model, trained via Vanilla methodand CREX method respectively.For the first product review (PR) case shown in Fig. 3(a), both DNNs give positive prediction for this testing in-stance, with 99.9% and 99.7% confidence respectively. Wecan observe that the Vanilla DNN pays nearly equal attentionto the second sentence as the first one, even though thesecond sentence talks about the beer palate ( “sweet”, “taste”,“aftertaste” ) and has nothing to do with beer appearance. Itindicates that the DNN classifier may have overfitted to biasin training set. In contrast, CREX could push the DNN to relyon correct evidences relevant to beer appearance, i.e., “goodlooking” , to make decisions. This explanation is consistentwith our human cognition, and thus CREX is more likely toearn trust from end-users.Similarly for the movie review case in Fig. 3 (b), althoughboth self-attention models give correct predictions, they usedistinct evidences to make decisions. Vanilla DNN pays nearlyequal attention to the first and third sentence, where onlythe third sentence contains more generalizable features. Onepossible reason to explain this phenomenon is that the DNNmay have memorized movie-unique terms to make decisions,which is supposed to perform poorly in movie reviews be-yond training and test data. In contrast, CREX could focusmostly on the third sentence with task-relevant adjective i.e., “entertaining” , to make positive sentiment prediction. This
Kaggle PolarityModels CNN LSTM Atten CNN LSTM AttenVanilla DNN 74.3 73.6 74.7 60.7 62.6 64.8Data Augment 75.7 70.3 75.0 62.5 58.1 65.4Rationale Augment 76.5 73.9
TABLE IV: Generalization accuracy (in percent) of DNNs trainedusing MR dataset on two alternative datasets: Kaggle and Polarity.Models CNN LSTM AttenVanilla DNN 92.1 91.5 91.0Data Augment 92.4 92.1 90.1Rationale Augment 92.5 91.9 90.9CREX
TABLE V: Generalization accuracy (in percent) of DNNs trainedusing PR dataset on an adversarial dataset. finding demonstrates that CREX is able to disentangle usefulknowledge from dataset specific biases. In next section, wewill demonstrate the benefit of increased credibility of CREXon unseen testing data which are not drawn from test set.
C. Generalization Accuracy beyond Test Set
Currently, the generalization performance of DNNs is usu-ally calculated using the prediction accuracy on the held-outtest set. This is problematic due to the independent and iden-tically distributed (i.i.d.) training-test split of data, especiallyin the presence of strong priors [22]. The DNN model cansucceed by simply recognize patterns that only happen to bepredictive on instances over the test set [44]. As evidenced bythe example in Sec. V-B3, the DNN may rely on the aromaand palate as evidences to support appearance prediction,which is supposed to perform poorly in beer reviews outsideof the training and test data. Consequently, test set fails toadequately measure how well DNN systems perform on newand previously unseen inputs. To assess the true generalizationability of DNN models as well as to demonstrate the benefitof increased credibility of CREX, we also evaluate the modelperformance using data beyond the test set.
1) Generalization for DNNs Trained on MR:
For DNNstrained on MR, we use two alternative datasets: • Kaggle movie reviews dataset ( Kaggle ) It is a binarysentiment classification benchmark, with movie reviewsfrom IMDB, consisting of 50,000 reviews. • Sentence polarity dataset ( Polarity ) [45]. Another bi-nary sentiment classification dataset with data from IMDB,consisting of 10,662 reviews.Note that none of the data from these two datasets is utilized totrain DNN models or tune hyperparameters. They only serve odels Credibility Kaggle PolarityCREX conf 2.27 76.7 63.0CREX unc 2.37 77.6 62.2CREX 2.24 78.4 63.2TABLE VI: Ablation analysis of CNN trained on MR dataset. Thefirst column is credibility score on MR test set, the last two columnsdenote generalization accuracy on two alternative datasets. the testing purpose. The generalization accuracy statistics areshown in Tab. IV. There are several key observations. Firstly,comparing with the accuracy in Tab. III, there is a significant generalization gap between predictive accuracy on MR testset and Kaggle (or Polarity), for all three architectures. Almostmost of the accuracy scores are above 90% on the correspond-ing test set. In contrast, all accuracy scores are below 80%for Kaggle and below 70% for Polarity dataset. Secondly,CREX could reduce this generalization gap comparing tobaseline methods. In Tab. IV, CREX DNNs achieve substantialaccuracy enhancements comparing to Vanilla DNNs, withrelative accuracy improvement of 4.1%, 2.1%, 0.5% for threenetworks on Kaggle, and 2.5%, 1.2%, 0.9% for three networkson Polarity. These enhancements have validated the benefitof the increased credibility of our trained DNNs. Thirdly, aninteresting observation is that there exists a positive correlationbetween the degree of credibility and the generalization accu-racy on data not existing in test set. Rationale Augmentationhas consistent accuracy improvement comparing with Vanilla,while Data Augment via back translation does not, as shownin Tab. IV. This conforms very well with the credibilityperformance in Tab. II.
2) Generalization for DNNs Trained on PR:
To test gen-eralization performance of DNNs trained on PR, we createan adversarial dataset by removing sentences relevant to beeraroma and palate. This is achieved via detecting sentences con-taining word “taste”, “smell”, “aroma”, “flavor”, “drinking” from the original PR test set. Note that we only differentiatebeer appearance, thus description words about beer aromaand palate are considered as training set specific bias. Thecorresponding accuracy is shown in Tab. V, where CREX con-sistently outperforms baseline methods. Particularly, CREXDNNs have promoted the accuracy ranging from 0.2% to0.8% comparing to Vanilla DNNs. It demonstrates that ourtrained DNNs rely more on correct evidences relevant to beerappearance rather than aroma and palate to make decisions,thus could achieve better generalization accuracy.
D. Ablation Study and Hyperparameters Analysis
In this section, we utilize CNN trained on MR dataset toconduct ablation and hyperparameter analysis to study theimpacts and contributions of different components of CREX.
1) Ablation Study:
We compare CREX with its ablationsto identify the contributions of different components. Theablations include (I). CREX conf, using only confident ex-planation loss in Eq. (3), and (II). CREX unc, using only (a) Credibility (b) Accuracy
Fig. 4: CNN performance under different values of parameter λ . (a)credibility performance on MR test set. (b) generalization accuracy(in percent) on two alternative datasets. (a) Credibility (b) Accuracy Fig. 5: CNN performance under different numbers of rationale. (a)credibility performance on MR test set. (b) generalization accuracy(in percent) on two alternative datasets. uncertain explanation loss in Eq. (6). The comparison resultsbetween CREX and its ablations are listed in Tab. VI. We canobserve that CREX outperforms the two ablations in terms ofcredibility as well as generalization accuracy on Kaggle andPolarity dataset. It indicates that these two components arecomplementary to each other in general, thus both are crucialin promoting model performance.
2) Hyperparameter Analysis:
We evaluate the effect of dif-ferent degrees of rationale loss regularization towards models’performance, by altering the value of the hyperparameter λ .As shown in Tab. II, the optimal λ for CNN trained onMR dataset is 5e-2. We are interested in how the modelperformance changes as we keep increasing the value of λ . The credibility and generalization accuracy are shown inFig. 4. As the value of λ increases, the CNN credibilitybegin to drop, i.e., KL divergence increases, and the modelgeneralization accuracy on Kaggle and Polarity also decreases.Particularly, we observe a dramatic change of credibility andaccuracy when λ is larger than 0.25. This indicates that themodels have overfitted to rationales, which also could sacrificegeneralization performance. E. Rationale Quantity and Quality Analysis
When incorporating human knowledge with DNN models,the quantity and quality of knowledge could have significantinfluences. In this section, we employ CNN trained on MRdataset to analyze how the performances of neural networkswould be affected by different conditions of rationale. a) Mistakes (b) Missing
Fig. 6: Rationale quality analysis using CNN generalization accuracy(in percent). (a) containing different ratios of mistakes. (b) missingdifferent ratios of rationales.
1) Rationale Number Analysis:
We study the effect ofexpert knowledge by altering the number of rationales N r in the training set, and examine the credibility and accuracychange of the trained CNN. For those instances withoutrationales, we impose sparse regularization as in Eq. (8). Theresults are illustrated in Fig. 5. There are two interestingobservations. Firstly, even when rationale number N r = , ourCNN could achieve improved performance comparing to theVanilla CNN. The divergence has dropped from 2.86 to 2.58comparing to Tab. II, and Kaggle and Polarity accuracy hasincreased 1.4% and 0.7% respectively comparing to Tab. IV,showing the effectiveness of sparse explanation loss in Eq.(8). Secondly, when the rationale number is 500, our CNNalready has comparable accuracy comparing with N r = ,indicating that a small ratio of rationales is sufficient fornetwork performance promotion. Considering the annotationeffort of expert rationales, this advantage of requiring smallnumber of rationales is significant.
2) Rationale Quality Analysis:
In this experiment, we an-alyze the effect of low quality rationales towards the DNNmodel performance. We consider two types of low quality: (I)containing mistakes (expert annotations could be sometimeswrong, and some irrelevant features are highlighted by theexperts); (II) missing another set of important rationales. Tosimulate the first case, we inject different level of noise tothe current rationales, and test model performance. Similarly,to test the second case, we delete different ratios of importantfeatures from current rationales to make the knowledge incom-plete. We report CNN generalization accuracy over Kaggleand Polarity in Fig. 6. There are several key findings. Firstly,the model performances are highly sensitive to rationalenoise (see Fig. 6 (a)), where a small ratio of mistakes, e.g.,10%, would significantly decrease generalization accuracy.Secondly, model performances are relatively robust to missingrationales (see Fig. 6 (b)). The reason for this phenomenon isthat the remaining rationale still contains important features.By capturing sparse connections between input text and output,model could make reasonable predictions. Thirdly, consideringthat rationale missing is more common than containing crucialmistakes in real world rationale annotation, thus CREX isrelatively robust to low-quality knowledge.
Models Training time Test time per inputVanilla CNN 2.5 min 8e-3 secondsCREX CNN 18.1 min 8e-3 secondsTABLE VII: Running time comparison of Vanilla and CREX CNN.For training time, we report average value for three runs. Test timeis the average over test set.
3) Running Efficiency Analysis:
Due to the calculationof local explanation and regularization using rationales, thetraining speed of CREX is slightly slower than the VanillaDNN training. As shown in Tab. VII, on average it takes 24minutes to train CNN on the Movie Review dataset if usingall the 1500 rationales in the training process (with our unop-timized code and using PyTorch GPU version). Even thoughCREX requires less training epoches to converge comparingto Vanilla, each epoch takes longer time than Vanilla training.To promote the training scalability of CREX, i.e., when theCREX is trained on a dataset which has much more trainingdata comparing to MR and PR, We could reduce the ratio ofrationales to speed up running of each epoch and make thetotal training time bearable. On the other hand, during the teststage, DNNs trained by CREX would need the same time (onaverage 8e-3 seconds) as Vanilla to yield prediction for aninput, meaning that increased credibility of CREX would notsacrifice inference speed.VI. C
ONCLUSION AND F UTURE W ORK
There has been an increasing interest recently in developingmore trustworthy DNNs. In pursuit of this objective, wepropose CREX, aiming to train credible DNNs which employcorrect evidences to make decisions. We employ a specifickind of domain knowledge, called rationales, to guide thelearning algorithms towards providing credible explanations,by pushing the explanation vectors to conform with rationales.CREX is DNN architecture agnostic, end-to-end trainable,and simple to implement. Experimental results show that ourresulting DNN models have a higher probability to look atcorrect evidences rather than training dataset specific bias tomake predictions. Although DNNs trained using CREX donot always improve prediction accuracy on held-out test set,they generalize much better on data which are beyond test setand which are representatives of underlying real-world tasks,highlighting the advantages of the increased credibility. Highcredibility and robustness of DNN are essential to earn trustof end-users towards a network model’s predictions, and webelieve the enhanced credibility and generalization will pavethe way for their wider adoptions in real world.On the other hand, it is not guaranteed that the incorpo-ration of human knowledge with DNN models would alwayspromote neural network performance, unless the knowledgehave sufficiently high quality. Currently, we have explored theenhancement of DNNs via relatively high quality rationales.The low-quality knowledge issue is a challenging topic andwould be explored in our future research.
EFERENCES[1] G. Montavon, W. Samek, and K.-R. M¨uller, “Methods for interpretingand understanding deep neural networks,”
Digital Signal Processing(DSP) , 2018.[2] M. Du, N. Liu, and X. Hu, “Techniques for interpretable machinelearning,”
Communications of the ACM (CACM) , 2019.[3] M. Du, N. Liu, Q. Song, and X. Hu, “Towards explanation of dnn-basedprediction with guided feature inversion,”
Proceedings of the 24th ACMSIGKDD International Conference on Knowledge Discovery & DataMining (KDD) , 2018.[4] M. Du, N. Liu, F. Yang, S. Ji, and X. Hu, “On attribution of recurrentneural network predictions via additive decomposition,” in
The WorldWide Web Conference (WWW) , 2019.[5] J. Wang, J. Oh, H. Wang, and J. Wiens, “Learning credible models,”in
Proceedings of the 24th ACM SIGKDD International Conference onKnowledge Discovery & Data Mining (KDD) , 2018.[6] P. K. Mudrakarta, A. Taly, M. Sundararajan, and K. Dhamdhere, “Did themodel understand the question?” , 2018.[7] B. Rychalska, D. Basaj, P. Biecek, and A. Wroblewska, “Does it carewhat you asked? understanding importance of verbs in deep learning qasystem,”
EMNLP workshop , 2018.[8] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should i trust you?:Explaining the predictions of any classifier,” in
Proceedings of the 22thACM SIGKDD International Conference on Knowledge Discovery &Data Mining (KDD) , 2016.[9] S. Gururangan, S. Swayamdipta, O. Levy, R. Schwartz, S. R. Bowman,and N. A. Smith, “Annotation artifacts in natural language inferencedata,”
North American Chapter of the Association for ComputationalLinguistics (NAACL) , 2018.[10] T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai,“Man is to computer programmer as woman is to homemaker? debiasingword embeddings,” in
Thirtieth Conference on Neural InformationProcessing Systems (NIPS) , 2016.[11] R. Zellers, Y. Bisk, R. Schwartz, and Y. Choi, “Swag: A large-scaleadversarial dataset for grounded commonsense inference,”
EmpiricalMethods in Natural Language Processing (EMNLP) , 2018.[12] Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. Xing, “Harnessing deep neuralnetworks with logic rules,” , 2016.[13] T. Mihaylov and A. Frank, “Knowledgeable reader: Enhancing cloze-style reading comprehension with external commonsense knowledge,” , 2018.[14] Y. Zhang, I. Marshall, and B. C. Wallace, “Rationale-augmented convo-lutional neural networks for text classification,” in
Empirical Methodsin Natural Language Processing (EMNLP) , 2016.[15] O. Zaidan, J. Eisner, and C. Piatko, “Using annotator rationales toimprove machine learning for text categorization,” in
North AmericanChapter of the Association for Computational Linguistics (NAACL) ,2007.[16] T. Lei, R. Barzilay, and T. Jaakkola, “Rationalizing neural predictions,”
Empirical Methods in Natural Language Processing (EMNLP) , 2016.[17] J. Donahue and K. Grauman, “Annotator rationales for visual recogni-tion,”
International Conference on Computer Vision (ICCV) , 2011.[18] T. McDonnell, M. Lease, M. Kutlu, and T. Elsayed, “Why is thatrelevant? collecting annotator rationales for relevance judgments,” in
Fourth AAAI Conference on Human Computation and Crowdsourcing ,2016.[19] F. Doshi-Velez and B. Kim, “Towards a rigorous science of interpretablemachine learning,” arXiv preprint arXiv:1702.08608 , 2017.[20] F. Yang, M. Du, and X. Hu, “Evaluating explanation withoutground truth in interpretable machine learning,” arXiv preprintarXiv:1907.06831 , 2019.[21] N. Liu, M. Du, and X. Hu, “Representation interpretation with spatialencoding and multimodal analytics,” in
Proceedings of the Twelfth ACMInternational Conference on Web Search and Data Mining (WSDM) ,2019.[22] A. Agrawal, D. Batra, D. Parikh, and A. Kembhavi, “Don’t just assume;look and answer: Overcoming priors for visual question answering,” in
Proceedings of the IEEE Conference on Computer Vision and PatternRecognition (CVPR) , 2018. [23] L. A. Hendricks, K. Burns, K. Saenko, T. Darrell, and A. Rohrbach,“Women also snowboard: Overcoming bias in captioning models,” in , 2018.[24] S. Ramakrishnan, A. Agrawal, and S. Lee, “Overcoming languagepriors in visual question answering with adversarial regularization,” in
Advances in Neural Information Processing Systems (NeurIPS) , 2018.[25] P. Rajpurkar, R. Jia, and P. Liang, “Know what you don’t know: Unan-swerable questions for squad,” , 2018.[26] M. Barrett, J. Bingel, N. Hollenstein, M. Rei, and A. Søgaard, “Sequenceclassification with human attention,” in
Proceedings of the 22nd Con-ference on Computational Natural Language Learning (CoNLL) , 2018,pp. 302–312.[27] Y. Bao, S. Chang, M. Yu, and R. Barzilay, “Deriving machine attentionfrom human rationales,” , 2018.[28] J. Li, W. Monroe, and D. Jurafsky, “Understanding neural networksthrough representation erasure,” arXiv preprint arXiv:1612.08220 , 2016.[29] A. K´ad´ar, G. Chrupała, and A. Alishahi, “Representation of linguisticform and function in recurrent neural networks,”
Computational Lin-guistics , pp. 761–780, 2017.[30] B. Peters, V. Niculae, and A. F. Martins, “Interpretable structure induc-tion via sparse attention,” in
EMNLP Workshop , 2018.[31] C. Malaviya, P. Ferreira, and A. F. Martins, “Sparse and constrainedattention for neural machine translation,” , 2018.[32] Z. C. Lipton, “The mythos of model interpretability,” arXiv preprintarXiv:1606.03490 , 2016.[33] Y. Kim, “Convolutional neural networks for sentence classification,”
Empirical Methods in Natural Language Processing (EMNLP) , 2014.[34] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
Neuralcomputation , 1997.[35] Z. Lin, M. Feng, C. N. d. Santos, M. Yu, B. Xiang, B. Zhou, andY. Bengio, “A structured self-attentive sentence embedding,”
Interna-tional Conference on Learning Representations (ICLR) , 2017.[36] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neuralnetwork,” arXiv preprint arXiv:1503.02531 , 2015.[37] B. Pang and L. Lee, “A sentimental education: Sentiment analysis usingsubjectivity summarization based on minimum cuts,” in
Proceedings ofthe 42nd annual meeting on Association for Computational Linguistics(ACL) , 2004.[38] J. McAuley, J. Leskovec, and D. Jurafsky, “Learning attitudes andattributes from multi-aspect reviews,” in
International Conference onData Mining (ICDM) . IEEE, 2012.[39] R. Sennrich, B. Haddow, and A. Birch, “Improving neural machinetranslation models with monolingual data,” , 2016.[40] A. Poncelas, D. Shterionov, A. Way, G. M. d. B. Wenniger, andP. Passban, “Investigating backtranslation in neural machine translation,” arXiv preprint arXiv:1804.06189 , 2018.[41] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,“Distributed representations of words and phrases and their composition-ality,” in
Conference on Neural Information Processing Systems (NIPS) ,2013.[42] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 , 2014.[43] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-dinov, “Dropout: a simple way to prevent neural networks from overfit-ting,”
The Journal of Machine Learning Research , 2014.[44] P. Minervini and S. Riedel, “Adversarially regularising neural nli modelsto integrate logical background knowledge,”
The SIGNLL Conference onComputational Natural Language Learning (CoNLL) , 2018.[45] B. Pang and L. Lee, “Seeing stars: Exploiting class relationships forsentiment categorization with respect to rating scales,” in