[PDF] Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News

Abstract

Large-scale dissemination of disinformation online intended to mislead or deceive the general population is a major societal problem. Rapid progression in image, video, and natural language generative models has only exacerbated this situation and intensified our need for an effective defense mechanism. While existing approaches have been proposed to defend against neural fake news, they are generally constrained to the very limited setting where articles only have text and metadata such as the title and authors. In this paper, we introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions. To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles as well as conduct a series of human user study experiments based on this dataset. In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies, which will serve as an effective first line of defense and a useful reference for future work in defending against machine-generated disinformation.

Full PDF

DDetecting Cross-Modal Inconsistency to Defend AgainstNeural Fake News

Reuben Tan

Boston University [email protected]

Bryan A. Plummer

Boston University [email protected]

Kate Saenko

Boston University [email protected]

Abstract

Large-scale dissemination of disinformationonline intended to mislead or deceive the gen-eral population is a major societal problem.Rapid progression in image, video, and naturallanguage generative models has only exacer-bated this situation and intensiﬁed our need foran effective defense mechanism. While exist-ing approaches have been proposed to defendagainst neural fake news , they are generallyconstrained to the very limited setting wherearticles only have text and metadata such as thetitle and authors. In this paper, we introducethe more realistic and challenging task of de-fending against machine-generated news thatalso includes images and captions. To identifythe possible weaknesses that adversaries canexploit, we create a NeuralNews dataset com-posed of 4 different types of generated articlesas well as conduct a series of human user studyexperiments based on this dataset. In addi-tion to the valuable insights gleaned from ouruser study experiments, we provide a relativelyeffective approach based on detecting visual-semantic inconsistencies, which will serve asan effective ﬁrst line of defense and a usefulreference for future work in defending againstmachine-generated disinformation.

The rapid progression of generative models in bothcomputer vision (Goodfellow et al., 2014; Zhanget al., 2017, 2018; Choi et al., 2018) and naturallanguage processing (Jozefowicz et al., 2016; Rad-ford et al., 2018, 2019) has led to the increasinglikelihood of realistic-looking news articles gener-ated by Artiﬁcial Intelligence (AI). The malicioususe of such technology could present a major so-cietal problem. Zellers et al. (2019) report thathumans are easily deceived by its AI-generatedpropaganda. By manipulating such technology,adversaries would be able to disseminate large

Parliament was scheduled to reconvene on Oct 9, but Mr. Johnson said he plan- ned to extend its break. nytimes.com

What's Next for Britons after Brexit?

August 28, 2019 - Anne Smith

In September, voters overwhelming rejected a plan from Prime Minister Theresa May’s team for the United Kingdom to stay in the European Union. On March 29, Britain will officially exit the union after years of campaigning and serious negotiations. The EU’s chief Brexit negotiator, Michel Barnier, has warned that there could be no future trade deals with the United Kingdom if there is a “no deal.” The transition period will allow the United Kingdom and the European Union to work out a new plan for their relationship. But we may not know ... articlephotocaptionmodel

Human or machine-generated?

Figure 1:

We propose an approach for detecting news articlesgenerated by machines. Prior work uses only the article con-tent and metadata including title, date, domain, and authors.However, news articles often contain photos and captions aswell. We propose to leverage possible visual-semantic incon-sistency between the article text, images, and captions, suchas missing or inconsistent named entities (underlined in red). amounts of online disinformation rapidly. Whileit is promising that the pretrained generative mod-els themselves are our best defense (Zellers et al.,2019), it is often challenging to be aware of themodels utilized by adversaries beforehand. Moreimportantly, it ignores the fact that news articles areoften accompanied by images with captions (Leeet al., 2018; Ji et al., 2019; Huang et al., 2019).In this paper, we present the ﬁrst line of de-fence against neural fake news with images andcaptions. To the best of our knowledge, we arethe ﬁrst to address this challenging and realisticproblem. Premised on the assumption that the ad-versarial text generator is unknown beforehand, wepropose to evaluate articles based on the semanticconsistency between the linguistic and visual com-ponents. While state-of-the-art approaches in bidi-rectional image-sentence retrieval (Lee et al., 2018;Yang et al., 2019b) have leveraged visual-semanticconsistency to great success on standard datasetssuch as MSCOCO (Lin et al., 2014) and Flickr30K(Plummer et al., 2015), we show in Appendix Dthey are not able to reason effectively about objectsin an image and named entities present in the cap- a r X i v : . [ c s . A I] O c t ion or article body. This is due to discrepancies inthe distribution of these datasets, as captions in thestandard datasets usually contain general terms in-cluding woman or dog as opposed to named entitiessuch as Mrs Betram and a

Golden Retriever , whichare commonly contained in news article captions.Moreover, images are often not directly related tothe articles they are associated with. For example,in Figure 1, the article contains mentions of theBritish Prime Minister. Yet, it only contains animage of the United Kingdom ﬂag.To circumvent this problem, we present DIDAN,a simple yet surprisingly effective approach whichexploits possible semantic inconsistencies betweenthe text and image/captions to detect machine-generated articles. For example, notice that thearticle and caption in Fig. 1 actually mention differ-ent Prime Ministers. Besides evaluating the seman-tic relevance of images and captions to the article,DIDAN also exploits the co-occurrences of namedentities in the article and captions to determinethe authenticity score . The authenticity score canbe thought of as the probability that an article ishuman-generated. We adopt a learning paradigmcommonly used in image-sentence retrieval wheremodels are trained to reason about dissimilaritiesbetween images and non-matching captions. Inthis instance, negative samples constitute articlesand non-corresponding image-caption pairs. Notonly is this a reasonable approach when the ad-versarial generative model is unknown, we showempirically that it is crucial to detecting machine-generated articles with high conﬁdence even withaccess to machine-generated samples during train-ing. More importantly, this means that DIDANis easily trained on the abundance of online newsarticles without additional costly annotations.To study this threat, we construct the Neural-News dataset which contains both human andmachine-generated articles. These articles containa title, the main body as well as images and cap-tions. The human-generated articles are sourcedfrom the GoodNews (Biten et al., 2019) dataset.Using the same titles and main article bodies ascontext, we use GROVER (Zellers et al., 2019) togenerate articles. Instead of using GAN-generatedimages which are easy to detect even without ex-posure to them during training time (Wang et al.,2019), we consider the much harder setting wherethe articles are completed with the original im-ages. We include both real and generated captions which are generated with the SOTA entity-awareimage captioning model (Biten et al., 2019). Wepresent results and ﬁndings from a series of em-pirical as well as user study experiments. In theuser study experiments, we use 4 types of articlesincluding real and generated news to determinewhat humans are most susceptible to. The insightsderived from these ﬁndings help identify the pos-sible weaknesses that adversaries can exploit toproduce neural fake news and serve as a valuablereference for defending against this threat. Last butnot least, our experimental results provide a com-petitive baseline for future research in this area.In summary, our contributions are multi-fold:1. We introduce the novel and challenging taskof defending against full news article contain-ing image-caption pairs. To the best of ourknowledge, this is the ﬁrst paper to addressboth the visual and linguistic aspects of de-fending against neural fake news.2. We introduce the NeuralNews dataset that con-tains both human and machine-generated arti-cles with images and captions.3. We present valuable insights from our empir-ical and user study experiments that identifyexploitable weaknesses.4. We propose DIDAN, an effective namedentity-based model that serves as a good base-line for defending against neural fake news.Most importantly, we empirically prove theimportance of training with articles and non-matching images and captions even when theadversarial generative models are known.

GROVER (Zellers et al., 2019) draws on recentimprovements in neural text generation (Jozefow-icz et al., 2016; Radford et al., 2015, 2018, 2019)to generate realistic-looking articles complete withmetadata such as title and publication date but with-out images. Interestingly, it also serves as the bestform of defense against its own generated propa-ganda. (Adelani et al., 2020) show that the GPT-2model can be manipulated to generate fake reviewsto deceive online shoppers. Corroborating the ﬁnd-ings by (Zellers et al., 2019), they also report thatpretrained language models such as GROVER andGPT-2 are unable to accurately detect fake reviews.To combat effectively against the dissemination ofeural disinformation, (Tay et al., 2020) propose apromising direction of reverse engineering the con-ﬁgurations of neural language models to identifydetectable tokens. Last but not least, (Biten et al.,2019) introduce an approach to generate image cap-tions based on contextual information derived fromnews articles. Such progress points towards theinevitability of large-scale dissemination of gener-ated propaganda and the signiﬁcance of this task.

In recent years, the introduction of Generative Ad-versarial Networks (Zhang et al., 2017, 2018; Choiet al., 2018) has led to unprecedented progress inimage and video generation. While most of thesehave focused on generating images from text aswell as video translation, such models can easily beexploited to generate disinformation which can bedevastating to privacy and national security (Mirskyand Lee, 2020; Chesney and Citron, 2019a,b). In re-sponse to this growing threat, (Agarwal et al., 2019)propose a forensic approach to identify fake videosby modeling people’s facial expressions and speak-ing movements. In a similar vein to (Tay et al.,2020), (Matern et al., 2019; Yang et al., 2019a;Wang et al., 2019, 2020) seek to exploit visual ar-tifacts to detect face manipulations and deepfakes.Encouragingly, Wang et al. (2019) show that neu-ral networks can easily learn to detect generatedimages even without exposure to training samplesfrom those generators.

To facilitate our endeavor of studying this threat,we introduce the NeuralNews dataset which con-sists of human and machine-generated articles withimages and captions. It provides a valuable testbedfor AI-enabled disinformation that adversaries canexploit presently and yet, is the hardest to detect.The human-generated articles are sourced from theGoodNews (Biten et al., 2019) dataset, which con-sists of New York Times news articles spanningfrom 2010 to 2018. Each news article contains atitle, the main article body as well as image-captionpairs. Note that we source original images fromreal articles since machine-generated images arerelatively easy to detect (Wang et al., 2019). In ourdataset, we restrict the number of image-captionpairs to be at most 3 per article. The entire datasetused in the empirical and user study experimentscontains the following 4 types of articles (see ex- N ≤ < N ≤ N > Table 1:

NeuralNews dataset statistics across its 128K arti-cles. Note that images are aggregated for both types of articles,since generated articles use the same images (but different ar-ticles and/or captions) as its corresponding real article. amples in Appendix E):A ) Real Articles and Real CaptionsB ) Real Articles and Generated CaptionsC ) Generated Articles and Real CaptionsD ) Generated Articles and Generated CaptionsIn total, we collect about 32K samples of each ar-ticle type (resulting in about 128K total). To collectmachine-generated news articles, we use GROVER(Zellers et al., 2019) to generate fake articles us-ing original titles and articles from the GoodNewsdataset as context. Type C articles are completedby incorporating the original image-caption pairs.In Type B and D articles, we use the entity-awareimage captioning model (Biten et al., 2019) to gen-erate fake image captions based on the articles. Webelieve that this dataset presents a realistic and chal-lenging setting for defending against neural news.

Dataset Statistics.

Table 1 provides statistics onthe length of articles and number of images in ourNeural News dataset. Most articles contain at most40 sentences in their main body. In addition, eventhough most articles contain a single image andcaption, a sizeable 18.2% have 3 images. We be-lieve that this setting will provide a challengingtestbed for future work to investigate methods us-ing varying number of images and captions.

We endeavor to determine the susceptibility of hu-mans to different types of neural fake news throughseveral user studies. To this end, we conduct aseries of user study experiments based on the Neu-ralNews dataset. The user study results providevital information to help us identify salient pointswhich adversaries can take advantage of. QualiﬁedAmazon Mechanical Turk workers with requiredEnglish proﬁciency are used in all of our experi-ments. We brieﬂy describe the experimental se-tups below. See Appendix A for a template of ourprompts and response options for each setting. rticle Article Only Naive Participants Trained ParticipantsType Article Caption Accuracy Accuracy Accuracyno imgs generated – 68.8% – –no imgs real – 49.2% – –A real real – 64.0% 70.7%B real generated – 34.0% 78.7%C generated real – 42.7% 56.7%D generated generated – 44.0% 55.3%Average – - 59.0% 46.2% 67.8%

Table 2:

User prediction results.

We report the percentages of participants who are able to classify articles as human-generatedor machine-generated accurately given different kinds of information and/or training (see Section 4 for additional details). Amore in-depth breakdown of results can be found in Appendix B. (Best)(Worst)

Figure 2:

Trustworthiness results.

Human evaluation ofthe 4 article types in the trustworthiness experiment. Workersare asked to evaluate the article based on its style, content, con-sistency and overall trustworthiness. We observe that peoplegenerally have a hard time deciding on the overall trustworthi-ness on articles regardless of their types. The prompt and theresponse options can be found in Appendix A.

Trustworthiness: How well are humans able torate the trustworthiness of news articles?

Thisexperiment extends the study from Zellers et al.(2019) to also use images and captions. The goalis to understand the qualitative factors that humansconsider to decide on the authenticity of articlesby asking participants to evaluate articles based onstyle, content, consistency between text and imagesand overall trustworthiness using a four point scalewhere higher scores indicate more trust.

Article Only User Predictions: Given articleswith titles which do not contain images and cap-tions, can humans detect if they are generatedor not?

We ask participants to predict if an articleis fake when they only contain their titles and mainarticle bodies. In this variant, participants are pro-vided with hints to pay attention to possible incon-sistencies between the text and title. This is donewith the purpose of understanding the importanceof visual-semantic cues provided by image-captionpairs in this task in the following experiments.

Naive User Predictions: Can humans discern if an article is real or generated without prior ex-posure to generated articles?

In this experiment,participants are tasked to decide based on their ownjudgements if the articles are human or machine-generated after reading them. The intuition behindthis experiment is to determine humans’ capabilityto identify fake news without prior exposure.

Trained User Predictions: Are humans able todetect generated articles if they are told whataspects to pay attention to beforehand?

We pro-vide limited training to participants by showingthem examples of human and machine-generatedarticles that speciﬁcally highlighted semantic in-consistencies between articles and image-captionpairs. Afterwards, we ask our trained participantsto determine if a article is human or machine gen-erated as done for the naive user predictions.

Figure 2 reports the results from our trustworthi-ness experiment where participants evaluated theoverall trustworthiness of the article, but were notasked to determine if it was real or machine gen-erated. These results show that humans generallyhave trouble with agreeing on the semantic rele-vance between images and the text (article bodyand captions), as evident from the large variancein their responses. We hypothesize the loose con-nection between an article and an image (1 to bea possible factor. Adversaries could easily exploitthis to disseminate realistic-looking neural fakenews. Consequently, exploring the visual-semanticconsistency between the article text and imagescould prove to be an important area for research indefending against generated disinformation. Whileit is reassuring that the overall trustworthiness forthe human-generated articles is the highest amongthe different article classes, these results also high-light the susceptibility of humans to being deceivedby generated neural disinformation. The differencebetween the overall trustworthiness ratings acrosshe different article classes is marginal.Table 2 reports the aggregated percentages ofparticipants who are able to detect human andmachine-generated articles accurately from the restof the user study experiments. See Appendix B fora complete breakdown of results. Trained partici-pants are deemed to have classiﬁed a Type B arti-cle correctly if they select any of 4 responses thatindicate visual-semantic inconsistencies betweenimages and the article or captions. The signiﬁcantdifference in the detection accuracy of Type B ar-ticles between the naive and trained users suggestthat humans do not typically pay much attention toimage captions in online news. However, it is alsoreassuring that 14% more participants are able todetect them after prior exposure.We predict that Type C articles will be the mostlikely type of neural disinformation that adversarieswould exploit for their purposes, given the currentstate of SOTA neural language models and image-captioning models. While recent neural languagemodels are able to produce realistic-looking text,SOTA image captioning models are generally notable to generate captions of comparable quality. Of-tentimes, the generated captions contain repeatedinstances of named entities without any stop words.In summary, it is worrying that humans are par-ticularly susceptible to being deceived by Type Cand D articles in Table 2. However, we believe thatthere are fewer repercussions from the spread ofType B articles with real article content and gen-erated captions. Since the generated captions onlymakes up a very small component of the entire arti-cle, the information conveyed is less likely to mis-lead people. In contrast, Type C articles have thepotential to be exploited by adversaries to dissemi-nate large amount of misleading disinformation dueto its generated article contents. Consequently, ourproposed approach is geared towards addressingthis particular type of generated articles. etecting Cross-Modal I nconsistency to D efend A gainst N euralFake News In our task, the goal is to detect machine-generatedarticles that also include images and captions. Theexample in Figure 1 points towards an inherentchallenge: identifying indirect relationships be-tween the image and the text. Due to the commonneed to measure visual-semantic similarity, an intu-itive ﬁrst step would be to base one’s approach on image-sentence similarity reasoning models whichare commonly used in SOTA bidirectional image-sentence retrieval. We hypothesize, from their dis-mal performance (Table 9), that the image-sentenceretrieval models are not adept at relating named en-tities in the articles to objects in the images. Thissuggests that contextual information about namedentities from the article body is essential.As a ﬁrst line of defense, we present our namedentity-based approach DIDAN. Besides integratingcontextual information from the text, DIDAN fac-tors in the co-occurrence of named entities in thearticle body and caption to detect possible visual-semantic inconsistencies. This is based on a simpleobservation that captions, more often than not, con-tain mentions of named entities that are present inthe main body too. DIDAN is trained on real andgenerated articles. To train our model to detectvisual-semantic inconsistencies between imagesand text, we also adopt the learning paradigm fromimage-sentence similarity models. In this case, thenegative samples are real but the article and itsimage-caption pairs are mismatched.An illustrative overview of DIDAN is shown inFigure 3. An article A consists of a set of sentences S where S = { S , · · · , S A } . Each sentence S i con-tains a sequence of words { w , · · · , w i } . The articleis also comprised of a set of image-caption pairswhere each image I is represented by a set of re-gional object features { o , ··· , o I } and each caption C contains a sequence of words C = { w , · · · , w I } .Spacy’s named entity recognition model (Honnibaland Montani, 2017) is used to detect named enti-ties in both articles and image captions. d T , d I and d vse are used to denote the initial dimensions ofthe text and image representations as well as thehidden dimension respectively. Each sentence istokenized and encoded with a BERT model (De-vlin et al., 2018) that is pretrained on BooksCorpus(Zhu et al., 2015) and English Wikipedia. To extract relevant semantic context from the arti-cle, we begin by computing sentence representa-tions. For each sentence S i in article A , the wordrepresentations are ﬁrst projected into the articlesubspace as follows: S i = W art V i (1)where V i represent all word embeddings in S i . Fora given sentence S i , its representation S if is com-puted as the average of all its word representations ert Textual Encoder FCBNRELUFCSigmoid

Detector

Object-By-Word Interactions

Visual-SemanticRepresentations

Caption WordRepresentationsObjectRepresentations ArticleRepresentations Binary Named Entity Indicator Machine-Generated / User-Generated

Figure 3:

An overview of our proposed DIDAN model. To reason about relationships between named entities present inthe article and entities in an image, DIDAN integrates article context into the visual-semantic representation learned fromﬁne-grained object-by-word interactions. The aforementioned visual-semantic representation is used to infer an authenticityscore for the entire news article. where the subscript f denotes the correspondingrepresentation. In turn, the article representation A f for an article A is computed as the average ofall its sentence representations. Our approach leverages word-speciﬁc image rep-resentations learned from images and captions todetermine their relevance to an article. A captionis represented by a feature matrix V capf ∈ R n c × D T and an image is represented by a matrix of objectfeatures V visf ∈ R n o × D I . As in the previous sec-tion, the word embeddings of a caption and imageobject features are projected into a common visual-semantic subspace using: C capf = W cap V capf (2) I visf = W vis V visf (3)A key property of these visual-semantic representa-tions is that they are built on ﬁne-grained interac-tions between words in the caption and objects inthe image. To begin, a semantic similarity score iscomputed for every possible pair of projected wordand object features w l , v k , respectively. s kl = v Tk w l (cid:107) v k (cid:107) (cid:107) w l (cid:107) where k ∈ [1 , n o ] and l ∈ [1 , n c ] . (4)where n c and n o indicate the number of words andobjects in a caption and image, respectively. These similarity scores are normalized over the objects todetermine the salience of each object with respectto a word in the caption. a kl = exp( s kl ) (cid:80) n o i =1 exp( s il ) . (5)The word-speciﬁc image representations are com-puted as a weighted sum of the object featuresbased on the normalized attention weights: w Il = a Tl I visf (6) A key contribution of our approach is the utiliza-tion of a binary indicator feature, which indicatesif the caption contains a reference to a named en-tity present in the main article body. The articlerepresentation and the average of the word-speciﬁcimage representations are concatenated to createcaption-speciﬁc article representations which arepassed into the discriminator: A cf = concat { A f , n c n c (cid:88) l =1 w Il , b c } (7)where concat {· · ·} denotes the concatenation op-eration and b c is the binary indicator feature. Thekey insight is that article context is integrated intothe caption-speciﬁc article representations. Ourdiscriminator (Figure 3) is a simple neural networkthat is comprised of a series of Fully-ConnectedFC), Rectiﬁed Linear Unit (ReLU), Batch Nor-malization (BN) and sigmoid layers. It outputs an authenticity score for every image-caption pair.Recall that in our problem formulation news ar-ticles can contain varying numbers of images andcaptions. The ﬁnal authenticity score of an articleis determined across those of its images and cap-tions. It can be thought of as the probability that anarticle is human-generated. The authenticity score is computed across the set of images and captionsin an article as follows: p A = 1 − (cid:89) images (1 − p IA ) (8)where p IA is the authenticity score of image-captionpair I with respect to article A. Intuitively, if animage-caption pair is deemed to be relevant to thearticle body (scores close to 1), then the ﬁnal au-thenticity score will be close to 1 as well.The entire model is optimized end-to-end with abinary cross-entropy loss. L = − (cid:88) ( A + ,I + ) (cid:88) I − y log( p A ) + (1 − y ) log (1 − p A ) . (9)where I − denotes non-matching sets of imagesand captions with respect to an article and y is theground-truth label of an article. Negative imagesand their captions are sampled from other articleswithin the same minibatch. Given a news articlefrom our NeuralNews dataset, our goal is to auto-matically predict whether it is human or machine-generated. We compare DIDAN to several base-lines, evaluating performance based on how oftenan article was correctly labeled. Note that in ourexperiments, only Type A and C articles are used.This is due to the fact that generated captions oftencontain repeated instances of named entities with-out any stop words, which is not challenging forhumans to detect (see Table 2). To comprehendthe importance of each component of DIDAN andeach part of the news article, we supplement ouranalysis with ablation experiments. Our model is implemented using Pytorch. In ourimplementation, the dimensions for the Bert-Baseand object region features d T and d I are set to768 and 2048 respectively. We also set the dimen-sion of the visual-semantic embedding space d vse to be 512. The image region representations are extracted with the bottom-up attention (Andersonet al., 2018) model that is pretrained on VisualGenome (Krishna et al., 2017). The language rep-resentations are extracted from a pretrained BERT-Base model (Devlin et al., 2018). We adopt aninitial learning rate of e − and train our modelsend-to-end using the ADAM optimizer.In addition to ablations of our model, we alsocompare to a baseline using Canonical CorrelationAnalysis (CCA), which learns a shared semanticspace between two sets of paired features, as wellas the GROVER Discriminator. In our CCA im-plementation, images are represented as the aver-age of its object region features and the captionis represented by the average of its word features.We apply CCA between the article features (Sec-tion 5.1), and the concatenation of the image andcaption features. The projection matrices in CCAare learned from positive samples constituting arti-cles and their corresponding images and captions.The GROVER Discriminator is a simple neuralnetwork used in (Zellers et al., 2019) to detect itsown generated articles based on the article text andmetadata. We train the GROVER Discriminatorwithout mismatched data and without images orcaptions. Training on Real News Only.

The topof Table 3 shows that our approach signiﬁcantlyimproves detection accuracy when trained withoutany generated examples (i.e. with mismatched realnews as negatives) compared to CCA. Our namedentity indicator (NEI) features provide a large im-provement in this most difﬁcult setting.

Training with Generated Samples.

We considerthe realistic setting where generated articles maybe available but the generator is not. We report theperformance achieved by variants of DIDAN whentrained on Grover-Mega generated articles in thesecond-to-last column of Table 3. Note that theresult achieved by GROVER Discriminator, akinto our text-only variant, is substantially worse thanthe result reported in (Zellers et al., 2019). This isbecause we train it with BERT representations asopposed to leveraging GROVER learned represen-tations to detect its own generated articles. Basedon the consistent trend of the results, training ongenerated articles from the same generator as ap-pears in test data improves the capability of a neuralnetwork to detect them. The binary NEI featuresalso prove to be very beneﬁcial to increasing thedetection accuracy of DIDAN. Interestingly, evenwhen we have access to generated articles during pproach Trained With Named Entity Generated Articles GROVER-Mega GROVER-LargeMismatch Indicator in Training (%) Accuracy (%) Accuracy (%)CCA - - None 52.1 -DIDAN (cid:88) - None 54.5 - (cid:88) (cid:88)

None -Grover Discriminator - - 50 56.0 -- - 25 51.2 49.9- - 50 56.4 53.7- (cid:88)

25 64.9 64.6DIDAN - (cid:88)

50 68.8 66.3 (cid:88) - 25 61.0 65.0 (cid:88) - 50 70.3 57.4 (cid:88) (cid:88)

25 80.9 69.8 (cid:88) (cid:88) Table 3:

Results of machine-generated (with GROVER-Mega) vs real news detection on our NeuralNews dataset. We showperformance of DIDAN variants trained on generated (with GROVER-Large or GROVER-Mega) articles and image-captionspairs when the number of generated articles is limited during training time.

Mismatch indicates real data but with images andcaptions that do not correspond to the article body. The percentages of real and machine-generated articles do not change acrossvariants that are trained with or without mismatch data.Articles Images Captions DIDAN Accuracy (%) CCA Accuracy (%) (cid:88) (cid:88) (cid:88) (cid:88) - (cid:88) (cid:88) (cid:88) - 56.9 52.1 Table 4:

Ablation results of CCA and DIDAN’s detection accuracy w.r.t. the contribution of each component in the news article.Experiments are performed on NeuralNews and the training as well as testing articles are generated by GROVER-Mega. training, the large improvement in detection accu-racy going from 68.8% to 85.6% when also trainingon mismatched real images and captions suggeststhat visual-semantic consistency has an importantrole to play in defending against neural fake news.

Unseen Generator.

To evaluate DIDAN’s capabil-ity to generalize to articles created by generators un-seen during training, we train on GROVER-Largegenerated articles and evaluate on GROVER-Megaarticles (last column of Table 3). While overallaccuracy drops, we observe the same trend whereour proposed training with mismatched real datahelps increase the detection accuracy from 66.3%to 77.6%, and removing NEI lowers accuracy.

Images vs Captions.

Table 4 show more abla-tion results of our model and CCA on NeuralNews.We observe an improvement of 2% in accuracyachieved by CCA variants with images. This sug-gests that visual cues from images can providecontextual information vital to detecting generatedarticles. This is also corroborated by the ablationresults obtained by DIDAN, where we observe thatboth images and captions are integral to detectingmachine-generated articles. While the contributionof the captions is the most signiﬁcant, we note thatthe visual cues provided by images are integral toachieving the best detection accuracy.

In Figure 4 and 5 we present examples of ourmodel’s prediction of sample articles (additional ex-amples can be found in appendix F). In Figure 4 weobserve that DIDAN is able to classify a machine-generated article correctly. One plausible reasonfor this is that the main subject in the caption doesnot match the person who was mentioned in thearticle body and DIDAN is able to pick up on thisrelatively easily. However, the example in Fig-ure 5 presents an especially challenging setting forDIDAN. In this case, the caption is only looselyrelated to the article and the image may or maynot portray the situation described in the article.Successfully determining the relevance of such re-lationships requires more abstract reasoning, whichmay be a good direction for future work.

While this is not entirely representative of all the fu-ture challenges presented by neural fake news , webelieve that this comprehensive study will providean effective initial defense mechanism against arti-cles with images and captions. Based on the ﬁnd-ings from the user evaluation, humans may be eas-ily deceived by articles generated by SOTA modelsif they are not attuned to noticing possible visual- an Who Jumped From Ambulance Says It’s New York City’s Fault.

Jon Vernick, the man who plunged from an ambulance parked on Manhattan’s Upper East Side on Wednesday, says it was the city’s fault for allowing him to get that close to the patient. Writing in the New York Post, Vernick — who miraculously survived the fall and was listed in stable condition with a broken collarbone — said that he was waiting for the doctor to arrive when he jumped out of the ambulance. The EMS workers did not have the ability to do anything to stop me.” A spokesman for City Council Health Committee chair Ydanis Rodriguez said the committee was in the process of conducting an investigation. Vernick says he went for an interview with NBC New York and tried to tell his story so the city would not continue to put people in precarious positions. Read the full story at New York Post. Related Man falls out of ambulance on the Upper East Side and breaks his collarbone. Man jumped from ambulance because he did not get his job, police say. On June 11, 2016, Yaugeni Kralkin, who was drunk, exited an ambulance en route to Staten Island University Hospital South Campus.

Authenticity Score

Machine-Generated Score

Figure 4: A machine-generated article that was classiﬁed correctly as such by DIDAN.

An American soldier and two other Marines were killed and four others wounded Thursday night when an Afghan Afghan Army vehicle they were riding in was struck by a suicide car bomber, the U.S. military said. It was the second such attack by the Taliban in the province of Helmand. U.S. forces are working with Afghan troops to beat back the insurgency as the United States prepares to withdraw most of its combat troops from Afghanistan in the coming months. According to officials in the province, the attacker in the car had disguised himself as a policeman before he detonated the bomb. But Col. Abdul Marouf, the police commander in Khanashin district, said the Taliban had previously targeted a checkpoint in the same area. “It was near the checkpoint that they killed an Afghan security force official,” he said. “Now they are targeting us.” “I haven’t heard of any friendly casualties on our side,” said a police commander in Sangin district. “We rely on our U.S. partners.”

In the Taliban’s Heartland, U.S. and Afghan Forces Dig In

American soldiers on patrol last month in Kandahar, Afghanistan, found and blew up a Taliban bunker. An influx of troops has begun to change the area.

Authenticity Score

Machine-Generated Score

Figure 5: A machine-generated article that was classiﬁed incorrectly by DIDAN. semantic inconsistencies between the article textand images. Adversaries can easily exploit this factto create misleading disinformation by generatingfake articles and combining them with manuallysourced images and captions.Encouragingly, our experimental results suggestthat visual-semantic consistency is an importantand promising research area in our defense againstneural news.We hope future work will address any poten-tial limitations of this work, such as expandingthe dataset to evaluate generalization across dif-ferent news sources, and a larger variety of neu-ral generators. Other interesting avenues for fu- ture research is to understand the importance ofmetadata in this multimodal setting and investi-gating counter-attacks to improved generators thatincorporate image-text consistency. Last but notleast, DIDAN and NeuralNews may be leveragedto supplement fact veriﬁcation in detecting human-written misinformation in general by evaluatingvisual-semantic consistency.

Acknowledgements:

This work is supported inpart by NSF awards IIS-1724237, CNS-1629700,CCF-1723379. We would also like to thank Profes-sor Derry Wijaya for the valuable discussions andfeedback. eferences

David Ifeoluwa Adelani, Haotian Mai, Fuming Fang,Huy H Nguyen, Junichi Yamagishi, and IsaoEchizen. 2020. Generating sentiment-preservingfake online reviews using neural language modelsand their human-and machine-based detection. In

International Conference on Advanced InformationNetworking and Applications , pages 1341–1354.Springer.Shruti Agarwal, Hany Farid, Yuming Gu, MingmingHe, Koki Nagano, and Hao Li. 2019. Protectingworld leaders against deep fakes. In

Proceedings ofthe IEEE Conference on Computer Vision and Pat-tern Recognition Workshops , pages 38–45.Peter Anderson, Xiaodong He, Chris Buehler, DamienTeney, Mark Johnson, Stephen Gould, and LeiZhang. 2018. Bottom-up and top-down attention forimage captioning and visual question answering. In

CVPR .Ali Furkan Biten, Lluis Gomez, Marc¸al Rusinol, andDimosthenis Karatzas. 2019. Good news, everyone!context driven entity-aware captioning for news im-ages. In

Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition , pages12466–12475.Bobby Chesney and Danielle Citron. 2019a. Deepfakes: A looming challenge for privacy, democracy,and national security.

Calif. L. Rev. , 107:1753.Robert Chesney and Danielle Citron. 2019b. Deep-fakes and the new disinformation war: The comingage of post-truth geopolitics.

Foreign Aff. , 98:147.Yunjey Choi, Minje Choi, Munyoung Kim, Jung-WooHa, Sunghun Kim, and Jaegul Choo. 2018. Stargan:Uniﬁed generative adversarial networks for multi-domain image-to-image translation. In

Proceedingsof the IEEE conference on computer vision and pat-tern recognition , pages 8789–8797.Jacob Devlin, Ming-Wei Chang, Kenton Lee, andKristina Toutanova. 2018. Bert: Pre-training of deepbidirectional transformers for language understand-ing. arXiv preprint arXiv:1810.04805 .Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza,Bing Xu, David Warde-Farley, Sherjil Ozair, AaronCourville, and Yoshua Bengio. 2014. Generative ad-versarial nets. In

Advances in neural informationprocessing systems , pages 2672–2680.Matthew Honnibal and Ines Montani. 2017. spacy 2:Natural language understanding with bloom embed-dings, convolutional neural networks and incremen-tal parsing.

To appear , 7(1).Yan Huang, Yang Long, and Liang Wang. 2019. Few-shot image and sentence matching via gated visual-semantic embedding. In

Proceedings of the AAAIConference on Artiﬁcial Intelligence , volume 33,pages 8489–8496. Zhong Ji, Haoran Wang, Jungong Han, and YanweiPang. 2019. Saliency-guided attention network forimage-sentence matching. In

Proceedings of theIEEE International Conference on Computer Vision ,pages 5754–5763.Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, NoamShazeer, and Yonghui Wu. 2016. Exploringthe limits of language modeling. arXiv preprintarXiv:1602.02410 .Amrith Krishna, Pavan Kumar Satuluri, and PawanGoyal. 2017. A dataset for Sanskrit word segmen-tation. In

Proceedings of the Joint SIGHUM Work-shop on Computational Linguistics for Cultural Her-itage, Social Sciences, Humanities and Literature ,pages 105–114, Vancouver, Canada. Association forComputational Linguistics.Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu,and Xiaodong He. 2018. Stacked cross attentionfor image-text matching. In

Proceedings of theEuropean Conference on Computer Vision (ECCV) ,pages 201–216.Tsung-Yi Lin, Michael Maire, Serge J. Belongie,Lubomir D. Bourdev, Ross B. Girshick, James Hays,Pietro Perona, Deva Ramanan, Piotr Doll´ar, andC. Lawrence Zitnick. 2014. Microsoft COCO: com-mon objects in context.

CoRR , abs/1405.0312.Falko Matern, Christian Riess, and Marc Stamminger.2019. Exploiting visual artifacts to expose deep-fakes and face manipulations. In , pages 83–92. IEEE.Yisroel Mirsky and Wenke Lee. 2020. The creationand detection of deepfakes: A survey. arXiv preprintarXiv:2004.11138 .Bryan A Plummer, Liwei Wang, Chris M Cervantes,Juan C Caicedo, Julia Hockenmaier, and SvetlanaLazebnik. 2015. Flickr30k entities: Collectingregion-to-phrase correspondences for richer image-to-sentence models. In

Proceedings of the IEEEinternational conference on computer vision , pages2641–2649.Alec Radford, Luke Metz, and Soumith Chintala. 2015.Unsupervised representation learning with deep con-volutional generative adversarial networks. arXivpreprint arXiv:1511.06434 .Alec Radford, Karthik Narasimhan, Tim Salimans,and Ilya Sutskever. 2018. Improving languageunderstanding by generative pre-training.

URLhttps://s3-us-west-2. amazonaws. com/openai-assets/researchcovers/languageunsupervised/languageunderstanding paper. pdf .Alec Radford, Jeffrey Wu, Rewon Child, David Luan,Dario Amodei, and Ilya Sutskever. 2019. Languagemodels are unsupervised multitask learners.

OpenAIBlog , 1(8):9.i Tay, Dara Bahri, Che Zheng, Clifford Brunk, Don-ald Metzler, and Andrew Tomkins. 2020. Reverseengineering conﬁgurations of neural text generationmodels. arXiv preprint arXiv:2004.06201 .Sheng-Yu Wang, Oliver Wang, Andrew Owens,Richard Zhang, and Alexei A Efros. 2019. De-tecting photoshopped faces by scripting photoshop.In

International Conference on Computer Vision(ICCV) .Sheng-Yu Wang, Oliver Wang, Richard Zhang, AndrewOwens, and Alexei A Efros. 2020. Cnn-generatedimages are surprisingly easy to spot... for now. In

Computer Vision and Pattern Recognition (CVPR) .Xin Yang, Yuezun Li, and Siwei Lyu. 2019a. Ex-posing deep fakes using inconsistent head poses.In

ICASSP 2019-2019 IEEE International Confer-ence on Acoustics, Speech and Signal Processing(ICASSP) , pages 8261–8265. IEEE.Zhengyuan Yang, Boqing Gong, Liwei Wang, WenbingHuang, Dong Yu, and Jiebo Luo. 2019b. A fast andaccurate one-stage approach to visual grounding. In

Proceedings of the IEEE International Conferenceon Computer Vision , pages 4683–4693.Rowan Zellers, Ari Holtzman, Hannah Rashkin,Yonatan Bisk, Ali Farhadi, Franziska Roesner, andYejin Choi. 2019. Defending against neural fakenews. In

Advances in Neural Information Process-ing Systems , pages 9051–9062.Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Au-gustus Odena. 2018. Self-attention generative adver-sarial networks. arXiv preprint arXiv:1805.08318 .Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang,Xiaogang Wang, Xiaolei Huang, and Dimitris NMetaxas. 2017. Stackgan: Text to photo-realisticimage synthesis with stacked generative adversarialnetworks. In

Proceedings of the IEEE internationalconference on computer vision , pages 5907–5915.Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhut-dinov, Raquel Urtasun, Antonio Torralba, and SanjaFidler. 2015. Aligning books and movies: Towardsstory-like visual explanations by watching moviesand reading books. In

Proceedings of the IEEE inter-national conference on computer vision , pages 19–27.

User Study Templates

A.1 Trustworthiness Study Template

This experiment requires users to evaluate the quality of news articles based on the following 4 criteria.The response options are displayed next to their corresponding score ratings. For ease of comparisons, weadopt the same metrics and scoring system in (Zellers et al., 2019).(a) (Style) Is the style of the article consistent?1) No, it reads like it’s written by a madman.2) Sort of, but there are certain sentences that are awkward or strange.3) Yes, this sounds like an article I would ﬁnd at an online news source.(b) (Content) Does the content of this article make sense?1) No, I have no (or almost no) idea what the author is trying to say.2) Sort of, but I don’t understand what the author means in certain places.3) Yes, this article reads coherently.(c) (Consistency) Does the text match the images?1) No, the images do not match the text and captions.2) Sort of, the images match the captions but do not match the text.3) Sort of, the images match the text but do not match their captions.4) Yes, the images match the text and captions.(d) (Trustworthiness) Does the article read like it comes from a trustworthy source?1) No, this seems like it comes from an unreliable source.2) Sort of, but something seems a bit ﬁshy.3) Yes, I feel that this article could come from a news source I would trust.

A.2 Naive User Predictions Study Template

In the second user study experiment, users are asked to decide based on their own judgements if thearticles are human or machine-generated after reading them. These articles contain images and captions.(a) Do you think this article is human or machine-generated?1) Human-generated.2) Machine-Generated.

A.3 Trained User Predictions Study Template

In this variant, users are given hints to pay more attention to speciﬁc components of news articles throughthe provided response options. The response options provide users with cues to look at possible semanticinconsistencies between the articles and image-caption pairs.(a) Choose the rating that you think is the most suitable for the given news article.1) Human-Generated.2) Machine-generated because 1 or more images is not very relevant to the article body.3) Machine-generated because 1 or more captions is not very relevant to the article body.4) Machine-generated because 1 or more images are not very relevant to the caption.5) Machine-generated because the image/caption pairs are not relevant to the article body.6) Machine-generated because something about the article appears off.7) Machine-generated because the article title is not really relevant to the article. .4 Article-only User Predictions Study Template

The fourth user study experiment is similar to that of the third experiment except that it does not containimages or captions. Instead, users are told to focus on possible mismatches between the title and articlebody.(a) Choose the rating that you think is the most suitable for the given news article.1) Human-generated.2) Machine-generated because something about the article appears off.3) Machine-generated because the article title is not really relevant to the article.

B User Study Results

B.1 Naive User Study Results

Article Type AccuracyA 64.0%B 34.0%C 42.7%D 44.0%Total 46.2 %

Table 5: Results of the naive user predictions experiment. In this study, workers rely on their own judgement todecide if articles are human or machine-generated after reading them. The results present a worrying trend wherea majority of the workers are misled by generated neural disinformation. The prompt and the response options canbe found at A.2.

The ﬁndings in Figure 2 are corroborated by the results from the naive user prediction study in Table 5.The lower than random classiﬁcation accuracy of 46.2% suggests that discriminating between human andmachine-generated articles is a very challenging task in general. In particular, it is worrying that only42.7% of users are able to accurately identify the Type C articles comprised of generated article bodiesand real image-caption pairs.

B.2 Trained User Study Results

Article Type 1 2 3 4 5 6 7 AccuracyA 70.7 7.3 6.0 4.0 4.7 2.7 2.7 70.7B 11.3 52.0 8.0 14.0 4.7 8.7 1.3 78.7C 43.3 13.3 11.3 10.7 8.0 1.3 12.0 56.7D 44.7 12.7 14.0 13.3 8.7 5.3 1.3 55.3Average - - - - - - - 67.8

Table 6: Results of the trained user predictions experiment. In this study, workers are prompted to pay moreattention to speciﬁc aspects of the articles by the response options before selecting the most appropriate response.The values in the columns with numerical headings indicate the percentage of users who select the correspondingresponse for each class of article. Generally, rating 1 indicates that the article is human-generated and the restindicate otherwise due to possible semantic irrelevance between the articles, images and captions. The prompt andthe exact rating descriptions can be found at Appendix A.3.

In the trained user prediction study, users are provided with hints to focus on possible visual-semanticinconsistencies between the article text (main body and image captions) and images via the providedresponse options. Table 6 reports the percentage of users who selected each response for the differentclasses of articles. The numerical headings in Table 6 indicate their corresponding responses as shown inappendix A.3. We observe a recurring theme where a large percentage of users are deceived by Type Drticles. Only 55.3% of users identiﬁed the aforementioned article class as generated. It is also notablethat workers who are told to focus on possible visual-semantic inconsistencies are signiﬁcantly moreaccurate in detecting generated articles.

B.3 Article-Only User Study Results

Article Type 1 2 3 AccuracyHuman-Generated 49.2 36.4 14.4 49.2Machine-Generated 31.2 61.6 7.2 68.8Average - - - 59.0

Table 7: Results of the article-only user predictions experiment. This study is similar to the trained user predictionstudy. However, in this experiment, the sample articles do not contain any image-caption pairs. Instead, each articlesample only contains a title and the main body. The values in the columns with numerical headings indicate thepercentage of users who select the corresponding response for each class of article. The prompt and the responseoptions can be found at A.4.

The results from the article-only user study are reported in Table 7. In this experiment, workers areprovided with hints to focus on possible semantic inconsistencies between the title and main body.The articles do not contain image-caption pairs. It is observed that by focusing on possible semanticinconsistency between the title and article body, a large majority of workers are able to identify generatedarticles.

C Importance of metadata in GROVER

Article Authors Date Domain Title Bert-Large Accuracy Pretrained Grover Accuracy (cid:88) - - - - 73.0 81.6 (cid:88) (cid:88) (cid:88) (cid:88) - 76.8 90.0 (cid:88) (cid:88) (cid:88) - (cid:88) (cid:88) (cid:88) - (cid:88) (cid:88) (cid:88) - (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) (cid:88) Table 8: Ablation results of our model and the pretrained Grover model on the Grover (Zellers et al., 2019) dis-crimination dataset.

We present results from a series of ablation experiments on the metadata which include the authors,date, domain and title. The ablation experiments are performed on the Grover discrimination dataset.Table 8 report results from ablation experiments achieved by a BERT-Large model and Grover on itsdiscriminated dataset. While using metadata generally leads to increased accuracy in detecting generatedarticles across both models, the resulting improvement is more pronounced on the Grover model. Despitethe fact that leveraging metadata signiﬁcantly improves the performance of Grover, it also appears that theaccuracy does not vary much with the exclusion of different types of metadata. In contrast, we observe asurprising observation that leveraging all metadata causes the detection accuracy to decrease. In addition,the inclusion of title results in a 6% decrease in detection accuracy. Without knowledge of the adversary’sgenerative language model, it is essential to understand the contribution of such metadata in defendingagainst general neural disinformation.

Bidirectional Image-Sentence Retrieval Results

Variants Directions R@1 R@5 R@10 AverageSCAN Image to Text 0.1 0.6 1.1 0.6Text to Image 0.1 0.5 1.0 0.5SCAN + Image to Text 1.0 16.5 23.0 13.5NER + Face Recognition Text to Image 2.2 6.5 9.2 6.0

Table 9: Bidirectional Image-Sentence Retrieval Results obtained on images and captions from the GoodNewsdataset.

We observe that standard image-sentence retrieval models perform really badly on images and captionsextracted from the GoodNews (Biten et al., 2019) dataset. We hypothesize that image-sentence retrievalmodels are designed to measure visual-semantic similarity between images and phrases that containgeneral terms such as man or dog . In contrast, they are less capable of reasoning about relationshipsbetween images and named entities often found in news captions. E Examples of Article Types

We provide samples of the different types of articles below. Each article sample contains a title, text, animage and a caption. The image and caption are located below the article text.

E.1 Type A Article

Playing Composer, of Course, to Impress

At a time when opportunities for gifted emerging opera composers blazing all manner of new stylistictrails appear to be proliferating, there’s something to be said for a company willing to go to bat for freshpieces by veteran creators working in conventional modes. Not long ago, that company likely would havebeen the Dicapo Opera, which performed an estimable service in championing composers like ThomasPasatieri, Tobias Picker and Conrad Susa. But with Dicapo in a state of limbo, it falls to other institutionsto ﬁll the void. Kudos, then, to the Bronx Opera Company, which opened its 47th season on Saturdaynight with ”The Rivals,” a 2002 comic opera by Kirke Mechem, in the Lovinger Theater at LehmanCollege. Mr. Mechem, born in Wichita, Kan., and based in San Francisco at 88, is a skillful composerespecially admired for his vocal music. ”Tartuffe,” his ﬁrst opera, has played more than 350 times sinceits 1980 San Francisco Opera premiere. Mr. Mechem fashioned his own libretto for ”The Rivals,” histhird opera, relocating an 18th-century Sheridan comedy from Bath, England, to Newport, R.I., around1900. The tale centers on Jack Absolute, a British naval captain who has concocted a ﬁctitious alter ego –Waverley, an impoverished opera composer – to woo Lydia Larkspur, an American heiress who dreams ofliving in ”charming poverty” in a Parisian garret. The couple are surrounded with a small cadre of friends,lovers, servants and, yes, rivals. Naturally, confusion ensues. Deftly juggling nine substantial roles, Mr.Mechem sets their entanglements awhirl with his buoyant melodies, supple harmonies and perky rhythms.In spirit, ”The Rivals” harks to Rossini and Donizetti; in sound, it weds Puccini’s generous lyricism to thedancing meters of Bernstein’s ”Candide.” igure 6: The Rivals From left, Julie-Anne Hamula, Caroline Tye and Mario Diaz-Moresco in the Bronx OperaCompany’s production of Kirke Mechem’s opera at the Lovinger Theater.

William Eggleston Set To Release First Album

William Eggleston’s photographs have adorned album covers for years: He has lent his singular eye toprojects by Big Star, Joanna Newsom and Spoon. But on Oct. 20, Mr. Eggleston, now 78, will releasean album of his own. The album, titled ”Musik,” will be released on Secretly Canadian and feature 13tracks of Mr. Eggleston playing a Korg synthesizer. He recorded improvisations onto ﬂoppy disks andused a four-track sequencer to overlay parts and create fuller symphonic compositions. In addition tohis own music, the album includes standards by Gilbert and Sullivan and Lerner and Loewe. Tom Lunt,co-founder of the record label Numero Group, produced the album. One song, ”Untitled ImprovisationFD 1.10,” was released on Thursday.

Figure 7: William Eggleston is famous for his photography, but music has long been part of his artistic identity. .2 Type B Article

LANI KAI

You can go crazy with a loco moco, pig out on kalua pig, stuff yourself with a guava malasada. Butone thing that is astonishingly hard to do in Hawaii is to get a decent drink in a coconut shell. This justisn’t right. The state ought to be to tropical cocktails what New Jersey is to the Jagerbomb. Julie Reinerhas set out to correct this cosmic injustice, even if she has to start in Manhattan. Ms. Reiner revivedDeco-era cocktails in her ﬁrst bar, the Flatiron Lounge, then peered into the crystal punchbowls of theGilded Age with the Clover Club. With Lani Kai, she brings state-of-the-art urban bartending techniquesto the ﬂavors of her home state, Hawaii. Needless to say, there is no mai-tai mix in sight. Instead, thereare two kinds of house-made orgeat syrup. One, derived from toasted almonds, washes up in the HotelCalifornia, along with apricot-infused gin. The other, macadamia-based, sweetens a distant relative ofthe Sazerac called the Tree House. Both cocktails ($13) are unmistakably tropical in ﬂavor. But tasteagain, note the underpinning of citrus and the foundation of bitters. These are not shaggy assemblagesthat shamble across the sand in board shorts and sandals. They are extremely well put together, buttoneddown and zippered up in the best Manhattan style. This goes for the bar snacks, too, which raise the pu-pugenre to heights Trader Vic never scaled. The crab wontons, erupting with molten mascarpone, seem tocontain actual shellﬁsh, and the pork-belly sliders pay homage to David Chang. Where a little more NewYork sensibility might have helped is in the decoration. One can respect Ms. Reiner’s decision to avoidthe usual outriggers, macaws and puffer ﬁsh, and still think that she might have done more than hammer afew shelves to the wall and line them with pots of orchids. You expect a place called Lani Kai to transportyou. At Lani Kai, the entire journey is in the drink. But that’s not a bad place to start.

Figure 8: The Clover Club.

Digital Chief At Vice Loses Job After Inquiry

Vice Media announced Tuesday that its chief digital ofﬁcer, Mike Germano, would not return to thecompany after the public disclosure of sexual harassment allegations against him prompted an internalinvestigation into his behavior. Mr. Germano was placed on leave after a New York Times investigationlast month detailed the treatment of women at the company. The article included allegations made by twowomen against Mr. Germano, including that he told a former employee at a holiday party in 2012 that hehad not wanted to hire her because he wanted to have sex with her and that, in 2014, he had pulled anemployee onto his lap. Mr. Germano declined to comment. In an earlier statement, he said he did ”notbelieve that these allegations reﬂect the company’s culture.” Mr. Germano was a co-founder of CarrotCreative, the digital ad agency that Vice acquired in 2013. In an email to the staff on Tuesday, Sarahroderick, Vice’s chief operating ofﬁcer, said that Vice’s creative ad agency was completing ”the longplanned integration of Carrot Creative” and that more details regarding the group’s leadership would beannounced in the weeks ahead.

Figure 9: media.

E.3 Type C Article

Finding Drama in Brutality and Beauty

Pina Bausch created this dance work a few years ago, originally at Tanztheater Wuppertal and now runningagain at John Malkovich Theatre. The presentation here in Atlanta is a return engagement; a tour throughEnglish-speaking North America has started in Lincoln Center. It has been said before that these works,vast, powerful, and outspoken, are less about keeping time than with the resourcefulness of the humanbody, and time clearly is not what they are after. The action unfolds in what looks like an enormoussteel studio, with a window in the middle. Its thrust stage resembles a rooftop. At one point, the viewermay be looking through one of the doors in the ﬂat roof at a falling bucket of water, but this is hardly athreat of death. That bucket is one of the recognizable tools used by Bausch’s company of 16 dancersand a psychologist (Ricardo Moyo), as they enact a psychic activity that is more about sustained uneaseand despair than about an exhibition of total clarity. There are nearly 30 variations on the theme ofinternalization. Pins, black masks, and goggles come out and are dropped. The dancers take turns gettingon, off, dancing alone, in unison, with or without their masks and hats. Sometimes, they rub their facesin lumps of plaster, as if trying to ﬁgure out a riddle. Most times, they walk about in doodling puddlesof blue, red, and black that drift away from the ﬂoor like fresh acrid water. If the works of Bausch andher husband, the choreographer Arvo P¨art, frequently present themselves as exercises in processing andsurvival, Mr. P¨art’s music falls into the category of soothing. It enables the dancers to linger awhile inmoments of perceived calm, even blindness. It is interesting to hear variations on the theme of failure inmusic. On certain occasions, Mr. P¨art’s powerful structural choices tie into the images that come acrossonscreen, as when, in an impressive demonstration of strength and resolve, a barefoot dancer balances afull-size walker on his head and shoulders for an extended time. Also memorable is the video-productiondevice that takes place by day, involving a screen in which a dancer can disappear under water, feedingher brain waves to the monitor. And the body part that might be the most isolated is the head, whichmay look unmoving to the spectator. “On the Mountain” alludes to Plato’s phrase, “There is a differencebetween aimlessness and misdirected aimlessness,” and perhaps that is what Mr. P¨art is trying to capturein his music. As might be the case with Mr. P¨art, no spectacle is too big, or too expensive. The openingﬁgures are all made from cobalt, and they dangle by wires. Then we are offered a shopping cart, which iswheeled around and pushed. Could it be that, in addition to representing purchasing power, it may have aubconscious meaning in an era of online shopping? The stage looks like a dock for a boat with Mr. P¨art’sfamiliar chords in the background, as the red-clad dancer is tossed inside it. He comes out again in anequivalent ﬁgure, waving his hands and toes. Finally, one sees Ms. Bausch’s face, appearing every nowand then on the screen behind the dancers, holding a dartboard with the numbers “10” and “14” scrawledacross it. Then an accompanying ad appears on the screen, with an unusually large 9 on it, directed at thepublic. “On the Mountain” stays close to the past, and everything but the information around it, but it isstill inviting.

Figure 10: A scene from “On the Mountain a Cry Was Heard.”

Young Cardinals Slugger Keeps Hammering Mets

Paul DeJong and Harrison Bader both homered in the seventh inning, sending the St. Louis Cardinalsto a 7-5 win over the New York Mets at Busch Stadium on Friday night. Bader added a two-run singlein the eighth to cap the St. Louis offensive outburst, which lifted the Cardinals to a 3-1 start againstNew York. DeJong went 3-for-4 with a pair of solo home runs and eight RBIs in the series, helpingsend the Mets to a fourth straight loss. Hansel Robles (1-1) took the loss for the Mets, who stranded 11base runners in the loss. Paul Sewald (1-0) earned the win in relief, throwing two shutout innings, whileSteve Cishek and Greg Holland each tossed a scoreless inning. The Mets jumped out to a 4-2 lead in theﬁfth after three consecutive singles with one out. Todd Frazier’s sacriﬁce ﬂy accounted for the ﬁrst runbefore Jose Bautista drove in the next two with a line drive RBI single to right, and a bases-loaded singleby Todd Frazier also scored a run. However, DeJong and Bader homered off Bobby Wahl to begin theCardinals’ comeback. Bader’s ﬁrst home run, a solo shot, tied the game at 4-4 before DeJong’s secondblast, a three-run shot, put St. Louis ahead, 6-4. A leadoff single by Matt Carpenter in the eighth startedthe Cardinals’ comeback. With Yadier Molina and Greg Garcia on base, Bader then drove in the ﬁnaltwo runs of the inning, the ﬁrst on a squeeze bunt and the second on a single. Molina also drove in tworuns with a bases-loaded single in the sixth inning that tied the game at 3-3. Molina’s single extendedhis hitting streak to 15 games. Wilmer Flores homered in the eighth for the Mets. Carpenter had twohits for the Cardinals, who had won three in a row. Trevor Rosenthal recorded the ﬁnal two outs forhis third save. Curtis Granderson collected three hits for the Mets, who had won two in a row. Firstaseman Wilmer Flores started the scoring for the Mets with a third-inning solo home run. Cardinalsright-hander Luke Weaver, who starts on Saturday, is 8-0 with a 1.77 ERA and 13 strikeout in 10 careerstarts against the Mets. On the other hand, Mets right-hander Jacob deGrom is 3-0 with a 0.60 ERA and38 strikeouts in three career starts against the Cardinals. The Cardinals announced on Friday afternoonthat ﬁrst baseman Jose Martinez will miss two to three weeks because of a strained right hamstring. St.Louis recalled outﬁelder Jose Martinez and inﬁelder Dillon Maples from Triple-A Memphis. Before thegame, Mets manager Mickey Callaway said right-hander Matt Harvey is still feeling elbow discomfortafter being put on the disabled list on April 11. The team decided it would be better to have Harveyrested. Harvey is scheduled to make his ﬁrst rehab start for Triple-A Las Vegas on Saturday, pitchingsix innings. Callaway said Harvey will not appear in a rehab game for St. Louis. Harvey is on the DLbecause of an inﬂammation of the ulnar collateral ligament in his right elbow. Harvey missed last seasonafter undergoing Tommy John surgery. The Cardinals’ lineup included rookies at four positions — leftﬁeld (Bryan Reynolds), center ﬁeld (Colby Rasmus), right ﬁeld (Rasmus) and ﬁrst base (Rasmus) — forthe ﬁrst time since 1958. But by Friday night, the rookies had spent a combined 34 hours on the ﬁeld.DeJong, a 22-year-old rookie from New Braunfels, Texas, has three home runs and a .308 batting averagethis season. He also has reached base safely in 16 of the 19 games he has started. The Cardinals’ ﬁrsthome game was on May 10, 1958. With the game in their favor, the Cardinals could make their fourthstraight trip to the playoffs history for the franchise, starting on Saturday afternoon in St. Louis.

Figure 11: Paul DeJong, right, after his home run in the second inning of the Cardinals’ win over the Mets onSunday.

E.4 Type D Article

Bonus Pay On Wall St. Likely to Fall

American ﬁnanciers are expected to take home a pay cut this year thanks to lower investment bankperformance fees. Not only are banks likely to reduce pay in the face of weak quarterly earnings, it alsolooks like Wall Street employees will take a hit to their bonuses, according to a report from New YorkCity’s comptroller on Thursday. Of the 42 ﬁnancial companies that submitted their bonus information, 43percent of the ﬁrms said they will pay out less money than last year, according to the report. “Investmentbanking fundamentals remain challenging due to low interest rates, subdued corporate M&A activity, anddecelerating economic growth,” Michael DiBiase, the comptroller’s chief investment ofﬁcer, said in astatement. “Once-strong markets are challenging to justify strong performance fees.” Last year, when WallStreet bonuses were already down signiﬁcantly from the previous year, the number of bank employeesreceiving bonuses was 22 percent lower. This year, almost 25 percent of ﬁnancial ﬁrms expect to pay outess, the report said. Wall Street bonuses have been under pressure in recent years as low interest rates anddecreasing merger activity has held back bonuses. The recent news that Wells Fargo, the scandal-riddenbank that was ﬁrst accused of opening fake accounts, was paying Wall Street executives in bonuses in2018 despite numerous conﬂicts of interest raised even more questions about the banking industry’scompensation system. Read the full story at The New York Times. Related For Wells Fargo employeesmaking $1,000 a month, Wells Fargo offers a cash bonus Citigroup to pay bonuses on top of annual payof 105,000 employees Treasury Department seeks to encourage more women to work at Goldman Sachs.

Figure 12: Goldman Sachs

Jets Bench Smith in Loss That Doesn’t Sit Too Well

Photo The Jets came into the night with speculation swirling that a trade involving the team’s ﬁrst-rounddraft pick was likely. And with Mark Sanchez all but gone, Geno Smith was apparently the primecandidate to go elsewhere. But so the rumors of a trade went through the night. The Jets had plans for thenight and they had Smith’s picture in the media room, leading many observers to believe that there wouldbe an alteration in the game plan. Except that nothing that happened Thursday had Smith’s name in themix. Instead, the team opted to start rookie Sam Darnold against the Colts. The former USC quarterbackled the Jets to a 17-0 lead in the ﬁrst quarter, then — because a trade failed to materialize — he ﬁnishedthe game. Quarterback was the deciding factor for the Jets, at least after it became clear the trade was notgoing to happen. Either the Browns were going to pick Darnold at No. 4 or Cleveland could attempt tomake a play for Baker Mayﬁeld, the Oklahoma quarterback selected No. 1 by the Browns. Thus, the Jetstook a leap at No. 3 and would have had to come back with whatever that three-point stand entailed. Evenif the Browns had picked Darnold and attempted to make a trade, it seemed unlikely that any team wouldsurrender a second-round pick for Mayﬁeld. That would have constituted a risky move for the Browns,one they never would have taken unless they had planned on moving back in the draft. Hence, the tightwindows that a quarterback-needy team often faces. igure 13

Visualizations of DIDAN’s predictions

We present examples of DIDAN’s predictions on both human-generated and machine-generated articles inthis section.

At Times, the History Is in the Margins

It demonstrates the roiling mind of an American original who used the collection to help frame the thoughts that shaped a young nation. Adams didn't just read books. He furiously scribbled marginal annotations that suggest he considered the books the manifestation of thinkers he wanted to talk to, wrestle with and maybe even knock out. For instance, in a book by Jacob Bryant that detailed an elaborate religious ceremony, Adams wrote, "Is this Religion? Good God!" Although Adams was familiar with the romantic notions of politics and government held by Enlightenment idealists, he was convinced that only structure and rules could stop mankind's tyrannical tendencies. That is why, though he was initially impressed with Thomas Paine's rousing "Common Sense" (some thought Adams had written it himself), he later cooled on the pamphlet, writing that Paine had "a better hand at pulling down than building." The Sachem Public Library is one of 20 in the United States hosting a traveling exhibition on John Adams's books.

Authenticity Score

Machine-Generated Score

Figure 14: A human-generated article that was classiﬁed correctly as such by DIDAN.

Murray L. Weidenbaum, Reagan Economist, Dies at 87

Murray L. Weidenbaum, who as President Ronald Reagan's first chief economic adviser elevated government regulation of business to the forefront of public policy debate, but resigned unhappy about the administration's budget-making, died on Thursday in St. Louis. He was 87. Mr. Weidenbaum, a Bronx-born economist, was fond of saying, "Don't just stand there, undo something." And he did, beginning in 1981, when the newly inaugurated Mr. Reagan appointed him chairman of the Council of Economic Advisers. Reducing the size of government and loosening its regulatory hold on the private sector became a large theme of the Reagan presidency, which began with inflation still running in double digits and the economy heading into recession. The banking, broadcasting and food and drug industries were a particular focus. Mr. Weidenbaum worked to reduce regulation on businesses.

Authenticity Score

Machine-Generated Score

Figure 15: A human-generated article that was classiﬁed correctly as such by DIDAN.

Novelist’s Prime Nesting Place in Nashville

Before this house was the fulfillment of our dreams, it was the fulfillment of other people's dreams, a couple with extremely good taste who knew exactly where and how they wanted to live. They bought a bungalow on Whitland, tore it down, and in 1993 built themselves a solid home in pink washed brick. Somehow they made it look as if it had been here all along. They threw their hearts into the tiniest details, collecting glass doorknobs at flea markets, commissioning ironwork for the banister and placing inlaid cherry-wood star shapes onto the walnut floors. When, after years of planning and hard work, it was finally perfect and they moved in, they were divorced. I can think of nothing better than to live in the dreams of these two people who moved away. Their vision of what a home could be far exceeds anything I ever could have imagined on my own. Trees shade Ann Patchett’s pink brick home, a place designed by others that fulfills her own dreams.

Authenticity Score

Machine-Generated Score

Figure 16: A human-generated article that was classiﬁed incorrectly as such by DIDAN.

New Plan to Treat Schizophrenia Is Worth Added Cost, Study Says

A new approach to treating early schizophrenia, which includes family counseling, results in improvements in quality of life that make it worth the added expense, researchers reported on Monday. The study, published by the journal Schizophrenia Bulletin, is the first rigorous cost analysis of a federally backed treatment program that more than a dozen states have begun trying. In contrast to traditional outpatient care, which generally provides only services covered by insurance, like drugs and some psychotherapy, the new program offers other forms of support, such as help with jobs and school, as well as family counseling. The program also tries to include the patients -- people struggling with a first psychotic "break" from reality, most of them in their late teens and 20s -- as equals in decisions about care, including drug dosage. A brain scan of a patient with schizophrenia.

Authenticity Score

Machine-Generated Score

Figure 17: A human-generated article that was classiﬁed incorrectly as such by DIDAN. umet’s ‘Dog Day Afternoon’: Hot Crime, Summer in the City

Also visit this page to view previous Compendium articles and screenshots Luxury View: Tribeca’s Isabella Rossellini Annex Hotel Plenty of smart people are getting honeymoon--budget married in Tribeca. The Isabella Rossellini Annex Hotel, a hyper-luxe outpost in a former warehouse designed by Michael Maltzan, is one example. The lobby, with its translucent copper frames, keeps the clean-lined modern minimalism going. But there’s a smart, domestic curation inside. “We have the most eclectic design,” says co-owner and designer Eric Font. “We’re also influenced by art and fashion.” Another amazing detail: Framing the screen, lights from Alexander McQueen. Nessa Austin Luxury View: The Expected and the Unexpected Strolling through the Tribeca neighborhood feels like walking in someone else’s world. Al Pacino as the would-be bank robber Sonny in Sydney Lumet’s “Dog Day Afternoon” (1975).

Authenticity Score

Machine-Generated Score

Figure 18: A machine-generated article that was classiﬁed correctly as such by DIDAN.

Driven to Accumulate and Dancing Till Nothing’s Left

While those at court leaned over at the stalls, snoozing like other patrons, here at last was part of that magic I expected from a fairground. The rhythmic scramble of people shuffling and jitterbugging, pretending not to notice each other, obscured the petty bickering that had been for weeks consuming their lives. There were not a few smiling faces, but the laughter that drowned out the voices of anyone who tried to hold the audience’s attention behind barriers was total. Even the Head Monster looked refreshed as he bellowed and danced. In this nowhere town, half a world away from Heidelberg, the judges of the circus were at work. My screams of thrill and satisfaction were promptly drowned out by the sharp, high, loudly grinding noises of the Mongolian horses. The show had started more than an hour earlier. People were in their seats, awaiting to dance, when the head clock came on. Danse: A French-American Festival of Performance and Ideas Ashley Chen, above, dresses and undresses in a dance at the Club at La MaMa.

Authenticity Score

Machine-Generated Score

Figure 19: A machine-generated article that was classiﬁed correctly as such by DIDAN. nited and Continental Said to Agree to Merge

The global airline industry is about to get a little bigger — United and Continental announced they are planning to merge, creating a company with more than $100 billion in annual revenue. The new company will be called United Continental Holdings Inc. The news was reported by CNBC, which cited anonymous sources. In a statement, both airlines said they were in the process of finalizing a deal, but declined to provide further details. Both companies confirmed the talks on their Twitter accounts. Rumors about a merger first surfaced in March, when United Continental announced it would lay off around 100 people who were working on its revenue management system. United spokesman Charlie Hobart told The Associated Press that the airline was implementing new IT systems for its key ticketing and revenue departments, and that the layoffs were not a result of a merger. Kiosks for Continental Airlines next to a United Airlines check-in area at O’Hare International Airport. The airlines announced an all-stock merger on Monday.

Authenticity Score

Machine-Generated Score

Figure 20: A machine-generated article that was classiﬁed incorrectly as such by DIDAN.

Executive at Monsanto Wins Global Food Honor

In 2015, the global executive dean of the ILR School of Management, Emeritus Professor Jack I. Eskenazi, made a declaration. In a speech at the Expert Economic Summit in Nice, he said: “In the world of politics, NGOs, labor and community, ultimately, we look to business for leadership and partnership.” On Thursday, April 18, the ILR School of Management at New York University, at Rockefeller University, announced that Prof. Eskenazi, who served as its executive dean from 1997 to 2006, has been awarded the Wolf Foundation for World Food Policy’s Global Food Prize. At the luncheon, Eskenazi talked with Aviva Aronow Friedman, the dean of the Wolf Foundation. How would you respond to someone who would argue that you live in an academic vacuum in order to look to business for leadership? I think one shouldn’t take myself too seriously. All of us live and work in very different settings. Robert Fraley, who is Monsanto’s chief technology officer.

Authenticity Score

Machine-Generated Score0.231