Who Blames Whom in a Crisis? Detecting Blame Ties from News Articles Using Neural Networks
WWho Blames Whom in a Crisis? Detecting Blame Ties from News Articles UsingNeural Networks
Shuailong Liang, Olivia Nicol, Yue Zhang
Singapore University of Technology and Design8 Somapah Road, Singapore 487372shuailong [email protected], { olivia nicol, yue zhang } @sutd.edu.sg Abstract
Blame games tend to follow major disruptions, be they fi-nancial crises, natural disasters or terrorist attacks. To studyhow the blame game evolves and shapes the dominant crisisnarratives is of great significance, as sense-making processescan affect regulatory outcomes, social hierarchies, and cul-tural norms. However, it takes tremendous time and effortsfor social scientists to manually examine each relevant newsarticle and extract the blame ties (A blames B). In this study,we define a new task, Blame Tie Extraction, and construct anew dataset related to the United States financial crisis (2007-2010) from
The New York Times , The Wall Street Journal and
USA Today . We build a Bi-directional Long Short-TermMemory (BiLSTM) network for contexts where the entitiesappear in and it learns to automatically extract such blameties at the document level. Leveraging the large unsupervisedmodel such as GloVe and ELMo, our best model achieves anF1 score of 70% on the test set for blame tie extraction, mak-ing it a useful tool for social scientists to extract blame tiesmore efficiently.
Introduction
Blame is an issue that has been receiving increasing atten-tion in the social sciences in recent years (Alicke 2000;Hobolt and Tilley 2014; Hood 2010; Shaver 2012). In par-ticular, more attention has been placed on blame dynam-ics following major disruptions, such as natural disasters(Malhotra and Kuo 2008), financial crises (Nicol 2016;Tourish and Hargie 2012), and terrorist attacks (Olmeda2008). Studying blame is of great significance as sense-making processes inform what and who a society values,and ultimately shape lawmaking. For instance, the intenseblame targeting Wall Street during the financial crisis (2007-2010) helped lawmakers pass the July 2011 Dodd-FrankWall Street Reform and Consumer Protection Act.Although the problem is important, it takes tremendoustime and efforts for social scientists to manually examineeach relevant news article and extract the blame ties (Ablames B). In Figure 1, for example, the tuple (John B. Tay-lor, Fed) is extracted as a blame tie. Recently, deep neuralnetworks have proved very powerful at solving many so-cial science problems (Li and Hovy 2014; Rule, Cointet, and
Copyright c (cid:13)
Figure 1: An example sentence from our dataset containinga blame tie. The red/ bold words are entities involved in ablame tie, and the blue/ italic words are supporting evidencethat the blame tie exists.Bearman 2015; Bail 2016). Based on the dataset annotatedfrom several new media excerpts on blame, we investigateautomatic ways to extract blame ties from new articles.There are three main challenges. First, some patterns onlyhave blame meanings in specific contexts. For instance, “Itwas lenders that made the lenient loans, it was home buyerswho sought out easy mortgages, and it was Wall Street un-derwriters that turned them into securities.” ( The Wall StreetJournal , Aug 2007), only with the background of the finan-cial crisis can we identify that the blamed targets are lenders , home buyers and Wall Street . Second, there are many waysto attribute blame and the structure of the sentences can bequite complex. Third, it is common for journalists to usemetaphors and ironies to designate actors.We design several neural models to address the problem.First, we leverage a neural network to learn the prior infor-mation about entities for blame tie extraction. In particular, aneural network is used to learn dense vector representationsof entities so that similar entities can be visualized close toeach other in the embedding space, and the likeliness of oneentity to blame another can be inferred automatically with-out further knowledge. Second, we build a BiLSTM neuralnetwork to represent contexts, which can be used to predictblame ties between entities mentioned in the news articlesusing linguistic clues. Finally, a model that integrates entityknowledge and linguistic knowledge is constructed by inte-grating the two respective networks.We conduct a case study on blame games for the U.S. fi-nancial crisis (2007-2010), the most important event sincethe Great Depression, which led to at least $6 trillion inlosses (Luttrell et al. 2013). Results show that our model can a r X i v : . [ c s . C L ] A p r rticles ... Ordinarily, [Americans] welcome lower interest rates. But many feel differently this time. Some think the economy is fine andinflation is the main danger. But a moral element is also at work: Many think a rate cut would reward foolish speculation and [WallStreet] greed at the expense of the thrifty (1).“The [Federal Reserve] needs to stand its ground and not bail out hedge funds – they should have known better to begin with!” [Suzanne Mitchell] , an administrative assistant at a Houston real-estate company, says in an email (2). In an interview, she adds: “I’mvery sorry that [people] took out $450,000 mortgages with no money down ... [people] ought to be responsible for the loans they takeout.” (3) . . .But Mr. [Brason] contrasts that with the far greater reliance on borrowed money that is typical nowadays. Some “of us ... weren’tbuying up five or 10 properties without any money down,” he says. “ [People] took the risks and should pay the price. (4) A lot ofothers at the higher end of the food chain, the [investment bankers] and [hedge-fund managers] were making oodles of fee-incomemoney and frankly, there’s a lot of public opinion that it was excessive. (5) ” . . . Blame Source Blame Target Causality Link
Americans( e ) Wall Street( e ) (1)Suzanne Mitchell( e ) Federal Reserve( e ) (2)Suzanne Mitchell( e ) people( e ) (3)Brason( e ) people( e ) (4)Brason( e ) investment bankers ( e ) (5)Brason( e ) hedge-fund managers ( e ) (5) Table 1: An article titled
Rate Cut Has Foes on Main Street ( The Wall Street Journal , September 2007). Top: paragraphs of thearticle containing several blame patterns. The entities are in the brackets. Bottom: blame ties extracted from the article. source target e e e e e e e e e - 1 0 0 0 0 0 0 e e e e e e e Table 2: Matrix representation of the blame ties in Table 1.effectively learn both entity knowledge and linguistic cluesfor blame ties. For example, it can successfully extractionentity relation with regard to blame ties from the crisis, suchas the fact that both Wall Street and McCain tend to blamethe same targets (Obama and Bernanke). In addition, themodel can generalize to new cases of extracting blame pat-terns automatically. Our implementation and trained modelsare released at https://github.com/Shuailong/BlamePipeline.
Related Work
NLP has become increasingly popular in the social sciencearea. O’Connor et al. [2010] aligned sentiment measuredfrom Twitter to public opinion measured from polls, andfound the two correlate well. Bamman and Smith [2015]tried to use text data to estimate the political ideologies of in-dividuals. Mohammad et al. [2016] create the SemEval 2016Task 6 called Stance Detection Task, which detects the Twit-ter user’s stance towards a target of interest. Preot¸iuc-Pietroet al. [2017] also predicted the political ideologies of Twit-ter users, in a more fine-grained form. Social scientists usedNLP along with network analysis etc. to analyze social me-dia texts (Rule, Cointet, and Bearman 2015) and the State ofUnion addresses in United States (Bail 2016). The Blame Tie Extraction task can be regarded as a spe-cial case of relation extraction (Miwa and Bansal 2016).Relation extraction solves the task of classifying a pairof entities into one of several pre-defined categories, suchas Cause-Effect and Component-Whole (Hendrickx et al.2009), while the Blame Tie Extraction task requires extract-ing all the blame ties among the entities of interest in anarticle. Our work differs from existing work on relation ex-traction in two main aspects. First, our work is at the docu-ment level and the data is sparse, while most existing workon relation extraction focuses on the sentence level (Nguyenand Grishman 2015). Second, our work explicitly uses entityprior information on blame patterns, which does not makesense in general domain relation extraction. Most existingwork mixes entity and content information in modeling re-lations.In blame game research, social scientists care more abouta few key players instead of all the entities, and most en-tities in the passage are irrelevant for studying the blamegame (Nicol 2016). Therefore, in this paper, we assume thatthe entities of interest are already given, and we only needto extract blame ties from these entities.To the best of our knowledge, our work is the first to studythe Blame Tie Extraction task using NLP techniques.
Dataset
We manually create a dataset on the U.S. financial crisis. Thedataset is drawn from three newspapers in the U.S., includ-ing
The New York Times , The Wall Street Journal and
USAToday , chosen for three main reasons. First, they are the mostwidely circulated newspapers in the United States. Second,they cover the social spectrum from elite to mass. Third,they also cover the political spectrum: from the quite lib-eral
New York Times , to the centrist
USA Today , to the con-servative
Wall Street Journal (Gentzkow and Shapiro 2010;Groseclose and Milyo 2005). The time period studied here
SA NYT WSJ days 310 736 648articles 132 429 438blame ties 353 787 754Table 3: Dataset size for the three newspapers. USA:
USAToday . NYT:
The New York Times . WSJ:
The Wall StreetJournal .spans from August 2007, after the first warning signs for thecrisis appeared, to June 2010, right before the signature ofthe largest set of financial regulations since the Great De-pression.We use a set of keywords to filter the articles gatheredfrom Factiva and LexisNexis , getting articles containingblame patterns for the crisis. The keywords we use areblame-related ( attack , accuse , misconduct , . . . ) or event-related ( financial crisis , global recession , housing bubble ,. . . ) . There are in total 70 blame-related keywords and 13event-related keywords (stem form). The two classes of key-words are combined together to filter the articles.Blame incidences are manually coded for each article.The identification of a blame pattern requires the presenceof 1) a blame source, 2) a blame target, and 3) a causalitylink. An example of a sentence containing a blame instancewould be as follows: “ Sen. Richard Shelby, R-Ala. [BlameSource]. . . said the FED [Blame Target] ‘kept interest ratestoo low for too long, encouraging a housing bubble and ex-cessive risk taking’[causality link] .” (
USA Today , December2009). The blame source and target form a blame tie. Ta-ble 1 gives an example article and its annotations and Ta-ble 3 shows the statistics of the dataset with the number ofblame ties.To ensure the reliability of the dataset, we ask two moreannotators to annotate a subset of the dataset. Specifically,we sample 100 articles, including 13 articles from
USA To-day , 43 articles from
The New York Times , and 44 articlesfrom
The Wall Street Journal , in proportion to the respec-tive number of articles of the three newspapers in the wholedataset. Then we run evaluations using the two annotators’results against the gold data. The average of the F1 scoreof the two annotators is 94.425%, and the Fleiss’s kappais 0.8744, which illustrates the strong inter-annotator agree-ment of the dataset.In the training process, the annotated blame ties serve aspositive samples. The negative samples are generated by re-moving the positive entity pairs from all possible permuta-tions of the entities of interest. The sample statistics aboutthe whole dataset are shown in Table 4. Theoretically, thenumber of negative samples increases quadratically with thenumber of entities in the article. In our dataset, the averagenumber of entities we consider per article is 3. Therefore thedataset is rather balanced. The full list of keywords are released along with the code. number of articles 998number of samples 8562number of entities/article 2.97average neg/pos ratio per article 2.19total neg/pos ratio 3.61Table 4: Sample statistics.
Task
We formulate the Blame Tie Extraction task as follows.Given a news article d and a set of entities e , we have | e | · ( | e | − possible directed links among them. We assignlabel to a pair ( s , t ) when entity s blames entity t based onarticle d , otherwise we assign label to the pair. We can usea matrix for a more intuitive illustration. For the example inTable 1, the matrix constructed is shown in Table 2.For a given entity pair ( s, t ) with label l , we would like tomaximize the likelihood L = P ( l | s, t, d ) , l ∈ { , } In order to predict whether a blame tie exists between twoentities based on the article, we have two sources of informa-tion to utilize. One is the entities themselves. For instance,we know that Democrats tend to blame Republicans so asto weaken their political opponents, and tend to blame WallStreet so as to gain popular support to impose a stringentset of financial regulations. The other one is the contexts inwhich the entities are mentioned. We rely on linguistic pat-terns of sentences to extract the blame ties. For instance, inthis sentence “
Who is to blame? Hedge funds, for one, hesays. ” (
The Wall Street Journal , Sept 2007), the linguisticstructure indicates that the entity appearing after the ques-tion is the blame target entity, and the narrator is the blamesource entity.
Models
We first introduce a simple rule-based model. Next, we in-troduce three neural models. The Entity Prior Model whichdirectly extracts prior information about entities (i.e. whichentities tend to blame which entities). The Entity PriorModel can learn entity information from its blame patternssuch as political standing, and it can generalize to new eventswith the same entities. Then, we mask out the entity men-tions from the text, relying purely on the contexts surround-ing the mentions of the entities. The Context Model can gen-eralize to new entities and across historical periods. Finally,a model combining the two is built to investigate the inter-actions.
Rule-based Model
As a baseline model, we use a simple rule to decide if ablame tie exists between two entities: if the minimal sen-tence distance between the two entities are less than or equalto d , AND a blame related keyword appears in any of thesentences mentioning either of the two entities, we deter-mine that a blame tie exists between the two entities. Ac-cording to the distribution of the minimal distance betweenigure 2: Context Model. LSTM is used to encode the sentences, and the hidden vectors at the positions of the entity are pooledtogether into a single vector to represent the context of the entity. The source entity context vector and the target entity contextvector are concatenated together to be sent to the prediction layer. The prediction layer predicts 1 if the source to target blametie exists otherwise 0.two entities in a blame tie in our dataset, over 90% of theblame entity pairs has a minimal sentence distance less thanor equal to 3. Therefore, we set d = 3 .To determine the direction (A blames B or B blames A),we define the aggressiveness of the entity, which is the per-centage of the entity being the blame source among all theblame ties related to this entity in the training data. The en-tity with higher aggressiveness is the blame source entity,and the other one is the blame target. When there is a tie, weuse random guess. When the entity is unknown, we use 0.5as the aggressiveness score. Entity Prior Model
We use the fully connected feedforward neural network(FCN) to collect entity prior knowledge, namely who islikely to blame whom without additional information. Toelaborate, we represent entities by their embeddings, con-catenate the embeddings of the blame source and target en-tity, and then stack a fully connected layer to learn the inter-actions between the entities. The FCN outputs the probabil-ity of the blame tie between the source entity and the targetentity: f score = ([ E e ( e s ); E e ( e t )]) · W e + b e e s and e t represent the source entity and target entity in-dex, respectively, E e is the embedding matrix for entities, ‘;’is the concatenation operator, and W e and b e are parameters.Specifically, E e ∈ R n × m , n = 707 is the number of entitiesin training set, m is the entity embedding dimension, whichis tuned as hyperparameters. W e ∈ R m × and b e ∈ R . E e is initialized with standard normal distribution. At testtime, we use the (cid:104) UNK ENT (cid:105) to indicate unknown entitiesfrom the training set.Like word embedding, we hope to learn meaningful rep-resentations of the entities, i.e., the entities sharing similar- ity blame behavior patterns will stay close in the embeddingspace.
Context Model
The Entity Prior Model tells how likely a particular entity isto blame or be blamed without further information. In con-trast, the Context Model relies only on linguistic clues fromnews articles, finding blame patterns explicitly or implicitlymentioned. The Context Model is thus useful across differ-ent political settings where the entities are different.To model context information about an entity, we first lo-cate the positions of all occurrences of the entity in the arti-cle. A position is represented by a tuple ( i, j ) , where i and j denotes sentence number and word number, and the posi-tions of blame source and blame target are denoted as pos s and pos t , respectively. Second, we replace each entity men-tion by a special (cid:104) ENT (cid:105) token, so that we do not have anyinformation regarding the entity itself. Third, we run a bidi-rectional LSTM on sentences containing the entity and usethe LSTM output to represent the context of each word: h ij = LSTM( E w ( w ij )) w ij is the j -th word of the i -th sentence, E w is the em-bedding matrix for words, and h ij is the concatenation ofthe LSTM outputs of the last layers from both directions. If ( i, j ) ∈ pos s or ( i, j ) ∈ pos t , h ij will be used to representthe context of entity s or t .Since an entity may appear at multiple positions in onearticle, we may have multiple representations of the entitycontext. Pooling is used to reduce the embeddings into onesingle vector. The pooling result of the contexts representa-tions of source and target entity are denoted as V s and V t ,respectively: V e = pool ( i,j ) ∈ pos e ( h ij ) , e ∈ { s, t } in avg max sentences per doc 4 45 384words per sentence 1 26 159words per doc 133 1,209 9,064Table 5: Sentence and words statistics.where pool denotes the pooling function, for which we tryrandom selection, mean, max and attention. For attentionpooling, we use a two-layer feedforward neural network toscore each vector representations, using softmax function tonormalize them as weights, and use the weighted sum of therepresentations as the entity representation. Once we obtainrepresentations of both the source and target entity, we con-catenate them and then use a fully connected layer to learn ascore of this entity pair: f score = ([ V s ; V t ]) · W v + b v W v ∈ R H × and b v ∈ R are both parameters. H is thedimension of BiLSTM outputs. The model architecture isdepicted in Figure 2. Combined Model
To incorporate the prior information about entities, we con-catenate two additional vectors on the basis of the ContextModel, which are the embeddings of the blame source entity E e ( s ) and blame target entity E e ( t ) from section : f score = ([ E e ( e s ); V s ; E e ( e t ); V t ]) · W c + b c W c ∈ R H +2 m and b c ∈ R are both parameters. We ex-pect the model to simultaneously learn how to extract blameties from context, and also learn the representations of theentities themselves as a byproduct. Experiments
We conduct experiments on our dataset using the EntityPrior Model, the Context Model, and the Combined Modeland compare the performance among different models. Tofind the best model architecture, we also conduct develop-ment experiments to investigate the effects of different pool-ing functions and pretrained word embeddings.
Experimental Settings
Stanford CoreNLP (Manning et al. 2014) is used to tok-enize articles into sentences and words. The sentence lengthstatistics of the dataset articles are shown in Table 5. Thereis no further preprocessing except that words are convertedto lower case. The dataset is split into train, dev, and test setat the document level by the ratio of 8:1:1. We use F1 on thepositive class to measure the performance of the model.To investigate whether the learned entities representationsare helpful in blame tie extraction, we conduct another roundof evaluations on the known entities of dev and test setas shown in Table 8. KNOWN denotes entity pairs bothof which appear in the training data, while ALL denotesall entity pairs. By comparing the model performance onKNOWN with that of ALL, we can evaluate the usefulness of entities representations. Conversely, we can also evaluatethe robustness of the model against unknown entities.Leveraging large unsupervised data has proved to be help-ful for many NLP tasks, especially for tasks with smalldatasets. Word2Vec (Mikolov et al. 2013) or GloVe (Pen-nington, Socher, and Manning 2014) can be used to pretrainword embeddings on larger external dataset. ELMo (Peterset al. 2018) word vectors are internal states of a deep bi-directional language models, and can effectively capture thesyntax and semantics of words. We conduct experiments onGloVe and ELMo, and investigate the effects of these pre-trained word embeddings.Our models are implemented in PyTorch . During train-ing, Adam (Kingma and Ba 2014) is used as the optimizer,and the default learning rate is adopted. We use the mini-batch size of 50 for all three models. Dropout (Hinton et al.2012) of . are used to prevent overfitting. Dropout is ap-plied to word embeddings and RNN outputs. We set gradientclipping to 3 to stabilize the training process. The maximumnumber of epochs is 30 and we use early stopping techniqueswith a patience of 10 epochs. For the model hyperparame-ters, word embedding size is , LSTM hidden size is for each direction, and entity embedding size is . Development Experiments
Before turning to the BiLSTM model, we use the small win-dow before and after the entity to extract the context infor-mation. Formally, we assume that one entity appears onlyonce in a sentence. Given an entity e appearing at sentence s , if the index of e is i , and window size is w , we use con-catenation of the embeddings of words s i − w . . . s i − and s i +1 . . . s i + w as the context of e . We call this model Bagof Embeddings (BoE). We try the window sizes of 3 and 6and report the higher dev result. If an entity appears multipletimes in an article, we randomly select one representation(BoERand).As stated in the Context Model section, we compare dif-ferent pooling methods used to aggregate multiple contextsrepresentations, and bi-directional LSTM with a single for-ward LSTM. The results of ALL dataset are shown in Ta-ble 6. BiLSTM works better than LSTM for every poolingmethod since we can take advantage of information fromboth directions of the sentence. Random pooling has theworst result, mainly because a lot of important informationis lost. Max pooling has the highest score on most cases,therefore we use BiLSTM + max pooling in the followingexperiments.In comparison to BoE models, LSTM models are worse.The reason may be that BoE models consider contexts fromboth sides. BiLSTM models perform better than BoE modelssince the former can model more context and preserve wordorder information.To investigate the effect of pretrained word embeddings,we initialize the word embedding parameters with ran-dom initialization, GloVe pretrained word embeddings, andELMo models. Since the official release of pretrained ELMomodel has the output dimension of 1024, we do a linear http://pytorch.org odel dev F1 Model dev F1 BoERand 54.60LSTMRand 50.51 BiLSTMRand 56.43LSTMMean 50.15 BiLSTMMean 61.95LSTMMax 51.49 BiLSTMMax 62.26LSTMAttn 50.17 BiLSTMAttn 61.92Table 6: Experiment results of Context Model using differ-ent pooling functions.
Model dev F1 test F1 random 62.26 56.11GloVe fixed 63.09 62.10GloVe tuned 61.37 57.75ELMo
Table 7: Experiment results of Context Model using differ-ent pretrained word vectors.transformation to reduce the output dimension to 100. Thetransformation matrix is part of the model parameters andare tuned during training. For GloVe, we use fixed and tunedversions of embeddings. For ELMo model, since it slowsdown the training significantly, we do not tune the modelparameters. The results using BiLSTM+Max pooling areshown in Table 7. Fixed GloVe vectors improve the F1 scoreon dev and test set by 0.83% and 4.99%, respectively. Thetuned version of GloVe does not improve that much, dueto the fact that the dataset is small and too many parameterswill cause overfitting. The pretrained ELMo model improvesthe F1 on the dev and test set by 10.90% and 10.24%, re-spectively, compared with random initialization, proving thepowerfulness of ELMo model.
Results
Table 8 details the final result of the Entity Model, ContextModel, and Combined Model. For comparison, we also in-clude a baseline of random guessing and rule-based model.For the Context Model, we use the BiLSTM+Max poolingand use ELMo model to obtain words representations.Our rule-based model outperforms random guess at alarge margin. Surprisingly, it achieves a higher result thanthe Entity Model on ALL data. The reason is that the EntityModel cannot generalize to new entities, therefore performsbadly on ALL data with many unknown entities. Comparedwith the Entity Model on the KNOWN data, Entity Modelperforms better.For the
Entity Model , the performance on KNOWN en-tities is better than that on ALL entities. This result illus-trates that the model learns prior information about the en-tities in the train set. From a visualization of entity embed-dings using tSNE (Maaten and Hinton 2008), we find thatentities with similar political backgrounds tend to be closeto each other in the embedding vector space. For example,Wall Street and McCain are close to each other, which isintuitive since they were both blamed by Obama, and theyboth blamed Obama and Bernanke, according to the training
Model KNOWN ALLdev F1 test F1 dev F1 test F1 random guess 38.81 38.04 37.39 32.96rule-based 69.14 58.97 70.45 61.54entity 73.97
Table 8: Experiment results of baseline models and threeproprosed models on KNOWN data and ALL data.
CTX Wrong CTX CorrectCMB Wrong
CMB Correct
Context Model , we can see that it can effectivelyextract blame ties from news articles without prior entityknowledge. The F1 on the KNOWN test set is 63.11%; whileon ALL test set, the F1 does not decrease as in the EntityPrior Model, it even increases to 66.35%. Unlike knowledgeabout entities, linguistic knowledge generalizes robustly tounseen test data, where most entities do not exist in thetraining data. This implies that our model can be used toextract blame ties in other occasions where the political set-tings are highly different. For example, when the Presidentof the United States changes, we can still use our model topredict how the new President will play the blame game.For the KNOWN entities, the
Combined Model does not perform better than Entity Model. This shows that entitiesinformation alone may be useful in extracting blame tiesthan using contexts. However, such information could not beused when the entities are new, for instance, our ALL data.On the ALL data, the Combined Model achieves the best re-sult on dev and test set, showing that the model can integratecontext information as well as the entity prior information tomake better predictions.Therefore, the Context Model is the most robust one, ap-plicable to extract blame tie among new entities. Entity priorinformation is helpful. If available, it can be used to boostthe model performance.
Analysis
To verify our hypothesis about why the Combined Modelis better than the Context Model, we take several exampleson which the Combined Model succeeds while the ContextModel fails and vice versa. To evaluate the practicality andgeneralization of the model, we use our trained model toextract blame ties from several recent news articles. rticlesTrump
Accuses
Russia of Helping
North Korea
Evade Sanctions (1)President
Donald Trump accused
Russia in unusually harsh terms of helping
North Korea evade United Nations sanctions intendedto press the country to give up its nuclear and ballistic missile programs. (2)“
Russia is not helping us at all with
North Korea ,” Trump said in an interview with Reuters on Wednesday (3). “What China ishelping us with,
Russia is denting. In other words,
Russia is making up for some of what China is doing.”
Trump has leaned on China to curb its support for
North Korean leader Kim Jong Un’s regime, and in exchange has so far laid offthe punishing trade measures he promised against the U.S.’s largest creditor during his campaign.North Korea’s weapons programs are
Trump ’s most urgent foreign crisis (4). He has vowed not to allow the country to develop amissile capable of carrying a nuclear warhead to the U.S. mainland, threatening war to prevent it if necessary. But Kim has plungedahead, and his government made rapid advances with both its missile and nuclear technology after
Trump took office.
Trump ’s criticism of
Russia (5) is striking because members of Congress have said in the past that he was too reluctant to criticize
Russia ’s foreign policy and too eager to establish good relations with President Vladimir Putin.
Major Entities Mentioned
Trump ( e ), Russia ( e ), North Korea ( e ) Table 10: An article from
Bloomberg published on January 18, 2018. Top: paragraphs of the article containing blame patterns.The blame entities are in bold face . Bottom: major entities appearing in the article. source target e e e e - 0.82 0.91 e e Case Study
To analyze the fine-grained differences between behaviorsof the Context Model and the Combined Model, we evaluatethe two models on samples of the test data, and divide thesamples into four classes, as shown in Table 9.Class I samples are those on which the Context Modelfails while the Combined Model works correctly. Thesesamples usually involve entities that appear frequently in thetrain set. For instance, we want to figure out whether
Obama blames
Republicans based on the article titled
Obama IssuesSharp Call for Reforms on Wall Street ( The New York Times ,April 2010). From sentences such as “
The president and hisallies have eagerly portrayed Republicans as handmaidensof Wall Street... Obama avoided incendiary language attack-ing Republicans... ” we can see that the blame tie (Obama, re-publicans) holds. The Context Model predicts that the blametie exists with a confidence score of 0.45, while the Com-bined Model gives a score of 0.97. The reason may be thatthe Combined Model learns prior information about Obamaand Republicans: Obama blamed Republicans before in thetraining data, which makes sense since he is a Democrat.Class II samples are those on which the Combined Modelfails, while the Context Model works correctly. In the
USAToday article
Obama tells Wall Street to join in (April 2010),the context is “...The president said the financial crisis,which has cost more than 8 million jobs so far, was ‘bornof a failure of responsibility from Wall Street all the way toWashington.’...” . The Context Model predicts that Washing-ton does not blame Wall Street, at the confidence level of 0.62, which is true. However, the Combined Model mistak-enly predicts a probability of 0.88 that the blame exists. Thisis because the Combined Model overly relies on entity in-formation. Nevertheless, in Table 9 this class of samples ismuch fewer than class I samples.
Generalization to New Cases
To validate the generalization of our model, we conduct fur-ther analysis on news articles beyond the time frame of thefinancial crisis. Since most entities in this new dataset donot appear in our financial crisis dataset, we use the con-text model for generalizability test. In particular, we man-ually annotate 13 recent articles containing 14 blame tiesfrom Google News, mostly in January 2018, and use ourpretrained Context Model to extract blame ties from the ar-ticles. The F1 on individual blame ties is 72.00% on thisnew test data, and 8 out of 13 articles are labeled correctly.The result is consistent with the result of our financial crisistest set. This further demonstrates that the linguistic patternsfor blame are generalizable to new scenarios. The articleson which the model fails mainly contain blame patterns thathave not be seen in our financial crisis dataset.Table 10 shows one of these new articles from BloombergPolitics. In this news, United States Presidents accused Rus-sia of helping North Korea evade United Nations sanctions.The blame tie is reflected in the sentences (1) (2) (3) and(5). From (4) we can infer that Trump has a negative atti-tude towards North Korea because of the nuclear weapons.The results of the Context Model are shown in Table 11. Themodel successfully identifies the blame ties from Trump toRussia and Trump to North Korea, with a high confidencescore, while the probabilities for the other entity pairs arelow.
Conclusion
We investigated blame analysis, for which previous researchlooked at the evolution of blame frames, without system-atically connecting the frames to the actors who producethem. Experiments show that neural networks are effectivefor identifying blame ties from news articles. Our approachan enable researchers to quantify the importance of a frameand to understand how and why it became prevalent. It canalso enable researchers to study actors’ position alignmentsover time. To facilitate such research, we release our codeand model for automatic blame tie extraction.
Acknowledgments
We thank Zhiyang Teng and Jie Yang for a lot of helpfuldiscussions, Yan Zhang for helping with the inter-annotatoragreement evaluation, Yu Yuan and Gokayaz Gulten andmy other colleagues for helping with the proofreading. YueZhang is the corresponding author.
References [Alicke 2000] Alicke, M. D. 2000. Culpable control and thepsychology of blame.
Psychological bulletin
Proceedings of the National Academy of Sciences
EMNLP , 76–85.[Gentzkow and Shapiro 2010] Gentzkow, M., and Shapiro,J. M. 2010. What drives media slant? evidence from usdaily newspapers.
Econometrica
The Quarterly Journalof Economics
Proceedings of the Workshop on Se-mantic Evaluations: Recent Achievements and Future Di-rections , 94–99. Association for Computational Linguistics.[Hinton et al. 2012] Hinton, G. E.; Srivastava, N.;Krizhevsky, A.; Sutskever, I.; and Salakhutdinov,R. R. 2012. Improving neural networks by prevent-ing co-adaptation of feature detectors. arXiv preprintarXiv:1207.0580 .[Hobolt and Tilley 2014] Hobolt, S. B., and Tilley, J. 2014.
Blaming europe?: Responsibility without accountability inthe european union . Oxford University Press.[Hood 2010] Hood, C. 2010.
The blame game: Spin, bu-reaucracy, and self-preservation in government . PrincetonUniversity Press.[Kingma and Ba 2014] Kingma, D., and Ba, J. 2014. Adam:A method for stochastic optimization. arXiv preprintarXiv:1412.6980 .[Li and Hovy 2014] Li, J., and Hovy, E. 2014. Sentimentanalysis on the people’s daily. In
Proceedings of the 2014Conference on Empirical Methods in Natural LanguageProcessing (EMNLP) , 467–476. Doha, Qatar: Associationfor Computational Linguistics. [Luttrell et al. 2013] Luttrell, D.; Atkinson, T.; Rosenblum,H.; et al. 2013. Assessing the costs and consequences of the2007–09 financial crisis and its aftermath.
Economic Letter
Journal of MachineLearning Research
The Journal of Politics
Association for Computational Linguistics (ACL) SystemDemonstrations , 55–60.[Mikolov et al. 2013] Mikolov, T.; Chen, K.; Corrado, G.;and Dean, J. 2013. Efficient estimation of word representa-tions in vector space.
CoRR abs/1301.3781.[Miwa and Bansal 2016] Miwa, M., and Bansal, M. 2016.End-to-end relation extraction using lstms on sequences andtree structures. arXiv preprint arXiv:1601.00770 .[Mohammad et al. 2016] Mohammad, S.; Kiritchenko, S.;Sobhani, P.; Zhu, X.; and Cherry, C. 2016. Semeval-2016task 6: Detecting stance in tweets. In
Proceedings of the 10thInternational Workshop on Semantic Evaluation (SemEval-2016) , 31–41. San Diego, California: Association for Com-putational Linguistics.[Nguyen and Grishman 2015] Nguyen, T. H., and Grishman,R. 2015. Relation extraction: Perspective from convolu-tional neural networks. In
Proceedings of the 1st Workshopon Vector Space Modeling for Natural Language Process-ing , 39–48.[Nicol 2016] Nicol, O. 2016. No body to kick, no soul todamn: Responsibility and accountability for the financial cri-sis (2007–2010).
Journal of Business Ethics
ICWSM by A. Boin, A. McConnell, P. Hart. Cam-bridge: Cambridge University Press .[Pennington, Socher, and Manning 2014] Pennington, J.;Socher, R.; and Manning, C. D. 2014. Glove: Globalvectors for word representation. In
Empirical Methods inNatural Language Processing (EMNLP) , 1532–1543.[Peters et al. 2018] Peters, M. E.; Neumann, M.; Iyyer, M.;Gardner, M.; Clark, C.; Lee, K.; and Zettlemoyer, L. 2018.Deep contextualized word representations. In
Proc. ofNAACL .[Preot¸iuc-Pietro et al. 2017] Preot¸iuc-Pietro, D.; Liu, Y.;Hopkins, D.; and Ungar, L. 2017. Beyond binary labels:political ideology prediction of twitter users. In
Proceed-ings of the 55th Annual Meeting of the Association for Com-utational Linguistics (Volume 1: Long Papers) , volume 1,729–740.[Rule, Cointet, and Bearman 2015] Rule, A.; Cointet, J.-P.;and Bearman, P. S. 2015. Lexical shifts, substantivechanges, and continuity in state of the union discourse,1790–2014.
Proceedings of the National Academy of Sci-ences
The attribution of blame:Causality, responsibility, and blameworthiness . SpringerScience & Business Media.[Tourish and Hargie 2012] Tourish, D., and Hargie, O. 2012.Metaphors of failure and the failures of metaphor: A criticalstudy of root metaphors used by bankers in explaining thebanking crisis.