Macroeconomic forecasting through news, emotions and narrative
MMacroeconomic forecasting through news, emotions and narrative
Sonja Tilly a , ∗ , Markus Ebner b and Giacomo Livan a,c a UCL, Computer Science Dep, 66 - 72 Gower St, Bloomsbury, WC1E 6EA London, UK b Quoniam Asset Management, Westhafen Tower, Westhafenplatz 1, 60327 Frankfurt am Main, Germany c Systemic Risk Centre, London School of Economics and Political Science, London, WC2A 2AE, UK
A R T I C L E I N F O
Keywords :news sentimenttime series forecastingbig dataNLP
A B S T R A C T
This study forecasts industrial production and consumer prices leveraging narrative and sentimentfrom global newspapers. Existing research includes positive and negative tone only to improvemacroeconomic forecasts, focusing predominantly on large economies such as the US. These worksuse mainly anglophone sources of narrative, thus not capturing the entire complexity of the multi-tude of emotions contained in global news articles. This study expands the existing body of researchby incorporating a wide array of emotions from newspapers around the world – extracted from theGlobal Database of Events, Language and Tone (GDELT) [32] – into macroeconomic forecasts. Wepresent a thematic data filtering methodology based on a bi-directional long short term memory neu-ral network (Bi-LSTM) for extracting emotion scores from GDELT and demonstrate its effectivenessby comparing results for filtered and unfiltered data. We model industrial production and consumerprices across a diverse range of economies using an autoregressive framework, and find that includ-ing emotions from global newspapers significantly improves forecasts compared to an autoregressivebenchmark model. We complement our forecasts with an interpretability analysis on distinct groups ofemotions and find that emotions associated with surprise and happiness have the strongest predictivepower for the variables we predict.
1. Introduction
Recent developments in automated language analysis haveallowed to quantify the elusive yet intuitive notion of narra-tive, and to quantify its predictive power in relation to changesin social systems.Research in psychology and cognitive sciences has ex-amined the role emotions and narrative play in decision mak-ing and judgement [8, 9, 14]. These studies show that emo-tions can help individuals make decisions in complex scenar-ios with uncertain outcomes. Keynes uses the term “animalspirits” to describe the dispositions and emotions that drivehuman actions, with the results of this behaviour measurablein terms of economic indices such as consumer confidence[29]. Shiller [46] finds that unsettling narrative led to eventssuch as the Great Depression in the 1920s and the Global Fi-nancial Crisis in 2008/9, arguing that narrative is a means ofpredicting the economy. A recent theoretical development– known as Conviction Narrative Theory (CNT) – draws onthe concept that to be sufficiently confident to act, agents cre-ate narratives supporting their expectations of the outcomeof their actions [40]. For instance, a study on CNT trackschanges in narrative and shows that they precede changes ineconomic growth [52].Media is an established, multi-functional tool for gov-ernments, corporations and individuals to disseminate infor-mation, connect and interact. As such, it is a major conduitfor news narrative. Nowadays, most forms of media havean online presence and produce huge volumes of data. Thisdata contains information in the form of opinions and senti-ment about financial markets and the economy, which maynot yet be reflected in macroeconomic variables. ∗ Corresponding author [email protected] (S. Tilly)
Over recent years, researchers have explored sentimentfrom different types of media and its usefulness for the pre-diction of the economy. Studies examine how to processlarge amounts of unstructured data from a variety of sourcesin order to extract signals [10, 19]. Other works outline ap-proaches to incorporate such signals into a predictive model,for instance to improve the monitoring of the economy andfinancial forecasting [35, 47].Media sentiment prediction has a wide range of applica-tion domains that Rousidis et al [43] group into finance, mar-keting and sociopolitical. Within the finance domain, studiesexplore media sentiment prediction either for specific assetsor markets (micro level) [1] or for different aspects of theeconomy (macro level) [2].Existing research incorporates mainly positive and nega-tive tone to improve macroeconomic forecasts, thus not cap-turing the entire complexity of the multitude of emotionscontained in global news articles. Most works use anglo-phone sources of narrative, focusing predominantly on largeeconomies such as the US.This study advances the existing body of research by in-corporating a wide array of emotions from newspapers aroundthe world into macroeconomic forecasts using data from theGlobal Database of Events, Language and Tone (GDELT)[32]. GDELT is a research collaboration that analyses globalnews articles and extracts items such as themes, emotions,locations, and many more. We employ a filtering method-ology based on machine learning to identify articles that arerelevant to the macroeconomic indices in question, and pro-vide a proof of concept demonstrating that emotions expressedin those news items add value to forecasts of industrial pro-duction and consumer prices across a diverse range of econo-mies, both in terms of geographic location and size. Wecomplement this with dimensionality reduction and corre-
Sonja Tilly et al.:
Preprint submitted to Elsevier
Page 1 of 9 a r X i v : . [ c s . C Y ] S e p acroeconomic forecasting through news, emotions and narrative lation analysis in order to group the more than 600 emotionscores available into a smaller number of interpretable fac-tors. We find emotions associated with “surprise” and “hap-piness” to yield the highest predictive power across the vari-ables we forecast. To the best of our knowledge, emotionsfrom GDELT Global Knowledge Graph’s Content AnalysisMeasure Systems have not yet been used to forecast macroe-conomic variables.
2. Literature review
This section addresses a selection of existing literatureon macroeconomic forecasting with media sentiment.A rapidly evolving body of literature examines the useof media sentiment and big data for economic forecasting[11, 28, 48]. The majority of studies forecast economic vari-ables with regression frameworks combining traditional datawith positive and negative sentiment classifications based onword count (as opposed to a wider spectrum of emotions).Research suggests that positive and negative sentimentfrom newspaper narrative is an effective tool for monitoringthe economic cycle [46, 52]. Similarly, newspaper narra-tive is found to precede a change in economic variables withlow frequency shifts correlating well with financial marketevents. Hence, newspaper narrative can be regarded as a riskmanagement tool [40].While most studies focus on a single economy, Baker etal [3, 4] develop indices of economic uncertainty for a widerange of countries. They use an autoregressive frameworkincluding variables derived from news as well as macroeco-nomic variables to gauge whether uncertainty shocks fore-shadow weaker macroeconomic performance. Findings sug-gest that effects of policy uncertainty on firms and macrodata raises stock price volatility, lowers investment rates andemployment growth. Political bias does not significantly im-pact the uncertainty indices. Thorsrud [51] decomposes un-structured newspaper text into daily news topics and usesthem to forecast quarterly GDP growth, producing signifi-cantly better predictions compared to central bank forecasts.A study by Pekar and Binner [41] demonstrates that addinginformation on intended purchases from Twitter tweets along-side lagged consumer index values often yields statisticallysignificant improvements over the baseline model that istrained with lag variables alone.Newspaper archives and Twitter are commonly used sour-ces for raw textual data, however there is a growing bodyof research using preprocessed sentiment scores. Ortiz [12]combines official statistics with themes from GDELT to trackChinese economic vulnerability in real-time, showing thatthe index provides valuable insights for policymakers and in-vestors. Elshendy et al [20] combine data from GDELT witha set of traditional macroeconomic variables and use socialnetwork analysis to generate predictors for macroeconomicindices such as consumer confidence, business confidenceand GDP for the 10 largest EU economies. Results showthat data extracted from GDELT is valuable for predictingmacroeconomic variables. Chen [13] examines the effect ofthe negative narrative in relation to international trade from US presidential candidates in 2016 using average tone fromGDELT. The study concludes that narrative can impact theeconomy by influencing market participants’ expectations.Glaeser et al [21] use reviews from YELP to forecast the lo-cal economy. Results from a regression analysis suggest thatthe data set is a useful complement for predicting contem-poraneous changes in the local economy. It also provides anup-to-date snapshot of economic change at local level, deliv-ering the best results for populous areas and the hospitalityindustry, given the high number of reviews.While most publications argue in favour of using mediasentiment for macroeconomic forecasting, Schaer et al [45]take a more critical view, highlighting the need for thoroughstatistical testing, careful choice of error metrics and bench-marks. They acknowledge some of the challenges when us-ing sentiment data such as data complexity, sampling insta-bility and key word selection.The majority of literature only incorporates positive andnegative tone to improve macroeconomic predictions, withjust a handful of studies featuring a wider range of emotionsin their analyses. This paper expands the existing body of re-search by incorporating nuanced sentiment from newspapersaround the world into macroeconomic forecasts of industrialproduction and consumer prices for 10 diverse economies.This paper goes beyond mere prediction and also focuseson the interpretability of results, illustrating which emotionshave the strongest predictive power.
3. Data and methods
This section introduces GDELT as data source, outlinesthe filtering methodology that is used and provides informa-tion about the nature of the sentiment scores.The GDELT Project is a research collaboration of GoogleIdeas, Google Cloud, Google and Google News, the Yahoo!Fellowship at Georgetown University, BBC Monitoring, theNational Academies Keck Futures Program, Reed Elsevier’sLexisNexis Group, JSTOR, DTIC, and the Internet Archive.The project monitors world media from a multitude of per-spectives, identifying and extracting items such as themes,emotions, locations and events. GDELT version two incor-porates real-time translation from 65 languages and mea-sures over 2,300 emotions and themes from every news ar-ticle, updated every 15 minutes [32]. It is a public data setavailable on the Google Cloud Platform.The Global Knowledge Graph (GKG), one of the tableswithin GDELT, contains fields such as sentiment scores andthemes extracted from global newspaper articles. It com-prises around 11 terabytes of data with new data being addedconstantly, starting in February 2015. To date, it has anal-ysed over one billion news items.
This study models industrial production (IP) and con-sumer price indices (CPI) for US, UK, Germany, Norway,Poland, Turkey, Japan, South Korea, Brazil and Mexico. IPis a monthly measure of economic activity. It is definedas the output of industrial establishments, covers a broad
Sonja Tilly et al.:
Preprint submitted to Elsevier
Page 2 of 9acroeconomic forecasting through news, emotions and narrative range of sectors and tracks the change in the volume of pro-duction output. The consumer price index (CPI) is selectedas monthly inflation index and describes the change in theprices of a basket of goods and services that are typicallypurchased by households.
A filtering methodology is applied to extract sentimentscores from GDELT’s GKG relevant to economic growthand inflation, respectively, containing three steps:• Step 1: Keyword filter• Step 2: Classification with neural network• Step 3: AggregationStep one consists of a top-level thematic filter based onkeywords (economic growth, inflation) to select relevant ar-ticles based on themes. Step two uses a neural network tofurther filter news items using GDELT themes. Step threeaggregates the sentiment scores to the frequency of the macro-economic variables. In addition, this step applies countryfilters to GDELT locations.To filter out non-relevant information, a simple keywordfilter is applied to GKG themes. The GDELT algorithm ex-tracts themes from every news article it analyses [34]. TheGKG contains over 12,000 unique themes.An analysis of a random set of 100 original news articlesis conducted to evaluate the keyword filter’s ability to elim-inate non-relevant news items, showing that the GDELT al-gorithm has a tendency of recognising themes where thereare none. This creates the need to further filter observationsusing the themes column. For each news article scanned,the themes are a sequence of labels, with each label repre-senting a theme. Given the nature of the data, this is a textsequence classification problem. A set of 1,000 random ar-ticles is manually classified according to relevance. This isdone by looking up the original news articles using the Doc-umentIdentifier column, which corresponds to the article’surl. In cases where the url is no longer available, the newsitem is disregarded. The themes for every article are label-encoded so that the themes are given numbers between zeroand 𝑁 − 1 , where 𝑁 is the total number of themes. Forout-of-vocabulary words, an “unknown” token is assigned.A range of classifiers are trained on 800 observationsand tested out of sample on 200 observations, where theencoded themes are the predictor variables and the classi-fications into relevant/non-relevant articles are the predictedvariables. Performance is assessed using the area under thecurve (AUC) score. He and Ma [25] suggest that the AUCis a more appropriate metric for the assessment of a classi-fier than basic accuracy, especially in the case of imbalanceddata, as the latter is too biased towards the dominant class.Table one shows the performance of different algorithms thatwere evaluated on a data set filtered for economic growth. Classifier AUCGaussian NaÃŕve Bayes 0.65Random Forest 0.67Support Vector Machine 0.75XGBoost 0.88Unidirectional NN 0.82Bi-LSTM 0.95Table 1: Classifier performanceAfter evaluating a range of algorithms, a Bi-LSTM neu-ral network is selected as it delivers the strongest perfor-mance. While recurrent neural networks have difficultieslearning long-term dependencies, long short neural networksare able to preserve information from inputs that has alreadypassed through its hidden state [26]. A Bi-LSTM architec-ture is well suited to tasks where context is important suchas sequence classification. The algorithm runs inputs simul-taneously into two directions – from the past to the futureand from the future to the past – thus preserving informationfrom both past and future at every step [23].The filtered data from step 2 is aggregated according totime period and location filters are applied to GKG locationsaccording to each country’s economic links. The locationcolumn contains a list of all locations found in each newsitem, extracted through the algorithm designed by Leetaru[31].In order to gain insights into the economic interconnect-edness of each of the 10 countries, the import and exportvolumes by trading partner are examined [16]. Six of theeconomies have diversified trade links with countries aroundthe world. Poland, Norway and Turkey trade predominantlywith Western European economies. For South Korea, overhalf of the country’s imports and exports are linked to China.Due to the trade links between economies, information relat-ing to one country may also be relevant for another one [42].Based on this idea of interconnection, a global data set in-corporating information on all 10 countries is generated forthe six global economies (US, UK, Germany, Japan, Brazil,Mexico). For Poland, Norway and Turkey, a data set con-taining information on Western European economies is gen-erated. For South Korea, a data set including information onChina is built. Within GDELT’s GKG, the Tone and the Global Con-tent Analysis Measures (GCAM) column contain over 2,300sentiment scores.The Tone field comprises a comma-delimited list of sixemotional dimensions, each recorded as floating point num-ber. From this field, the average tone of the document isused. This score typically ranges from -10 (very negative) to+10 (very positive), with zero being neutral [33]. The tonescore is based on sentiment mining. This approach countswords according to positive and negative pre-compiled dic-tionaries. The net sentiment represents the overall tone [27].
Sonja Tilly et al.:
Preprint submitted to Elsevier
Page 3 of 9acroeconomic forecasting through news, emotions and narrative
The GCAM system runs 24 content analysis systems overeach news article and returns the resulting scores as a comma-delimited list into the GCAM column [33]. The majority ofGCAM scores is based on word count, some are based onmore sophisticated methods. GCAM also includes the over-all word count for each news item analysed.There is some overlap between the GCAM scores gener-ated by the different analysis systems. Scores of followingfour analysis systems are chosen as they minimise duplica-tion of sentiment scores while incorporating a broad rangeof emotions:• WordNet-Affect was developed by Strapparava and Val-itutti [50] based on WordNet Domains [38]. WordNetDomains maps synsets, i.e. groupings of synonymouswords expressing the same concept, to domain labelssuch as Economics or Health. WordNet Affect extendsthis structure in assigning affective domain labels tothe synsets. WordNet-Affect scores are word count-based and account for 280 sentiment dimensions in theGCAM column.• The Loughran and McDonald Financial Sentiment Dic-tionary uses negative word lists specific to a financialcontext to produce scores based on word count. Theauthors find that word lists for other disciplines oftenmisclassify words in financial documents [37]. Thesystem generates six scores.• The Hedometer scores provide a measurement for over-all societal happiness for English and a range of non-English languages. In order to provide an overall score,over 10,000 unique words are rated by humans on ascale from one to nine. For each of these words, an av-erage happiness score is derived, with five being neu-tral. The system returns 12 scores.• ML-Senticon represents a multi-layered synset-levellexicon and calculates positivity and negativity scorescovering English and Spanish. This system first usesa number of algorithms to estimate the polarity of in-dividual synsets [15]. Subsequently, an average hap-piness score for a news item is calculated [17]. Thesystem generates 32 scores.The extracted data is aggregated by month. The meanand standard deviation of the tone score is calculated. Wherethe GCAM sentiment scores are based on word count, themean and standard deviation are calculated, normalized toaccount for variation of word count as done by Baker et al.[4]. For calculated sentiment scores, the mean score andstandard deviation over the period are computed. In addi-tion, the number of news items and the total word count perperiod is generated.
This section sets out how the data sets used in this studyare built. The filtering methodology is applied to build two datasets, filtered for articles relevant to economic growth and in-flation, respectively. In step one, country filters for the US,UK, Germany, Norway, Poland, Turkey, Japan, South Korea,Brazil and Mexico are applied. The countries are selected torepresent a diverse mix of economies, both in size and ge-ography. The data is aggregated to monthly frequency, frombeginning of March 2015 to end of June 2020, respectively.Model predictions incorporate true positives, false positives,true negatives and false negatives. The filtered data setscomprise true positive and false positive predictions only.They include c 5.4% and c 3.9% of noise for the economicgrowth and the inflation filter, respectively.An unfiltered data sample of aggregated GCAM scores iscreated for comparison purposes. Around five million ran-dom observations (one million for each calendar year) areselected and aggregated to monthly frequency. The unfil-tered data set contains over 60% noise i.e. news items notrelevant to economic growth or inflation, respectively.In order to account for macroeconomic effects, the balticdry index and the crude oil price are incorporated when mod-elling IP. The baltic dry index is a leading indicator for eco-nomic activity, reflecting levels of global trade [6]. A studyby Van Eyden et al [53] suggests that there is a significant re-lationship between oil price fluctuations and economic growthin OECD countries. For models forecasting CPI, the coun-tries’ respective terms of trade indices as well as the crudeoil price are included. Mihailov et al [39] find that the an-ticipated relative change in the terms of trade is a more im-portant determinant of inflation than the contemporaneousdomestic output gap. A study by Salisu et al establishes asignificant long-term positive relationship between oil priceand inflation [44].
In this section the data preparation methods are summa-rized.For the each of the 10 aforementioned economies the re-spective values for IP and CPI are used as predicted vari-ables. Both index values represent the monthly percentagechange.The augmented Dickey Fuller unit root test is applied to20 years of monthly data and rejected at 5% significance forthe above described variables.Where a sentiment score contains zeros only, it is as-sumed that the relevant GCAM system did not return anyscores and they are dropped from the respective data set. Thescores affected are mainly based on the Hedometer and MLSenticon GCAM systems. The GDELT data sets contain 664raw scores and this step reduces the amount of features to630 and 628 for data sets filtered for economic growth andinflation, respectively. The unfiltered data set retains 632features. The monthly change in sentiment scores is calcu-lated. The augmented Dickey Fuller unit root test is appliedand stationary is not rejected at 5% for any of the scores. Thesentiment scores are standardized by removing the mean and
Sonja Tilly et al.:
Preprint submitted to Elsevier
Page 4 of 9acroeconomic forecasting through news, emotions and narrative scaling to unit variance.The most recent 20% of the data are set aside as “unseen”data for testing purposes.
4. Analysis
This section outlines the analysis that is performed togauge if the sentiment scores from GDELT have predictivepower.
The Granger causality between the GDELT sentimentscores and the predicted variables is assessed as to evalu-ate if there are relationships between those variables. WhileGranger causality can provide useful insights into the rela-tion between variables, it is not testing true causality, instead,the test looks to establish if changes in one variable occur be-fore changes in the other one [22]. This means that Grangercausality may be found even when there is no causal link[30].The null hypothesis for the Granger causality test statesthat lagged sentiment scores are not causing a variable at asignificance level of 5%, while the alternate hypothesis stip-ulates that lagged sentiment scores are Granger-causing anindex at the same significance level.The Granger causality for lags up to a maximum of threemonths is evaluated. Since multiple tests for each data setare run, the resulting 𝑝 -values are adjusted according to theBenjamini-Hochberg (BH) procedure to control for multiplehypothesis testing [5]. Principal component analysis (PCA) is then applied tothe GDELT sentiment scores to discount redundancies be-tween these features due to correlations between them, whichcould lead to overfitting during modelling. PCA is widelyused in literature to reduce high-dimensional data. Stockand Watson [49] show that a small number of principal com-ponents extracted from a large data set can be used to pre-dict macroeconomic indices. Similarly, Hanson and McMa-hon apply principal component analysis to features extractedfrom FOMC communications and demonstrate that the prin-cipal components have an effect on economic variables [24].PCA is applied to the data sets filtered for IP and CPI,respectively. The first eight principal components are used,explaining between 45% and 48% of the variance.
Figure 1:
Variance explained by the first eight principal com-ponents (global data sets)
Figure 1 illustrates the percentage of variance each prin-cipal component explains extracted from the global data sets.
As a further step in the analysis of the sentiment scores,an autoregressive framework is used for forecasting macroe-conomic variables.This framework allows modeling a 𝑇 × 𝐾 multivariatetime series 𝑌 , where 𝑇 denotes the number of observationsand 𝐾 the number of variables. The framework is defined as 𝑌 𝑡 = 𝑣 + 𝐴 𝑌 𝑡 −1 + ⋯ + 𝐴 𝑝 𝑌 𝑡 − 𝑝 + 𝑢 𝑡 (1)where 𝐴 𝑖 is a 𝐾 × 𝐾 coefficient matrix, 𝑣 is a constant and 𝑢 𝑡 is white noise. The model is then calibrated for each macroe-conomic variable and each country.The optimal lag length is selected based on the the Akaike(AIC) and the Bayesian (BIC) information critera. Thesemeasures are based on the idea that the inclusion of a fur-ther term may improve the model however the model shouldalso be penalised for increasing the number of parametersto be estimated. When the improvement in goodness-of-fitoutweighs the penalty term, the statistic associated with theinformation criterion decreases. Thus, the lag which min-imises the information criterion is selected [7].For each country, the respective country macroeconomicvariable, the respective two explanatory variables and theeight principal components derived from the GDELT senti-ment scores are included into the model.The benchmark is an autoregressive model including thepredicted indices and the explanatory variables as describedabove but excluding the GDELT sentiment factors, with thesame lag as the above autoregressive framework.The model is trained on 80% of the data and is tested outof sample using the most recent 20% of the data that havebeen set aside.Performance is assessed using the root mean squared er-ror (RMSE) on the test set predictions. Sonja Tilly et al.:
Preprint submitted to Elsevier
Page 5 of 9acroeconomic forecasting through news, emotions and narrative
5. Research findings
This section presents the findings from the analysis setout in the previous section.
The sentiment scores from GDELT and the macroeco-nomic indices for 10 countries are tested for Granger causal-ity, with a maximum lag of three months. Tables 2 and 3 dis-play the number of BH-adjusted 𝑝 -values that exhibit signifi-cance at 5% for each country’s macroeconomic variable. The“Filtered” column refers to results from models includingGDELT sentiment scores filtered for economic growth andinflation respectively, while the “Unfiltered” column showsthe results for the models incorporating the unfiltered GDELTsentiment.Country Data set Filtered UnfilteredUS 3 0UK 28 0Germany 8 7Norway 30 0Poland 5 0Turkey 4 1Japan 5 0South Korea 6 0Brazil 35 64Mexico 12 0Table 2: IP: Number of significant BH-adjusted 𝑝 -valuesCountry Data set Filtered UnfilteredUS 14 0UK 30 8Germany 13 3Norway 9 0Poland 16 3Turkey 57 1Japan 18 0South Korea 17 0Brazil 41 0Mexico 35 15Table 3: CPI: Number of significant BH-adjusted 𝑝 -valuesNotwithstanding the limitations of the Granger causal-ity test [30], the results show a pattern. For both macroe-conomic variables, the filtered data sets exhibit consistentGranger causality across countries. The exception is Brazilin the case of IP, where unfiltered data set shows more Gran-ger causality than the filtered one. The analysis suggests that the filtering methodology in-troduced in section 3.2 adds value and is able to generatesentiment scores that have a relationship with economic in-dices. The respective filtered sentiment data sets are condensedinto eight principal components using PCA. They are thenused to predict IP and CPI, for 10 countries each with themodel in Eq. (1). For comparison, the respective variablesare forecast with unfiltered data. Tables 4 and 5 provide ahigh-level summary of the results from predictions on themost recent 20% of the data that has been set aside for fortesting.The “Filtered” column refers to results from models in-cluding filtered GDELT data, while the “Unfiltered” columnrefers to results for the models using unfiltered GDELT data.Blue (red) cells denote cases in which the models outper-form (underperform) the benchmark. Numbers in parenthe-ses correspond to the number of significant coefficients asso-ciated with GDELT factors in the model in Eq. (1), with theasterisks denoting the level of their statistical significance.
IP for Data set Filtered UnfilteredUS **(1), *(1) ***(1), *(1)UKGermanyNorway *(3)Poland *(1) ***(1), **(2)Turkey **(1) ***(1), **(2), *(4)Japan ***(1), **(1), *(2)South Korea **(3), *(2)BrazilMexico **(1) **(1), *(1)
Table 4: Results of the model in Eq. 1 applied to IP.Blue (red) cells denote cases in which the model outperforms(underperforms) the benchmark. Numbers in parenthesescorrespond to the number of significant coefficients associ-ated with GDELT factors in the model in Eq. (1) ( ∗∗∗ de-notes at least one GDELT sentiment factor with 𝑝 -value <0.01, ∗∗ < ∗ < Sonja Tilly et al.:
Preprint submitted to Elsevier
Page 6 of 9acroeconomic forecasting through news, emotions and narrative
CPI for Data set Filtered UnfilteredUS **(1), *(2)UK ***(1), **(2)Germany **(1), *(1) ***(1), **(1)Norway **(1), *(2) *(1)Poland ***(2) **(1), *(1)Turkey *(1) ***(3), **(3), *(1)Japan ***(1), **(4), *(2) **(1)South Korea **(1), *(1) ***(2)Brazil **(1)Mexico
Table 5: Results of the model in Eq. 1 applied to CPI.Blue (red) cells denote cases in which the model outperforms(underperforms) the benchmark. Numbers in parenthesescorrespond to the number of significant coefficients associ-ated with GDELT factors in the model in Eq. (1) ( ∗∗∗ de-notes at least one GDELT sentiment factor with 𝑝 -value <0.01, ∗∗ < ∗ < In order to gain insights into the relationship betweensentiment scores and the principal components derived fromthe filtered GDELT data, the loadings corresponding to eachcomponent are examined.Loadings correspond to the strength of relationship be-tween the original sentiment scores and the principal compo-nents, quantifying the relevance of the underlying sentimentscores in each of the components. They are derived by mul-tiplying eigenvectors by the square root of the eigenvalues.All sentiment scores from GDELT represent a specificemotion such as “cheerfulness”, “euphoria” or “joy” and aremanually mapped to seven universal emotions as set out byEkman and Corduro [18]. These seven emotions define emo-tions as discrete, automatic reactions to events and stipulatethat emotions such as happiness or anger describe groupsof related states with distinct common traits. According tothese seven groups, the above mentioned emotions are as-signed to “happiness”.For each principal component, the squared loadings aresummed according to these seven distinct emotions. Map- ping sentiment scores onto emotions provides some inter-pretability to our analysis, by allowing us to investigate whichemotions are associated with each principal component.
Figure 2:
IP: Significant principal components explained byemotions (US)
Figure 3:
CPI: Significant principal components explained byemotions (US)
As an example, in Figs. 2 and 3 we show radar charts ofthe emotions associated to the loadings corresponding to thestatistically significant principal components used to fore-cast IP and CPI in the US. As can be seen from Tables 4 and5, the corresponding models outperform the benchmarks weconsidered and are associated with substantial statistical sig-nificance. The principal components explain 8.3% and 8.9%of the variance for the components shown in Figs. 2 and 3,respectively. Further charts can be provided upon request.
Sonja Tilly et al.:
Preprint submitted to Elsevier
Page 7 of 9acroeconomic forecasting through news, emotions and narrative
The findings from this example show that the factors weuse to predict IP and CPI can be associated with well definedemotions. Therefore, movements in such emotions – as ex-pressed in news articles published by global newspapers –contribute to explain movements in major macroeconomicindices. Of the seven distinct emotions, “surprise” and “hap-piness” have the strongest predictive power. This is the caseacross all principal components.From a Keynesian point of view, these emotions can beconsidered “animal spirits” that guide human behaviour, ul-timately being reflected in economic variables [29]. Like-wise, Loewenstein argues that intense emotions – so-calledvisceral factors – affect the economy as they lead to individ-ual decisions and actions and should therefore be includedinto economic models [36]. Our findings agree with theschool of thought that human emotions have an impact onthe economy.
6. Discussion
This study attempts to forecast macroeconomic indicesusing sentiment scores derived from GDELT. It introducesa filtering methodology to extract and aggregate large vol-umes of data. The methodology is applied to build datasets filtered for economic growth and inflation. The country-specific macroeconomic indices are forecast using data setsfor IP and CPI, respectively, that take into account each coun-try’s trade links when applying location filters. The filtereddata exhibits consistent Granger causality across the twomacroeconomic variables, except for Brazil in the case ofIP. Autoregressive models including the filtered data outper-form their benchmark for most predicted variables and over-all produce better results than those using unfiltered data.Mapping the GDELT sentiment scores onto distinct emo-tions helps understand how these emotions relate to eachprincipal component and thus interpret our analysis, suggest-ing that “surprise” and “happiness” are their main drivers.The findings from this study agree with the school of thoughtthat emotions drive human behaviour and eventually impactthe economy.
This study examines linear relationships between GDELTvariables and macroeconomic indices. Investigating non-linear interactions between these variables could potentiallygenerate further insights and could be an extension to thisexperiment.All GDELT sentiment scores are incorporated into the fore-casting framework. An alternative approach could be tochoose a subset of sentiment scores based on a feature selec-tion criterion.GDELT starts end of February 2015 and thus has a shorttrack record. Particularly when modelling monthly data, thesmall amount of observations is likely to impact the signifi-cance of results.
7. Conclusions
Short-term forecasting using narrative and sentiment frommedia has emerged in recent years. The majority of researchextracts data from anglophone sources, utilising means suchas simple word count of positive and negative keywords;few studies use big data [43]. As a result, existing worksdo not capture the entire complexity of global news articles.This study expands the existing body of research by creatinga data set that incorporates a wide array of emotions fromnewspapers around the world. To the best of our knowledge,the GCAM sentiment scores from the GDELT GKG have notyet been used to forecast macroeconomic variables; hencethe experiment introduces a new data source.The study represents a proof of concept showing thatthe filtering methodology presented captures relevant signalsand that the data extracted from GDELT adds value whenforecasting macroeconomic variables. The findings demon-strate that the sentiment factors derived from GDELT weuse to predict IP and CPI can be linked to distinct emotions.Therefore, fluctuations in such emotions âĂŞ as expressedin news articles published by global newspapers âĂŞ helpexplain changes in major macroeconomic indices.
Acknowledgments
GL acknowledges support from an EPSRC Early CareerFellowship (Grant No. EP/N006062/1). ST acknowledgessupport from Quoniam Asset Management.
References [1] Allen, D.E., McAleer, M., Singh, A.K., 2019. Daily market newssentiment and stock prices. Applied Economics 51, 3212–3235.[2] Ardia, D., Bluteau, K., Boudt, K., 2019. Questioning the news abouteconomic growth: Sparse forecasting using thousands of news-basedsentiment values. International Journal of Forecasting 35, 1370–1386.[3] Baker, S., Bloom, N., Davis, S., Terry, S., 2020. Covid-induced eco-nomic uncertainty and its consequences. VoxEU. org 13.[4] Baker, S.R., Bloom, N., Davis, S.J., 2016. Measuring economic pol-icy uncertainty. The quarterly journal of economics 131, 1593–1636.[5] Benjamini, Y., Yekutieli, D., 2005. False discovery rate–adjustedmultiple confidence intervals for selected parameters. Journal of theAmerican Statistical Association 100, 71–81.[6] Bildirici, M.E., Kayıkçı, F., Onat, I.Ş., 2015. Baltic dry index asa major economic policy indicator: the relationship with economicgrowth. Procedia-Social and Behavioral Sciences 210, 416–424.[7] Brooks, C., Tsolacos, S., 2010. Real Estate Modelling and Forecast-ing. doi: .[8] Brosch, T., Scherer, K.R., Grandjean, D.M., Sander, D., 2013. Theimpact of emotion on perception, attention, memory, and decision-making. Swiss medical weekly 143, w13786.[9] Bruner, J.S., 1990. Acts of meaning. volume 3. Harvard universitypress.[10] Buono, D., Kapetanios, G., Marcellino, M., Mazzi, G.L., Papailias,F., . Evaluation of nowcasting/flash estimation based on a big set ofindicators. Paper prepared for the 16th Conference of IAOS .[11] Buono, D., Mazzi, G.L., Kapetanios, G., Marcellino, M., Papailias,F., 2017. Big data types for macroeconomic nowcasting. EurostatReview on National Accounts and Macroeconomic Indicators 1, 93–145.[12] Casanova, C., Ortiz, A., Rodrigo, T., Xia, L., Iglesias, J., et al., 2017.Tracking chinese vulnerability in real time using Big Data. TechnicalReport.
Sonja Tilly et al.:
Preprint submitted to Elsevier
Page 8 of 9acroeconomic forecasting through news, emotions and narrative [13] Chen, H.Y., Lo, T.C., 2019. Online search activities and investor at-tention on financial markets. Asia Pacific Management Review 24,21–26.[14] Clore, G.L., Palmer, J., 2009. Affective guidance of intelligent agents:How emotion controls cognition. Cognitive systems research 10, 21–30.[15] Cruz, F.L., Troyano, J.A., Pontes, B., Ortega, F.J., 2014. Buildinglayered, multilingual sentiment lexicons at synset and lemma levels.Expert Systems with Applications 41, 5984–5994.[16] Datawheel, Simoes, A., Hidalgo, C.A., . The observatory of economiccomplexity. URL: https://oec.world/ .[17] Dodds, P.S., Harris, K.D., Kloumann, I.M., Bliss, C.A., Danforth,C.M., 2011. Temporal patterns of happiness and information ina global social network: Hedonometrics and twitter. PloS one 6,e26752.[18] Ekman, P., Cordaro, D., 2011. What is meant by calling emotionsbasic. Emotion review 3, 364–370.[19] Elshendy, M., Colladon, A.F., Battistoni, E., Gloor, P.A., 2018. Us-ing four different online media sources to forecast the crude oil price.Journal of Information Science 44, 408–421.[20] Elshendy, M., Fronzetti Colladon, A., 2017. Big data analy-sis of economic news: Hints to forecast macroeconomic indica-tors. International Journal of Engineering Business Management 9,1847979017720040.[21] Glaeser, E.L., Kim, H., Luca, M., 2017. Nowcasting the local econ-omy: Using yelp data to measure economic activity. Technical Re-port. National Bureau of Economic Research.[22] Granger, C.W., 1969. Investigating causal relations by econometricmodels and cross-spectral methods. Econometrica: journal of theEconometric Society , 424–438.[23] Graves, A., Schmidhuber, J., 2005. Framewise phoneme classifica-tion with bidirectional lstm networks, in: Proceedings. 2005 IEEEInternational Joint Conference on Neural Networks, 2005., IEEE. pp.2047–2052.[24] Hansen, S., McMahon, M., 2016. Shocking language: Understandingthe macroeconomic effects of central bank communication. Journalof International Economics 99, S114–S133.[25] He, H., Ma, Y., 2013. Imbalanced learning: foundations, algorithms,and applications. John Wiley & Sons.[26] Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory.Neural computation 9, 1735–1780.[27] Hu, M., Liu, B., 2004. Mining and summarizing customer reviews,in: Proceedings of the tenth ACM SIGKDD international conferenceon Knowledge discovery and data mining, pp. 168–177.[28] Kapetanios, G., Papailias, F., et al., 2018. Big data & macroeconomicnowcasting: Methodological review. Economic Statistics Centre ofExcellence, National Institute of Economic and Social Research .[29] Keynes, J.M., 2018. The general theory of employment, interest, andmoney. Springer.[30] Leamer, E.E., 1985. Self-interpretation. Economics and Philosophy1, 295âĂŞ302. doi: .[31] Leetaru, K., 2016. Can we forecast conflict? A framework for fore-casting global human societal behavior using latent narrative indica-tors. Ph.D. thesis. University of Illinois at Urbana-Champaign.[32] Leetaru, K.H., a. The gdelt project. URL: .[33] Leetaru, K.H., b. THE GDELT GLOBAL KNOWLEDGE GRAPH(GKG) DATA FORMAT CODEBOOK v2.1. URL: http://gdeltproject.org/ .[34] Leetaru, K.H., 2015. Mining libraries: Lessons learned from 20 yearsof massive computing on the worldâĂŹs information. InformationServices & Use 35, 31–50.[35] Levenberg, A., Pulman, S., Moilanen, K., Simpson, E., Roberts, S.,2014. Predicting economic indicators from web text using sentimentcomposition. International Journal of Computer and CommunicationEngineering 3, 109–115.[36] Loewenstein, G., 2000. Emotions in economic theory and economicbehavior. American economic review 90, 426–432. [37] Loughran, T., McDonald, B., 2011. When is a liability not a liability?textual analysis, dictionaries, and 10-ks. The Journal of Finance 66,35–65.[38] Magnini, B., Cavaglia, G., 2000. Integrating subject field codes intowordnet., in: LREC, pp. 1413–1418.[39] Mihailov, A., Rumler, F., Scharler, J., 2011. The small open-economynew keynesian phillips curve: empirical evidence and implied infla-tion dynamics. Open Economies Review 22, 317–337.[40] Nyman, R., Kapadia, S., Tuckett, D., Gregory, D., Ormerod, P.,Smith, R., 2018. News and narratives in financial systems: exploitingbig data for systemic risk assessment .[41] Pekar, V., Binner, J., 2017. Forecasting consumer spending from pur-chase intentions expressed on social media, Association for Compu-tational Linguistics.[42] Piccardi, C., Tajoli, L., 2018. Complexity, centralization, and fragilityin economic networks. PloS one 13, e0208265.[43] Rousidis, D., Koukaras, P., Tjortjis, C., 2020. Social media pre-diction: a literature review. Multimedia Tools and Applications 79,6279–6311.[44] Salisu, A.A., Isah, K.O., Oyewole, O.J., Akanni, L.O., 2017. Mod-elling oil price-inflation nexus: The role of asymmetries. Energy 125,97–106.[45] Schaer, O., Kourentzes, N., Fildes, R., 2019. Demand forecasting withuser-generated online information. International Journal of Forecast-ing 35, 197–212.[46] Shiller, R.J., 2017. Narrative economics. American Economic Re-view 107, 967–1004.[47] Slaper, T., Bianco, A., Lenz, P., 2018. Digital vapor trails: Using web-site behavior to nowcast entrepreneurial activity, in: 2nd InternationalConference on Advanced Reserach Methods and Analytics (CARMA2018), Editorial Universitat Politècnica de València. pp. 107–113.[48] Stern, S., Livan, G., Smith, R.E., 2020. A network perspective onintermedia agenda-setting. arXiv preprint arXiv:2002.05971 .[49] Stock, J.H., Watson, M.W., 2002. Macroeconomic forecasting usingdiffusion indexes. Journal of Business & Economic Statistics 20, 147–162.[50] Strapparava, C., Valitutti, A., et al., 2004. Wordnet affect: an affectiveextension of wordnet., in: Lrec, Citeseer. p. 40.[51] Thorsrud, L.A., 2016. Nowcasting using news topics. big data versusbig bank .[52] Tuckett, D., Ormerod, P., Smith, R., Nyman, R., 2014. Bringingsocial-psychological variables into economic modelling: Uncertainty,animal spirits and the recovery from the great recession. Animal Spir-its and the Recovery from the Great Recession (January 12, 2014) .[53] Van Eyden, R., Difeto, M., Gupta, R., Wohar, M.E., 2019. Oil pricevolatility and economic growth: Evidence from advanced economiesusing more than a centuryâĂŹs data. Applied Energy 233, 612–621.
Sonja Tilly et al.: