ESG2Risk: A Deep Learning Framework from ESG News to Stock Volatility Prediction
Tian Guo, Nicolas Jamet, Valentin Betrix, Louis-Alexandre Piquet, Emmanuel Hauptmann
aa r X i v : . [ q -f i n . C P ] M a y ESG2Risk: A Deep Learning Framework from ESGNews to Stock Volatility Prediction
Tian Guo ∗ Nicolas Jamet Valentin Betrix Louis-Alexandre PiquetEmmanuel Hauptmann
Systematic Equity ResearchRAM Active Investments
Abstract
Incorporating environmental, social, and governance (ESG) considerations intosystematic investments has drawn numerous attention recently. In this paper, wefocus on the ESG events in financial news flow and exploring the predictive powerof ESG related financial news on stock volatility. In particular, we develop apipeline of ESG news extraction, news representations, and Bayesian inferenceof deep learning models. Experimental evaluation on real data and different mar-kets demonstrates the superior predicting performance as well as the relation ofhigh volatility prediction to stocks with potential high risk and low return. It alsoshows the prospect of the proposed pipeline as a flexible predicting framework forvarious textual data and target variables.
The widely adopted perspective for judging the sustainability of equity investments is along threepillars, E S G, which stand for E nvironmental, S ocial and G overnance. In this paper we propose anovel approach of volatility forecasting based on ESG newsflow, an original integration of ESG intothe investment process. Recently, integrating sustainability into investment strategies are receiv-ing exponentially increasing attention in finance [3, 15]. Environmental metrics cover all aspectsof the firm’s interaction with the environment, such as its CO2 emissions, its approach to the cli-mate change transition or its broad strategy in the use of natural resources. The Social dimensionencompasses all standards set by companies as they build relationships with employees, suppliersand the communities in which they operate (labor conditions, equality, fairness to suppliers) whileGovernance would cover leadership elements like executive compensation, diversity of the boardand controversies. These ESG inputs are vital to assess the sustainability and the relevant risks ofan investment position [16]. Conventionally, ESG related factors are formatted as structured datato facilitate the integration of ESG aspects into quantitative models and the building of expertise insystematic ESG investing.In our research, we study the predictive power of ESG news on volatility. We develop a pipelinefor ESG news extraction and a state-of-the-art transformer based language model to predict stockvolatility [18, 4, 9]. This pipeline can be generalized to other predicting targets with ease. As ameasure of price fluctuations and market risk, volatility plays an important role in trading strategies,investment decisions and position scaling. In this paper, we focus on predicting Equity realizedvolatility, which is empirically calculated by the variance of observed returns of an asset. Volatilitypredictions often rely on predictive models based purely on price/return time-series, from standard ∗ Correspondence to [email protected] tatistical models of the GARCH family up to more recent deep-learning model based predictions[13].The input to our models is an alternative source of ESG information: textual financial news-flow.Compared to structured ESG data provided by analysts or data vendors, ESG information fromnews-flow reflects more timely events of companies, and offers an alternative channel of capturingthe relation of ESG events to market dynamics in a timely manner. Numerous research demonstratedthat financial news is closely related to market and is becoming a gold mine to analyze marketparticipants’ behaviour [12, 7]. An intuitive example is illustrated in Fig.1.
T-Mobile, Sprint agree on new merger terms
Figure 1: An intuitive example of financial news and stock price movement from T-Mobile.Though bearing rich information, ESG news is challenging to process for predicting models. Rawtextual data is categorical and symbolic represented, which is a hindrance for quantitative models.Financial news is sparse in the sense that it moves in-parallel with real-world events in irregulartimings. This is in contrast to the structured and well-formatted market and factor data typicallyused in conventional quantitative models. Although there are a variety of work studying predictingmarket behaviours with different data sources [5, 19, 17, 2], how to exploit the predictive power ofESG news on volatility is rarely researched.To this end, we resort to natural language processing (NLP) and deep learning techniques to explorethe predictive power of ESG news. In particular, one key NLP technique, which helps hurdle thechallenges above, is language representation (i.e. text embedding) [18, 4, 10]. Deep neural networks,e.g. recurrent neural networks and transformer, trained on large scale text corpus exhibit remarkablesuccess in a variety of NLP applications, such as sentiment analysis, text matching, dialogue sys-tems and so on. This technique transforms text symbols into a numerically high-dimensional densevectors, while importantly still preserving context and semantic relations.
Contributions.
Specifically, the contribution of this paper is as follows:• We propose a NLP and deep learning based pipeline for ESG news integration.• We exploit the Transformer based language model to transform textual news into numericalrepresentations including sentiments and semantic-preserving text embedding.• The predicting model is based on Bayesian inference, in order to enable stable and robustpredicting.• Evaluation on real data and different markets demonstrates the superior predicting perfor-mance as well as the efficacy of the volatility prediction in stock selection.
In this part, we give an overview of the proposed ESG2Risk framework and define the notationsused throughout the paper.
Pipeline.
Fig. 2 illustrates the pipeline mainly consisting of four components: ESG news extraction,transformation, deep learning models, and the strategy. The news flow represents streaming piecesof news, which can be obtained by a variety of tools, e.g. online news feeds, finance data vendors,web crawlers, and so on. 2 tr ategyNews Flow Figure 2: Pipeline of the ESG2Risk Framework.We first brief the main components of ESG2Risk as follows and then will mainly describe the com-ponents of ESG news extraction, transformation, and deep learning models in the next subsection.• ESG news extraction: generic financial news flow rarely provides labelled ESG newsand thus we developed the in-house extraction process. Streaming financial news firstgo through a filter, which makes use of an ESG vocabulary defined by our domain ex-perts. The output is ESG related news and the corresponding companies mentioned ineach piece of news. The mentioned companies are commonly provided as accompaniedattributes by data vendors and can also be derived by linking entities in news to a stockdictionary. For instance, Fig. 3 shows the top ESG topics in news within one week period. ) U D X G $ S S H D O / D Z V X L W & X V W R P H U V D I H W \ , Q W H O O H F W X D O S U R S H U W \ & R P S O L D Q F H ( [ H F X W L Y H F R P S H Q V D W L R Q % R D U G F R P S H Q V D W L R Q & X V W R P H U K H D O W K 3 D W H Q W 3 X E O L F K H D O W K $ Q W L W U X V W ( P S O R \ H H K H D O W K : R U N I R U F H L V V X H V 6 X S S O \ F K D L Q 1 X P E H U Figure 3: Top-15 ESG topics in the news from an example time period April 06, 2020 toApril 12, 2020.• Transformation: news is represented by symbolic textual data, which is unstructured andinfeasible for quantitative models to process. As a results, this step aims to transformnews into quantitative representations by transformer based language model and sentimentanalysis [18, 1, 4].Language models or text embedding is a powerful technique in natural language processing(NLP), which is able to transform symbol represented text into numerical dense vectorspreserving the semantic relatedness of text in the numerical space. More details will begiven in the following.• Models: in the model training phase, we collect a dataset including pairs of numerical newsrepresentations and the volatility of the corresponding companies for supervised trainingof the predicting model.In the inference phase of the production system, the newly arriving numerical news repre-sentations are fed into the trained models to predict the future volatility of the companiesmentioned in the news. 3or both training and inference phases, we take advantages of Bayesian inference to en-able stable learning and robust predicting [9, 14]. Details will be presented in the nextsubsection.• Strategy: by using the volatility predictions from deep learning models, either stand-aloneor combined with other signals, we design stock selection strategy.
Problem Definition.
We formally define the problem of ESG news based volatility predicting asfollows.Market risk exists because of price changes [13]. The volatility of a stock i is used to characterizethe risk and the return fluctuation over time. Let p i,t be the price of stock i at the end of a tradingperiod t with closing returns r i,t given by r i,t = p i,t p i,t − − (1)In this paper, we focus on the realized volatility, which is defined as: v i,t = s P j r i,j K t (2), where K t is the number of return samples.Define the universe of stocks as I . At time t , a set of stocks I t ⊆ I is identified with ESG news ina sliding window w.r.t. t . It means for a stock i ∈ I t , it has ESG news mentions during time period [ t − w, t ] , where w is the window length. Then, the news mentioning i at time t is represented by N i,t = { n m } m ∈ M i,t , where M i,t is the number of ESG news corresponding to stock i in the timewindow [ t − w, t ] . n m denotes the text of one piece of news.In the inference phase, given observed N i,t , we predict the forward volatility v i,t +∆ by a quantitativemodel f ( · ) , namely ˆ v i,t +∆ = f ( Trm ( N i,t )) , where Trm ( · ) is the operation of transforming unstruc-tured news text into model-friendly numerical data and will be described in the next subsection. ∆ isthe forecasting horizon. Then, based on the set of predictions { ˆ v i,t +∆ } i ∈ I t , we can further developstock selection strategies.In the learning phase, we collect paired news representations and ground-truth forward volatility, de-noted by a dataset D = { ( N i,t , v i,t +∆ ) } i ∈I ,t ∈T , to train the predicting model f ( · ) in the supervisedway. In this part, we describe how to obtain numerical representations of news. It is mainly based onpre-trained language models [18, 4].Conceptually, the language model is designed to learn the usage of various words, phrases and howthe language is written in general. Technically, the typical building blocks in contemporary languagemodels are recurrent neural networks, convolutional neural networks, or the more recently proposedTransformer architecture. A variety of language models are developed by stacking certain types ofbuilding blocks with specialized training procedures.During the training phase of a language model, each token of a text segment is initialized as a nu-merical dense vector. These numerical vectors are fed into (stacked) building blocks to capture con-textual relationships among words. Subsequently, the output vectors of building blocks are used insome language predicting tasks as the training objective, for instance, next word prediction, maskedword prediction, next sentence prediction, etc.A language model is typically pre-trained on a large corpus of text, for instance, the completeWikipedia dump(2,500 million words), Book Corpus (800 million words), etc. The training processaims to maximize the accuracy of these language predicting tasks, so as to learn vectors representingthe semantic relations in large amount of text. This pre-trained language model serves as the SwissArmy Knife for downstream NLP tasks. For instance, sentiment classification is proven to benefitfrom text embedding by pre-trained language models [1]. Refer to Fig. 4 for an illustration of textembedding. 4igure 4: An illustrative example of embedding text into numerical semantic space. Raw newstext is transformed into dense vectors in the numerical space. The different colours represent thecorrespondence between text and vectors (i.e. points) in the space. Texts with semantic closenessare mapped to points close in the semantic space.In our case, given news N i,t = { n m } m ∈ M i,t , the transformation is defined as follows:Trm ( N i,t ) := ( pool s ( { s m } ) , pool e ( { e m } )) , (3)where s m and e m represent the sentiment information and text embedding of one piece of news.In particular, we use the sentiment analyzer trained on text embedding to extract sentiment scoresover segments of news content, which form the fine-grained sentiment vector s m for each piece ofnews. By feeding ESG news text into a pre-trained Transformer based language model (e.g. BERT,RoBERTa, etc.) [4, 10], the derived vectors { e m } are quantitative-model-friendly and convenientlyused in the subsequent learning of the volatility predicting model.Note that different stocks have different number of ESG news over time. We apply pooling opera-tions to { s m } and { e m } , in order to format them as structured data across stocks and time instants.The derived dataset of the uniform shape will be ready for feeding the volatility model. We choosethe simple average pooling for pool s ( · ) and pool e ( · ) , i.e. pool ( { x k } ) := P x k |{ x k }| , while it is flexibleto use different ones. In this part, we present the architecture of the volatility model and Bayesian learning and inference.Since we have two types of data, sentiment and text embedding from Trm ( N i,t ) , we design theencoder and information fusion architecture, as is shown in Fig. 5. The idea is to capture data modalspecific information by individual encoders and then to perform prediction using fused informationfrom encoders. Different information fusion strategies, e.g. attention mechanism, mixture, etc. canbe applied [6]. For encoders, we choose stacked dense layers (with residual connections) [8], thoughalternative encoders are free to choose.Figure 5: Architecture of the Volatility Model Parameterized by Θ .Instead of using the conventional squared error loss as the learning objective, we aim to learn theposterior of the model parameter in the Bayesian setting, as is defined below: p (Θ |D ) ∝ Y i,t p Θ ( v i,t +∆ | Trm ( N i,t )) | {z } Likelihood · p (Θ |{z} Prior ) , (4)where the set of parameters of the model to learn from data is denoted by the random variables Θ .5n our task, the target variable volatility is essentially noisy and volatile. More complexly, thesentiment and embedding vectors are high-dimensional, for instance, the embedding derived byBERT is a -dimensional vector. These present a great challenge for conventional stochasticgradient descent based training algorithms to learn stable patterns in the data.On the contrary, Bayesian style learning is probabilistic and theoretically designed to handle thestochasticity inherent in the data [9, 14]. Meanwhile, learning the posterior leads to ensembleinference, which aggregates predictions from a set of model realizations. It has been shown inmany applications ensemble inference gives rise to more robust and accurate predicting perfor-mance [11, 9, 14]. From the perspective of maximum a posterior probability (MAP) estimate, theprior in Eq. 4 also functions as a regularization term to ensure the generalization ability of the learnedmodel.Figure 6: Illustration of Bayesian Model Ensemble. The model in each block represents a realizationof the model in Fig. 5 by the parameter sample Θ .Concretely, we use the modern stochastic gradient based Markov Chain Monte Carlo (SG-MCMC)method to obtain the approximate posterior of p (Θ |D ) in the training phase [11, 9]. Then, in theinference phase, given a testing set of news for stock j , the predictive probabilistic density of thevolatility is derived as: p ( v j,t +∆ | Trm ( N j,t ) , D ) = Z Θ p Θ ( v j,t +∆ | Trm ( N j,t )) · p (Θ |D ) d Θ (5)Empirically, we use Monte Carlo method to sample parameter Θ c from the posterior, thereby gettinga set of model realizations, as is shown in Fig. 6. Each model realization provides the predictivemean of the volatility and the overall prediction is derived as: ˆ v j,t +∆ = 1 C C X c =1 E [ v j,t +∆ | Trm ( N j,t ) , Θ c ] , Θ c ∼ p (Θ |D ) (6) In this part, we report evaluation results to demonstrate the efficacy of our predicting pipeline.
Data . From a collection of financial news in the time period from 2003 to 2019, we extract around50 thousands ESG related news. We link ESG news to companies from two different markets, i.e.MSCI US and All Cap EU. Then, for each market, we build the training and validation data using thetime period from 2003 to 2014, while the rest is used as the out-of-sample testing data. We evaluatethe predicting performance of each market independently. The results reported below are from thetesting data.
Performance . Table 1 and 2 exhibit the errors on two different predicting horizons, 1 week and2 weeks. We report the rooted mean squared error (RMSE) and mean absolute error (MAE). Forcomparison, the performance of the model using solely sentiments, i.e. Senti in the table, and ourESG2Risk, using both sentiment and text embedding, are reported. It is observed that on bothmarkets, ESG2Risk significantly outperforms the Senti method.6able 1: One Week Forward Volatility Predicting Errors.Bold values indicate better performance.Market RMSE MAESenti ESG2Risk Senti ESG2RiskMSCI-US 0.663
AC-EU 0.630
Table 2: Two Week Forward Volatility Predicting Errors.Bold values indicate better performance.Market RMSE MAESenti ESG2Risk Senti ESG2RiskMSCI-US 0.669
AC-EU 0.624
To assess the possibility to use our ESG2Risk predictions to build more attractive risk-adjustedreturn portfolios, we split the stocks based on quintiles of volatility predictions and calculate theaverage return of each quintile with 1 week and 2 weeks holding periods.The average realized standard deviation of returns and returns of these quintile portfolios are shownin Fig. 7 and 8 and Fig. 9 and 10. 4 X B 4 X B 4 X B 4 X B 4 X B 6 W D Q G D U G ' H Y L D W L R Q R I 5 H W X U Q : H H N 4 X B 4 X B 4 X B 4 X B 4 X B 6 W D Q G D U G ' H Y L D W L R Q R I 5 H W X U Q : H H N V Figure 7: MSCI US Quintile Portfolio Return Standard Deviation based on 1 and 2-week forwardvolatility predictions. 4 X B 4 X B 4 X B 4 X B 4 X B 6 W D Q G D U G ' H Y L D W L R Q R I 5 H W X U Q : H H N 4 X B 4 X B 4 X B 4 X B 4 X B 6 W D Q G D U G ' H Y L D W L R Q R I 5 H W X U Q : H H N V Figure 8: All Cap EU Quintile Portfolio Return Standard Deviation based on 1 and 2 week forwardvolatility predictions.Quintile portfolios of stocks with high predicted volatility have significantly higher volatility thanlow predicted volatility portfolios in our out-of-sample test. The portfolio built with the highestpredicted risk names (Qt 4 portfolio) also exhibits significantly lower returns, both in the MSCI USand All Cap Europe investment universes than the other quintile portfolios as shown in Fig. 9 and10. Extending the findings in existing work on structured ESG rating data [3], integration of ESGnews flow data can positively contribute to Equity portfolio returns.7 X B 4 X B 4 X B 4 X B 4 X B $ Y H U D J H 5 H W X U Q : H H N 4 X B 4 X B 4 X B 4 X B 4 X B í $ Y H U D J H 5 H W X U Q : H H N V Figure 9: MSCI US Quintile Portfolio Return based on 1 and 2-week forward volatility predictions. 4 X B 4 X B 4 X B 4 X B 4 X B í í $ Y H U D J H 5 H W X U Q : H H N 4 X B 4 X B 4 X B 4 X B 4 X B í í $ Y H U D J H 5 H W X U Q : H H N V Figure 10: All Cap EU Quintile Portfolio Return based on 1 and 2-week forward volatility predic-tions.
In this paper, we implement a novel deep learning framework, ESG2Risk, to predict future volatilityof stock prices. We show that a transformer-based language model successfully manages to extractinformation from ESG newsflow to predict future volatility of stock returns. Predictions of volatilityin our model is more accurate when attempting to identify the stocks with the highest volatility riskin the market, hence the worst potential risk contributors to an Equity selection. Our research givesevidence that ESG newsflow does significantly impact future return and risk of companies and isa relevant factor for investors to consider when investing. Our findings in different geographiesconfirm that ESG newsflow integration can contribute to build profitable investment strategies, ontop of improving the ESG profile of an Equity selection.
References [1] Dogu Araci. Finbert: Financial sentiment analysis with pre-trained language models. arXivpreprint arXiv:1908.10063 , 2019.[2] Johannes Beck, Roberta Huang, David Lindner, Tian Guo, Zhang Ce, Dirk Helbing, and NinoAntulov-Fantulin. Sensing social media signals for cryptocurrency news. In
Companion Pro-ceedings of The 2019 World Wide Web Conference , pages 1051–1054, 2019.[3] Indrani De and Michelle R Clayman. The benefits of socially responsible investing: An activemanagers perspective.
The Journal of Investing , 24(4):49–72, 2015.[4] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training ofdeep bidirectional transformers for language understanding. In
Proceedings of the 2019 Con-ference of the North American Chapter of the Association for Computational Linguistics: Hu-man Language Technologies, Volume 1 (Long and Short Papers) , pages 4171–4186, 2019.[5] Tian Guo, Albert Bifet, and Nino Antulov-Fantulin. Bitcoin volatility forecasting with aglimpse into buy and sell orders. In , pages 989–994. IEEE, 2018.[6] Tian Guo, Tao Lin, and Nino Antulov-Fantulin. Exploring interpretable lstm neural networksover multi-variable data. In
International Conference on Machine Learning (ICML) , pages2494–2504, 2019. 87] Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, and Tie-Yan Liu. Listening to chaoticwhispers: A deep learning framework for news-oriented stock trend prediction. In
Proceedingsof the eleventh ACM international conference on web search and data mining , pages 261–269,2018.[8] Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daum´e III. Deep unorderedcomposition rivals syntactic methods for text classification. In
Proceedings of the 53rd AnnualMeeting of the Association for Computational Linguistics and the 7th International Joint Con-ference on Natural Language Processing (Volume 1: Long Papers) , pages 1681–1691, 2015.[9] Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. Simple and scalable pre-dictive uncertainty estimation using deep ensembles. In
Advances in neural information pro-cessing systems , pages 6402–6413, 2017.[10] Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and RaduSoricut. Albert: A lite bert for self-supervised learning of language representations. In
Inter-national Conference on Learning Representations , 2019.[11] Chunyuan Li, Changyou Chen, David Carlson, and Lawrence Carin. Preconditioned stochas-tic gradient langevin dynamics for deep neural networks. In
Thirtieth AAAI Conference onArtificial Intelligence , 2016.[12] Qikai Liu, Xiang Cheng, Sen Su, and Shuguang Zhu. Hierarchical complementary attentionnetwork for predicting stock price movements with news. In
Proceedings of the 27th ACM In-ternational Conference on Information and Knowledge Management , pages 1603–1606, 2018.[13] Yang Liu. Novel volatility forecasting using deep learninglong short term memory recurrentneural networks.
Expert Systems with Applications , 132:99–109, 2019.[14] Wesley J Maddox, Pavel Izmailov, Timur Garipov, Dmitry P Vetrov, and Andrew GordonWilson. A simple baseline for bayesian uncertainty in deep learning. In
Advances in NeuralInformation Processing Systems , pages 13132–13143, 2019.[15] Zolt´an Nagy, Altaf Kassam, and Linda-Eling Lee. Can esg add alpha? an analysis of esg tiltand momentum strategies.
The Journal of Investing , 25(2):113–124, 2016.[16] Remmer Sassen, Anne-Kathrin Hinze, and Inga Hardeck. Impact of esg factors on firm risk ineurope.
Journal of business economics , 86(8):867–904, 2016.[17] Robert P Schumaker and Hsinchun Chen. Textual analysis of stock market prediction usingbreaking financial news: The azfin text system.
ACM Transactions on Information Systems(TOIS) , 27(2):1–19, 2009.[18] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In
Advances in neural informa-tion processing systems , pages 5998–6008, 2017.[19] Bin Weng, Lin Lu, Xing Wang, Fadel M Megahed, and Waldyn Martinez. Predicting short-term stock prices using ensemble methods and online data sources.