Machine Learning Panel Data Regressions with an Application to Nowcasting Price Earnings Ratios
MMachine Learning Panel Data Regressions with anApplication to Nowcasting Price Earnings Ratios
Andrii Babii ∗ Ryan T. Ball † Eric Ghysels ‡ Jonas Striaukas § August 11, 2020
Abstract
This paper introduces structured machine learning regressions for prediction and now-casting with panel data consisting of series sampled at different frequencies. Motivatedby the empirical problem of predicting corporate earnings for a large cross-section offirms with macroeconomic, financial, and news time series sampled at different fre-quencies, we focus on the sparse-group LASSO regularization. This type of regulariza-tion can take advantage of the mixed frequency time series panel data structures andwe find that it empirically outperforms the unstructured machine learning methods.We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSOpanel data estimators recognizing that financial and economic data exhibit heavierthan Gaussian tails. To that end, we leverage on a novel Fuk-Nagaev concentrationinequality for panel data consisting of heavy-tailed τ -mixing processes which may beof independent interest in other high-dimensional panel data settings. Keywords: corporate earnings, nowcasting, high-dimensional panels, mixed frequencydata, text data, sparse-group LASSO, heavy-tailed τ -mixing processes, Fuk-Nagaevinequality. ∗ University of North Carolina at Chapel Hill - Gardner Hall, CB 3305 Chapel Hill, NC 27599-3305. Email: [email protected] † Stephen M. Ross School of Business, University of Michigan, 701 Tappan Street, Ann Arbor,MI 48109. Email: [email protected] ‡ Department of Economics and Kenan-Flagler Business School, University of North Carolina–Chapel Hill. Email: [email protected]. § LIDAM UC Louvain and FRS-FNRS Research Fellow. Email: [email protected]. a r X i v : . [ ec on . E M ] A ug Introduction
The fundamental value of equity shares is determined by the discounted value offuture payoffs. Every quarter investors get a glimpse of a firms’ potential payoffswith the release of corporate earnings reports. In a data-rich environment, stockanalysts have many indicators regarding future cash flows that are available muchmore frequently. Ball and Ghysels (2018) took a first stab at automating the processusing MIDAS regressions. Since their original work, much progress has been madeon machine learning regularized mixed frequency regression models. In the currentpaper, we significantly expand the tools of nowcasting in a data-rich environmentby exploiting panel data structures. Panel data regression models are well suitedfor the firm-level data analysis as both time series and cross-section dimensions canbe properly modeled. In such models, time-invariant firm-specific effects are typ-ically modeled in a flexible way which allows capturing heterogeneity in the data.At the same time, machine learning methods are becoming increasingly popular ineconomics and finance as a flexible way to model relationships between the responseand covariates.In the present paper, we analyze panel data regressions in a high-dimensionalsetting where the number of time-varying covariates can be very large and poten-tially exceed the sample size. This may happen when the number of firm-specificcharacteristics, such as textual analysis news data or firm-level stock returns, is large,and/or the number of aggregates, such as market returns, macro data, etc., is large.In our theoretical treatment, we obtain oracle inequalities for pooled and fixed ef-fects LASSO-type panel data estimators allowing for heavy-tailed τ -mixing data. Torecognize time series data structures, we rely on a more general sparse-group LASSO(sg-LASSO) regularization with dictionaries, which typically improves upon the un-structured LASSO estimator for time series data in small samples. Importantly, ourtheory covers LASSO and group-LASSO estimators as special cases.To recognize that the economic and financial data often have heavier than Gaus-sian tails, our theoretical treatment relies on a new Fuk-Nagaev panel data concen-tration inequality. This allows us to characterize the dependence of the performanceof LASSO-type estimators on N (cross-section) and T (time series), which is espe-cially relevant for modern panel data applications, where both N and T can be large;see Fern´andez-Val and Weidner (2018) for a recent literature review focusing on thelow-dimensional panel data case.Our paper is related to the recent work of Fosten and Greenaway-McGrevy (2019) See Babii, Ghysels, and Striaukas (2020b) for an application of sg-LASSO to ADL-MIDASmodel and US GDP nowcasting. An empirical application to nowcasting firm-specific price/earnings ratios (hence-forth P/E ratio) is provided. We focus on the current quarter nowcasts, hence eval-uating model-based within quarter predictions for very short horizons. It is widelyacknowledged that P/E ratios are a good indicator of the future performance of aparticular company and therefore used by analysts and investment professionals tobase their decisions on which stocks to pick for their investment portfolios. A typicalvalue investor relies on consensus forecasts of earnings made by a pool of analysts.Hence, we naturally benchmark our proposed machine learning methods against suchpredictions. Besides, we compare our methods with a forecast combination approachused by Ball and Ghysels (2018) and a simple random walk (RW).Our high-frequency regressors include traditional macro and financial series aswell as non-standard series generated by the textual analysis. We consider structuredpooled and fixed effects sg-LASSO panel data regressions with mixed frequency data(sg-LASSO-MIDAS). The fixed effects estimator yields sparser models compared topooled regressions with the Revenue growth and the first lag of the dependent vari-able are selected throughout the out-of-sample period. BAA less AAA bond yieldspread, firm-level volatility, and news textual analysis Aggregate Event Sentimentindex are also selected very frequently. Our results show the superior performanceof sg-LASSO-MIDAS over analysts’ predictions, forecast combination method, andfirm-specific time series regression models. Besides, the sg-LASSO-MIDAS regres-sions perform better than unstructured panel data regressions with the elastic netregularization.Regarding the textual news data, it is worth emphasizing that the time seriesof news data is sparse since for many days are without firms-specific news and weimpute zero values. The nice property of our mixed frequency data treatment withdictionaries, imputing zeros also implies that non-zero entries get weights with adecaying pattern for distant past values in comparison to the most recent daily newsdata. As a result, our ML approach is particularly useful to model news data which The panel data regressions with the LASSO penalty is used in microeconometrics since Koenker(2004); see also Lamarche (2010), Kock (2013), Belloni, Chernozhukov, Hansen, and Kozbur (2016),Lu and Su (2016), Kock (2016), Harding and Lamarche (2019), Chiang, Rodrigue, and Sasaki (2019)and Chernozhukov, Hausman, and Newey (2019) among others. The group LASSO is consideredin Su, Shi, and Phillips (2016), Lu and Su (2016), and Farrell (2015) among others.
2s sparse in nature.The paper is organized as follows. Section 2 introduces the models and estimators.Oracle inequalities for sparse group LASSO panel data regressions appear in Section3. Section 4 covers Fuk-Nagaev inequalities for panel data. Results of our empiricalapplication analyzing price earnings ratios for a panel of individual firms are reportedin Section 5. Technical material appears in the Appendix and conclusions appear inSection 6.
In this section we describe briefly the methodological approach, while in Section 3 weprovide more details and the supporting theoretical results. We focus on the pooledand the fixed effects panel regressions with the sparse-group LASSO (sg-LASSO)regularization. The best linear predictor for firm i = 1 , . . . , N in the panel datasetting is α i + x (cid:62) it β, where α i , i = 1 , . . . , N are fixed intercepts. We consider predictive regressions withhomogeneous and heterogeneous entity-specific intercepts. In the pooled regressions, we ignore the cross-sectional heterogneity, assuming that α i = α for all i = 1 , . . . , N and the pooled sg-LASSO estimator is a solution tomin ( a,b ) ∈ R × R p (cid:107) y − aι − X b (cid:107) NT + 2 λ Ω( b ) , where Ω( b ) = γ | b | + (1 − γ ) (cid:107) b (cid:107) , is a penalty function and γ ∈ [0 ,
1] is the relative weight of the LASSO and the groupLASSO penalties.Intuitively, the low- or high-frequency lags of a single covariate define a groupwhich might be a dense signal provided that this covariate is relevant for prediction.The dense signals are not well-captured by the unstructured LASSO estimator ofTibshirani (1996). Indeed, the lags of a single covariate are temporally related,hence, taking this group structure into account might improve upon the predictiveperformance of the unstructured LASSO estimator in small samples; see Section 2.3for more details how the dense time series signal is mapped into the sparse-groupstructure with dictionaries. 3 .2 Fixed effects sg-LASSO
In contrast to the pooled regressions, in the fixed effects regressions we estimatethe heterogeneous slope parameters α i , i = 1 , . . . , N , and use them subsequentlyto construct the best linear predictors. The fixed effects sg-LASSO estimator is asolution to min ( a,b ) ∈ R N + p (cid:107) y − Ba − X b (cid:107) NT + 2 λ Ω( b ) , where B = I N ⊗ ι and Ω is the sparse-group LASSO penalty. Note that we considerthe fixed effects as a dense signal and leave them unpenalized. The sparse groupstructure is defined by low- and high-frequency lags similarly to the pooled regressionsas explained in the following subsection.
Motivated by our empirical application, we allow the high-dimensional set of covari-ates to be sampled at a higher frequency than the outcome variable. Let K be thetotal number of time-varying covariates { x i,t − j/m,k , i ∈ [ N ] , t ∈ [ T ] , j ∈ [ m ] , k ∈ [ K ] } possibly measured at some higher frequency with m observations for every low-frequency period t and consider the following MIDAS panel data regression y it = α i + K (cid:88) k =1 ψ ( L /m ; β k ) x it,k + u it , where ψ ( L /m ; β k ) x it,k = 1 /m (cid:80) mj =1 β j,k x i,t − j/m,k is the high-frequency lag polyno-mial. For m = 1, we retain the standard panel data regression model, while m > x i,t,k are also included. For large m , thereis a proliferation of the total number of the estimated parameters which reduces thefinite-sample predictive performance.The MIDAS literature offers various parametrization of weights, see Ghysels,Santa-Clara, and Valkanov (2006); Ghysels, Sinko, and Valkanov (2006). More re-cently, Babii, Ghysels, and Striaukas (2020b) proposed a new approach based ondictionaries linear in parameters to approximate the MIDAS weight function whichis particularly useful for high-dimensional MIDAS regression models. The sparse-group LASSO allows for the data-driven approximation to the MIDAS weight func-tions from the dictionary promoting sparsity between groups (covariate selection)and within groups (MIDAS weight approximation). The pooled and the fixed effects estimators can be efficiently computed using a variant ofcoordinate descent algorithm proposed by Simon, Friedman, Hastie, and Tibshirani (2013). w L ( s ) = ( w ( s ) , . . . , w L ( s )) (cid:62) , called the dictionary , ω L ( s ; β k ) = ( w L ( s )) (cid:62) β k . Instead of estimating a large number of parameters pertatining to the high-frequencylag polynomial ψ ( L /m ; β k ) x it,k = 1 /m (cid:80) mj =1 β j,k x i,t − j/m,k , we estimate a lower-dimensionalparameter β k in 1 m m (cid:88) j =1 ω L ( j/m ; β k ) x i,t − j/m,k . The attractive feature of the sparse-group LASSO estimator is that it can learn theMIDAS weight function from the dictionary in a nonlinear data-driven way and, atthe same time, selects covariates defined as groups of time series lags. In practice,dictionaries lead to the design matrix X structured appropriately; see Babii, Ghysels,and Striaukas (2020b) for more details on how to construct the design matrix.Note that such weights depend linearly on the parameter β k which allows forthe efficient estimation of the high-dimensional MIDAS panel regression model, cf.,Khalaf, Kichian, Saunders, and Voia (2020) for the low-dimensional non-linear case.A suitable dictionary for our purposes are Legendre polynomials, which are in theclass of orthogonal polynomials and have very good approximating properties. Inpractice, orthogonal polynomials typically outperform non-orthogonal counterparts,e.g. Almon polynomials, or unrestricted lags in small samples; see Babii, Ghysels, andStriaukas (2020b) for further details and Monte Carlo simulation study supportingthis choice.
We consider several approaches to select the tuning parameter λ . First, we adaptthe k -fold cross-validation to the panel data setting. To that end, we resamplethe data by blocks respecting the time-series dimension and creating folds basedon individual firms instead of the pooled sample. We use 5-fold cross-validation asthe sample size of the dataset we consider in our empirical application is relativelysmall. We also consider the following three information criteria: BIC, AIC, andcorrected AIC (AICc). Assuming that y it | x it are i.i.d. draws from N ( α i + x (cid:62) it β, σ ), More precisely, we can approximate any continuous weight function in the L ∞ [0 ,
1] norm, andmore generally, any square-integrable function in the L [0 ,
1] norm, hence, the discontinuous MIDASweights are not ruled-out. L ( α, β, σ ) ∝ − σ N (cid:88) i =1 T (cid:88) t =1 ( y it − α i − x (cid:62) it β ) . Then, for the pooled model, the BIC criterion isBIC = (cid:107) y − ˆ αι − X ˆ β (cid:107) NT ˆ σ + log( N T ) N T × (cid:98) df , where df denotes the degrees of freedom. The degrees of freedom are estimatedas (cid:98) df = | ˆ β | + 1 for the pooled regression and (cid:98) df = | ˆ β | + N for the fixed effectsregression, where | . | is the (cid:96) -norm defined as a number of non-zero coefficients; seeZou, Hastie, and Tibshirani (2007) for more details. The AIC is computed asAIC = (cid:107) y − ˆ αι − X ˆ β (cid:107) NT ˆ σ + 2 N T × (cid:98) df . Lastly, the corrected Akaike information criteria isAICc = (cid:107) y − ˆ αι − X ˆ β (cid:107) NT ˆ σ + 2 (cid:98) dfN T − (cid:98) df − . The AICc might be a better choice when p is large compared to the sample size. Forthe fixed effects regressions, we replace ˆ αι with B ˆ α everywhere and adjust the degreesof freedom as described above. We report results for each of these four choices of thetuning parameters. In this section, we provide the theoretical analysis of the predictive performance ofpooled and fixed effects panel data regressions with the sg-LASSO regularization,including the standard LASSO and the group LASSO regularizations. It is worthstressing that our analysis is not tied to the mixed-frequency data setting and ap-plies to generic high-dimensional panel data regularized with the sg-LASSO penaltyfunction. Importantly, we focus on panels consisting of τ -mixing time series withpolynomial (Pareto-type) tails. 6 .1 Pooled regression The pooled linear projection model is y it = α + x (cid:62) it β + u it , E [ u it z it ] = 0 , i ∈ [ N ] , t ∈ [ T ] , where α ∈ R and β ∈ R p are unknown projection coefficients, z it = (1 , x (cid:62) it ) (cid:62) , and weuse [ J ] to denote the set { , , . . . , J } for arbitrary positive integer J . The vector ofcovariates x it ∈ R p may include the time-varying covariates common for all entities(macroeconomic factors) as well as lags of y it and lags of some baseline covariates. Itis worth stressing that pooled regressions can also potentially accommodate hetero-geneity provided that the data are clustered in a relatively small number of clustersof similar entities.Put y i = ( y i , . . . , y iT ) (cid:62) , x i = ( x i , . . . , x it ) (cid:62) , u i = ( u i , . . . , u iT ) (cid:62) , and let ι ∈ R T be a vector of ones. Then the regression equation after stacking time seriesobservations is y i = αι + x i β + u i , i ∈ [ N ] . Define further y = ( y (cid:62) , . . . , y (cid:62) N ) (cid:62) , X = ( x (cid:62) , . . . , x (cid:62) N ) (cid:62) , and u = ( u (cid:62) , . . . , u (cid:62) N ) (cid:62) .Then the regression equation after stacking all cross-sectional observations is y = αι + X β + u , where ι ∈ R NT is a vector of ones.The pooled sg-LASSO estimator ˆ β solvesmin ( a,b ) ∈ R p (cid:107) y − aι − X b (cid:107) NT + 2 λ Ω( b ) , (1)where (cid:107) z (cid:107) NT = z (cid:62) z/N T for z ∈ R NT , andΩ( b ) = γ | b | + (1 − γ ) (cid:107) b (cid:107) , is the sg-LASSO penalty function. The penalty function Ω interpolates between theLASSO penalty | b | = (cid:80) pj =1 | b j | and the group LASSO penalty (cid:107) b (cid:107) , = (cid:80) G ∈G | b G | ,where G is a partition of [ p ] = { , , . . . , p } and | b G | = ( (cid:80) j ∈ G | b j | ) / is the (cid:96) norm.The parameter γ ∈ [0 ,
1] determines the relative weights of the LASSO and thegroup LASSO penalization, while the amount of regularization is controlled by theregularization parameter λ >
0. Note that the group structure G has to be specifiedby the econometrician, which in our setting is defined by the high-frequency lags of7ifferent covariates. Throughout the paper we assume that groups have fixed size,which is well-justified in our empirical applications. For a random variable ξ , let (cid:107) ξ (cid:107) q = ( E | ξ | q ) /q denote its L q , q ≥ τ -mixing processes, measuring the temporal dependence with τ -mixingcoefficients. The τ -mixing processes can be placed somewhere between the α -mixingprocesses and mixingales – they are less restrictive than α -mixing, yet at the sametime are amenable to coupling similarly to α -mixing processes, which is not the casefor the mixingales; see Dedecker and Doukhan (2003), Dedecker and Prieur (2004),and Dedecker and Prieur (2005) for more details. This allows us to obtain sharpconcentration inequalities in Section 4.For a σ -algebra M and a random vector ξ ∈ R l , the coupling τ coefficient isdefined as τ ( M , ξ ) = sup f ∈ Lip (cid:90) R (cid:107) F f ( ξ ) |M ( x ) − F f ( ξ ) ( x ) (cid:107) d x, where Lip is a set of 1-Lipschitz functions from R l to R , F ζ is the CDF of ζ = f ( ξ )and F ζ |M is the CDF of ζ conditionally on M . For a stochastic process ( ξ t ) t ∈ Z with a natural filtration generated by its past M t = σ ( ξ t , ξ t − , . . . ), the τ -mixingcoefficients are defined as τ k = sup j ≥ max l ∈ [ j ] l sup t + k ≤ t < ···
Suppose that Assumptions 3.1, 3.2, and 3.3 are satisfied. Then withprobability at least − δ − O (cid:16) s ˜ κ p ( NT ) ˜ κ − + p e − cNT/s (cid:17) (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT (cid:46) sλ + (cid:107) m − Z ρ (cid:107) NT and | ˆ α − α | + | ˆ β − β | (cid:46) sλ + λ − (cid:107) m − Z β (cid:107) NT + s / (cid:107) m − Z β (cid:107) NT , for some c > and ˜ κ = (˜ a +1)˜ q − a +˜ q − . The proof of this result can be found in the Appendix. Theorem 3.1 applies topanel data unlike the result of Babii, Ghysels, and Striaukas (2020b). It provides theoracle inequalities describing the prediction and the estimation accuracy in the envi-ronment where the number of regressors p is allowed to scale with the effective samplesize N T . Importantly, the result is stated under the weak tail and mixing conditionsin Assumption 3.1. Parameters κ and ˜ κ are the mixing-tails exponents for stochasticprocesses driving the regression score and the covariance matrix respectively.To describe convergence rates, the following condition considers a simplified set-ting, where the effective sparsity s is constant, the approximation error vanishessufficiently fast, and the total number of regressors scales appropriately with theeffective sample size N T . Assumption 3.4.
Suppose that (i) s = O (1) ; (ii) (cid:107) m − Z β (cid:107) NT = O P ( λ ) ; (iii) p = o (( N T ) ˜ κ − ) . In particular, Assumption 3.4 allows for 1) N → ∞ while T is fixed; 2) T → ∞ while N is fixed; and 3) both N → ∞ and T → ∞ without restricting the relativegrowth of the two. The following result describes the prediction and the estimationconvergence rates in the asymptotic environment outlined in Assumption 3.4 and isan immediate consequence of Theorem 3.1. Corollary 3.1.
Suppose that Assumptions 3.1, 3.2, 3.3, and 3.4 are satisfied. Then (cid:107) Z ( ˆ β − β ) (cid:107) NT = O P (cid:18) p /κ ( N T ) − /κ ∨ log pN T (cid:19) nd | ˆ β − β | = O P (cid:32) p /κ ( N T ) − /κ ∨ (cid:114) log pN T (cid:33) . Note that for large a , the mixing-tails exponent is κ ≈ q . Therefore, for the datathat are close to independent, the prediction accuracy is approximately of order O P (cid:16) p /q ( NT ) − /q ∨ log pNT (cid:17) , which is the rate one would obtain for the i.i.d. data applyingdirectly Fuk and Nagaev (1971), Corollary 4, so in this sense our result is sharp.If the data are sub-Gaussian, then moments of all order q ≥ N T , the first term can be made arbitrarily small relatively tothe second taking large enough q . In this case we recover the O P (cid:0) log pNT (cid:1) rate typicallyobtained for sub-Gaussian data. Therefore, the Fuk-Nagaev inequality provides amore accurate description of the performance of the LASSO-type estimators.If the polynomial tail dominates, then we need p = o (( N T ) κ − ) for the predictionand the estimation consistency provided that ˜ κ ≥ κ −
1. The pooled sg-LASSOestimator is expected to work well whenever the number of regressors p is small rela-tive to ( N T ) κ − . This is a significantly weaker requirement compared to p = o ( T κ − )needed for time series regressions in Babii, Ghysels, and Striaukas (2020b). In par-ticular since κ > p = o (( N T ) κ − ) can be significantly weaker than p = o ( N T )condition needed in the QMLE/GMM framework without regularization. How muchthe sg-LASSO improves upon the (unregularized) QMLE depends on the heavinessof tails and persistence of the underlying stochastic processes as measured by themixing-tails exponent κ . In particular, for light tails and weakly persistent series,the mixing-tails exponent κ is large, offsetting the dependence on p .Lastly, it is worth mentioning that the oracle inequality is driven by the timeseries with the heaviest tail and that it might be possible to obtain sharper resultsallowing for heterogeneous tails at the costs of introducing a heavier notation. Pooled regressions are attractive since the effective sample size
N T can be huge,yet the heterogeneity of individual time series may be lost. If the underlying serieshave substantial heterogeneity over i ∈ [ N ], then taking this into account mightreduce the projection error and improve the predictive accuracy. At a very extremeside, the cross-sectional structure can be completely ignored and individual time-series regressions can be used for prediction. The fixed effects panel data regressions Recall that the Fuk-Nagaev inequality provides sharper description of concentration comparedto the simple Markov’s bound in conjunction with the Rosenthal’s moment inequality. y it = α i + x (cid:62) it β + u it , E [ u it z it ] = 0 , i ∈ [ N ] , t ∈ [ T ] , where z it = (1 , x (cid:62) it ) (cid:62) . Note that the entity-specific intercepts α i are deterministicconstants and the projection model is always well-defined. The fixed effects have tobe estimated to construct the best linear predictor α i + x (cid:62) it β .The fixed effects sg-LASSO estimators ˆ α and ˆ β solvemin ( a,b ) ∈ R N + p (cid:107) y − Ba − X b (cid:107) NT + 2 λ Ω( b ) , where B = I N ⊗ ι , I N is N × N identity matrix, ι ∈ R T is the vector with allcoordinates equal to one, and Ω is the sg-LASSO penalty. It is worth stressing thatthe design matrix X does not include the intercept and that we do not penalizethe fixed effects. This is done because the sparsity over the fixed effects does nothold even in the special case where all intercepts are equal. By Fermat’s rule, thefirst-order conditions are ˆ α = ( B (cid:62) B ) − B (cid:62) ( y − X ˆ β )0 = X (cid:62) M B ( X ˆ β − y ) /N T + λz ∗ for some z ∗ ∈ ∂ Ω( ˆ β ), where b (cid:55)→ ∂ Ω( b ) is the subdifferential of Ω and M B = I − B ( B (cid:62) B ) − B (cid:62) is the orthogonal projection matrix. It is easy to see from the first-order conditions that the estimator of ˆ β is equivalent to: 1) penalized GLS estimatorfor the first-differenced regression; 2) penalized OLS estimator for the regressionwritten in the deviation from time means; and 3) penalized OLS estimator where thefixed effects are partialled-out. Thus, the equivalence between the three approachesis not affected by the penalization, cf., Arellano (2003) for low-dimensional panels.For the fixed effects regression, we defineˆΣ = (cid:32) T B (cid:62) B √ NT B (cid:62) X √ NT X (cid:62) B NT X (cid:62) X (cid:33) and Σ = (cid:32) I N √ NT E (cid:2) B (cid:62) X (cid:3) √ NT E (cid:2) X (cid:62) B (cid:3) E [ x it x (cid:62) it ] (cid:33) . (2)We will assume that the smallest eigenvalue of Σ is uniformly bounded away fromzero by some constant. Note that if x it ∼ N (0 , I p ), then Σ is approximately equal tothe identity matrix for large N .The order of the regularization parameter is governed by the Fuk-Nagaev inequal-ity for long panels in Theorem 4.1, with the only difference that it has to take intoaccount the fact that the fixed effects parameters are estimated.12 ssumption 3.5 (Regularization) . The regularization parameter satisfies λ ∼ (cid:18) p ∨ N κ/ δ ( N T ) κ − (cid:19) /κ ∨ (cid:114) log( p ∨ N/δ ) N T for some δ ∈ (0 , and κ = ( a +1) q − a + q − , where a, q are as in Assumptions 3.1. Similarly to the pooled regressions, we state the oracle inequality allowing forthe approximation error. For fixed effects regressions we redefine Z = ( B, X ), ρ =( α, β (cid:62) ) (cid:62) . Put also r N,T,p = p ( s ∨ N ) ˜ κ T − ˜ κ ( N − ˜ κ/ + pN − ˜ κ ) + p ( p ∨ N ) e − cNT/ ( s ∨ N ) with ˜ κ = (˜ a +1)˜ q − a +˜ q − and some c >
0. Recall also that Σ in Assumption 3.2 is redefinedaccording to Eq. 2, so that Σ is non-singular uniformly over p, N, T . Theorem 3.2.
Suppose that Assumptions 3.1, 3.2, and 3.5 are satisfied. Then withprobability at least − δ − O ( r N,T,p ) (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT (cid:46) ( s ∨ N ) λ + (cid:107) m − Z ρ (cid:107) NT . Theorem 3.2 states the oracle inequalities for the prediction error in the fixedeffects panel data regressions estimated with the sg-LASSO. To see clearly, how theprediction accuracy scales with the sample size, we make the following assumption.
Assumption 3.6.
Suppose that (i) s = O (1) ; (ii) (cid:107) m − Z β (cid:107) NT = O P ( N λ ) ; (iii) ( p + N ˜ κ/ ) pN/T ˜ κ − = o (1) and p ( p ∨ N ) e − cT/N = o (1) . The following corollary is an immediate consequence of Theorem 3.2.
Corollary 3.2.
Suppose that Assumptions 3.1, 3.2, 3.5, and 3.6 are satisfied. Then (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT = O P (cid:18) p /κ ∨ NN − /κ T − /κ ∨ log( p ∨ N ) T (cid:19) . Note that this result allows for p, N, T → ∞ at appropriate rates and that wepay additional price for estimating N fixed effects which plays a similar role to theeffective dimension of covariates. Therefore, in order to achieve accurate prediction,the panel has to be sufficiently long to offset the estimation error of the individualfixed effects. 13 Fuk-Nagaev inequality for panel data
In this section we obtain new Fuk-Nagaev concentration inequality for panel datareflecting the concentration jointly over N and T . It is worth stressing that theinequality does not follow directly from the Fuk-Nagaev inequality of Babii, Ghysels,and Striaukas (2020a) and is of independent interest for the high-dimensional paneldata. Theorem 4.1.
Let { ξ it : i ≥ , t ∈ Z } be an array of centered random vectors in R p such that { ξ i , . . . , ξ iT : i ≥ } are i.i.d. and for each i ≥ , ( ξ it ) t ∈ Z is a stationarystochastic process such that (i) max j ∈ [ p ] (cid:107) ξ it,j (cid:107) q = O (1) for some q > ; (ii) forevery j ∈ [ p ] , τ -mixing coefficients of ( ξ it,j ) t ∈ Z satisfy τ ( j ) k ≤ ck − a , ∀ k ≥ for someuniversal constants c > and a > q − q − . Then for every u > (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 T (cid:88) t =1 ξ it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ > u (cid:33) ≤ c pN T u − κ + 4 pe − c u /NT for some c , c > and κ = ( a +1) q − a + q − . It follows from Theorem 4.1 that there exists
C > δ ∈ (0 , (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N T T (cid:88) t =1 N (cid:88) i =1 ξ it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ ≤ C (cid:18) pδ ( N T ) κ − (cid:19) /κ ∨ (cid:114) log(8 p/δ ) N T (cid:33) ≥ − δ. Note that the inequality reflects the concentration jointly over N and T and that tailsand persistence play an important role through the mixing-tails exponent κ . Theinequality is a key technical tool that allows us to handle panel data with heavierthan Gaussian tails and non-negligible T and N . The proof of this result can befound in the Appendix and is based on the blocking technique, cf., Bosq (1993)combined with the τ -coupling lemma of Dedecker and Prieur (2004).For short panels with small T , the following inequality might be a better choice. Theorem 4.2.
Let { ξ it : i ≥ , t ∈ Z } be an array of centered random vectors in R p such that { ξ i, , . . . , ξ it : i ≥ } are i.i.d. and for each i ≥ , ( ξ it ) t ∈ Z is a stationarystochastic process such that (i) max j ∈ [ p ] (cid:107) ξ it,j (cid:107) q = O (1) for some q > ; (ii) for The direct application of the time series Fuk-Nagaev inequality of Babii, Ghysels, and Striaukas(2020a) leads to inferior concentration results for panel data. very j ∈ [ p ] , τ -mixing coefficients of ( ξ it,j ) t ∈ Z satisfy τ ( j ) k ≤ ck − a , ∀ k ≥ for someuniversal constants c > and a > q − q − . Then for every u > (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 T (cid:88) t =1 ξ it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ > u (cid:33) ≤ c pN u − q + 4 pe − c u /NT for some c , c > . It follows from Theorem 4.2 that there exists
C > δ ∈ (0 , (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N T T (cid:88) t =1 N (cid:88) i =1 ξ it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ ≤ C (cid:16) pδN q − (cid:17) /q ∨ (cid:114) log(8 p/δ ) N T (cid:33) ≥ − δ. The proof of this result can be found in the Appendix and is a straightforward ap-plication of the Fuk-Nagaev inequality for i.i.d. data and the Rosenthal’s momentinequality, in contrast to Theorem 4.1. This inequality does not capture the concen-tration over T and may be a suboptimal choice for long panels which is the case inour empirical application. In our empirical application, we consider nowcasting the P/E ratios of 210 US firmsusing a set of predictors that are sampled at mixed frequencies. We use 24 predictors,including traditional macro and financial series as well as non-standard series gener-ated by the textual analysis. We apply pooled and fixed effects sg-LASSO-MIDASpanel data models and compare them with several benchmarks such as random walk(RW), analysts consensus forecasts, and unstructured elastic net. We also computepredictions using individual-firm high-dimensional time series regressions and pro-vide results for several choices of the tuning parameter. Lastly, we provide resultsfor low-dimensional single-firm MIDAS regressions using forecast combination tech-niques used by Andreou, Ghysels, and Kourtellos (2013) and Ball and Ghysels (2018).The latter is particularly relevant regarding the analysis in the current paper as italso deals with nowcasting price earnings ratios. The forecast combination meth-ods consist of estimating ADL-MIDAS regressions with each of the high-frequencycovariates separately. In our case this leads to 24 predictions, corresponding tothe number of predictors. Then a combination scheme, typically discounted meansquared error type, produces a single nowcast. One could call this a pre-machinelearning large dimensional approach. It will, therefore, be interesting to assess how15his approach compares to the regularized MIDAS panel regression machine learningapproach introduced in the current paper.We start with a short review of the data, with more detailed descriptions andtables available appearing in Appendix Section D, followed by a summary of themethods used and the empirical results obtained.
The full sample consists of observations between 1 st of January, 2000 and 30 th ofJune, 2017. Due to the lagged dependent variables in the models, our effectivesample starts the third fiscal quarter of 2000. We use the first 25 observationsfor the initial sample, and use the remaining 42 observations for evaluating theout-of-sample forecasts, which we obtain by using an expanding window forecastingscheme. We collect the data from CRSP and I/B/E/S to compute quarterly P/Eratios and firm-specific financial covariates; RavenPack is used to compute dailyfirm-level textual-analysis-based data; real-time monthly macroeconomic series areobtained from FRED-MD dataset, see McCracken and Ng (2016) for more details;FRED is used to compute daily financial markets data and, lastly, monthly newsattention series extracted from the Wall Street Journal articles is retrieved fromBybee, Kelly, Manela, and Xiu (2019). Appendix Section D provides a detaileddescription of the data sources. In particular, firm-level variables, including P/Eratios, are described in Appendix Table A.3, and the other predictor variables inAppendix Table A.4. The list of all firms we consider in our analysis appears inAppendix Table A.5.
P/E ratio and analysts’ forecasts sample construction.
Our target variableis the P/E ratio for each individual firm. To compute it, we use CRSP stock pricedata and I/B/E/S earnings data. Earnings data are subject to release delays of 1to 2 months depending on the firm and quarter. Therefore, to reflect the real-timeinformation flow, we separately compute the dependent variable, analysts’ consensusforecasts, and the target variable using stock prices that were available in real-time.We also take into account that different firms have different fiscal quarters, whichalso affects the real-time information flow.For example, suppose for a particular firm the fiscal quarters are at the end ofthe third month in a quarter, i.e. end of March, June, September, and December.Our dependent variable used in regression models is computed by taking the end ofquarter prices and dividing them by the respective earnings value. The consensus th of April. In this case, we record the stock price forthis particular firm on 25 th of April, and divide it by the realized earnings value. To compute forecasts, we estimate several regression models. First, we estimate firm-specific sg-LASSO-MIDAS regressions, which in Table 1 we refer to as
Individual .The model is written as y i = ια i + x i β i + u i , i = 1 , . . . , N, and the firm-specific predictions are computed as ˆ y i,T +1 = ˆ α i + x (cid:62) i,T +1 ˆ β i . As noted inSection 2, x i contains lags of the low-frequency target variable and MIDAS weightsfor each of the high-frequency covariate. We then estimate the following pooled andfixed effects sg-LASSO-MIDAS panel data models y = αι + X β + u Pooled y = Bα + X β + u Fixed Effectsand compute predictions asˆ y i,T +1 = ˆ α + x (cid:62) i,T +1 ˆ β Pooledˆ y i,T +1 = ˆ α i + x (cid:62) i,T +1 ˆ β Fixed Effects . We benchmark firm-specific and panel data regression-based nowcasts against twosimple alternatives. First, we compute forecasts for the RW model asˆ y i,T +1 = y i,T . Second, we consider predictions of P/E implied by analysts earnings nowcasts usingthe information up to time T + 1, i.e.ˆ y i,T +1 = ¯ y i,T +1 , y indicates that the forecasted P/E ratio is based on consensus earnings fore-casts made at the end of T + 1 quarter, and the stock price is also taken at the endof T + 1 . To measure the forecasting performance, we compute the mean squared forecasterrors (MSE) for each method. Let ¯y i = ( y iT is +1 , . . . , y iT os ) (cid:62) represent the out-of-sample realized P/E ratio values, where T is and T os denote the last initial in-sample observation and the last out-of-sample observation respectively, and let ˆy i =(ˆ y iT is +1 , . . . , ˆ y iT os ) collect the out-of-sample forecasts from a specific method. Then,the mean squared forecast errors are computed asMSE = 1 N N (cid:88) i =1 T − T is + 1 ( ¯y i − ˆy i ) (cid:62) ( ¯y i − ˆy i ) . RW MSE An.-mean MSE An.-median sg-LASSO2.331 2.339 2.088 γ = 0 0.2 0.4 0.6 0.8 1Panel A. Cross-validationIndividual 1.545 1.551 1.567 1.594 1.614 1.606Pooled 1.459 1.456 Table 1:
Prediction results – The table reports average over firms MSEs of out-of-samplepredictions. The nowcasting horizon is the current month, i.e. we predict the P/E ratiousing information up to the end of current fiscal quarter. Each Panel A-D block representsdifferent ways of calculating the tuning parameter λ . Bold entries are the best results in ablock. The main results are reported in Table 1, while additional results for unstructuredLASSO estimators and the forecast combination approach appear in Appendix TablesA.1-A.2. First, we document that analysts-based predictions have much larger meansquared forecast errors (MSEs) compared to model-based predictions. The sharpincrease in quality of model- versus analyst-based predictions indicates the usefulnessof machine learning methods to nowcast P/E ratios, see Tables 1 and A.1. A better18erformance is achieved for almost all machine learning methods - single firm orpanel data regressions - and all tuning parameter choices. Unstructured panel datamethods and forecast combination approach also yield more accurate forecasts, seeAppendix Table A.1-A.2. The latter confirms the findings of Ball and Ghysels (2018).Turning to the comparison of model-based predictions, we see from the resultsin Table 1 that sg-LASSO-MIDAS panel data models improve the quality of predic-tions over individual sg-LASSO-MIDAS models irrespective of the γ weight or thetuning parameter choice. This indicates that panel data structures are relevant fornowcasting P/E ratios. We also report similar findings for unstructured estimators.Within the panel data framework, we observe that fixed effects improve over pooledregressions in most cases except when cross-validation is used; compare Table 1-A.2Panel A with Table 1-A.2 Panel B-D. The pooled model tuned by cross-validationseems to yield the best overall performance. In general, one can expect that cross-validation improves prediction performance over different tuning methods as it isdirectly linked to empirical risk minimization. In the case of fixed effects, however,we may lose the predictive gain due to smaller samples with each fold used in esti-mating the model. Lastly, the best results per tuning parameter block seem to beachieved when γ / ∈ { , } , indicating that both sparsity within the group and at thegroup level matters for prediction performance.In Appendix Figure A.1, we plot the sparsity pattern of the selected covariatesfor the two best-performing methods: a) pooled sg-LASSO regressions, tuned usingcross-validation with γ = 0 .
4, and b) fixed effects sg-LASSO model with BIC tuningparameter and the same γ parameter. We also plot the forecast combination weightswhich are averaged over firms. The plots in Figure A.1 reveal that the fixed effectsestimator yields sparser models compared to pooled regressions, and the sparsitypattern is clearer. In the fixed-effects case, the Revenue growth and the first lag ofthe dependent variable are selected throughout the out-of-sample period. BAA lessAAA bond yield spread, firm-level volatility, and Aggregate Event Sentiment indexare also selected very frequently. Similarly, these variables are selected in the pooledregression, but the pattern is less apparent. The forecast combination weights seemto yield similar, yet a more blurred pattern. In this case, Revenue growth and firm-level stock returns covariates obtain relatively larger weights compared to the restof covariates, particularly for the first part of the out-of-sample period. Therefore,the gain of machine learning methods - both single-firm and panel data - can be Note that forecast combination weights start in 2009 Q1 due to the first eight quarters beingused as a pre-sample to estimate weights, see Ball and Ghysels (2018) for further details. Also, theforecast combination weights figure does not contain autoregressive lags; all four lags are alwaysincluded in all forecasting regressions.
To test for the superior forecast performance, we use the Diebold and Mariano (1995)test for the pool of P/E ratio nowcasts. We compare the mean and median consensusforecasts versus panel data machine learning regressions with the smallest forecasterror per tuning parameter block in Table 1. We report the forecast accuracy testresults in Table 2.When testing the full sample of pooled nowcasts, the gain in prediction accuracy isnot significant even though the MSEs are much lower for the panel data sg-LASSOregressions relative to the consensus forecasts. The result may not be surprising,however, as some firms have a large number of outlier observations and the Dieboldand Mariano (1995) test statistic is affected by the inevitably heavy-tailed forecasterrors for such firms. However, when we equally split the pooled sample of nowcastsinto firms with high versus low variance P/E ratios, the gain in forecast accuracyis (not) significant for all panel data machine learning regressions for (high) lowvariance P/E firms.
This paper introduces a new class of high-dimensional panel data regression modelswith dictionaries and sparse-group LASSO regularization. This type of regularizationis an especially attractive choice for the predictive panel data regressions, where thelow- and/or the high-frequency lags define a clear group structure, and dictionariesare used to aggregate time series lags. The estimator nests the LASSO and thegroup LASSO estimators as special cases, as discussed in our theoretical analysis.20 ull sample Large variance Low variancePooled (Cross-validation) vs An.-mean 0.852 0.567 2.300Pooled (Cross-validation) vs An.-median 0.694 0.386 2.190Fixed-effects (BIC) vs An.-mean 0.793 0.508 2.312Fixed-effects (BIC) vs An.-median 0.628 0.319 2.202Fixed-effects (AIC) vs An.-mean 0.825 0.540 2.312Fixed-effects (AIC) vs An.-median 0.663 0.355 2.202Fixed-effects (AICc) vs An.-mean 0.825 0.540 2.312Fixed-effects (AICc) vs An.-median 0.663 0.355 2.202
Table 2:
Forecasting performance significance – The table reports the Diebold and Mariano(1995) test statistic for pooled nowcasts comparing machine learning panel data regressionswith analysts’ implied consensus forecasts, where An.-mean and An.-median denote meanand median consensus forecasts respectively. We compare panel models that have thesmallest forecast error per tuning parameter block in Table 1.
Our theoretical treatment allows for the heavy-tailed data frequently encounteredin time series and financial econometrics. To that end, we obtain a new panel dataconcentration inequality of the Fuk-Nagaev type for τ -mixing processes.Our empirical analysis sheds light on the advantage of the regularized panel dataregressions for nowcasting corporate earnings. We focus on nowcasting the P/Eratio of 210 US firms and find that the regularized panel data regressions outperformseveral benchmarks, including the analysts’ predictions. Furthermore, we find thatthe regularized machine learning regressions outperform the forecast combinationsand that the panel data approach improves upon the predictive time series regressionsfor individual firms.While nowcasting earnings is a leading example of applying panel data MIDASmachine learning regressions, one can think of many other applications of interestin finance. Beyond earnings, analysts are also interested in sales, dividends, etc.Our analysis can also be useful for other areas of interest, such as regional andinternational panel data settings. 21 eferences Andreou, E., E. Ghysels, and
A. Kourtellos (2013): “Should macroeconomicforecasters use daily financial data and how?,”
Journal of Business and EconomicStatistics , 31(2), 240–251.
Arellano, M. (2003):
Panel data econometrics . Oxford University Press.
Babii, A. (2020): “High-dimensional mixed-frequency IV regression,” arXivpreprint arXiv:2003.13478 . Babii, A., E. Ghysels, and
J. Striaukas (2020a): “Inference for high-dimensional regressions with heteroskedasticity and autocorrelation,” arXivpreprint arXiv:1912.06307 .(2020b): “Machine learning time series regressions with an application tonowcasting,” arXiv preprint arXiv:2005.14057 . Ball, R. T., and
E. Ghysels (2018): “Automated earnings forecasts: beat ana-lysts or combine and conquer?,”
Management Science , 64(10), 4936–4952.
Belloni, A., V. Chernozhukov, C. Hansen, and
D. Kozbur (2016): “Infer-ence in high-dimensional panel models with an application to gun control,”
Journalof Business and Economic Statistics , 34(4), 590–605.
Bosq, D. (1993): “Bernstein-type large deviations inequalities for partial sums ofstrong mixing processes,”
Statistics , 24(1), 59–70.
Bybee, L., B. T. Kelly, A. Manela, and
D. Xiu (2019): “The structure ofeconomic news,” Available at SSRN 3446225.
Chernozhukov, V., J. A. Hausman, and
W. K. Newey (2019): “Demandanalysis with many prices,” National Bureau of Economic Research Discussionpaper 26424.
Chetverikov, D., Z. Liao, and
V. Chernozhukov (2020): “On cross-validatedLasso,”
Annals of Statistics (forthcoming) . Chiang, H. D., J. Rodrigue, and
Y. Sasaki (2019): “Post-selection inferencein three-dimensional panel data,” arXiv preprint arXiv:1904.00211 . Dedecker, J., and
P. Doukhan (2003): “A new covariance inequality and ap-plications,”
Stochastic Processes and their Applications , 106(1), 63–80.22 edecker, J., and
C. Prieur (2004): “Coupling for τ -dependent sequences andapplications,” Journal of Theoretical Probability , 17(4), 861–885.(2005): “New dependence coefficients. Examples and applications to statis-tics,”
Probability Theory and Related Fields , 132(2), 203–236.
Diebold, F. X., and
R. S. Mariano (1995): “Comparing predictive accuracy,”
Journal of Business and Economic Statistics , 13(3), 253–263.
Farrell, M. H. (2015): “Robust inference on average treatment effects with pos-sibly more covariates than observations,”
Journal of Econometrics , 189(1), 1–23.
Fern´andez-Val, I., and
M. Weidner (2018): “Fixed effects estimation of large-T panel data models,”
Annual Review of Economics , 10, 109–138.
Fosten, J., and
R. Greenaway-McGrevy (2019): “Panel data nowcasting,”
Available at SSRN 3435691 . Fuk, D. K., and
S. V. Nagaev (1971): “Probability inequalities for sums ofindependent random variables,”
Theory of Probability and Its Applications , 16(4),643–660.
Ghysels, E., P. Santa-Clara, and
R. Valkanov (2006): “Predicting volatility:getting the most out of return data sampled at different frequencies,”
Journal ofEconometrics , 131(1–2), 59–95.
Ghysels, E., A. Sinko, and
R. Valkanov (2006): “MIDAS regressions: Furtherresults and new directions,”
Econometric Reviews , 26(1), 53–90.
Harding, M., and
C. Lamarche (2019): “A panel quantile approach to attritionbias in Big Data: Evidence from a randomized experiment,”
Journal of Econo-metrics , 211(1), 61–82.
Khalaf, L., M. Kichian, C. J. Saunders, and
M. Voia (2020): “Dynamic pan-els with MIDAS covariates: Nonlinearity, estimation and fit,”
Journal of Econo-metrics (forthcoming).
Kock, A. B. (2013): “Oracle efficient variable selection in random and fixed effectspanel data models,”
Econometric Theory , 29(1), 115–152.(2016): “Oracle inequalities, variable selection and uniform inference inhigh-dimensional correlated random effects panel data models,”
Journal of Econo-metrics , 195(1), 71–85. 23 oenker, R. (2004): “Quantile regression for longitudinal data,”
Journal of Mul-tivariate Analysis , 91(1), 74–89.
Kolanovic, M., and
R. Krishnamachari (2017): “Big data and AI strategies:Machine learning and alternative data approach to investing,” JP Morgan GlobalQuantitative & Derivatives Strategy Report.
Lamarche, C. (2010): “Robust penalized quantile regression estimation for paneldata,”
Journal of Econometrics , 157(2), 396–408.
Lu, X., and
L. Su (2016): “Shrinkage estimation of dynamic panel data modelswith interactive fixed effects,”
Journal of Econometrics , 190(1), 148–175.
McCracken, M. W., and
S. Ng (2016): “FRED-MD: A monthly database formacroeconomic research,”
Journal of Business and Economic Statistics , 34(4),574–589.
Simon, N., J. Friedman, T. Hastie, and
R. Tibshirani (2013): “A sparse-group LASSO,”
Journal of Computational and Graphical Statistics , 22(2), 231–245.
Su, L., Z. Shi, and
P. C. Phillips (2016): “Identifying latent structures in paneldata,”
Econometrica , 84(6), 2215–2264.
Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,”
Journalof the Royal Statistical Society, Series B (Methodological) , 58(1), 267–288.
Zou, H., T. Hastie, and
R. Tibshirani (2007): “On the degrees of freedom ofthe lasso,”
Annals of Statistics , 35(5), 2173–2192.24
PPENDIX
A Proofs of oracle inequalities
Proof of Theorem 3.1.
The proof is similar to the proof of Babii, Ghysels, and Stri-aukas (2020b), Theorem 3.1 and is omitted. The main difference in the proof is thatinstead of applying the Fuk-Nagaev inequality from Babii, Ghysels, and Striaukas(2020a), Theorem 3.1, we apply the concentration inequality from Theorem 4.1 to (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N T N (cid:88) i =1 T (cid:88) t =1 u it z it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ and max j,k ∈ [ p ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N T N (cid:88) i =1 T (cid:88) t =1 z it,j z it,k − Σ j,k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) under Assumptions 3.1, 3.2, and 3.3. Proof of Theorem 3.2.
Put r = ( a (cid:62) , b (cid:62) ) (cid:62) . Then we solvemin r ∈ R N + p (cid:107) y − Z r (cid:107) NT + 2 λ Ω( b ) . By Fermat’s rule the solution to this problem satisfies Z (cid:62) ( Z ˆ ρ − y ) /N T + λz ∗ = 0 N + p for some z ∗ = (cid:0) N z ∗ b (cid:1) , where 0 N is N -dimensional vector of zeros, z ∗ b ∈ ∂ Ω( ˆ β ), ˆ ρ =( ˆ α (cid:62) , ˆ β (cid:62) ) (cid:62) , and ∂ Ω( ˆ β ) is the sub-differential of b (cid:55)→ Ω( b ) at ˆ β . Taking the innerproduct with ρ − ˆ ρ (cid:104) Z (cid:62) ( y − Z ˆ ρ ) , ρ − ˆ ρ (cid:105) NT = λ (cid:104) z ∗ , ρ − ˆ ρ (cid:105) = λ (cid:104) z ∗ b , β − ˆ β (cid:105)≤ λ (cid:110) Ω( β ) − Ω( ˆ β ) (cid:111) , where the last line follows from the definition of the sub-differential. RearrangingAppendix - 1his inequality and using y = m + u (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT − λ (cid:110) Ω( β ) − Ω( ˆ β ) (cid:111) ≤(cid:104) Z (cid:62) u , ˆ ρ − ρ (cid:105) NT + (cid:104) Z ( m − Z ρ ) , ˆ ρ − ρ (cid:105) NT = (cid:104) B (cid:62) u , ˆ α − α (cid:105) NT + (cid:104) X (cid:62) u , ˆ β − β (cid:105) NT + (cid:104) Z ( m − Z ρ ) , ˆ ρ − ρ (cid:105) NT ≤| B (cid:62) u /N T | ∞ | ˆ α − α | + Ω ∗ ( X (cid:62) u /N T )Ω( ˆ β − β )+ (cid:107) m − Z ρ (cid:107) NT (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT ≤| B (cid:62) u / √ N T | ∞ ∨ Ω ∗ ( X (cid:62) u /N T ) (cid:110) | ˆ α − α | / √ N + Ω( ˆ β − β ) (cid:111) + (cid:107) m − Z ρ (cid:107) NT (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT , (A.1)where the second line follows by the dual norm inequality and the Cauchy-Schwartzinequality, and Ω ∗ is the dual norm of Ω. By Babii, Ghysels, and Striaukas (2020b),Lemma A.2.1. | B (cid:62) u / √ N T | ∞ ∨ Ω ∗ ( X (cid:62) u /N T ) ≤ C max {| X (cid:62) u /N T | ∞ , | B (cid:62) u / √ N T | ∞ } = max (cid:40)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N T N (cid:88) i =1 T (cid:88) t =1 u it x it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ , max i ∈ [ N ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ N T T (cid:88) t =1 u it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:41) for some C >
0, where the first inequality follows since max G ∈G | G | (cid:46)
1. UnderAssumption 3.1 by Theorem 4.1 and Babii, Ghysels, and Striaukas (2020a), Theorem3.1 and Lemma A.1.1. for every u > (cid:0) | B (cid:62) u /N T | ∞ ∨ Ω ∗ ( X (cid:62) u /N T ) > u (cid:1) ≤ Pr (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) CN T N (cid:88) i =1 T (cid:88) t =1 u it x it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ > u (cid:33) + Pr (cid:32) max i ∈ [ N ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) C √ N T T (cid:88) t =1 u it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > u (cid:33) (cid:46) p ( N T ) − κ u − κ + pe − c u NT + N − κ/ T − κ u − κ + 4 N e − c u NT (cid:46) ( pN − κ ∨ N − κ/ ) T − κ u − κ + ( p ∨ N ) e − c u /NT for some c , c , C >
0. Therefore, under Assumption 3.5 with probability at least1 − δ | B (cid:62) u /N T | ∞ ∨ Ω ∗ ( X (cid:62) u /N T ) (cid:46) (cid:18) ( pN − κ ) ∨ N − κ/ δT κ − (cid:19) /κ ∨ (cid:114) log( p ∨ N/δ ) N T (cid:46) λ. Appendix - 2n conjunction with the inequality in Eq. A.1, this gives (cid:107) Z ∆ (cid:107) NT ≤ c − λ (cid:110) | ˆ α − α | / √ N + Ω( ˆ β − β ) (cid:111) + (cid:107) m − Z ρ (cid:107) NT (cid:107) Z ∆ (cid:107) NT + λ (cid:110) Ω( β ) − Ω( ˆ β ) (cid:111) ≤ ( c − + 1) λ (cid:110) | ˆ α − α | / √ N + Ω( ˆ β − β ) (cid:111) + (cid:107) m − Z ρ (cid:107) NT (cid:107) Z ∆ (cid:107) NT (A.2)for some c > ρ − ρ , where the second line follows by the triangle inequality.Note that the sg-LASSO penalty function can be decomposed as a sum of two semi-norms Ω( b ) = Ω ( b ) + Ω ( b ) , ∀ b ∈ R p and that we have Ω ( β ) = 0 and Ω ( ˆ β ) =Ω ( ˆ β − β ). Then Ω( β ) − Ω( ˆ β ) = Ω ( β ) − Ω ( ˆ β ) − Ω ( ˆ β ) ≤ Ω ( ˆ β − β ) − Ω ( ˆ β − β ) . (A.3)Suppose that (cid:107) m − Z ρ (cid:107) NT ≤ (cid:107) Z ∆ (cid:107) NT . Then it follows from the first inequalityin Eq. A.2 and Eq. A.3 that (cid:107) Z ∆ (cid:107) NT ≤ c − λ (cid:110) | ˆ α − α | / √ N + Ω( ˆ β − β ) (cid:111) + 2 λ (cid:110) Ω ( ˆ β − β ) − Ω ( ˆ β − β ) (cid:111) . Since the left side of this equation is ≥
0, this shows that(1 − c − )Ω ( ˆ β − β ) ≤ (1 + c − )Ω ( ˆ β − β ) + c − | ˆ α − α | / √ N or equivalentlyΩ ( ˆ β − β ) ≤ c + 1 c − ( ˆ β − β ) + ( c − − | ˆ α − α | / √ N . (A.4)Appendix - 3ut ∆ N = (( ˆ α − α ) (cid:62) / √ N , ( ˆ β − β ) (cid:62) ) (cid:62) . Then under Assumption 3.2 | ∆ N | (cid:46) Ω( ˆ β − β ) + | ˆ α − α | / √ N ≤ cc − ( ˆ β − β ) + cc − | ˆ α − α | / √ N (cid:46) | ˆ α − α | + √ s | ˆ β − β | ≤ (cid:113) s ∨ N | ∆ N | (cid:46) (cid:113) s ∨ N | Σ / ∆ N | = (cid:114) s ∨ N (cid:110) (cid:107) Z ∆ (cid:107) NT + ∆ (cid:62) N ( ˆΣ − Σ)∆ N (cid:111) ≤ (cid:114) s ∨ N (cid:110) (cid:107) Z ∆ (cid:107) NT + | ∆ N | | vech( ˆΣ − Σ) | ∞ (cid:111) (cid:46) (cid:114) s ∨ N (cid:110) λ | ∆ N | + | ∆ N | | vech( ˆΣ − Σ) | ∞ (cid:111) . Consider the following event E = {| vech( ˆΣ − Σ) | ∞ < / (2 s ∨ N ) } . Under Assump-tion 3.1 by Theorem 4.1 and Babii, Ghysels, and Striaukas (2020a), Theorem 3.1Pr( E c ) ≤ Pr (cid:32) max i ∈ [ N ] ,j ∈ [ p ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ N T T (cid:88) t =1 { x it,j − E [ x it,j ] } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ s ∨ N (cid:33) + Pr (cid:32) max ≤ j ≤ k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N T N (cid:88) i =1 T (cid:88) t =1 x it,j x it,k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ s ∨ N (cid:33) (cid:46) p ( s ∨ N ) ˜ κ T − ˜ κ ( N − ˜ κ/ + pN − ˜ κ ) + p ( p ∨ N ) e − cNT/ ( s ∨ N ) . Therefore, on the event E | ˆ α − α | / √ N + | ˆ β − β | = | ∆ N | (cid:46) ( s ∨ N ) λ, and whence from Eq. A.2 we obtain (cid:107) Z ∆ (cid:107) NT (cid:46) λ (cid:110) | ˆ α − α | / √ N + Ω( ˆ β − β ) (cid:111) (cid:46) λ | ∆ N | ≤ ( s ∨ N ) λ . Suppose now that (cid:107) m − Z ρ (cid:107) NT > (cid:107) Z ∆ (cid:107) NT . Then, obviously, (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT ≤ (cid:107) m − Z ρ (cid:107) NT . Appendix - 4herefore, on the event E , we always have (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT (cid:46) ( s ∨ N ) λ + 4 (cid:107) m − Z ρ (cid:107) NT , which proves the statement of the theorem. B Proofs of Fuk-Nagaev inequalities
Proof of Theorem 4.1.
Suppose first that p = 1. For a ∈ R with some abuse ofnotation, let [ a ] denote its integer part. For each i = 1 , . . . , N , split partial sumsinto blocks with at most J ∈ N summands V i,k = ξ i, ( k − J +1 + · · · + ξ i,kJ , k = 1 , , . . . , [ T /J ] V i, [ T/J ]+1 = ξ i, [ T/J ] J +1 + · · · + ξ i,T , where we set V i, [ T/J ]+1 = 0 if [
T /J ] J = T . Let { U i,t : i, t ≥ } be i.i.d. randomvariables uniformly distributed on [0 ,
1] and independent of { ξ i,t : i, t ≥ } . Put M i,t = σ ( V i, , . . . , V i,t − ) with t ≥
3. For t = 1 ,
2, set V ∗ i,t = V i,t , while for t ≥
3, byDedecker and Prieur (2004), Lemma 5, there exist random variables V ∗ i,t = d V i,t suchthat1. V ∗ i,t is M i,t ∨ σ ( V i,t ) ∨ σ ( U i,t )-measurable.2. V ∗ i,t is independent of M i,t .3. (cid:107) V i,t − V ∗ i,t (cid:107) = τ ( M i,t , V i,t ).By 1. there exists a measurable function f i such that V ∗ i,t = f i ( V i,t , V i,t − , . . . , V i, , U i,t ) . Therefore, by 2., ( V ∗ i, t ) t ≥ and ( V ∗ i, t − ) t ≥ are sequences of independent randomvariables for every i = 1 , . . . , N . Moreover, { V ∗ i, t : i = 1 , . . . , N, t ≥ } and { V ∗ i, t − : i = 1 , . . . , N, t ≥ } are sequences of independent random variables since { ξ i,t : t =1 , . . . , T } are independent over i = 1 , . . . , N .Decompose (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 T (cid:88) t =1 ξ i,t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 (cid:88) t ≥ V ∗ i, t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 (cid:88) t ≥ V ∗ i, t − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + N (cid:88) i =1 [ T/J ]+1 (cid:88) t =3 (cid:12)(cid:12) V i,t − V ∗ i,t (cid:12)(cid:12) (cid:44) I + II + III.
Appendix - 5y Fuk and Nagaev (1971), Corollary 4 there exist constants c , c > I > u/ ≤ c u − q N (cid:88) t ≥ E | V ∗ i, t | q + 2 exp (cid:32) − c u N (cid:80) t ≥ Var( V ∗ i, t ) (cid:33) ≤ c u − q N (cid:88) t ≥ E | V i, t | q + 2 exp (cid:18) − c u N T (cid:19) , where we use V ∗ i,t = d V i,t and (cid:88) t ≥ Var( V i, t ) ≤ (cid:88) t ≥ Var( V i,t ) = O ( T ) , which follows by Babii, Ghysels, and Striaukas (2020a), Lemma A.1.2. Similarly,Pr( II > u/ ≤ c u − q N (cid:88) t ≥ E | V i, t | q + 2 exp (cid:18) − c u N T (cid:19) for some constants c , c >
0. Lastly, since M i,t and V i,t are separated by J + 1 lagsof ξ i,t , we have τ ( M i,t , V i,t ) ≤ J τ J ( J + 1). By Markov’s inequality and property 3.,this gives Pr( III > u/ ≤ Nu [ T/J ]+1 (cid:88) t =3 (cid:107) V i,t − V ∗ i,t (cid:107) ≤ N Tu τ J +1 . Combining all estimates togetherPr (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 T (cid:88) t =1 ξ i,t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > u (cid:33) ≤ Pr(
I > u/
3) + Pr(
II > u/
3) + Pr(
III > u/ ≤ c u − q N (cid:88) t ≥ (cid:107) V i,t (cid:107) qq + 4 e − c u /NT + 3 N Tu τ J +1 ≤ c u − q J q − N T (cid:107) ξ i,t (cid:107) qq + 3 N Tu ( J + 1) − a + 4 e − c u /NT for some constants c , c >
0. To balance the first two terms, we shall choose thelength of blocks J ∼ u q − q + a − , in which case we getPr (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 T (cid:88) t =1 ξ i,t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > u (cid:33) ≤ c N T u − κ + 4 e − c u /NT Appendix - 6or some c , c > p >
1, the result follows by the union bound.
Proof of Theorem 4.2.
Put M q,N,T (cid:44) max j ∈ [ p ] max i ∈ [ N ] E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 ξ i,t,j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) q and B N,T (cid:44) max j ∈ [ p ] N (cid:88) i =1 Var (cid:32) T (cid:88) t =1 ξ i,t,j (cid:33) . By Jensen’s inequality under the stationarity and the i.i.d. hypotheses M q,N,T ≤ max j ∈ [ p ] T q E | ξ i,t,j | q (cid:46) T q , where the last inequality follows under assumption (i). Similarly, B N,T ≤ N max j ∈ [ p ] T (cid:88) t =1 T (cid:88) k =1 | Cov ( ξ ,t,j , ξ ,k,j ) | (cid:46) N T, where the last inequality follows from the computations in Theorem 4.1 under as-sumptions (i)-(ii).Using these estimates, by the union bound and Fuk and Nagaev (1971), Corollary4, for every u > (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 N (cid:88) i =1 ξ i,t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ > u (cid:33) ≤ c M q,N,T pN u − q + 2 p exp (cid:32) − c u B N,T (cid:33) ≤ c pN T q u − q + 2 e − c u /NT for some constants c j > , j ∈ [4]. Therefore, there exists C > (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 N (cid:88) i =1 ξ i,t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ ≤ C ( pN T q /δ ) /q ∨ (cid:112) N T log(4 p/δ ) (cid:33) ≥ − δ. C Additional empirical results
Appendix - 7 a r D ec S e p J un D ec S e p J un lag-1lag-2lag-3lag-4ESSAESAEVCSSNIPfirm-returnfirm-volaVXOOil price10Y minus 3-monthTEDRATE3-month T-BillAAA minus 10YBAA minus 10YBAA minus AAASNP500EarningsEarnings forecastsEarnings lossesRecessionRevenue growthRevised estimateIndust. ProductionCPI Inflation Quarters V a r i a b l e s (a) Pooled sg-LASSO, γ =0 .
4, cross-validation. M a r D ec S e p J un D ec S e p J un lag-1lag-2lag-3lag-4ESSAESAEVCSSNIPfirm-returnfirm-volaVXOOil price10Y minus 3-monthTEDRATE3-month T-BillAAA minus 10YBAA minus 10YBAA minus AAASNP500EarningsEarnings forecastsEarnings lossesRecessionRevenue growthRevised estimateIndust. ProductionCPI Inflation Quarters V a r i a b l e s (b) Fixed effects sg-LASSO, γ = 0 .
4, BIC. M a r J un D ec J un S e p D ec J un lag-1lag-2lag-3lag-4ESSAESAEVCSSNIPfirm-returnfirm-volaVXOOil price10Y minus 3-monthTEDRATE3-month T-BillAAA minus 10YBAA minus 10YBAA minus AAASNP500EarningsEarnings forecastsEarnings lossesRecessionRevenue growthRevised estimateIndust. ProductionCPI Inflation Quarters V a r i a b l e s (c) Average forecast combi-nation weights. Figure A.1: Sparsity patterns and forecast combination weights.Appendix - 8
W MSE An.-mean MSE An.-median sg-LASSO elnet-U elnet2.331 2.339 2.088 Panel A. Cross-validationIndividual 1.545 1.610 1.609Pooled
Table A.1:
Prediction results – The table reports average over firms MSEs of out-of-sample predictions. The nowcasting horizon is the current month, i.e. we predict theP/E ratio using information up to the end of current fiscal quarter. Each Panel A-Dblock represents different ways of calculating the tuning parameter λ . Bold entries are thebest results in a block. We report elastic net MSEs averaged over LASSO/ridge weight[0 , . , . , . , . , D Data description
D.1 Firm-level data
The full list of firm-level data is provided in Table A.3. We also add two dailyfirm-specific stock market predictor variables: stock returns and a realized variancemeasure, which is defined as the rolling sample variance over the previous 60 days(i.e. 60-day historical volatility).
D.1.1 Firm sample selection
We select a sample of firms based on data availability. First, we remove all firmsfrom I/B/E/S which have missing values in earnings time series. Next, we retainfirms that we are able to match with CRSP dataset. Finally, we keep firms that wecan match with the RavenPack dataset.Appendix - 9
W MSE An.-mean MSE An.-median F.Comb sg-LASSO2.794 2.836 2.539 2.405 γ = 0 0.2 0.4 0.6 0.8 1Panel A. Cross-validationIndividual 1.808 1.817 1.836 1.864 1.889 1.884Pooled 1.692 1.689 Table A.2:
Prediction results – The table reports average over firms MSEs of out-of-samplepredictions for the same models as in Table 1 - discarding the first 8 quarters to computefor forecast combination weights - with additional result of prediction errors using forecastcombination approach of Ball and Ghysels (2018), denoted as
F.Comb . Hence the out-of-sample quarters start at 2009 Q1. The nowcasting horizon is the current month, i.e. wepredict the P/E ratio using information up to the end of current fiscal quarter. Each PanelA-D block represents different ways of calculating the tuning parameter λ . Bold entriesare the best results in a block. Appendix - 10 .1.2 Firm-specific text data
We create a link table of RavenPack ID and PERMNO identifiers which enablesus to merge I/B/E/S and CRSP data with firm-specific textual analysis generateddata from RavenPack. The latter is a rich dataset that contains intra-daily newsinformation about firms. There are several editions of the dataset; in our analysis,we use the Dow Jones (DJ) and Press Release (PR) editions. The former containsrelevant information from Dow Jones Newswires, regional editions of the Wall StreetJournal, Barron’s and MarketWatch. The PR edition contains news data, obtainedfrom various press releases and regulatory disclosures, on a daily basis from a varietyof newswires and press release distribution networks, including exclusive content fromPRNewswire, Canadian News Wire, Regulatory News Service, and others. The DJedition sample starts at 1 st of January, 2000, and PR edition data starts at 17 th ofJanuary, 2004.We construct our news-based firm-level covariates by filtering only highly relevantnews stories. More precisely, for each firm and each day, we filter out news that hasthe Relevance Score (REL) larger or equal to 75, as is suggested by the RavenPackNews Analytics guide and used by practitioners, see for example Kolanovic andKrishnamachari (2017). REL is a score between 0 and 100 which indicates howstrongly a news story is linked with a particular firm. A score of zero means that theentity is vaguely mentioned in the news story, while 100 means the opposite. A scoreof 75 is regarded as a significantly relevant news story. After applying the REL filter,we apply a novelty of the news filter by using the
Event Novelty Score (ENS); wekeep data entries that have a score of 100. Like REL, ENS is a score between 0 and100. It indicates the novelty of a news story within a 24-hour time window. A scoreof 100 means that a news story was not already covered by earlier announced news,while subsequently published news story score on a related event is discounted, andtherefore its scores are less than 100. Therefore, with this filter, we consider onlynovel news stories. We focus on five sentiment indices that are available in both DJand PR editions. They are:
Event Sentiment Score (ESS), for a given firm, represents the strength of thenews measured using surveys of financial expert ratings for firm-specific events. Thescore value ranges between 0 and 100 - values above (below) 50 classify the news asbeing positive (negative), 50 being neutral.
Aggregate Event Sentiment (AES) represents the ratio of positive events re-ported on a firm compared to the total count of events measured over a rollingAppendix - 111-day window in a particular news edition (DJ or PR). An event with ESS >
50 iscounted as a positive entry while ESS <
50 as negative. Neutral news (ESS = 50) andnews that does not receive an ESS score does not enter into the AES computation.As ESS, the score values are between 0 and 100.
Aggregate Event Volume (AEV) represents the count of events for a firm overthe last 91 days within a certain edition. As in AES case, news that receives anon-neutral ESS score is counted and therefore accumulates positive and negativenews.
Composite Sentiment Score (CSS) represents the news sentiment of a givennews story by combining various sentiment analysis techniques. The direction of thescore is determined by looking at emotionally charged words and phrases and bymatching stories typically rated by experts as having short-term positive or negativeshare price impact. The strength of the scores is determined by intra-day price reac-tions modeled empirically using tick data from approximately 100 large-cap stocks.As for ESS and AES, the score takes values between 0 and 100, 50 being the neutral.
News Impact Projections (NEP) represents the degree of impact a news flashhas on the market over the following two-hour period. The algorithm produces scoresto accurately predict a relative volatility - defined as scaled volatility by the averageof volatilities of large-cap firms used in the test set - of each stock price measuredwithin two hours following the news. Tick data is used to train the algorithm andproduce scores, which take values between 0 and 100, 50 representing zero impactnews.For each firm and each day with firm-specific news, we compute the averagevalue of the specific sentiment score. In this way, we aggregate across editions andgroups, where the later is defined as a collection of related news. We then mapthe indices that take values between 0 and 100 onto [ − , x i ∈{ ESS , AES , CSS , NIP } be the average score value for a particular day and firm. Wemap x i (cid:55)→ ¯ x i ∈ [ − ,
1] by computing ¯ x i = ( x i − / . Appendix - 12 d Frequency Source T-codePanel A.- Price/Earnings ratio quarterly CRSP & I/B/E/S 1- Price/Earnings ratio consensus forecasts quarterly CRSP & I/B/E/S 1Panel B.1 Stock returns daily CRSP 12 Realized variance measure daily CRSP/computations 1Panel C.1 Event Sentiment Score (ESS) daily RavenPack 12 Aggregate Event Sentiment (AES) daily RavenPack 13 Aggregate Event Volume (AEV) daily RavenPack 14 Composite Sentiment Score (CSS) daily RavenPack 15 News Impact Projections (NIP) daily RavenPack 1
Table A.3:
Firm-level data description table – The id column gives mnemonics according to datasource, which is given in the second column Source . The column frequency states the samplingfrequency of the variable. The column
T-code denotes the data transformation applied to a time-series, which are: (1) not transformed, (2) ∆ x t , (3) ∆ x t , (4) log( x t ), (5) ∆ log ( x t ), (6) ∆ log( x t ). Panel A. describes earnings data, panel B. describes quarterly firm-level accouting data, panelC. daily firm-level stock market data and panel D. daily firm-level sentiment data series. id Frequency Source T-codePanel A.1 Industrial Production Index monthly FRED-MD 52 CPI Inflation monthly FRED-MD 6Panel B.1 Crude Oil Prices daily FRED 62 S&P 500 daily CRSP 53 VXO Volatility Index daily FRED 14 Moodys Aaa - 10-Year Treasury daily FRED 15 Moodys Baa - 10-Year Treasury daily FRED 16 Moodys Baa - Aaa Corporate Bond daily FRED 17 10-Year Treasury - 3-Month Treasury daily FRED 18 3-Month Treasury - Effective Federal funds rate daily FRED 19 TED rate daily FRED 1Panel C.1 Earnings monthly Bybee, Kelly, Manela, and Xiu (2019) 12 Earnings forecasts monthly Bybee, Kelly, Manela, and Xiu (2019) 13 Earnings losses monthly Bybee, Kelly, Manela, and Xiu (2019) 14 Recession monthly Bybee, Kelly, Manela, and Xiu (2019) 15 Revenue growth monthly Bybee, Kelly, Manela, and Xiu (2019) 16 Revised estimate monthly Bybee, Kelly, Manela, and Xiu (2019) 1 Table A.4:
Other predictor variables description table – The id column gives mnemonics accordingto data source, which is given in the second column Source . The column frequency states thesampling frequency of the variable. The column
T-code denotes the data transformation appliedto a time-series, which are: (1) not transformed, (2) ∆ x t , (3) ∆ x t , (4) log( x t ), (5) ∆ log ( x t ), (6)∆ log ( x t ). Panel A. describes real-time monthly macro series, panel B. describes daily financialmarkets data and panel C. monthly news attention series. Appendix - 13 icker Firm name PERMNO RavenPack ID1 MMM 3M 22592 03B8CF2 ABT Abbott labs 20482 5206323 AUD Automatic data processing 44644 66ECFD4 ADTN Adtran 80791 9E98F25 AEIS Advanced energy industries 82547 1D943E6 AMG Affiliated managers group 85593 30E01D7 AKST A K steel holding 80303 41588B8 ATI Allegheny technologies 43123 D1173F9 AB AllianceBernstein holding l.p. 75278 CB138D10 ALL Allstate corp. 79323 E1C16B11 AMZN Amazon.com 84788 0157B112 AMD Advanced micro devices 61241 69345C13 DOX Amdocs ltd. 86144 45D15314 AMKR Amkor technology 86047 5C8D6115 APH Amphenol corp. 84769 BB07E416 AAPL Apple 14593 D8442A17 ADM Archer daniels midland 10516 2B7A4018 ARNC Arconic 24643 EC821B19 ATTA AT&T 66093 25198820 AVY Avery dennison corp. 44601 66268221 BHI Baker hughes 75034 940C3D22 BAC Bank of america corp. 59408 990AD023 BAX Baxter international inc. 27887 1FAF2224 BBT BB&T corp. 71563 1A3E1B25 BDX Becton dickinson & co. 39642 873DB926 BBBY Bed bath & beyond inc. 77659 9B71A727 BHE Benchmark electronics inc. 76224 6CF43C28 BA Boeing co. 19561 55438C29 BK Bank of new york mellon corp. 49656 EF5BED30 BWA BorgWarner inc. 79545 1791E731 BP BP plc 29890 2D469F32 EAT Brinker international inc. 23297 73244933 BMY Bristol-Myers squibb co. 19393 94637C34 BRKS Brooks automation inc. 81241 FC01C035 CA CA technologies inc. 25778 76DE4036 COG Cabot oil & gas corp. 76082 388E0037 CDN Cadence design systems inc. 11403 CC6FF538 COF Capital one financial corp. 81055 05501839 CRR Carbo ceramics inc. 83366 8B66CE40 CSL Carlisle cos. 27334 9548BB41 CCL Carnival corporation & plc 75154 06777942 CERN Cerner corp. 10909 9743E543 CHRW C.H. robinson worldwide inc. 85459 C659EB44 SCHW Charles schwab corp. 75186 D33D8C45 CHKP Check point software technologies ltd. 83639 531EF146 CHV Chevron corp. 14541 D54E6247 CI CIGNA corp. 64186 86A1B948 CTAS Cintas corp. 23660 BFAEB449 CLX Clorox co. 46578 71947750 KO Coca-Cola co. 11308 EEA6B351 CGNX Cognex corp. 75654 709AED52 COLM Columbia sportswear co. 85863 5D033753 CMA Comerica inc. 25081 8CF6DD54 CRK Comstock resources inc. 11644 4D72C855 CAG ConAgra foods inc. 56274 FA40E256 STZ Constellation brands inc. 69796 1D1B0757 CVG Convergys corp. 86305 914819
Appendix - 14
Appendix - 15
16 MTB M&T bank corp. 35554 D1AE3B117 MANH Manhattan associates inc. 85992 031025118 MAN ManpowerGroup inc. 75285 C0200F119 MAR Marriott international inc. 85913 385DD4120 MMC Marsh & mcLennan cos. 45751 9B5968121 MCD McDonald’s corp. 43449 954E30122 MCK McKesson corp. 81061 4A5C8D123 MDU MDU resources group inc. 23835 135B09124 MRK Merck & co. inc. 22752 1EBF8D125 MTOR Meritor inc 85349 00326E126 MTG MGIC investment corp. 76804 E28F22127 MGM MGM resorts international 11891 8E8E6E128 MCHP Microchip technology inc. 78987 CDFCC9129 MU Micron technology inc. 53613 49BBBC130 MSFT Microsoft corp. 10107 228D42131 MOT Motorola solutions inc. 22779 E49AA3132 MSM MSC industrial direct co. 82777 74E288133 MUR Murphy oil corp. 28345 949625134 NBR Nabors industries ltd. 29102 E4E3B7135 NOI National oilwell varco inc. 84032 5D02B7136 NYT New york times co. 47466 875F41137 NFX Newfield exploration co. 79915 9C1A1F138 NEM Newmont mining corp. 21207 911AB8139 NKE NIKE inc. 57665 D64C6D140 NBL Noble energy inc. 61815 704DAE141 NOK Nokia corp. 87128 C12ED9142 NOC Northrop grumman corp. 24766 FC1B7B143 NTRS Northern trust corp. 58246 3CCC90144 NUE NuCor corp. 34817 986AF6145 ODEP Office depot inc. 75573 B66928146 ONB Old national bancorp 12068 D8760C147 OMC Omnicom group inc. 30681 C8257F148 OTEX Open text corp. 82833 34E891149 ORCL Oracle corp. 10104 D6489C150 ORBK Orbotech ltd. 78527 290820151 PCAR Paccar inc. 60506 ACF77B152 PRXL Parexel international corp. 82607 EF8072153 PH Parker hannifin corp. 41355 6B5379154 PTEN Patterson-uti energy inc. 79857 57356F155 PBCT People’s united financial inc. 12073 449A26156 PEP PepsiCo inc. 13856 013528157 PFE Pfizer inc. 21936 267718158 PIR Pier 1 imports inc. 51692 170A6F159 PXD Pioneer natural resources co. 75241 2920D5160 PNCF PNC financial services group inc. 60442 61B81B161 POT Potash corporation of saskatchewan inc. 75844 FFBF74162 PPG PPG industries inc. 22509 39FB23163 PX Praxair inc. 77768 285175164 PG Procter & gamble co. 18163 2E61CC165 PTC PTC inc. 75912 D437C3166 PHM PulteGroup inc. 54148 7D5FD6167 QCOM Qualcomm inc. 77178 CFF15D168 DGX Quest diagnostics inc. 84373 5F9CE3169 RL Ralph lauren corp. 85072 D69D42170 RTN Raytheon co. 24942 1981BF171 RF Regions financial corp. 35044 73C521172 RCII Rent-a-center inc. 81222 C4FBDC173 RMD ResMed inc. 81736 434F38
Appendix - 16
74 RHI Robert half international inc. 52230 A4D173175 RDC Rowan cos. inc. 45495 3FFA00176 RCL Royal caribbean cruises ltd. 79145 751A74177 RPM RPM international inc. 65307 F5D059178 RRD RR R.R. donnelley & sons co. 38682 0BE0AE179 SLB Schlumberger ltd. n.v. 14277 164D72180 SCTT Scotts miracle-gro co. 77300 F3FCC3181 SM SM st. mary land & exploration co. 78170 6A3C35182 SONC Sonic corp. 76568 80D368183 SO Southern co. 18411 147C38184 LUV Southwest airlines co. 58683 E866D2185 SWK Stanley black & decker inc. 43350 CE1002186 STT State street corp. 72726 5BC2F4187 TGNA TEGNA inc. 47941 D6EAA3188 TXN Texas instruments inc. 15579 39BFF6189 TMK Torchmark corp. 62308 E90C84190 TRV The travelers companies inc. 59459 E206B0191 TBI TrueBlue inc. 83671 9D5D35192 TUP Tupperware brands corp. 83462 2B0AF4193 TYC Tyco international plc 45356 99333F194 TSN Tyson foods inc. 77730 AD1ACF195 X United states Steel corp. 76644 4E2D94196 UNH UnitedHealth group inc. 92655 205AD5197 VIAV Viavi solutions inc. 79879 E592F0198 GWW W.W. grainger inc. 52695 6EB9DA199 WDR Waddell & reed financial inc. 85931 2F24A5200 WBA Walgreens boots alliance inc. 19502 FACF19201 DIS Walt disney co. 26403 A18D3C202 WAT Waters corp. 82651 1F9D90203 WBS Webster financial corp. 10932 B5766D204 WFC Wells fargo & co. 38703 E8846E205 WERN Werner enterprises inc. 10397 D78BF1206 WABC Westamerica bancorp 82107 622037207 WDC Western digital corp. 66384 CE96E7208 WHR Whirlpool corp. 25419 BDD12C209 WFM Whole foods market inc. 77281 319E7D210 XLNX Xilinx inc. 76201 373E85
Table A.5: