[PDF] Machine Learning Panel Data Regressions with an Application to Nowcasting Price Earnings Ratios

Abstract

This paper introduces structured machine learning regressions for prediction and nowcasting with panel data consisting of series sampled at different frequencies. Motivated by the empirical problem of predicting corporate earnings for a large cross-section of firms with macroeconomic, financial, and news time series sampled at different frequencies, we focus on the sparse-group LASSO regularization. This type of regularization can take advantage of the mixed frequency time series panel data structures and we find that it empirically outperforms the unstructured machine learning methods. We obtain oracle inequalities for the pooled and fixed effects sparse-group LASSO panel data estimators recognizing that financial and economic data exhibit heavier than Gaussian tails. To that end, we leverage on a novel Fuk-Nagaev concentration inequality for panel data consisting of heavy-tailed τ -mixing processes which may be of independent interest in other high-dimensional panel data settings.

Full PDF

MMachine Learning Panel Data Regressions with anApplication to Nowcasting Price Earnings Ratios

Andrii Babii ∗ Ryan T. Ball † Eric Ghysels ‡ Jonas Striaukas § August 11, 2020

Abstract

This paper introduces structured machine learning regressions for prediction and now-casting with panel data consisting of series sampled at diﬀerent frequencies. Motivatedby the empirical problem of predicting corporate earnings for a large cross-section ofﬁrms with macroeconomic, ﬁnancial, and news time series sampled at diﬀerent fre-quencies, we focus on the sparse-group LASSO regularization. This type of regulariza-tion can take advantage of the mixed frequency time series panel data structures andwe ﬁnd that it empirically outperforms the unstructured machine learning methods.We obtain oracle inequalities for the pooled and ﬁxed eﬀects sparse-group LASSOpanel data estimators recognizing that ﬁnancial and economic data exhibit heavierthan Gaussian tails. To that end, we leverage on a novel Fuk-Nagaev concentrationinequality for panel data consisting of heavy-tailed τ -mixing processes which may beof independent interest in other high-dimensional panel data settings. Keywords: corporate earnings, nowcasting, high-dimensional panels, mixed frequencydata, text data, sparse-group LASSO, heavy-tailed τ -mixing processes, Fuk-Nagaevinequality. ∗ University of North Carolina at Chapel Hill - Gardner Hall, CB 3305 Chapel Hill, NC 27599-3305. Email: [email protected] † Stephen M. Ross School of Business, University of Michigan, 701 Tappan Street, Ann Arbor,MI 48109. Email: [email protected] ‡ Department of Economics and Kenan-Flagler Business School, University of North Carolina–Chapel Hill. Email: [email protected]. § LIDAM UC Louvain and FRS-FNRS Research Fellow. Email: [email protected]. a r X i v : . [ ec on . E M ] A ug Introduction

The fundamental value of equity shares is determined by the discounted value offuture payoﬀs. Every quarter investors get a glimpse of a ﬁrms’ potential payoﬀswith the release of corporate earnings reports. In a data-rich environment, stockanalysts have many indicators regarding future cash ﬂows that are available muchmore frequently. Ball and Ghysels (2018) took a ﬁrst stab at automating the processusing MIDAS regressions. Since their original work, much progress has been madeon machine learning regularized mixed frequency regression models. In the currentpaper, we signiﬁcantly expand the tools of nowcasting in a data-rich environmentby exploiting panel data structures. Panel data regression models are well suitedfor the ﬁrm-level data analysis as both time series and cross-section dimensions canbe properly modeled. In such models, time-invariant ﬁrm-speciﬁc eﬀects are typ-ically modeled in a ﬂexible way which allows capturing heterogeneity in the data.At the same time, machine learning methods are becoming increasingly popular ineconomics and ﬁnance as a ﬂexible way to model relationships between the responseand covariates.In the present paper, we analyze panel data regressions in a high-dimensionalsetting where the number of time-varying covariates can be very large and poten-tially exceed the sample size. This may happen when the number of ﬁrm-speciﬁccharacteristics, such as textual analysis news data or ﬁrm-level stock returns, is large,and/or the number of aggregates, such as market returns, macro data, etc., is large.In our theoretical treatment, we obtain oracle inequalities for pooled and ﬁxed ef-fects LASSO-type panel data estimators allowing for heavy-tailed τ -mixing data. Torecognize time series data structures, we rely on a more general sparse-group LASSO(sg-LASSO) regularization with dictionaries, which typically improves upon the un-structured LASSO estimator for time series data in small samples. Importantly, ourtheory covers LASSO and group-LASSO estimators as special cases.To recognize that the economic and ﬁnancial data often have heavier than Gaus-sian tails, our theoretical treatment relies on a new Fuk-Nagaev panel data concen-tration inequality. This allows us to characterize the dependence of the performanceof LASSO-type estimators on N (cross-section) and T (time series), which is espe-cially relevant for modern panel data applications, where both N and T can be large;see Fern´andez-Val and Weidner (2018) for a recent literature review focusing on thelow-dimensional panel data case.Our paper is related to the recent work of Fosten and Greenaway-McGrevy (2019) See Babii, Ghysels, and Striaukas (2020b) for an application of sg-LASSO to ADL-MIDASmodel and US GDP nowcasting. An empirical application to nowcasting ﬁrm-speciﬁc price/earnings ratios (hence-forth P/E ratio) is provided. We focus on the current quarter nowcasts, hence eval-uating model-based within quarter predictions for very short horizons. It is widelyacknowledged that P/E ratios are a good indicator of the future performance of aparticular company and therefore used by analysts and investment professionals tobase their decisions on which stocks to pick for their investment portfolios. A typicalvalue investor relies on consensus forecasts of earnings made by a pool of analysts.Hence, we naturally benchmark our proposed machine learning methods against suchpredictions. Besides, we compare our methods with a forecast combination approachused by Ball and Ghysels (2018) and a simple random walk (RW).Our high-frequency regressors include traditional macro and ﬁnancial series aswell as non-standard series generated by the textual analysis. We consider structuredpooled and ﬁxed eﬀects sg-LASSO panel data regressions with mixed frequency data(sg-LASSO-MIDAS). The ﬁxed eﬀects estimator yields sparser models compared topooled regressions with the Revenue growth and the ﬁrst lag of the dependent vari-able are selected throughout the out-of-sample period. BAA less AAA bond yieldspread, ﬁrm-level volatility, and news textual analysis Aggregate Event Sentimentindex are also selected very frequently. Our results show the superior performanceof sg-LASSO-MIDAS over analysts’ predictions, forecast combination method, andﬁrm-speciﬁc time series regression models. Besides, the sg-LASSO-MIDAS regres-sions perform better than unstructured panel data regressions with the elastic netregularization.Regarding the textual news data, it is worth emphasizing that the time seriesof news data is sparse since for many days are without ﬁrms-speciﬁc news and weimpute zero values. The nice property of our mixed frequency data treatment withdictionaries, imputing zeros also implies that non-zero entries get weights with adecaying pattern for distant past values in comparison to the most recent daily newsdata. As a result, our ML approach is particularly useful to model news data which The panel data regressions with the LASSO penalty is used in microeconometrics since Koenker(2004); see also Lamarche (2010), Kock (2013), Belloni, Chernozhukov, Hansen, and Kozbur (2016),Lu and Su (2016), Kock (2016), Harding and Lamarche (2019), Chiang, Rodrigue, and Sasaki (2019)and Chernozhukov, Hausman, and Newey (2019) among others. The group LASSO is consideredin Su, Shi, and Phillips (2016), Lu and Su (2016), and Farrell (2015) among others.

2s sparse in nature.The paper is organized as follows. Section 2 introduces the models and estimators.Oracle inequalities for sparse group LASSO panel data regressions appear in Section3. Section 4 covers Fuk-Nagaev inequalities for panel data. Results of our empiricalapplication analyzing price earnings ratios for a panel of individual ﬁrms are reportedin Section 5. Technical material appears in the Appendix and conclusions appear inSection 6.

In this section we describe brieﬂy the methodological approach, while in Section 3 weprovide more details and the supporting theoretical results. We focus on the pooledand the ﬁxed eﬀects panel regressions with the sparse-group LASSO (sg-LASSO)regularization. The best linear predictor for ﬁrm i = 1 , . . . , N in the panel datasetting is α i + x (cid:62) it β, where α i , i = 1 , . . . , N are ﬁxed intercepts. We consider predictive regressions withhomogeneous and heterogeneous entity-speciﬁc intercepts. In the pooled regressions, we ignore the cross-sectional heterogneity, assuming that α i = α for all i = 1 , . . . , N and the pooled sg-LASSO estimator is a solution tomin ( a,b ) ∈ R × R p (cid:107) y − aι − X b (cid:107) NT + 2 λ Ω( b ) , where Ω( b ) = γ | b | + (1 − γ ) (cid:107) b (cid:107) , is a penalty function and γ ∈ [0 ,

1] is the relative weight of the LASSO and the groupLASSO penalties.Intuitively, the low- or high-frequency lags of a single covariate deﬁne a groupwhich might be a dense signal provided that this covariate is relevant for prediction.The dense signals are not well-captured by the unstructured LASSO estimator ofTibshirani (1996). Indeed, the lags of a single covariate are temporally related,hence, taking this group structure into account might improve upon the predictiveperformance of the unstructured LASSO estimator in small samples; see Section 2.3for more details how the dense time series signal is mapped into the sparse-groupstructure with dictionaries. 3 .2 Fixed eﬀects sg-LASSO

In contrast to the pooled regressions, in the ﬁxed eﬀects regressions we estimatethe heterogeneous slope parameters α i , i = 1 , . . . , N , and use them subsequentlyto construct the best linear predictors. The ﬁxed eﬀects sg-LASSO estimator is asolution to min ( a,b ) ∈ R N + p (cid:107) y − Ba − X b (cid:107) NT + 2 λ Ω( b ) , where B = I N ⊗ ι and Ω is the sparse-group LASSO penalty. Note that we considerthe ﬁxed eﬀects as a dense signal and leave them unpenalized. The sparse groupstructure is deﬁned by low- and high-frequency lags similarly to the pooled regressionsas explained in the following subsection.

Motivated by our empirical application, we allow the high-dimensional set of covari-ates to be sampled at a higher frequency than the outcome variable. Let K be thetotal number of time-varying covariates { x i,t − j/m,k , i ∈ [ N ] , t ∈ [ T ] , j ∈ [ m ] , k ∈ [ K ] } possibly measured at some higher frequency with m observations for every low-frequency period t and consider the following MIDAS panel data regression y it = α i + K (cid:88) k =1 ψ ( L /m ; β k ) x it,k + u it , where ψ ( L /m ; β k ) x it,k = 1 /m (cid:80) mj =1 β j,k x i,t − j/m,k is the high-frequency lag polyno-mial. For m = 1, we retain the standard panel data regression model, while m > x i,t,k are also included. For large m , thereis a proliferation of the total number of the estimated parameters which reduces theﬁnite-sample predictive performance.The MIDAS literature oﬀers various parametrization of weights, see Ghysels,Santa-Clara, and Valkanov (2006); Ghysels, Sinko, and Valkanov (2006). More re-cently, Babii, Ghysels, and Striaukas (2020b) proposed a new approach based ondictionaries linear in parameters to approximate the MIDAS weight function whichis particularly useful for high-dimensional MIDAS regression models. The sparse-group LASSO allows for the data-driven approximation to the MIDAS weight func-tions from the dictionary promoting sparsity between groups (covariate selection)and within groups (MIDAS weight approximation). The pooled and the ﬁxed eﬀects estimators can be eﬃciently computed using a variant ofcoordinate descent algorithm proposed by Simon, Friedman, Hastie, and Tibshirani (2013). w L ( s ) = ( w ( s ) , . . . , w L ( s )) (cid:62) , called the dictionary , ω L ( s ; β k ) = ( w L ( s )) (cid:62) β k . Instead of estimating a large number of parameters pertatining to the high-frequencylag polynomial ψ ( L /m ; β k ) x it,k = 1 /m (cid:80) mj =1 β j,k x i,t − j/m,k , we estimate a lower-dimensionalparameter β k in 1 m m (cid:88) j =1 ω L ( j/m ; β k ) x i,t − j/m,k . The attractive feature of the sparse-group LASSO estimator is that it can learn theMIDAS weight function from the dictionary in a nonlinear data-driven way and, atthe same time, selects covariates deﬁned as groups of time series lags. In practice,dictionaries lead to the design matrix X structured appropriately; see Babii, Ghysels,and Striaukas (2020b) for more details on how to construct the design matrix.Note that such weights depend linearly on the parameter β k which allows forthe eﬃcient estimation of the high-dimensional MIDAS panel regression model, cf.,Khalaf, Kichian, Saunders, and Voia (2020) for the low-dimensional non-linear case.A suitable dictionary for our purposes are Legendre polynomials, which are in theclass of orthogonal polynomials and have very good approximating properties. Inpractice, orthogonal polynomials typically outperform non-orthogonal counterparts,e.g. Almon polynomials, or unrestricted lags in small samples; see Babii, Ghysels, andStriaukas (2020b) for further details and Monte Carlo simulation study supportingthis choice.

We consider several approaches to select the tuning parameter λ . First, we adaptthe k -fold cross-validation to the panel data setting. To that end, we resamplethe data by blocks respecting the time-series dimension and creating folds basedon individual ﬁrms instead of the pooled sample. We use 5-fold cross-validation asthe sample size of the dataset we consider in our empirical application is relativelysmall. We also consider the following three information criteria: BIC, AIC, andcorrected AIC (AICc). Assuming that y it | x it are i.i.d. draws from N ( α i + x (cid:62) it β, σ ), More precisely, we can approximate any continuous weight function in the L ∞ [0 ,

1] norm, andmore generally, any square-integrable function in the L [0 ,

1] norm, hence, the discontinuous MIDASweights are not ruled-out. L ( α, β, σ ) ∝ − σ N (cid:88) i =1 T (cid:88) t =1 ( y it − α i − x (cid:62) it β ) . Then, for the pooled model, the BIC criterion isBIC = (cid:107) y − ˆ αι − X ˆ β (cid:107) NT ˆ σ + log( N T ) N T × (cid:98) df , where df denotes the degrees of freedom. The degrees of freedom are estimatedas (cid:98) df = | ˆ β | + 1 for the pooled regression and (cid:98) df = | ˆ β | + N for the ﬁxed eﬀectsregression, where | . | is the (cid:96) -norm deﬁned as a number of non-zero coeﬃcients; seeZou, Hastie, and Tibshirani (2007) for more details. The AIC is computed asAIC = (cid:107) y − ˆ αι − X ˆ β (cid:107) NT ˆ σ + 2 N T × (cid:98) df . Lastly, the corrected Akaike information criteria isAICc = (cid:107) y − ˆ αι − X ˆ β (cid:107) NT ˆ σ + 2 (cid:98) dfN T − (cid:98) df − . The AICc might be a better choice when p is large compared to the sample size. Forthe ﬁxed eﬀects regressions, we replace ˆ αι with B ˆ α everywhere and adjust the degreesof freedom as described above. We report results for each of these four choices of thetuning parameters. In this section, we provide the theoretical analysis of the predictive performance ofpooled and ﬁxed eﬀects panel data regressions with the sg-LASSO regularization,including the standard LASSO and the group LASSO regularizations. It is worthstressing that our analysis is not tied to the mixed-frequency data setting and ap-plies to generic high-dimensional panel data regularized with the sg-LASSO penaltyfunction. Importantly, we focus on panels consisting of τ -mixing time series withpolynomial (Pareto-type) tails. 6 .1 Pooled regression The pooled linear projection model is y it = α + x (cid:62) it β + u it , E [ u it z it ] = 0 , i ∈ [ N ] , t ∈ [ T ] , where α ∈ R and β ∈ R p are unknown projection coeﬃcients, z it = (1 , x (cid:62) it ) (cid:62) , and weuse [ J ] to denote the set { , , . . . , J } for arbitrary positive integer J . The vector ofcovariates x it ∈ R p may include the time-varying covariates common for all entities(macroeconomic factors) as well as lags of y it and lags of some baseline covariates. Itis worth stressing that pooled regressions can also potentially accommodate hetero-geneity provided that the data are clustered in a relatively small number of clustersof similar entities.Put y i = ( y i , . . . , y iT ) (cid:62) , x i = ( x i , . . . , x it ) (cid:62) , u i = ( u i , . . . , u iT ) (cid:62) , and let ι ∈ R T be a vector of ones. Then the regression equation after stacking time seriesobservations is y i = αι + x i β + u i , i ∈ [ N ] . Deﬁne further y = ( y (cid:62) , . . . , y (cid:62) N ) (cid:62) , X = ( x (cid:62) , . . . , x (cid:62) N ) (cid:62) , and u = ( u (cid:62) , . . . , u (cid:62) N ) (cid:62) .Then the regression equation after stacking all cross-sectional observations is y = αι + X β + u , where ι ∈ R NT is a vector of ones.The pooled sg-LASSO estimator ˆ β solvesmin ( a,b ) ∈ R p (cid:107) y − aι − X b (cid:107) NT + 2 λ Ω( b ) , (1)where (cid:107) z (cid:107) NT = z (cid:62) z/N T for z ∈ R NT , andΩ( b ) = γ | b | + (1 − γ ) (cid:107) b (cid:107) , is the sg-LASSO penalty function. The penalty function Ω interpolates between theLASSO penalty | b | = (cid:80) pj =1 | b j | and the group LASSO penalty (cid:107) b (cid:107) , = (cid:80) G ∈G | b G | ,where G is a partition of [ p ] = { , , . . . , p } and | b G | = ( (cid:80) j ∈ G | b j | ) / is the (cid:96) norm.The parameter γ ∈ [0 ,

1] determines the relative weights of the LASSO and thegroup LASSO penalization, while the amount of regularization is controlled by theregularization parameter λ >

0. Note that the group structure G has to be speciﬁedby the econometrician, which in our setting is deﬁned by the high-frequency lags of7iﬀerent covariates. Throughout the paper we assume that groups have ﬁxed size,which is well-justiﬁed in our empirical applications. For a random variable ξ , let (cid:107) ξ (cid:107) q = ( E | ξ | q ) /q denote its L q , q ≥ τ -mixing processes, measuring the temporal dependence with τ -mixingcoeﬃcients. The τ -mixing processes can be placed somewhere between the α -mixingprocesses and mixingales – they are less restrictive than α -mixing, yet at the sametime are amenable to coupling similarly to α -mixing processes, which is not the casefor the mixingales; see Dedecker and Doukhan (2003), Dedecker and Prieur (2004),and Dedecker and Prieur (2005) for more details. This allows us to obtain sharpconcentration inequalities in Section 4.For a σ -algebra M and a random vector ξ ∈ R l , the coupling τ coeﬃcient isdeﬁned as τ ( M , ξ ) = sup f ∈ Lip (cid:90) R (cid:107) F f ( ξ ) |M ( x ) − F f ( ξ ) ( x ) (cid:107) d x, where Lip is a set of 1-Lipschitz functions from R l to R , F ζ is the CDF of ζ = f ( ξ )and F ζ |M is the CDF of ζ conditionally on M . For a stochastic process ( ξ t ) t ∈ Z with a natural ﬁltration generated by its past M t = σ ( ξ t , ξ t − , . . . ), the τ -mixingcoeﬃcients are deﬁned as τ k = sup j ≥ max l ∈ [ j ] l sup t + k ≤ t < ··· ; (iii) for every j ∈ [ p ] , τ -mixing coeﬃcients of { u it z it,j : t ∈ Z } satisfy τ k ≤ ck − a , ∀ k ≥ with some universal constants c > and a > ( q − / ( q − ;(iii) max j,k ∈ [ p ] (cid:107) z it,j z it,k (cid:107) ˜ q = O (1) for some ˜ q > ; (iv) for every j, k ∈ [ p ] , τ -mixingcoeﬃcients of { z it,j z it,k : t ∈ Z } satisfy ˜ τ k ≤ ˜ ck − ˜ a , ∀ k ≥ for some universalconstants ˜ c > and ˜ a > (˜ q − / (˜ q − . It is worth mentioning that the stationarity hypothesis can be relaxed at costs ofintroducing heavier notation. We require that only 2 + (cid:15) moments exist with (cid:15) > See Babii (2020) for a continuous-time mixed-frequency regression where the group size isallowed to increase with the sample size. See Dedecker and Prieur (2004) and Dedecker and Prieur (2005) for equivalent deﬁnitions. τ -mixing coeﬃcients.Next, we assume that the matrix Σ = E [ z it z (cid:62) it ] is non-singular uniformly over p . Assumption 3.2 (Covariance matrix) . The smallest eigenvalue of Σ is uniformlybounded away from zero by some universal constant. Assumption 3.2 can also be relaxed to the restricted eigenvalue condition im-posed on the population covariance matrix Σ; see also Babii, Ghysels, and Striaukas(2020b).Lastly, we assume that the regularization parameter λ scales appropriately withthe number of covariates p , the length of the panel T , and the size of the cross-section N . The precise order of the regularization parameter is described by theFuk-Nagaev inequality for long panels appearing in Theorem 4.1 of the next section.In what follows, we say that a ∼ b if and only if cb ≤ a ≤ cb for some appropriatelydeﬁned constants c, c > Assumption 3.3 (Regularization) . The regularization parameter satisﬁes λ ∼ (cid:18) pδ ( N T ) κ − (cid:19) /κ ∨ (cid:114) log( p/δ ) N T for some δ ∈ (0 , and κ = ( a +1) q − a + q − , where a, q are as in Assumptions 3.1. Assumption 3.3 describes the theoretically optimal level of the regularizationparameter. Our ﬁrst result is the oracle inequality for the pooled sg-LASSO estimator de-scribed in Eq. 1. The result allows for misspeciﬁed regressions with a non-trivialapproximation error in the sense that we consider more generally y = m + u , where m ∈ R NT is approximated with Z ρ , Z = ( ι, X ), and ρ = ( α, β (cid:62) ) (cid:62) . Theapproximation error m − Z ρ might come from the fact the MIDAS weight functionmay not have the exact expansion in terms of the speciﬁed dictionary or from the An interesting challenging question is how the data-driven choices of the tuning parameteraﬀect the performance of the LASSO-type estimators; see Chetverikov, Liao, and Chernozhukov(2020) for an example of such analysis in the case of cross-validation with i.i.d. sub-Gaussian data. S = { j ∈ [ p ] : β j (cid:54) = 0 } be the support of β and let G = { G ∈ G : β G (cid:54) = 0 } be the group support of β . Consider the eﬀective sparsity of the sparse-group structure, deﬁned as s = ( γ (cid:112) | S | + (1 − γ ) (cid:112) |G | ) . Note that s simpliﬁes to the sparsity of β , | S | , when γ = 1 and to the group sparsity |G | when γ = 0. Theorem 3.1.

Suppose that Assumptions 3.1, 3.2, and 3.3 are satisﬁed. Then withprobability at least − δ − O (cid:16) s ˜ κ p ( NT ) ˜ κ − + p e − cNT/s (cid:17) (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT (cid:46) sλ + (cid:107) m − Z ρ (cid:107) NT and | ˆ α − α | + | ˆ β − β | (cid:46) sλ + λ − (cid:107) m − Z β (cid:107) NT + s / (cid:107) m − Z β (cid:107) NT , for some c > and ˜ κ = (˜ a +1)˜ q − a +˜ q − . The proof of this result can be found in the Appendix. Theorem 3.1 applies topanel data unlike the result of Babii, Ghysels, and Striaukas (2020b). It provides theoracle inequalities describing the prediction and the estimation accuracy in the envi-ronment where the number of regressors p is allowed to scale with the eﬀective samplesize N T . Importantly, the result is stated under the weak tail and mixing conditionsin Assumption 3.1. Parameters κ and ˜ κ are the mixing-tails exponents for stochasticprocesses driving the regression score and the covariance matrix respectively.To describe convergence rates, the following condition considers a simpliﬁed set-ting, where the eﬀective sparsity s is constant, the approximation error vanishessuﬃciently fast, and the total number of regressors scales appropriately with theeﬀective sample size N T . Assumption 3.4.

Suppose that (i) s = O (1) ; (ii) (cid:107) m − Z β (cid:107) NT = O P ( λ ) ; (iii) p = o (( N T ) ˜ κ − ) . In particular, Assumption 3.4 allows for 1) N → ∞ while T is ﬁxed; 2) T → ∞ while N is ﬁxed; and 3) both N → ∞ and T → ∞ without restricting the relativegrowth of the two. The following result describes the prediction and the estimationconvergence rates in the asymptotic environment outlined in Assumption 3.4 and isan immediate consequence of Theorem 3.1. Corollary 3.1.

Suppose that Assumptions 3.1, 3.2, 3.3, and 3.4 are satisﬁed. Then (cid:107) Z ( ˆ β − β ) (cid:107) NT = O P (cid:18) p /κ ( N T ) − /κ ∨ log pN T (cid:19) nd | ˆ β − β | = O P (cid:32) p /κ ( N T ) − /κ ∨ (cid:114) log pN T (cid:33) . Note that for large a , the mixing-tails exponent is κ ≈ q . Therefore, for the datathat are close to independent, the prediction accuracy is approximately of order O P (cid:16) p /q ( NT ) − /q ∨ log pNT (cid:17) , which is the rate one would obtain for the i.i.d. data applyingdirectly Fuk and Nagaev (1971), Corollary 4, so in this sense our result is sharp.If the data are sub-Gaussian, then moments of all order q ≥ N T , the ﬁrst term can be made arbitrarily small relatively tothe second taking large enough q . In this case we recover the O P (cid:0) log pNT (cid:1) rate typicallyobtained for sub-Gaussian data. Therefore, the Fuk-Nagaev inequality provides amore accurate description of the performance of the LASSO-type estimators.If the polynomial tail dominates, then we need p = o (( N T ) κ − ) for the predictionand the estimation consistency provided that ˜ κ ≥ κ −

1. The pooled sg-LASSOestimator is expected to work well whenever the number of regressors p is small rela-tive to ( N T ) κ − . This is a signiﬁcantly weaker requirement compared to p = o ( T κ − )needed for time series regressions in Babii, Ghysels, and Striaukas (2020b). In par-ticular since κ > p = o (( N T ) κ − ) can be signiﬁcantly weaker than p = o ( N T )condition needed in the QMLE/GMM framework without regularization. How muchthe sg-LASSO improves upon the (unregularized) QMLE depends on the heavinessof tails and persistence of the underlying stochastic processes as measured by themixing-tails exponent κ . In particular, for light tails and weakly persistent series,the mixing-tails exponent κ is large, oﬀsetting the dependence on p .Lastly, it is worth mentioning that the oracle inequality is driven by the timeseries with the heaviest tail and that it might be possible to obtain sharper resultsallowing for heterogeneous tails at the costs of introducing a heavier notation. Pooled regressions are attractive since the eﬀective sample size

N T can be huge,yet the heterogeneity of individual time series may be lost. If the underlying serieshave substantial heterogeneity over i ∈ [ N ], then taking this into account mightreduce the projection error and improve the predictive accuracy. At a very extremeside, the cross-sectional structure can be completely ignored and individual time-series regressions can be used for prediction. The ﬁxed eﬀects panel data regressions Recall that the Fuk-Nagaev inequality provides sharper description of concentration comparedto the simple Markov’s bound in conjunction with the Rosenthal’s moment inequality. y it = α i + x (cid:62) it β + u it , E [ u it z it ] = 0 , i ∈ [ N ] , t ∈ [ T ] , where z it = (1 , x (cid:62) it ) (cid:62) . Note that the entity-speciﬁc intercepts α i are deterministicconstants and the projection model is always well-deﬁned. The ﬁxed eﬀects have tobe estimated to construct the best linear predictor α i + x (cid:62) it β .The ﬁxed eﬀects sg-LASSO estimators ˆ α and ˆ β solvemin ( a,b ) ∈ R N + p (cid:107) y − Ba − X b (cid:107) NT + 2 λ Ω( b ) , where B = I N ⊗ ι , I N is N × N identity matrix, ι ∈ R T is the vector with allcoordinates equal to one, and Ω is the sg-LASSO penalty. It is worth stressing thatthe design matrix X does not include the intercept and that we do not penalizethe ﬁxed eﬀects. This is done because the sparsity over the ﬁxed eﬀects does nothold even in the special case where all intercepts are equal. By Fermat’s rule, theﬁrst-order conditions are ˆ α = ( B (cid:62) B ) − B (cid:62) ( y − X ˆ β )0 = X (cid:62) M B ( X ˆ β − y ) /N T + λz ∗ for some z ∗ ∈ ∂ Ω( ˆ β ), where b (cid:55)→ ∂ Ω( b ) is the subdiﬀerential of Ω and M B = I − B ( B (cid:62) B ) − B (cid:62) is the orthogonal projection matrix. It is easy to see from the ﬁrst-order conditions that the estimator of ˆ β is equivalent to: 1) penalized GLS estimatorfor the ﬁrst-diﬀerenced regression; 2) penalized OLS estimator for the regressionwritten in the deviation from time means; and 3) penalized OLS estimator where theﬁxed eﬀects are partialled-out. Thus, the equivalence between the three approachesis not aﬀected by the penalization, cf., Arellano (2003) for low-dimensional panels.For the ﬁxed eﬀects regression, we deﬁneˆΣ = (cid:32) T B (cid:62) B √ NT B (cid:62) X √ NT X (cid:62) B NT X (cid:62) X (cid:33) and Σ = (cid:32) I N √ NT E (cid:2) B (cid:62) X (cid:3) √ NT E (cid:2) X (cid:62) B (cid:3) E [ x it x (cid:62) it ] (cid:33) . (2)We will assume that the smallest eigenvalue of Σ is uniformly bounded away fromzero by some constant. Note that if x it ∼ N (0 , I p ), then Σ is approximately equal tothe identity matrix for large N .The order of the regularization parameter is governed by the Fuk-Nagaev inequal-ity for long panels in Theorem 4.1, with the only diﬀerence that it has to take intoaccount the fact that the ﬁxed eﬀects parameters are estimated.12 ssumption 3.5 (Regularization) . The regularization parameter satisﬁes λ ∼ (cid:18) p ∨ N κ/ δ ( N T ) κ − (cid:19) /κ ∨ (cid:114) log( p ∨ N/δ ) N T for some δ ∈ (0 , and κ = ( a +1) q − a + q − , where a, q are as in Assumptions 3.1. Similarly to the pooled regressions, we state the oracle inequality allowing forthe approximation error. For ﬁxed eﬀects regressions we redeﬁne Z = ( B, X ), ρ =( α, β (cid:62) ) (cid:62) . Put also r N,T,p = p ( s ∨ N ) ˜ κ T − ˜ κ ( N − ˜ κ/ + pN − ˜ κ ) + p ( p ∨ N ) e − cNT/ ( s ∨ N ) with ˜ κ = (˜ a +1)˜ q − a +˜ q − and some c >

0. Recall also that Σ in Assumption 3.2 is redeﬁnedaccording to Eq. 2, so that Σ is non-singular uniformly over p, N, T . Theorem 3.2.

Suppose that Assumptions 3.1, 3.2, and 3.5 are satisﬁed. Then withprobability at least − δ − O ( r N,T,p ) (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT (cid:46) ( s ∨ N ) λ + (cid:107) m − Z ρ (cid:107) NT . Theorem 3.2 states the oracle inequalities for the prediction error in the ﬁxedeﬀects panel data regressions estimated with the sg-LASSO. To see clearly, how theprediction accuracy scales with the sample size, we make the following assumption.

Assumption 3.6.

Suppose that (i) s = O (1) ; (ii) (cid:107) m − Z β (cid:107) NT = O P ( N λ ) ; (iii) ( p + N ˜ κ/ ) pN/T ˜ κ − = o (1) and p ( p ∨ N ) e − cT/N = o (1) . The following corollary is an immediate consequence of Theorem 3.2.

Corollary 3.2.

Suppose that Assumptions 3.1, 3.2, 3.5, and 3.6 are satisﬁed. Then (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT = O P (cid:18) p /κ ∨ NN − /κ T − /κ ∨ log( p ∨ N ) T (cid:19) . Note that this result allows for p, N, T → ∞ at appropriate rates and that wepay additional price for estimating N ﬁxed eﬀects which plays a similar role to theeﬀective dimension of covariates. Therefore, in order to achieve accurate prediction,the panel has to be suﬃciently long to oﬀset the estimation error of the individualﬁxed eﬀects. 13 Fuk-Nagaev inequality for panel data

In this section we obtain new Fuk-Nagaev concentration inequality for panel datareﬂecting the concentration jointly over N and T . It is worth stressing that theinequality does not follow directly from the Fuk-Nagaev inequality of Babii, Ghysels,and Striaukas (2020a) and is of independent interest for the high-dimensional paneldata. Theorem 4.1.

Let { ξ it : i ≥ , t ∈ Z } be an array of centered random vectors in R p such that { ξ i , . . . , ξ iT : i ≥ } are i.i.d. and for each i ≥ , ( ξ it ) t ∈ Z is a stationarystochastic process such that (i) max j ∈ [ p ] (cid:107) ξ it,j (cid:107) q = O (1) for some q > ; (ii) forevery j ∈ [ p ] , τ -mixing coeﬃcients of ( ξ it,j ) t ∈ Z satisfy τ ( j ) k ≤ ck − a , ∀ k ≥ for someuniversal constants c > and a > q − q − . Then for every u > (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 T (cid:88) t =1 ξ it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ > u (cid:33) ≤ c pN T u − κ + 4 pe − c u /NT for some c , c > and κ = ( a +1) q − a + q − . It follows from Theorem 4.1 that there exists

C > δ ∈ (0 , (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N T T (cid:88) t =1 N (cid:88) i =1 ξ it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ ≤ C (cid:18) pδ ( N T ) κ − (cid:19) /κ ∨ (cid:114) log(8 p/δ ) N T (cid:33) ≥ − δ. Note that the inequality reﬂects the concentration jointly over N and T and that tailsand persistence play an important role through the mixing-tails exponent κ . Theinequality is a key technical tool that allows us to handle panel data with heavierthan Gaussian tails and non-negligible T and N . The proof of this result can befound in the Appendix and is based on the blocking technique, cf., Bosq (1993)combined with the τ -coupling lemma of Dedecker and Prieur (2004).For short panels with small T , the following inequality might be a better choice. Theorem 4.2.

Let { ξ it : i ≥ , t ∈ Z } be an array of centered random vectors in R p such that { ξ i, , . . . , ξ it : i ≥ } are i.i.d. and for each i ≥ , ( ξ it ) t ∈ Z is a stationarystochastic process such that (i) max j ∈ [ p ] (cid:107) ξ it,j (cid:107) q = O (1) for some q > ; (ii) for The direct application of the time series Fuk-Nagaev inequality of Babii, Ghysels, and Striaukas(2020a) leads to inferior concentration results for panel data. very j ∈ [ p ] , τ -mixing coeﬃcients of ( ξ it,j ) t ∈ Z satisfy τ ( j ) k ≤ ck − a , ∀ k ≥ for someuniversal constants c > and a > q − q − . Then for every u > (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 T (cid:88) t =1 ξ it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ > u (cid:33) ≤ c pN u − q + 4 pe − c u /NT for some c , c > . It follows from Theorem 4.2 that there exists

C > δ ∈ (0 , (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N T T (cid:88) t =1 N (cid:88) i =1 ξ it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ ≤ C (cid:16) pδN q − (cid:17) /q ∨ (cid:114) log(8 p/δ ) N T (cid:33) ≥ − δ. The proof of this result can be found in the Appendix and is a straightforward ap-plication of the Fuk-Nagaev inequality for i.i.d. data and the Rosenthal’s momentinequality, in contrast to Theorem 4.1. This inequality does not capture the concen-tration over T and may be a suboptimal choice for long panels which is the case inour empirical application. In our empirical application, we consider nowcasting the P/E ratios of 210 US ﬁrmsusing a set of predictors that are sampled at mixed frequencies. We use 24 predictors,including traditional macro and ﬁnancial series as well as non-standard series gener-ated by the textual analysis. We apply pooled and ﬁxed eﬀects sg-LASSO-MIDASpanel data models and compare them with several benchmarks such as random walk(RW), analysts consensus forecasts, and unstructured elastic net. We also computepredictions using individual-ﬁrm high-dimensional time series regressions and pro-vide results for several choices of the tuning parameter. Lastly, we provide resultsfor low-dimensional single-ﬁrm MIDAS regressions using forecast combination tech-niques used by Andreou, Ghysels, and Kourtellos (2013) and Ball and Ghysels (2018).The latter is particularly relevant regarding the analysis in the current paper as italso deals with nowcasting price earnings ratios. The forecast combination meth-ods consist of estimating ADL-MIDAS regressions with each of the high-frequencycovariates separately. In our case this leads to 24 predictions, corresponding tothe number of predictors. Then a combination scheme, typically discounted meansquared error type, produces a single nowcast. One could call this a pre-machinelearning large dimensional approach. It will, therefore, be interesting to assess how15his approach compares to the regularized MIDAS panel regression machine learningapproach introduced in the current paper.We start with a short review of the data, with more detailed descriptions andtables available appearing in Appendix Section D, followed by a summary of themethods used and the empirical results obtained.

The full sample consists of observations between 1 st of January, 2000 and 30 th ofJune, 2017. Due to the lagged dependent variables in the models, our eﬀectivesample starts the third ﬁscal quarter of 2000. We use the ﬁrst 25 observationsfor the initial sample, and use the remaining 42 observations for evaluating theout-of-sample forecasts, which we obtain by using an expanding window forecastingscheme. We collect the data from CRSP and I/B/E/S to compute quarterly P/Eratios and ﬁrm-speciﬁc ﬁnancial covariates; RavenPack is used to compute dailyﬁrm-level textual-analysis-based data; real-time monthly macroeconomic series areobtained from FRED-MD dataset, see McCracken and Ng (2016) for more details;FRED is used to compute daily ﬁnancial markets data and, lastly, monthly newsattention series extracted from the Wall Street Journal articles is retrieved fromBybee, Kelly, Manela, and Xiu (2019). Appendix Section D provides a detaileddescription of the data sources. In particular, ﬁrm-level variables, including P/Eratios, are described in Appendix Table A.3, and the other predictor variables inAppendix Table A.4. The list of all ﬁrms we consider in our analysis appears inAppendix Table A.5.

P/E ratio and analysts’ forecasts sample construction.

Our target variableis the P/E ratio for each individual ﬁrm. To compute it, we use CRSP stock pricedata and I/B/E/S earnings data. Earnings data are subject to release delays of 1to 2 months depending on the ﬁrm and quarter. Therefore, to reﬂect the real-timeinformation ﬂow, we separately compute the dependent variable, analysts’ consensusforecasts, and the target variable using stock prices that were available in real-time.We also take into account that diﬀerent ﬁrms have diﬀerent ﬁscal quarters, whichalso aﬀects the real-time information ﬂow.For example, suppose for a particular ﬁrm the ﬁscal quarters are at the end ofthe third month in a quarter, i.e. end of March, June, September, and December.Our dependent variable used in regression models is computed by taking the end ofquarter prices and dividing them by the respective earnings value. The consensus th of April. In this case, we record the stock price forthis particular ﬁrm on 25 th of April, and divide it by the realized earnings value. To compute forecasts, we estimate several regression models. First, we estimate ﬁrm-speciﬁc sg-LASSO-MIDAS regressions, which in Table 1 we refer to as

Individual .The model is written as y i = ια i + x i β i + u i , i = 1 , . . . , N, and the ﬁrm-speciﬁc predictions are computed as ˆ y i,T +1 = ˆ α i + x (cid:62) i,T +1 ˆ β i . As noted inSection 2, x i contains lags of the low-frequency target variable and MIDAS weightsfor each of the high-frequency covariate. We then estimate the following pooled andﬁxed eﬀects sg-LASSO-MIDAS panel data models y = αι + X β + u Pooled y = Bα + X β + u Fixed Eﬀectsand compute predictions asˆ y i,T +1 = ˆ α + x (cid:62) i,T +1 ˆ β Pooledˆ y i,T +1 = ˆ α i + x (cid:62) i,T +1 ˆ β Fixed Eﬀects . We benchmark ﬁrm-speciﬁc and panel data regression-based nowcasts against twosimple alternatives. First, we compute forecasts for the RW model asˆ y i,T +1 = y i,T . Second, we consider predictions of P/E implied by analysts earnings nowcasts usingthe information up to time T + 1, i.e.ˆ y i,T +1 = ¯ y i,T +1 , y indicates that the forecasted P/E ratio is based on consensus earnings fore-casts made at the end of T + 1 quarter, and the stock price is also taken at the endof T + 1 . To measure the forecasting performance, we compute the mean squared forecasterrors (MSE) for each method. Let ¯y i = ( y iT is +1 , . . . , y iT os ) (cid:62) represent the out-of-sample realized P/E ratio values, where T is and T os denote the last initial in-sample observation and the last out-of-sample observation respectively, and let ˆy i =(ˆ y iT is +1 , . . . , ˆ y iT os ) collect the out-of-sample forecasts from a speciﬁc method. Then,the mean squared forecast errors are computed asMSE = 1 N N (cid:88) i =1 T − T is + 1 ( ¯y i − ˆy i ) (cid:62) ( ¯y i − ˆy i ) . RW MSE An.-mean MSE An.-median sg-LASSO2.331 2.339 2.088 γ = 0 0.2 0.4 0.6 0.8 1Panel A. Cross-validationIndividual 1.545 1.551 1.567 1.594 1.614 1.606Pooled 1.459 1.456 Table 1:

Prediction results – The table reports average over ﬁrms MSEs of out-of-samplepredictions. The nowcasting horizon is the current month, i.e. we predict the P/E ratiousing information up to the end of current ﬁscal quarter. Each Panel A-D block representsdiﬀerent ways of calculating the tuning parameter λ . Bold entries are the best results in ablock. The main results are reported in Table 1, while additional results for unstructuredLASSO estimators and the forecast combination approach appear in Appendix TablesA.1-A.2. First, we document that analysts-based predictions have much larger meansquared forecast errors (MSEs) compared to model-based predictions. The sharpincrease in quality of model- versus analyst-based predictions indicates the usefulnessof machine learning methods to nowcast P/E ratios, see Tables 1 and A.1. A better18erformance is achieved for almost all machine learning methods - single ﬁrm orpanel data regressions - and all tuning parameter choices. Unstructured panel datamethods and forecast combination approach also yield more accurate forecasts, seeAppendix Table A.1-A.2. The latter conﬁrms the ﬁndings of Ball and Ghysels (2018).Turning to the comparison of model-based predictions, we see from the resultsin Table 1 that sg-LASSO-MIDAS panel data models improve the quality of predic-tions over individual sg-LASSO-MIDAS models irrespective of the γ weight or thetuning parameter choice. This indicates that panel data structures are relevant fornowcasting P/E ratios. We also report similar ﬁndings for unstructured estimators.Within the panel data framework, we observe that ﬁxed eﬀects improve over pooledregressions in most cases except when cross-validation is used; compare Table 1-A.2Panel A with Table 1-A.2 Panel B-D. The pooled model tuned by cross-validationseems to yield the best overall performance. In general, one can expect that cross-validation improves prediction performance over diﬀerent tuning methods as it isdirectly linked to empirical risk minimization. In the case of ﬁxed eﬀects, however,we may lose the predictive gain due to smaller samples with each fold used in esti-mating the model. Lastly, the best results per tuning parameter block seem to beachieved when γ / ∈ { , } , indicating that both sparsity within the group and at thegroup level matters for prediction performance.In Appendix Figure A.1, we plot the sparsity pattern of the selected covariatesfor the two best-performing methods: a) pooled sg-LASSO regressions, tuned usingcross-validation with γ = 0 .

4, and b) ﬁxed eﬀects sg-LASSO model with BIC tuningparameter and the same γ parameter. We also plot the forecast combination weightswhich are averaged over ﬁrms. The plots in Figure A.1 reveal that the ﬁxed eﬀectsestimator yields sparser models compared to pooled regressions, and the sparsitypattern is clearer. In the ﬁxed-eﬀects case, the Revenue growth and the ﬁrst lag ofthe dependent variable are selected throughout the out-of-sample period. BAA lessAAA bond yield spread, ﬁrm-level volatility, and Aggregate Event Sentiment indexare also selected very frequently. Similarly, these variables are selected in the pooledregression, but the pattern is less apparent. The forecast combination weights seemto yield similar, yet a more blurred pattern. In this case, Revenue growth and ﬁrm-level stock returns covariates obtain relatively larger weights compared to the restof covariates, particularly for the ﬁrst part of the out-of-sample period. Therefore,the gain of machine learning methods - both single-ﬁrm and panel data - can be Note that forecast combination weights start in 2009 Q1 due to the ﬁrst eight quarters beingused as a pre-sample to estimate weights, see Ball and Ghysels (2018) for further details. Also, theforecast combination weights ﬁgure does not contain autoregressive lags; all four lags are alwaysincluded in all forecasting regressions.

To test for the superior forecast performance, we use the Diebold and Mariano (1995)test for the pool of P/E ratio nowcasts. We compare the mean and median consensusforecasts versus panel data machine learning regressions with the smallest forecasterror per tuning parameter block in Table 1. We report the forecast accuracy testresults in Table 2.When testing the full sample of pooled nowcasts, the gain in prediction accuracy isnot signiﬁcant even though the MSEs are much lower for the panel data sg-LASSOregressions relative to the consensus forecasts. The result may not be surprising,however, as some ﬁrms have a large number of outlier observations and the Dieboldand Mariano (1995) test statistic is aﬀected by the inevitably heavy-tailed forecasterrors for such ﬁrms. However, when we equally split the pooled sample of nowcastsinto ﬁrms with high versus low variance P/E ratios, the gain in forecast accuracyis (not) signiﬁcant for all panel data machine learning regressions for (high) lowvariance P/E ﬁrms.

This paper introduces a new class of high-dimensional panel data regression modelswith dictionaries and sparse-group LASSO regularization. This type of regularizationis an especially attractive choice for the predictive panel data regressions, where thelow- and/or the high-frequency lags deﬁne a clear group structure, and dictionariesare used to aggregate time series lags. The estimator nests the LASSO and thegroup LASSO estimators as special cases, as discussed in our theoretical analysis.20 ull sample Large variance Low variancePooled (Cross-validation) vs An.-mean 0.852 0.567 2.300Pooled (Cross-validation) vs An.-median 0.694 0.386 2.190Fixed-eﬀects (BIC) vs An.-mean 0.793 0.508 2.312Fixed-eﬀects (BIC) vs An.-median 0.628 0.319 2.202Fixed-eﬀects (AIC) vs An.-mean 0.825 0.540 2.312Fixed-eﬀects (AIC) vs An.-median 0.663 0.355 2.202Fixed-eﬀects (AICc) vs An.-mean 0.825 0.540 2.312Fixed-eﬀects (AICc) vs An.-median 0.663 0.355 2.202

Table 2:

Forecasting performance signiﬁcance – The table reports the Diebold and Mariano(1995) test statistic for pooled nowcasts comparing machine learning panel data regressionswith analysts’ implied consensus forecasts, where An.-mean and An.-median denote meanand median consensus forecasts respectively. We compare panel models that have thesmallest forecast error per tuning parameter block in Table 1.

Our theoretical treatment allows for the heavy-tailed data frequently encounteredin time series and ﬁnancial econometrics. To that end, we obtain a new panel dataconcentration inequality of the Fuk-Nagaev type for τ -mixing processes.Our empirical analysis sheds light on the advantage of the regularized panel dataregressions for nowcasting corporate earnings. We focus on nowcasting the P/Eratio of 210 US ﬁrms and ﬁnd that the regularized panel data regressions outperformseveral benchmarks, including the analysts’ predictions. Furthermore, we ﬁnd thatthe regularized machine learning regressions outperform the forecast combinationsand that the panel data approach improves upon the predictive time series regressionsfor individual ﬁrms.While nowcasting earnings is a leading example of applying panel data MIDASmachine learning regressions, one can think of many other applications of interestin ﬁnance. Beyond earnings, analysts are also interested in sales, dividends, etc.Our analysis can also be useful for other areas of interest, such as regional andinternational panel data settings. 21 eferences Andreou, E., E. Ghysels, and

A. Kourtellos (2013): “Should macroeconomicforecasters use daily ﬁnancial data and how?,”

Journal of Business and EconomicStatistics , 31(2), 240–251.

Arellano, M. (2003):

Panel data econometrics . Oxford University Press.

Babii, A. (2020): “High-dimensional mixed-frequency IV regression,” arXivpreprint arXiv:2003.13478 . Babii, A., E. Ghysels, and

J. Striaukas (2020a): “Inference for high-dimensional regressions with heteroskedasticity and autocorrelation,” arXivpreprint arXiv:1912.06307 .(2020b): “Machine learning time series regressions with an application tonowcasting,” arXiv preprint arXiv:2005.14057 . Ball, R. T., and

E. Ghysels (2018): “Automated earnings forecasts: beat ana-lysts or combine and conquer?,”

Management Science , 64(10), 4936–4952.

Belloni, A., V. Chernozhukov, C. Hansen, and

D. Kozbur (2016): “Infer-ence in high-dimensional panel models with an application to gun control,”

Journalof Business and Economic Statistics , 34(4), 590–605.

Bosq, D. (1993): “Bernstein-type large deviations inequalities for partial sums ofstrong mixing processes,”

Statistics , 24(1), 59–70.

Bybee, L., B. T. Kelly, A. Manela, and

D. Xiu (2019): “The structure ofeconomic news,” Available at SSRN 3446225.

Chernozhukov, V., J. A. Hausman, and

W. K. Newey (2019): “Demandanalysis with many prices,” National Bureau of Economic Research Discussionpaper 26424.

Chetverikov, D., Z. Liao, and

V. Chernozhukov (2020): “On cross-validatedLasso,”

Annals of Statistics (forthcoming) . Chiang, H. D., J. Rodrigue, and

Y. Sasaki (2019): “Post-selection inferencein three-dimensional panel data,” arXiv preprint arXiv:1904.00211 . Dedecker, J., and

P. Doukhan (2003): “A new covariance inequality and ap-plications,”

Stochastic Processes and their Applications , 106(1), 63–80.22 edecker, J., and

C. Prieur (2004): “Coupling for τ -dependent sequences andapplications,” Journal of Theoretical Probability , 17(4), 861–885.(2005): “New dependence coeﬃcients. Examples and applications to statis-tics,”

Probability Theory and Related Fields , 132(2), 203–236.

Diebold, F. X., and

R. S. Mariano (1995): “Comparing predictive accuracy,”

Journal of Business and Economic Statistics , 13(3), 253–263.

Farrell, M. H. (2015): “Robust inference on average treatment eﬀects with pos-sibly more covariates than observations,”

Journal of Econometrics , 189(1), 1–23.

Fern´andez-Val, I., and

M. Weidner (2018): “Fixed eﬀects estimation of large-T panel data models,”

Annual Review of Economics , 10, 109–138.

Fosten, J., and

R. Greenaway-McGrevy (2019): “Panel data nowcasting,”

Available at SSRN 3435691 . Fuk, D. K., and

S. V. Nagaev (1971): “Probability inequalities for sums ofindependent random variables,”

Theory of Probability and Its Applications , 16(4),643–660.

Ghysels, E., P. Santa-Clara, and

R. Valkanov (2006): “Predicting volatility:getting the most out of return data sampled at diﬀerent frequencies,”

Journal ofEconometrics , 131(1–2), 59–95.

Ghysels, E., A. Sinko, and

R. Valkanov (2006): “MIDAS regressions: Furtherresults and new directions,”

Econometric Reviews , 26(1), 53–90.

Harding, M., and

C. Lamarche (2019): “A panel quantile approach to attritionbias in Big Data: Evidence from a randomized experiment,”

Journal of Econo-metrics , 211(1), 61–82.

Khalaf, L., M. Kichian, C. J. Saunders, and

M. Voia (2020): “Dynamic pan-els with MIDAS covariates: Nonlinearity, estimation and ﬁt,”

Journal of Econo-metrics (forthcoming).

Kock, A. B. (2013): “Oracle eﬃcient variable selection in random and ﬁxed eﬀectspanel data models,”

Econometric Theory , 29(1), 115–152.(2016): “Oracle inequalities, variable selection and uniform inference inhigh-dimensional correlated random eﬀects panel data models,”

Journal of Econo-metrics , 195(1), 71–85. 23 oenker, R. (2004): “Quantile regression for longitudinal data,”

Journal of Mul-tivariate Analysis , 91(1), 74–89.

Kolanovic, M., and

R. Krishnamachari (2017): “Big data and AI strategies:Machine learning and alternative data approach to investing,” JP Morgan GlobalQuantitative & Derivatives Strategy Report.

Lamarche, C. (2010): “Robust penalized quantile regression estimation for paneldata,”

Journal of Econometrics , 157(2), 396–408.

Lu, X., and

L. Su (2016): “Shrinkage estimation of dynamic panel data modelswith interactive ﬁxed eﬀects,”

Journal of Econometrics , 190(1), 148–175.

McCracken, M. W., and

S. Ng (2016): “FRED-MD: A monthly database formacroeconomic research,”

Journal of Business and Economic Statistics , 34(4),574–589.

Simon, N., J. Friedman, T. Hastie, and

R. Tibshirani (2013): “A sparse-group LASSO,”

Journal of Computational and Graphical Statistics , 22(2), 231–245.

Su, L., Z. Shi, and

P. C. Phillips (2016): “Identifying latent structures in paneldata,”

Econometrica , 84(6), 2215–2264.

Tibshirani, R. (1996): “Regression shrinkage and selection via the lasso,”

Journalof the Royal Statistical Society, Series B (Methodological) , 58(1), 267–288.

Zou, H., T. Hastie, and

R. Tibshirani (2007): “On the degrees of freedom ofthe lasso,”

Annals of Statistics , 35(5), 2173–2192.24

PPENDIX

A Proofs of oracle inequalities

Proof of Theorem 3.1.

The proof is similar to the proof of Babii, Ghysels, and Stri-aukas (2020b), Theorem 3.1 and is omitted. The main diﬀerence in the proof is thatinstead of applying the Fuk-Nagaev inequality from Babii, Ghysels, and Striaukas(2020a), Theorem 3.1, we apply the concentration inequality from Theorem 4.1 to (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N T N (cid:88) i =1 T (cid:88) t =1 u it z it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ and max j,k ∈ [ p ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N T N (cid:88) i =1 T (cid:88) t =1 z it,j z it,k − Σ j,k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) under Assumptions 3.1, 3.2, and 3.3. Proof of Theorem 3.2.

Put r = ( a (cid:62) , b (cid:62) ) (cid:62) . Then we solvemin r ∈ R N + p (cid:107) y − Z r (cid:107) NT + 2 λ Ω( b ) . By Fermat’s rule the solution to this problem satisﬁes Z (cid:62) ( Z ˆ ρ − y ) /N T + λz ∗ = 0 N + p for some z ∗ = (cid:0) N z ∗ b (cid:1) , where 0 N is N -dimensional vector of zeros, z ∗ b ∈ ∂ Ω( ˆ β ), ˆ ρ =( ˆ α (cid:62) , ˆ β (cid:62) ) (cid:62) , and ∂ Ω( ˆ β ) is the sub-diﬀerential of b (cid:55)→ Ω( b ) at ˆ β . Taking the innerproduct with ρ − ˆ ρ (cid:104) Z (cid:62) ( y − Z ˆ ρ ) , ρ − ˆ ρ (cid:105) NT = λ (cid:104) z ∗ , ρ − ˆ ρ (cid:105) = λ (cid:104) z ∗ b , β − ˆ β (cid:105)≤ λ (cid:110) Ω( β ) − Ω( ˆ β ) (cid:111) , where the last line follows from the deﬁnition of the sub-diﬀerential. RearrangingAppendix - 1his inequality and using y = m + u (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT − λ (cid:110) Ω( β ) − Ω( ˆ β ) (cid:111) ≤(cid:104) Z (cid:62) u , ˆ ρ − ρ (cid:105) NT + (cid:104) Z ( m − Z ρ ) , ˆ ρ − ρ (cid:105) NT = (cid:104) B (cid:62) u , ˆ α − α (cid:105) NT + (cid:104) X (cid:62) u , ˆ β − β (cid:105) NT + (cid:104) Z ( m − Z ρ ) , ˆ ρ − ρ (cid:105) NT ≤| B (cid:62) u /N T | ∞ | ˆ α − α | + Ω ∗ ( X (cid:62) u /N T )Ω( ˆ β − β )+ (cid:107) m − Z ρ (cid:107) NT (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT ≤| B (cid:62) u / √ N T | ∞ ∨ Ω ∗ ( X (cid:62) u /N T ) (cid:110) | ˆ α − α | / √ N + Ω( ˆ β − β ) (cid:111) + (cid:107) m − Z ρ (cid:107) NT (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT , (A.1)where the second line follows by the dual norm inequality and the Cauchy-Schwartzinequality, and Ω ∗ is the dual norm of Ω. By Babii, Ghysels, and Striaukas (2020b),Lemma A.2.1. | B (cid:62) u / √ N T | ∞ ∨ Ω ∗ ( X (cid:62) u /N T ) ≤ C max {| X (cid:62) u /N T | ∞ , | B (cid:62) u / √ N T | ∞ } = max (cid:40)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N T N (cid:88) i =1 T (cid:88) t =1 u it x it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ , max i ∈ [ N ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ N T T (cid:88) t =1 u it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:41) for some C >

0, where the ﬁrst inequality follows since max G ∈G | G | (cid:46)

1. UnderAssumption 3.1 by Theorem 4.1 and Babii, Ghysels, and Striaukas (2020a), Theorem3.1 and Lemma A.1.1. for every u > (cid:0) | B (cid:62) u /N T | ∞ ∨ Ω ∗ ( X (cid:62) u /N T ) > u (cid:1) ≤ Pr (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) CN T N (cid:88) i =1 T (cid:88) t =1 u it x it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ > u (cid:33) + Pr (cid:32) max i ∈ [ N ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) C √ N T T (cid:88) t =1 u it (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > u (cid:33) (cid:46) p ( N T ) − κ u − κ + pe − c u NT + N − κ/ T − κ u − κ + 4 N e − c u NT (cid:46) ( pN − κ ∨ N − κ/ ) T − κ u − κ + ( p ∨ N ) e − c u /NT for some c , c , C >

0. Therefore, under Assumption 3.5 with probability at least1 − δ | B (cid:62) u /N T | ∞ ∨ Ω ∗ ( X (cid:62) u /N T ) (cid:46) (cid:18) ( pN − κ ) ∨ N − κ/ δT κ − (cid:19) /κ ∨ (cid:114) log( p ∨ N/δ ) N T (cid:46) λ. Appendix - 2n conjunction with the inequality in Eq. A.1, this gives (cid:107) Z ∆ (cid:107) NT ≤ c − λ (cid:110) | ˆ α − α | / √ N + Ω( ˆ β − β ) (cid:111) + (cid:107) m − Z ρ (cid:107) NT (cid:107) Z ∆ (cid:107) NT + λ (cid:110) Ω( β ) − Ω( ˆ β ) (cid:111) ≤ ( c − + 1) λ (cid:110) | ˆ α − α | / √ N + Ω( ˆ β − β ) (cid:111) + (cid:107) m − Z ρ (cid:107) NT (cid:107) Z ∆ (cid:107) NT (A.2)for some c > ρ − ρ , where the second line follows by the triangle inequality.Note that the sg-LASSO penalty function can be decomposed as a sum of two semi-norms Ω( b ) = Ω ( b ) + Ω ( b ) , ∀ b ∈ R p and that we have Ω ( β ) = 0 and Ω ( ˆ β ) =Ω ( ˆ β − β ). Then Ω( β ) − Ω( ˆ β ) = Ω ( β ) − Ω ( ˆ β ) − Ω ( ˆ β ) ≤ Ω ( ˆ β − β ) − Ω ( ˆ β − β ) . (A.3)Suppose that (cid:107) m − Z ρ (cid:107) NT ≤ (cid:107) Z ∆ (cid:107) NT . Then it follows from the ﬁrst inequalityin Eq. A.2 and Eq. A.3 that (cid:107) Z ∆ (cid:107) NT ≤ c − λ (cid:110) | ˆ α − α | / √ N + Ω( ˆ β − β ) (cid:111) + 2 λ (cid:110) Ω ( ˆ β − β ) − Ω ( ˆ β − β ) (cid:111) . Since the left side of this equation is ≥

0, this shows that(1 − c − )Ω ( ˆ β − β ) ≤ (1 + c − )Ω ( ˆ β − β ) + c − | ˆ α − α | / √ N or equivalentlyΩ ( ˆ β − β ) ≤ c + 1 c − ( ˆ β − β ) + ( c − − | ˆ α − α | / √ N . (A.4)Appendix - 3ut ∆ N = (( ˆ α − α ) (cid:62) / √ N , ( ˆ β − β ) (cid:62) ) (cid:62) . Then under Assumption 3.2 | ∆ N | (cid:46) Ω( ˆ β − β ) + | ˆ α − α | / √ N ≤ cc − ( ˆ β − β ) + cc − | ˆ α − α | / √ N (cid:46) | ˆ α − α | + √ s | ˆ β − β | ≤ (cid:113) s ∨ N | ∆ N | (cid:46) (cid:113) s ∨ N | Σ / ∆ N | = (cid:114) s ∨ N (cid:110) (cid:107) Z ∆ (cid:107) NT + ∆ (cid:62) N ( ˆΣ − Σ)∆ N (cid:111) ≤ (cid:114) s ∨ N (cid:110) (cid:107) Z ∆ (cid:107) NT + | ∆ N | | vech( ˆΣ − Σ) | ∞ (cid:111) (cid:46) (cid:114) s ∨ N (cid:110) λ | ∆ N | + | ∆ N | | vech( ˆΣ − Σ) | ∞ (cid:111) . Consider the following event E = {| vech( ˆΣ − Σ) | ∞ < / (2 s ∨ N ) } . Under Assump-tion 3.1 by Theorem 4.1 and Babii, Ghysels, and Striaukas (2020a), Theorem 3.1Pr( E c ) ≤ Pr (cid:32) max i ∈ [ N ] ,j ∈ [ p ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) √ N T T (cid:88) t =1 { x it,j − E [ x it,j ] } (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ s ∨ N (cid:33) + Pr (cid:32) max ≤ j ≤ k ≤ p (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N T N (cid:88) i =1 T (cid:88) t =1 x it,j x it,k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ s ∨ N (cid:33) (cid:46) p ( s ∨ N ) ˜ κ T − ˜ κ ( N − ˜ κ/ + pN − ˜ κ ) + p ( p ∨ N ) e − cNT/ ( s ∨ N ) . Therefore, on the event E | ˆ α − α | / √ N + | ˆ β − β | = | ∆ N | (cid:46) ( s ∨ N ) λ, and whence from Eq. A.2 we obtain (cid:107) Z ∆ (cid:107) NT (cid:46) λ (cid:110) | ˆ α − α | / √ N + Ω( ˆ β − β ) (cid:111) (cid:46) λ | ∆ N | ≤ ( s ∨ N ) λ . Suppose now that (cid:107) m − Z ρ (cid:107) NT > (cid:107) Z ∆ (cid:107) NT . Then, obviously, (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT ≤ (cid:107) m − Z ρ (cid:107) NT . Appendix - 4herefore, on the event E , we always have (cid:107) Z ( ˆ ρ − ρ ) (cid:107) NT (cid:46) ( s ∨ N ) λ + 4 (cid:107) m − Z ρ (cid:107) NT , which proves the statement of the theorem. B Proofs of Fuk-Nagaev inequalities

Proof of Theorem 4.1.

Suppose ﬁrst that p = 1. For a ∈ R with some abuse ofnotation, let [ a ] denote its integer part. For each i = 1 , . . . , N , split partial sumsinto blocks with at most J ∈ N summands V i,k = ξ i, ( k − J +1 + · · · + ξ i,kJ , k = 1 , , . . . , [ T /J ] V i, [ T/J ]+1 = ξ i, [ T/J ] J +1 + · · · + ξ i,T , where we set V i, [ T/J ]+1 = 0 if [

T /J ] J = T . Let { U i,t : i, t ≥ } be i.i.d. randomvariables uniformly distributed on [0 ,

1] and independent of { ξ i,t : i, t ≥ } . Put M i,t = σ ( V i, , . . . , V i,t − ) with t ≥

3. For t = 1 ,

2, set V ∗ i,t = V i,t , while for t ≥

3, byDedecker and Prieur (2004), Lemma 5, there exist random variables V ∗ i,t = d V i,t suchthat1. V ∗ i,t is M i,t ∨ σ ( V i,t ) ∨ σ ( U i,t )-measurable.2. V ∗ i,t is independent of M i,t .3. (cid:107) V i,t − V ∗ i,t (cid:107) = τ ( M i,t , V i,t ).By 1. there exists a measurable function f i such that V ∗ i,t = f i ( V i,t , V i,t − , . . . , V i, , U i,t ) . Therefore, by 2., ( V ∗ i, t ) t ≥ and ( V ∗ i, t − ) t ≥ are sequences of independent randomvariables for every i = 1 , . . . , N . Moreover, { V ∗ i, t : i = 1 , . . . , N, t ≥ } and { V ∗ i, t − : i = 1 , . . . , N, t ≥ } are sequences of independent random variables since { ξ i,t : t =1 , . . . , T } are independent over i = 1 , . . . , N .Decompose (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 T (cid:88) t =1 ξ i,t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 (cid:88) t ≥ V ∗ i, t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 (cid:88) t ≥ V ∗ i, t − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + N (cid:88) i =1 [ T/J ]+1 (cid:88) t =3 (cid:12)(cid:12) V i,t − V ∗ i,t (cid:12)(cid:12) (cid:44) I + II + III.

Appendix - 5y Fuk and Nagaev (1971), Corollary 4 there exist constants c , c > I > u/ ≤ c u − q N (cid:88) t ≥ E | V ∗ i, t | q + 2 exp (cid:32) − c u N (cid:80) t ≥ Var( V ∗ i, t ) (cid:33) ≤ c u − q N (cid:88) t ≥ E | V i, t | q + 2 exp (cid:18) − c u N T (cid:19) , where we use V ∗ i,t = d V i,t and (cid:88) t ≥ Var( V i, t ) ≤ (cid:88) t ≥ Var( V i,t ) = O ( T ) , which follows by Babii, Ghysels, and Striaukas (2020a), Lemma A.1.2. Similarly,Pr( II > u/ ≤ c u − q N (cid:88) t ≥ E | V i, t | q + 2 exp (cid:18) − c u N T (cid:19) for some constants c , c >

0. Lastly, since M i,t and V i,t are separated by J + 1 lagsof ξ i,t , we have τ ( M i,t , V i,t ) ≤ J τ J ( J + 1). By Markov’s inequality and property 3.,this gives Pr( III > u/ ≤ Nu [ T/J ]+1 (cid:88) t =3 (cid:107) V i,t − V ∗ i,t (cid:107) ≤ N Tu τ J +1 . Combining all estimates togetherPr (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 T (cid:88) t =1 ξ i,t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > u (cid:33) ≤ Pr(

I > u/

3) + Pr(

II > u/

3) + Pr(

III > u/ ≤ c u − q N (cid:88) t ≥ (cid:107) V i,t (cid:107) qq + 4 e − c u /NT + 3 N Tu τ J +1 ≤ c u − q J q − N T (cid:107) ξ i,t (cid:107) qq + 3 N Tu ( J + 1) − a + 4 e − c u /NT for some constants c , c >

0. To balance the ﬁrst two terms, we shall choose thelength of blocks J ∼ u q − q + a − , in which case we getPr (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:88) i =1 T (cid:88) t =1 ξ i,t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > u (cid:33) ≤ c N T u − κ + 4 e − c u /NT Appendix - 6or some c , c > p >

1, the result follows by the union bound.

Proof of Theorem 4.2.

Put M q,N,T (cid:44) max j ∈ [ p ] max i ∈ [ N ] E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 ξ i,t,j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) q and B N,T (cid:44) max j ∈ [ p ] N (cid:88) i =1 Var (cid:32) T (cid:88) t =1 ξ i,t,j (cid:33) . By Jensen’s inequality under the stationarity and the i.i.d. hypotheses M q,N,T ≤ max j ∈ [ p ] T q E | ξ i,t,j | q (cid:46) T q , where the last inequality follows under assumption (i). Similarly, B N,T ≤ N max j ∈ [ p ] T (cid:88) t =1 T (cid:88) k =1 | Cov ( ξ ,t,j , ξ ,k,j ) | (cid:46) N T, where the last inequality follows from the computations in Theorem 4.1 under as-sumptions (i)-(ii).Using these estimates, by the union bound and Fuk and Nagaev (1971), Corollary4, for every u > (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 N (cid:88) i =1 ξ i,t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ > u (cid:33) ≤ c M q,N,T pN u − q + 2 p exp (cid:32) − c u B N,T (cid:33) ≤ c pN T q u − q + 2 e − c u /NT for some constants c j > , j ∈ [4]. Therefore, there exists C > (cid:32)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T (cid:88) t =1 N (cid:88) i =1 ξ i,t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ ≤ C ( pN T q /δ ) /q ∨ (cid:112) N T log(4 p/δ ) (cid:33) ≥ − δ. C Additional empirical results

Appendix - 7 a r D ec S e p J un D ec S e p J un lag-1lag-2lag-3lag-4ESSAESAEVCSSNIPﬁrm-returnﬁrm-volaVXOOil price10Y minus 3-monthTEDRATE3-month T-BillAAA minus 10YBAA minus 10YBAA minus AAASNP500EarningsEarnings forecastsEarnings lossesRecessionRevenue growthRevised estimateIndust. ProductionCPI Inﬂation Quarters V a r i a b l e s (a) Pooled sg-LASSO, γ =0 .

4, cross-validation. M a r D ec S e p J un D ec S e p J un lag-1lag-2lag-3lag-4ESSAESAEVCSSNIPﬁrm-returnﬁrm-volaVXOOil price10Y minus 3-monthTEDRATE3-month T-BillAAA minus 10YBAA minus 10YBAA minus AAASNP500EarningsEarnings forecastsEarnings lossesRecessionRevenue growthRevised estimateIndust. ProductionCPI Inﬂation Quarters V a r i a b l e s (b) Fixed eﬀects sg-LASSO, γ = 0 .

4, BIC. M a r J un D ec J un S e p D ec J un lag-1lag-2lag-3lag-4ESSAESAEVCSSNIPﬁrm-returnﬁrm-volaVXOOil price10Y minus 3-monthTEDRATE3-month T-BillAAA minus 10YBAA minus 10YBAA minus AAASNP500EarningsEarnings forecastsEarnings lossesRecessionRevenue growthRevised estimateIndust. ProductionCPI Inﬂation Quarters V a r i a b l e s (c) Average forecast combi-nation weights. Figure A.1: Sparsity patterns and forecast combination weights.Appendix - 8

W MSE An.-mean MSE An.-median sg-LASSO elnet-U elnet2.331 2.339 2.088 Panel A. Cross-validationIndividual 1.545 1.610 1.609Pooled

Table A.1:

D.1 Firm-level data

The full list of ﬁrm-level data is provided in Table A.3. We also add two dailyﬁrm-speciﬁc stock market predictor variables: stock returns and a realized variancemeasure, which is deﬁned as the rolling sample variance over the previous 60 days(i.e. 60-day historical volatility).

D.1.1 Firm sample selection

We select a sample of ﬁrms based on data availability. First, we remove all ﬁrmsfrom I/B/E/S which have missing values in earnings time series. Next, we retainﬁrms that we are able to match with CRSP dataset. Finally, we keep ﬁrms that wecan match with the RavenPack dataset.Appendix - 9

W MSE An.-mean MSE An.-median F.Comb sg-LASSO2.794 2.836 2.539 2.405 γ = 0 0.2 0.4 0.6 0.8 1Panel A. Cross-validationIndividual 1.808 1.817 1.836 1.864 1.889 1.884Pooled 1.692 1.689 Table A.2:

Prediction results – The table reports average over ﬁrms MSEs of out-of-samplepredictions for the same models as in Table 1 - discarding the ﬁrst 8 quarters to computefor forecast combination weights - with additional result of prediction errors using forecastcombination approach of Ball and Ghysels (2018), denoted as

F.Comb . Hence the out-of-sample quarters start at 2009 Q1. The nowcasting horizon is the current month, i.e. wepredict the P/E ratio using information up to the end of current ﬁscal quarter. Each PanelA-D block represents diﬀerent ways of calculating the tuning parameter λ . Bold entriesare the best results in a block. Appendix - 10 .1.2 Firm-speciﬁc text data

We create a link table of RavenPack ID and PERMNO identiﬁers which enablesus to merge I/B/E/S and CRSP data with ﬁrm-speciﬁc textual analysis generateddata from RavenPack. The latter is a rich dataset that contains intra-daily newsinformation about ﬁrms. There are several editions of the dataset; in our analysis,we use the Dow Jones (DJ) and Press Release (PR) editions. The former containsrelevant information from Dow Jones Newswires, regional editions of the Wall StreetJournal, Barron’s and MarketWatch. The PR edition contains news data, obtainedfrom various press releases and regulatory disclosures, on a daily basis from a varietyof newswires and press release distribution networks, including exclusive content fromPRNewswire, Canadian News Wire, Regulatory News Service, and others. The DJedition sample starts at 1 st of January, 2000, and PR edition data starts at 17 th ofJanuary, 2004.We construct our news-based ﬁrm-level covariates by ﬁltering only highly relevantnews stories. More precisely, for each ﬁrm and each day, we ﬁlter out news that hasthe Relevance Score (REL) larger or equal to 75, as is suggested by the RavenPackNews Analytics guide and used by practitioners, see for example Kolanovic andKrishnamachari (2017). REL is a score between 0 and 100 which indicates howstrongly a news story is linked with a particular ﬁrm. A score of zero means that theentity is vaguely mentioned in the news story, while 100 means the opposite. A scoreof 75 is regarded as a signiﬁcantly relevant news story. After applying the REL ﬁlter,we apply a novelty of the news ﬁlter by using the

Event Novelty Score (ENS); wekeep data entries that have a score of 100. Like REL, ENS is a score between 0 and100. It indicates the novelty of a news story within a 24-hour time window. A scoreof 100 means that a news story was not already covered by earlier announced news,while subsequently published news story score on a related event is discounted, andtherefore its scores are less than 100. Therefore, with this ﬁlter, we consider onlynovel news stories. We focus on ﬁve sentiment indices that are available in both DJand PR editions. They are:

Event Sentiment Score (ESS), for a given ﬁrm, represents the strength of thenews measured using surveys of ﬁnancial expert ratings for ﬁrm-speciﬁc events. Thescore value ranges between 0 and 100 - values above (below) 50 classify the news asbeing positive (negative), 50 being neutral.

Aggregate Event Sentiment (AES) represents the ratio of positive events re-ported on a ﬁrm compared to the total count of events measured over a rollingAppendix - 111-day window in a particular news edition (DJ or PR). An event with ESS >

50 iscounted as a positive entry while ESS <

50 as negative. Neutral news (ESS = 50) andnews that does not receive an ESS score does not enter into the AES computation.As ESS, the score values are between 0 and 100.

Aggregate Event Volume (AEV) represents the count of events for a ﬁrm overthe last 91 days within a certain edition. As in AES case, news that receives anon-neutral ESS score is counted and therefore accumulates positive and negativenews.

Composite Sentiment Score (CSS) represents the news sentiment of a givennews story by combining various sentiment analysis techniques. The direction of thescore is determined by looking at emotionally charged words and phrases and bymatching stories typically rated by experts as having short-term positive or negativeshare price impact. The strength of the scores is determined by intra-day price reac-tions modeled empirically using tick data from approximately 100 large-cap stocks.As for ESS and AES, the score takes values between 0 and 100, 50 being the neutral.

News Impact Projections (NEP) represents the degree of impact a news ﬂashhas on the market over the following two-hour period. The algorithm produces scoresto accurately predict a relative volatility - deﬁned as scaled volatility by the averageof volatilities of large-cap ﬁrms used in the test set - of each stock price measuredwithin two hours following the news. Tick data is used to train the algorithm andproduce scores, which take values between 0 and 100, 50 representing zero impactnews.For each ﬁrm and each day with ﬁrm-speciﬁc news, we compute the averagevalue of the speciﬁc sentiment score. In this way, we aggregate across editions andgroups, where the later is deﬁned as a collection of related news. We then mapthe indices that take values between 0 and 100 onto [ − , x i ∈{ ESS , AES , CSS , NIP } be the average score value for a particular day and ﬁrm. Wemap x i (cid:55)→ ¯ x i ∈ [ − ,

1] by computing ¯ x i = ( x i − / . Appendix - 12 d Frequency Source T-codePanel A.- Price/Earnings ratio quarterly CRSP & I/B/E/S 1- Price/Earnings ratio consensus forecasts quarterly CRSP & I/B/E/S 1Panel B.1 Stock returns daily CRSP 12 Realized variance measure daily CRSP/computations 1Panel C.1 Event Sentiment Score (ESS) daily RavenPack 12 Aggregate Event Sentiment (AES) daily RavenPack 13 Aggregate Event Volume (AEV) daily RavenPack 14 Composite Sentiment Score (CSS) daily RavenPack 15 News Impact Projections (NIP) daily RavenPack 1

Table A.3:

Firm-level data description table – The id column gives mnemonics according to datasource, which is given in the second column Source . The column frequency states the samplingfrequency of the variable. The column

T-code denotes the data transformation applied to a time-series, which are: (1) not transformed, (2) ∆ x t , (3) ∆ x t , (4) log( x t ), (5) ∆ log ( x t ), (6) ∆ log( x t ). Panel A. describes earnings data, panel B. describes quarterly ﬁrm-level accouting data, panelC. daily ﬁrm-level stock market data and panel D. daily ﬁrm-level sentiment data series. id Frequency Source T-codePanel A.1 Industrial Production Index monthly FRED-MD 52 CPI Inﬂation monthly FRED-MD 6Panel B.1 Crude Oil Prices daily FRED 62 S&P 500 daily CRSP 53 VXO Volatility Index daily FRED 14 Moodys Aaa - 10-Year Treasury daily FRED 15 Moodys Baa - 10-Year Treasury daily FRED 16 Moodys Baa - Aaa Corporate Bond daily FRED 17 10-Year Treasury - 3-Month Treasury daily FRED 18 3-Month Treasury - Eﬀective Federal funds rate daily FRED 19 TED rate daily FRED 1Panel C.1 Earnings monthly Bybee, Kelly, Manela, and Xiu (2019) 12 Earnings forecasts monthly Bybee, Kelly, Manela, and Xiu (2019) 13 Earnings losses monthly Bybee, Kelly, Manela, and Xiu (2019) 14 Recession monthly Bybee, Kelly, Manela, and Xiu (2019) 15 Revenue growth monthly Bybee, Kelly, Manela, and Xiu (2019) 16 Revised estimate monthly Bybee, Kelly, Manela, and Xiu (2019) 1 Table A.4:

Other predictor variables description table – The id column gives mnemonics accordingto data source, which is given in the second column Source . The column frequency states thesampling frequency of the variable. The column

T-code denotes the data transformation appliedto a time-series, which are: (1) not transformed, (2) ∆ x t , (3) ∆ x t , (4) log( x t ), (5) ∆ log ( x t ), (6)∆ log ( x t ). Panel A. describes real-time monthly macro series, panel B. describes daily ﬁnancialmarkets data and panel C. monthly news attention series. Appendix - 13 icker Firm name PERMNO RavenPack ID1 MMM 3M 22592 03B8CF2 ABT Abbott labs 20482 5206323 AUD Automatic data processing 44644 66ECFD4 ADTN Adtran 80791 9E98F25 AEIS Advanced energy industries 82547 1D943E6 AMG Aﬃliated managers group 85593 30E01D7 AKST A K steel holding 80303 41588B8 ATI Allegheny technologies 43123 D1173F9 AB AllianceBernstein holding l.p. 75278 CB138D10 ALL Allstate corp. 79323 E1C16B11 AMZN Amazon.com 84788 0157B112 AMD Advanced micro devices 61241 69345C13 DOX Amdocs ltd. 86144 45D15314 AMKR Amkor technology 86047 5C8D6115 APH Amphenol corp. 84769 BB07E416 AAPL Apple 14593 D8442A17 ADM Archer daniels midland 10516 2B7A4018 ARNC Arconic 24643 EC821B19 ATTA AT&T 66093 25198820 AVY Avery dennison corp. 44601 66268221 BHI Baker hughes 75034 940C3D22 BAC Bank of america corp. 59408 990AD023 BAX Baxter international inc. 27887 1FAF2224 BBT BB&T corp. 71563 1A3E1B25 BDX Becton dickinson & co. 39642 873DB926 BBBY Bed bath & beyond inc. 77659 9B71A727 BHE Benchmark electronics inc. 76224 6CF43C28 BA Boeing co. 19561 55438C29 BK Bank of new york mellon corp. 49656 EF5BED30 BWA BorgWarner inc. 79545 1791E731 BP BP plc 29890 2D469F32 EAT Brinker international inc. 23297 73244933 BMY Bristol-Myers squibb co. 19393 94637C34 BRKS Brooks automation inc. 81241 FC01C035 CA CA technologies inc. 25778 76DE4036 COG Cabot oil & gas corp. 76082 388E0037 CDN Cadence design systems inc. 11403 CC6FF538 COF Capital one ﬁnancial corp. 81055 05501839 CRR Carbo ceramics inc. 83366 8B66CE40 CSL Carlisle cos. 27334 9548BB41 CCL Carnival corporation & plc 75154 06777942 CERN Cerner corp. 10909 9743E543 CHRW C.H. robinson worldwide inc. 85459 C659EB44 SCHW Charles schwab corp. 75186 D33D8C45 CHKP Check point software technologies ltd. 83639 531EF146 CHV Chevron corp. 14541 D54E6247 CI CIGNA corp. 64186 86A1B948 CTAS Cintas corp. 23660 BFAEB449 CLX Clorox co. 46578 71947750 KO Coca-Cola co. 11308 EEA6B351 CGNX Cognex corp. 75654 709AED52 COLM Columbia sportswear co. 85863 5D033753 CMA Comerica inc. 25081 8CF6DD54 CRK Comstock resources inc. 11644 4D72C855 CAG ConAgra foods inc. 56274 FA40E256 STZ Constellation brands inc. 69796 1D1B0757 CVG Convergys corp. 86305 914819

Appendix - 14

Appendix - 15

16 MTB M&T bank corp. 35554 D1AE3B117 MANH Manhattan associates inc. 85992 031025118 MAN ManpowerGroup inc. 75285 C0200F119 MAR Marriott international inc. 85913 385DD4120 MMC Marsh & mcLennan cos. 45751 9B5968121 MCD McDonald’s corp. 43449 954E30122 MCK McKesson corp. 81061 4A5C8D123 MDU MDU resources group inc. 23835 135B09124 MRK Merck & co. inc. 22752 1EBF8D125 MTOR Meritor inc 85349 00326E126 MTG MGIC investment corp. 76804 E28F22127 MGM MGM resorts international 11891 8E8E6E128 MCHP Microchip technology inc. 78987 CDFCC9129 MU Micron technology inc. 53613 49BBBC130 MSFT Microsoft corp. 10107 228D42131 MOT Motorola solutions inc. 22779 E49AA3132 MSM MSC industrial direct co. 82777 74E288133 MUR Murphy oil corp. 28345 949625134 NBR Nabors industries ltd. 29102 E4E3B7135 NOI National oilwell varco inc. 84032 5D02B7136 NYT New york times co. 47466 875F41137 NFX Newﬁeld exploration co. 79915 9C1A1F138 NEM Newmont mining corp. 21207 911AB8139 NKE NIKE inc. 57665 D64C6D140 NBL Noble energy inc. 61815 704DAE141 NOK Nokia corp. 87128 C12ED9142 NOC Northrop grumman corp. 24766 FC1B7B143 NTRS Northern trust corp. 58246 3CCC90144 NUE NuCor corp. 34817 986AF6145 ODEP Oﬃce depot inc. 75573 B66928146 ONB Old national bancorp 12068 D8760C147 OMC Omnicom group inc. 30681 C8257F148 OTEX Open text corp. 82833 34E891149 ORCL Oracle corp. 10104 D6489C150 ORBK Orbotech ltd. 78527 290820151 PCAR Paccar inc. 60506 ACF77B152 PRXL Parexel international corp. 82607 EF8072153 PH Parker hanniﬁn corp. 41355 6B5379154 PTEN Patterson-uti energy inc. 79857 57356F155 PBCT People’s united ﬁnancial inc. 12073 449A26156 PEP PepsiCo inc. 13856 013528157 PFE Pﬁzer inc. 21936 267718158 PIR Pier 1 imports inc. 51692 170A6F159 PXD Pioneer natural resources co. 75241 2920D5160 PNCF PNC ﬁnancial services group inc. 60442 61B81B161 POT Potash corporation of saskatchewan inc. 75844 FFBF74162 PPG PPG industries inc. 22509 39FB23163 PX Praxair inc. 77768 285175164 PG Procter & gamble co. 18163 2E61CC165 PTC PTC inc. 75912 D437C3166 PHM PulteGroup inc. 54148 7D5FD6167 QCOM Qualcomm inc. 77178 CFF15D168 DGX Quest diagnostics inc. 84373 5F9CE3169 RL Ralph lauren corp. 85072 D69D42170 RTN Raytheon co. 24942 1981BF171 RF Regions ﬁnancial corp. 35044 73C521172 RCII Rent-a-center inc. 81222 C4FBDC173 RMD ResMed inc. 81736 434F38

Appendix - 16

74 RHI Robert half international inc. 52230 A4D173175 RDC Rowan cos. inc. 45495 3FFA00176 RCL Royal caribbean cruises ltd. 79145 751A74177 RPM RPM international inc. 65307 F5D059178 RRD RR R.R. donnelley & sons co. 38682 0BE0AE179 SLB Schlumberger ltd. n.v. 14277 164D72180 SCTT Scotts miracle-gro co. 77300 F3FCC3181 SM SM st. mary land & exploration co. 78170 6A3C35182 SONC Sonic corp. 76568 80D368183 SO Southern co. 18411 147C38184 LUV Southwest airlines co. 58683 E866D2185 SWK Stanley black & decker inc. 43350 CE1002186 STT State street corp. 72726 5BC2F4187 TGNA TEGNA inc. 47941 D6EAA3188 TXN Texas instruments inc. 15579 39BFF6189 TMK Torchmark corp. 62308 E90C84190 TRV The travelers companies inc. 59459 E206B0191 TBI TrueBlue inc. 83671 9D5D35192 TUP Tupperware brands corp. 83462 2B0AF4193 TYC Tyco international plc 45356 99333F194 TSN Tyson foods inc. 77730 AD1ACF195 X United states Steel corp. 76644 4E2D94196 UNH UnitedHealth group inc. 92655 205AD5197 VIAV Viavi solutions inc. 79879 E592F0198 GWW W.W. grainger inc. 52695 6EB9DA199 WDR Waddell & reed ﬁnancial inc. 85931 2F24A5200 WBA Walgreens boots alliance inc. 19502 FACF19201 DIS Walt disney co. 26403 A18D3C202 WAT Waters corp. 82651 1F9D90203 WBS Webster ﬁnancial corp. 10932 B5766D204 WFC Wells fargo & co. 38703 E8846E205 WERN Werner enterprises inc. 10397 D78BF1206 WABC Westamerica bancorp 82107 622037207 WDC Western digital corp. 66384 CE96E7208 WHR Whirlpool corp. 25419 BDD12C209 WFM Whole foods market inc. 77281 319E7D210 XLNX Xilinx inc. 76201 373E85

Table A.5: