[PDF] A nowcasting approach to generate timely estimates of Mexican economic activity: An application to the period of COVID-19

Abstract

In this paper, we present a new approach based on dynamic factor models (DFMs) to perform nowcasts for the percentage annual variation of the Mexican Global Economic Activity Indicator (IGAE in Spanish). The procedure consists of the following steps: i) build a timely and correlated database by using economic and financial time series and real-time variables such as social mobility and significant topics extracted by Google Trends; ii) estimate the common factors using the two-step methodology of Doz et al. (2011); iii) use the common factors in univariate time-series models for test data; and iv) according to the best results obtained in the previous step, combine the statistically equal better nowcasts (Diebold-Mariano test) to generate the current nowcasts. We obtain timely and accurate nowcasts for the IGAE, including those for the current phase of drastic drops in the economy related to COVID-19 sanitary measures. Additionally, the approach allows us to disentangle the key variables in the DFM by estimating the confidence interval for both the factor loadings and the factor estimates. This approach can be used in official statistics to obtain preliminary estimates for IGAE up to 50 days before the official results.

Full PDF

AA nowcasting approach to generate timely estimatesof Mexican economic activity: An application to theperiod of COVID-19

Francisco Corona ∗ , Graciela Gonz´alez-Far´ıas and Jes´us L´opez-P´erez Instituto Nacional de Estad´ıstica y Geograf´ıa Centro de Investigaci´on en Matem´aticas A.C.This version: November 6, 2020

Abstract

In this paper, we present a new approach based on dynamic factor models (DFMs)to perform nowcasts for the percentage annual variation of the Mexican Global EconomicActivity Indicator (IGAE in Spanish). The procedure consists of the following steps: i)build a timely and correlated database by using economic and ﬁnancial time series and real-time variables such as social mobility and signiﬁcant topics extracted by Google Trends;ii) estimate the common factors using the two-step methodology of Doz et al. (2011); iii)use the common factors in univariate time-series models for test data; and iv) accordingto the best results obtained in the previous step, combine the statistically equal betternowcasts (Diebold-Mariano test) to generate the current nowcasts. We obtain timely andaccurate nowcasts for the IGAE, including those for the current phase of drastic drops inthe economy related to COVID-19 sanitary measures. Additionally, the approach allows usto disentangle the key variables in the DFM by estimating the conﬁdence interval for boththe factor loadings and the factor estimates. This approach can be used in oﬃcial statisticsto obtain preliminary estimates for IGAE up to 50 days before the oﬃcial results.

Keywords:

Dynamic Factor Models, Global Mexican Economic Activity Indicator, GoogleTrends, LASSO regression, Nowcasts. ∗ Corresponding author: [email protected]. Please, if you require to quote this working progress,request it to the corresponding author. The views expressed here are those of the authors and do not reﬂect thoseof INEGI. a r X i v : . [ s t a t . A P ] J a n Introduction

Currently, the large amount of economic and ﬁnancial time series collected over several years byoﬃcial statistical agencies allows researchers to implement statistical and econometric method-ologies to generate accurate models to understand any macroeconomic phenomenon. One ofthe most important events to anticipate is the movement of the gross domestic product (GDP)because doing so allows policy to be carried out with more certainty, according to the expectedscenario. For instance, if an economic contraction is foreseeable, businesses can adjust theirinvestment or expansion plans, governments can apply countercyclical policy, and consumerscan adjust their spending patterns.As new economic and ﬁnancial information is released, the forecasts for a certain periodare constantly also being updated; thus, diﬀerent GDP estimations arise. In this sense, a new,unexpected event can drastically aﬀect predictions in the short term; consequently, it might benecessary to use not only economic and ﬁnancial information but also nontraditional and high-frequency indicators, such as news, search topics extracted from the Internet, social networks,etc. The seminal work of Varian (2014) is an obligatory reference for the inclusion of high-frequency information by economists, and Buono et al. (2018) is also an important reference tocharacterize the types of nontraditional data and see the econometric methods usually employedto extract information from these data.Thus, the term “nowcast”, or real-time estimation, is relevant because we can use a richvariety of information to model, from a multivariate point of view, macroeconomic and ﬁnancialevents, plus speciﬁc incidents that can aﬀect the dynamics of GDP in the short run. Economet-rically and statistically, these facts are related to the literature on large dynamic factor models(DFMs) because a large amount of time series is useful to estimate underlying common factors.First introduced in economics by Geweke (1977) and Sargent and Sims (1977), DFMs have re-cently become very attractive in practice given the current requirements of dealing with largedatasets of time series using high-dimensional DFM; see, for example, Breitung and Eickmeier(2006), Bai and Ng (2008), Stock and Watson (2011), Breitung and Choi (2013) and Bai andWang (2016) for reviews of the existing literature.An open question in the literature on large DFMs is whether a large number of series isadequate for a particular forecasting objective. In that sense, preselecting variables has provento reduce the error prediction with respect to using the complete dataset Boivin and Ng (2006);that is, not always by using a large set of variables, we can obtain closer factor estimates withrespect to when we use fewer variables, especially under ﬁnite sample performance Poncela andRuiz (2016). Even when the number of time series is moderate, approximately 15, we canaccurately estimate the simulated common factors, as shown by Corona et al. (2020) in a MonteCarlo analysis. The latter also corroborates that the Doz et al. (2011) two-step (2SM) factorextraction method performs better than other approaches available in the literature above allwhen the data are nonstationary.DFM methodology has already been used to nowcast or predict the Mexican economy.Corona et al. (2017a), one of the ﬁrst works in this line, estimated common trends in a large and2onstationary DFM to predict the Global Economic Activity Indicator (IGAE in Spanish) twosteps ahead and concluded that the error prediction was reduced with respect to some bench-marking univariate and multivariate time-series models. Caruso (2018) focuses on internationalindicators, mainly for the US economy, to show that its nowcasts of quarterly GDP outperformthe predictions obtained by professional forecasters. Recently, G´alvez-Soriano (2020) concludedthat bridge equations perform better than DFM and static principal components (PCs) whenmaking the nowcasts of quarterly GDP. An important work related with timely GDP estimationis Guerrero et al. (2013) where, based on vector autoregression (VAR) models, they generaterapid GDP estimates (and its three grand economic activities) with a delay of up to 15 days fromthe end of the reference quarter, while the oﬃcial GDP takes around 52 days after the quartercloses. This work is the main reference to INEGI’s “

Estimaci´on Oportuna del PIB Trimestral .” Although prior studies are empirically relevant for the case of Mexico, our analysis goesbeyond including nontraditional information to capture more drastic frictions that occur inthe very short run, one or two months. We identify that previous works focus on traditionalinformation, which limits their capacity to predict the recent historical declines attributed toCOVID-19 and the associated economic closures since March 2020. Our approach maximizesthe structural explanation of the already relevant macroeconomic and ﬁnancial time series withthe timeliness of other high-frequency variables commonly used in big data analysis.In this tradition, this work estimates a ﬂexible and trained DFM to verify the assumptionsthat guarantee the consistency of the component estimation from a statistical point of view.That is, we use previous knowledge and attempt to ﬁll in the identiﬁed gaps by focusing on theMexican case in the following ways: i) build a timely and correlated database by using traditionaleconomic and ﬁnancial time series and real-time nontraditional information, determining thelatter relevant variables with least absolute selection and shrinkage operator (LASSO) regression,a method of variable selection; ii) estimate the common factors using the two-step methodologyof Doz et al. (2011); iii) train univariate time series models with the DFM’s common factors toselect the best nowcasts; iv) determine the conﬁdence intervals for both the factor loadings andthe factor itself to analyze the importance of each variable and the uncertainty attributed tothe estimation; and iv) combine the statistically equal better nowcasts to generate the currentestimates.In practice, we consider the beneﬁts of this paper to be opportunity and openness. First,given the timely availability of the information that our approach uses, we can generate nowcastsof the IGAE up to 50 days before the oﬃcial data release; thus, our approach becomes analternative to obtaining IGAE’s preliminary estimates, which are very important in oﬃcialstatistics. Second, this paper illustrates the empirical strategy to generate IGAE nowcasts step-by-step to practitioners, so any user can replicate the results for other time series. Third, andvery important, the nowcasting approach allows to known which variables are the most relevantin the nowcasts, consequently, we emphasize in the structural explanation of our results.The remainder of this paper is structured as follows. The next section, 2, summarizesthe Mexican economy evolution in the era of COVID-19. Section 3 presents the methodology The ﬁrst six months of the COVID-19 pandemic (until September 2020) has had severe im-pacts on the Mexican economy. The ﬁrst case of coronavirus in Mexico was documented onFebruary 27, 2020. Despite government eﬀorts to cope with the eﬀects of the obligatory halt ofeconomic activity, GDP in the second quarter plummeted with a historic 18.7% yearly contrac-tion. Moreover, the pandemic accelerated economic stagnation that had begun to show signsof amelioration, following three quarters of negative growth of 0.5, 0.8 and 2.1% since the thirdquarter of 2019. However, starting in 2020, the actual values were not foreseen by nationaland international institutions such as private and central banks. For example, the November2019 Organisation for Economic Co-operation and Development Economic Outlook estimatedthe real GDP variation for 2020 at 1.29%, while the June 2020 report updated it to -8.6%, adiﬀerence of 9.8% in absolute terms. Moreover, even when the Mexican Central Bank expectedbarely zero economic growth for 2020, placing its November 2019 outlook between -0.2% and0.2%, it did not anticipate such a contraction as has seen so far this year.Between January 2019 and February 2020, before the COVID-19 outbreak started in Mexico,the annual growth of IGAE already showed signs of slowing and ﬂuctuated around -1.75 and0.76%, and since May 2019, the economy exhibited nine consecutive months of negative growth.Broken down by sector and using IGAE, the economy suﬀered devastating consequences in thesecondary and tertiary sectors. Overall, the pandemic brought about -19.7, -21.6 and -14.5%contractions in total economic activity for April, May and June of 2020, respectively.The industrial sector registered the deepest contractions, reducing its activity in April andMay by -30.1 and -29.6%, respectively, in annual terms, mainly driven by the closure of manu-facturing and construction operations, which were considered nonessential businesses, followinga slight recovery in June, -17.5%, when an important number of activities, including automobilemanufacturing, resumed but remained at low activity levels. The services sector also suﬀeredfrom lockdown measures, falling by -15.9, -19 and -13.6% in the three months of the second quar-ter, respectively, especially due to transportation, retail, lodging and food preparation, mainlydue to the decrease in tourist activity, although restaurants and airports were not closed. Theprimary sector showed signs of resilience and even grew in April and May 2020, by 1.4 and 2.7%,and only shrank in June by -1.5% on an annual basis.The great conﬁnement in Mexico, which oﬃcially lasted from March 23 to May 31 (named“Jornada Nacional de Sana Distancia”), had severe consequences for the components of theaggregate demand: consumption, investment and foreign trade suﬀered consequences. Con-sumption had been on a deteriorating path since September 2019, and in May 2020, the lastmonth for which data are available, it exhibited a -23.5% plunge compared to the same period of The IGAE is the monthly proxy variable for Mexican GDP, which covers approximately 95% of the totaleconomy. Its publication is available two months after the reference month ( ). Covid Economics from the Center for Economic and Policy Research and numerousworking papers from the National Bureau of Economic Research. For the case of the Mexicaneconomy, the works of Campos-Vazquez et al. (2020) who analyze online job advertisements intimes of COVID-19 and Lustig et al. (2020) who conducts simulations to project poverty ﬁguresacross diﬀerent population sectors by using survey’s microdata, stand out. Along the same lines,the journal

Econom´ıaUNAM dedicated its number 51 of volume 17 in its entirety to study theimpacts in Mexico of the pandemic, covering a wide range of issues related mainly to healtheconomics (Vanegas, 2020, Kershenobich, 2020), labor economics (Samaniego, 2020) , inequality(Alberro, 2020), poverty (Fern´andez, 2020) and public policy (S´anchez, 2020, Moreno-Brid,2020). None of these related to short-term forecasting of the economic activity.The closest paper to ours is Meza (2020), who projects the economic impact of COVID-19 fortwelve variables, including IGAE, based on a Susceptible-Infectious-Recovered epidemic modeland a novel method to handle a sequence of extreme observations when estimating a VAR model(Lenza and Primiceri, 2020). To make the forecasts, Meza (2020) ﬁrst estimates the shocks thathit the economy since March 2020, and then produce four forecasts considering a path for thepandemic or not, and if so then considers three scenarios. Opposite to this work, the forecasthorizon focuses in the mid term, June 2020 to February 2023, rather than ours in the shortterm, one or two months ahead. 5

Methodology

This section describes how we employ DFM to generate the nowcasts of the IGAE. First, wedescribe how LASSO regression is used as a variable selection method to select among variousGoogle Trends topics. Then, we report how the stationary DFM shrinks the complete datasetin the 2SM strategy to obtain the estimated factor loadings and common factors and in theOnatski (2010) procedure to detect the number of common factors. Finally, we describe thenowcasting approach.

LASSO regression was introduced by Tibshirani (1996) as a new method of estimation in linearmodels by minimizing the residual sum of the squares (RSS) subject to the sum of the absolutevalue of the coeﬃcients being less than a constant. In this sense, LASSO regression is related toridge regression, but the former focuses on determining the tuning parameter, λ , that controlsthe regularization eﬀect; consequently, we can have better predictions than ordinary least squares(OLS) in a variety of scenarios, depending on its choice.Let W t = ( w t , . . . , w Kt ) (cid:48) be a K × RSS = ( y − W β ) (cid:48) ( y − W β ) s.t f ( β ) ≤ c, (1)where y = ( y , . . . , y T ) (cid:48) is a T × β = ( β , . . . β K ) (cid:48) is a K × W = ( W , . . . , W T ) (cid:48) is a T × K matrix and c ≥ f ( b ) = (cid:80) Kj =1 β j , the ridge solution is (cid:98) β Ridgeλ = ( W (cid:48) W − λI p ) − W (cid:48) y . In practice, thissolution never sets coeﬃcients to exactly zero; therefore, ridge regression cannot perform as avariable selection method in linear models, although its prediction ability is better than OLS.Tibshirani (1996) considers a penalty function as f ( β ) = (cid:80) Kj =1 | β j | ≤ c ; in this case, thesolution of (4) is not closed, and it is obtained by convex optimization techniques. The LASSOsolution has the following implications: i) when λ →

0, we obtain solutions similar to OLS, andii) when λ → ∞ , (cid:98) β LASSOλ → . Therefore, LASSO regression can perform as a variable selectionmethod in linear models. Consequently, if λ is large, more coeﬃcients tend to zero, selectingthe variables that minimize the error prediction.In macroeconomic applications, Aprigliano and Bencivelli (2013) use LASSO regression toselect the relevant economic and ﬁnancial variables in a large data set with the goal of estimatinga new Italian coincident indicator. We consider a stationary DFM where the observations, X t , are generated by the followingprocess: X t = P F t + ε t , (2)6( L ) F t = η t , (3)Γ( L ) ε t = a t , (4)where X t = ( x t , . . . , x Nt ) (cid:48) and ε t = ( ε t , . . . , ε Nt ) (cid:48) are N × t . The common factors, F t = ( F t , . . . , F rt ) (cid:48) , and thefactor disturbances, η t = ( η t , . . . , η rt ) (cid:48) , are r × r ( r < N ) being the numberof static common factors, which is assumed to be known. The N × a t , is distributed independently of the factor disturbances, η t , for all leads and lags,denoted by L , where LX t = X t − . Furthermore, η t and a t , are assumed to be Gaussian whitenoises with positive deﬁnite covariance matrices Σ η = diag( σ η , . . . , σ η r ) and Σ a , respectively. P = ( p , . . . , p N ) (cid:48) , is the N × r matrix of factor loadings, where, p i = ( p i , . . . , p ir ) (cid:48) is an r × L ) = I − (cid:80) ki =1 Φ L i and Γ = I − (cid:80) sj =1 Γ L j , where Φ and Γ are r × r and N × N matrices containing the VAR parameters of the factors and idiosyncratic componentswith k and s orders, respectively. For simplicity, we assume that the number of dynamic factors, r , is equal to r .Alternative representations in the stationary case are given by Doz et al. (2011, 2012), whoassume that r can be diﬀerent from r . Additionally, when r = r , Bai and Ng (2004), Choi(2017), and Corona et al. (2020) also assume possible nonstationarity in the idiosyncratic noises.Barigozzi et al. (2016, 2017) assume possible nonstationarity in F t , ε t and r (cid:54) = r .The DFM in equations (2) to (4) is not identiﬁed. As we noted in the Introduction, thefactor extraction used in this work is the 2SM; consequently, in the ﬁrst step, we estimate thecommon factors by using PCs to solve the identiﬁcation problem and uniquely deﬁne the factors;we impose the restrictions P (cid:48) P/N = I r and F (cid:48) F being diagonal, where F = ( F , . . . , F T ) is r × T .For a review of restrictions in the context of PC factor extraction, see Bai and Ng (2013). Giannone et al. (2008) popularized the usage of 2SM factor extraction to estimate the commonfactors by using monthly information with the goal of generating the nowcasts of quarterly GDP.However, Doz et al. (2011) proved the statistical consistency of the estimated common factorusing 2SM. In the ﬁrst step, PC factor extraction consistently estimates the static commonfactors without assuming any particular distribution, allowing weak serial and cross-sectionalcorrelation in the idiosyncratic noises; see, for example, Bai (2003). In the second step, wemodel the dynamics of the common factors via the Kalman smoother, allowing idiosyncraticheteroskedasticity, a situation that occurs frequently in practice. In a ﬁnite sample study, Coronaet al. (2020) show that with the 2SM of Doz et al. (2011) based on PC and Kalman smoothing,we can obtain closer estimates of the common factors under several data generating processesthat can occur in empirical analysis, such as heteroskedasticity and serial and cross-sectionalcorrelation in idiosyncratic noises. Additionally, following Giannone et al. (2008), this methodis useful when the objective is nowcasting given the ﬂexibility to estimate common factors whenall variables are not updated at the same time.The 2SM procedure is implemented according to the following steps:7. Set ˆ P as √ N times the r largest eigenvalues of X (cid:48) X , where X = ( X , . . . , X T ) (cid:48) is a T × N matrix. By regressing X on ˆ P and using the identiﬁability restrictions, obtain ˆ F = X ˆ P /N and ˆ ε = X − ˆ F (cid:48) ˆ P (cid:48) . Compute the asymptotic conﬁdence intervals for both factor loadingsand common factors as proposed by Bai (2003).2. Set the estimated covariance matrix of the idiosyncratic errors as ˆΨ = diag (cid:16) ˆΣ ε (cid:17) , wherethe diagonal of ˆΨ includes the variances of each variable of X ; hence, ˆ σ i for i = 1 , . . . , N.

3. Estimate a VAR(k) model by OLS to the estimated common factors, ˆ F , and computetheir estimated autoregressive coeﬃcients as the VAR(1) model, denoted by ˆΦ. Assumingthat f ∼ N (0 , Σ f ), the unconditional covariance matrix of the factors can be estimatedas vec (cid:16) ˆΣ f (cid:17) = (cid:16) I r − ˆΦ ⊗ ˆΦ (cid:17) − vec (cid:16) ˆΣ η (cid:17) , where ˆΣ η = ˆ η (cid:48) ˆ η/T .4. Write DFM in equations (2) to (4) in state-space form, and with the system matricessubstituted by ˆ P , ˆΨ, ˆΦ, ˆΣ η and ˆΣ f , use the Kalman smoother to obtain an updatedestimation of the factors denoted by ˜ F .In practice, X t are not updated for all t ; in these cases, we apply the Kalman smoother, E ( ˆ F t | Ω T ), where Ω T is all the available information in the sample, and we take into account thefollowing two cases: ˆΨ i = (cid:40) ˆ σ i if x it is available, ∞ if x it is not available.Empirically, when speciﬁc data on X t are not available, Harvey and Phillips (1979) suggestsusing a diﬀuse value equal to 10 ; however, we use 10 according to the package nowcast of theR program, see de Valk et al. (2019). To detect the estimated number of common factors, (cid:98) r , Onatski (2010) proposes a procedurewhen the proportion of the observed variance attributed to the factors is small relative to thatattributed to the idiosyncratic term. This method determines a sharp threshold, δ , whichconsistently separates the bounded and diverging eigenvalues of the sample covariance matrix.The author proposes the following algorithm to estimate δ and determine the number of factors:1. Obtain and sort in descending order the N eigenvalues of the covariance matrix of obser-vations, (cid:98) Σ X . Set j = r max + 1.2. Obtain (cid:98) γ as the OLS estimator of the slope of a simple linear regression, with a constantof { λ j , . . . , λ j +4 } on (cid:8) ( j − / , . . . ( j + 3) / (cid:9) , and set δ = 2 | (cid:98) γ | .3. Let r ( N )max be any slowly increasing sequence (in the sense that it is o ( N )). If (cid:98) λ k − (cid:98) λ k +1 < δ ,set (cid:98) r = 0; otherwise, set (cid:98) r = max { k ≤ r ( N )max | (cid:98) λ k − (cid:98) λ k +1 ≥ δ } .4. With j = (cid:98) r + 1, repeat steps 2 and 3 until convergence.8his algorithm is known as edge distribution, and Onatski (2010) proves the consistency of (cid:98) r for any ﬁxed δ >

0. Corona et al. (2017b) shows that this method works reasonably well insmall samples. Two important features of this method are that the number of factors can beestimated without previously estimating the common components and that the common factorsmay be integrated.

In this subsection, we describe the nowcasting approach to estimate the annual percentagevariation of IGAE, denoted by y ∗ = ( y , . . . , y T ∗ ), where T ∗ = T −

2; hence, we focus ongenerating the nowcasts two steps ahead.

Currently, Google Trends topics, an up-to-date source of information that provides an index ofInternet searches or queries by category and geography, are frequently used to predict economicphenomena. See, for instance, Stephens-Davidowitz and Varian (2014) for a full review of thistool and other analytical tools from Google applied to social sciences. Other recent examplesare Ali et al. (2020), who analyzes online job postings in the US childcare market under stay-at-home orders, Goldsmith-Pinkham and Sojourner (2020), who nowcast the number of workersﬁling unemployment insurance claims in the US, based on the intensity of search for the term “ﬁlefor unemployment”, and Caperna et al. (2020), who develop random forest models to nowcastcountry-level unemployment ﬁgures for the 27 European Union countries based on queries thatbest predict the unemployment rate to create a daily indicator of unemployment-related searches.In this way, for a sample K topics on Google Trends, the relevant topics l = 0 , . . . , ζ , with ζ (cid:62) t = 1 , . . . , T ∗ − H g .2. For h = 1 and for the sample of size T ∗ − H g + h , estimate (cid:98) β LASSOλ,h . Compute the followingvector of indicator variables: (cid:98) β j,h = (cid:40) (cid:98) β LASSOλ,h (cid:54) = 00 if (cid:98) β LASSOλ,h = 03. Repeat 2 until H g .4. Deﬁne the H g × K matrix, (cid:98) β = ( (cid:98) β , . . . , (cid:98) β K ), where (cid:98) β j = ( (cid:98) β j, , . . . , (cid:98) β j,H g ) (cid:48) is an H g × l signiﬁcant variables that satisfy the condition (cid:98) β l = (cid:16) (cid:98) β l ∈ j | (cid:98) β > ϕ (cid:17) , where ϕ isthe 1 − α sample quantile of (cid:98) β with being and vector 1 × H g of ones.With this procedure, we select the topics that frequently reduce the prediction error – insample – for the IGAE estimates during the last H g months. We estimate the optimum λ byusing the glmnet package from the R program.9 .3.2 Transformations In our case, to predict y ∗ , the time series X i = ( x i , . . . , x iT ∗ ) are transformed such that theysatisfy the following condition: X ∗ i = (cid:16) f ( X i ) | max corr ( f ( X i ) , y ∗ ) (cid:17) . (5)Hence, we select the f ( X i ) that maximizes the correlation between y . Consider f ( · ) as follows:1. None (n)2. Monthly percentage variation (m): (cid:16) X t X t − × (cid:17) − (cid:16) X t X t − × (cid:17) − X t − Note that these transformations do not have the goal of achieving stationarity, although intrinsi-cally these transformations are stationary transformations regardless of whether y ∗ is stationary;in fact, the transformations m and a tend to be stationary transformations when the time seriesare I (1), which is frequent in economics; see Corona et al. (2017b). Otherwise, it is necessarythat ( f ( X i ) , y ∗ ) are cointegrated. The implications of equations (2) to (4) are very importantbecause it is necessary to stationarize the system in terms that, theoretically, although somecommon factor, F t , can be nonstationary, consistent estimates remain regardless of whether theidiosyncratic errors are stationary, see Bai (2004). In this way, we use the PANIC test (Baiand Ng, 2004) to verify this assumption. Additionally, an alternative to estimate nonstationarycommon factors by using 2SM when the time series are I (1) is given by Corona et al. (2020). Having estimated the common factors as described in subsection 3.2.1 by using X ∗ t for t =1 , . . . , T , we estimate a linear regression model with autoregressive moving average (ARMA)errors to generate the nowcasts y ∗ t = a + b ˜ F t + u t t = 1 , . . . , T − , (6)where u t = φ ( L ) u t + γ ( L ) v t with φ ( L ) = (cid:80) pi =1 φ i L i and γ ( L ) = 1 + (cid:80) qj =1 γ j L j . The parametersare estimated by maximum likelihood. Consequently, the nowcasts are obtained by the followingexpression: (cid:98) y T ∗ + h = (cid:98) a + (cid:98) b ˜ F T ∗ + h + (cid:98) u T ∗ + h for h = 1 , . (7)Note that Giannone et al. (2008) propose using the model with p = q = 0; hence, the nowcastsare obtained by using the expression (7). In our case, we estimate diﬀerent models by the orders p = 0 , . . . p max and q = 0 , . . . q max ; thus ,the case of Giannone et al. (2008) is a particular case ofthis expression. Now, our interest is in selecting models with similar performance for trainingdata. In this way, we carry out the following procedure:10. Start with p = 0 and q = 0.2. Estimate the nowcasts for T ∗ + 1 and T ∗ + 2, namely, (cid:98) y , = ( (cid:98) y T ∗ +1 , (cid:98) y T ∗ +2 ) (cid:48) .3. Split the data for t = 1 , . . . , T ∗ − H t .

4. For h = 1 and for the sample of size T ∗ − H t + h , estimate equation (6), generate thenowcasts with expression (7) one step ahead, and calculate the errors and absolute error(AE) as follows: e , = y T ∗ − H t +1 − (cid:98) y T ∗ − H t +1 AE , = | e , |

5. Repeat steps 3 and 4 until H t . Hence, estimate e , = ( e , , . . . , e , H ) (cid:48) and AE , =( AE , , . . . , AE , H t ). Additionally, we deﬁne the weighted AE (WAE) as W AE , = AE , Υwhere Υ is a weighted H t × (cid:48) = 1 .

6. Repeat steps for all combinations of p and q until p max and q max . Generate the followingelements: (cid:98) y ( p, q ) = ( (cid:98) y , , (cid:98) y , , . . . , (cid:98) y p max ,q max ) ,e ( p, q ) = ( e , , e , , . . . , e p max ,q max ) ,W AE ( p, q ) = ( W AE , , W AE , , . . . , W AE p max ,q max ) (cid:48) , where (cid:98) y is a 2 × ( p max + 1)( q max + 1) matrix of nowcasts, e is an H t × ( p max + 1)( q max + 1)matrix that contains the nowcast errors in the training data, and W AE is an H t × p and q , denoted by (cid:98) y ( p ∗ , q ∗ ), where p ∗ , q ∗ areobtained as follows: p ∗ , q ∗ = argmin ≤ p,q ≤ p max ,q max W AE ( p, q )8. To use models with similar performance, we combine the nowcasts of (cid:98) y ( p ∗ , q ∗ ) with modelswith equal forecast errors according to Diebold and Mariano (1995) tests, by using the e ( p, q ), carrying out pairs of tests between the model with minimum AE ( p, q ) and theothers. Consequently, from the models with statistically equal performance, we select themedian of the nowcasts, namely, (cid:98) y .This nowcasting approach allows the generation of nowcasts based on a trained process, takingadvantage of the information of similar models. It is clear that (cid:98) b must be signiﬁcant to exploitthe relationship between the IGAE and the information summarized by the DFM. Note thatΥ is a weighted matrix that penalizes the nowcasts errors. The most common form is Υ =(1 /H t , . . . , /H t ) (cid:48) , a H t × AE ( p, q ) and estimate the median or some speciﬁc quantile foreach vector of this matrix.Note that despite root mean squared errors (RMSEs) are often used in the forecast literature,we prefer a weighted function of AEs, although in this work we use equal weights i.e., the MAE.The main advantages of MAE over RMSE are in two ways: i) it is easy to interpret since itrepresents the average deviation without considering their direction, while the RMSE averagesthe squared errors and then we apply the root, which tends to inﬂate larger errors and ii) RMSEdoes not necessarily increase with the variance of the errors. Anyway, the two criteria are in theinterval [0 , ∞ ) and are indistinct to the sign of errors. The variables to estimate the DFM are selected by using the criteria of timely and contem-poraneous correlation with respect to y ∗ . In this sense, the model diﬀers from the traditionalliterature on large DFMs, which uses a large amount of economic and ﬁnancial variables; see, forexample, Corona et al. (2017a) who use 211 time series to estimate the DFM for the Mexican casewith the goal of generating forecasts for the levels of IGAE. On the other hand, G´alvez-Soriano(2020) uses approximately 30 selected time series to generate nowcasts of Mexican quarterlyGDP. Thus, our number of variables is intermediate between these two cases. However, as notedby Boivin and Ng (2006), in the context of DFM, we can reduce the forecast prediction errorwith selected variables by estimating the common components. Additionally, Poncela and Ruiz(2016) and Corona et al. (2020) show that with a relativity small sample size, for example, N = 12, we can accurately estimate a rotation of the common factors.Consequently, given the timely and possibly contemporaneous correlation with respect tothe y ∗ , the features of the variables considered in this work are described in Annex 1. Hence, we initialized with 68 time series divided into three blocks. The ﬁrst block is timelytraditional information such as the Industrial Production Index values for Mexico and the UnitedStates, business conﬁdence, and exports and imports, among many others. In this block, allvariables are monthly. In the second block, we have the high-frequency traditional variables suchas Mexican stock market index, nominal exchange rate, interest rate and the Standard Poor’s500. These variables can be obtained daily, but we decide to use the averages to obtain monthlytime series. Finally, for the high-frequency nontraditional variables, we have daily variablessuch as the social media mobility index obtained from Twitter and the topics extracted fromGoogle Trends. These topics are manually selected according to several phenomena that occurin Mexican society, such as politicians’ names, natural disasters, economic themes and topicsrelated to COVID-19, such as coronavirus, quarantine, or facemask. The Google Trends variable All variables are seasonally adjusted in the following ways: i) directly downloadable from their source or ii)by applying the X-13ARIMA-SEATS. H g = 36 and α = 0 .

10; consequently, we select the topics that are relevant in 90% of casesin the training data. In this way, the signiﬁcant topics are quarantine and facemask.Once X is deﬁned, we apply the transformations suggested by equation (5) to deﬁne X ∗ .Figure 1 shows each X ∗ i ordered according to its correlation with y ∗ .Figure 1: Blue indicates the speciﬁc X ∗ i , and red indicates the speciﬁed y ∗ . Numbers in paren-theses indicate the linear correlation and those between brackets the transformation.We can see the behavior of each variable, and industrial production is the variable with themost correlated time series with the IGAE, followed by imports and industrial production in theUnited States. Note that nontraditional time series are also correlated with y ∗ such as facemask,quarantine and the social mobility index. Finally, the variables less related to the IGAE are thevariables related to business conﬁdence and the monetary aggregate M4.To summarize whether the time series capture prominent macroeconomic events as the 2009ﬁnancial crisis and the economic deceleration in eﬀect since 2019, Figure 2 shows the heat map13y time series plotted in Figure 1Figure 2: Heat map plot of the variables. The time series inversely related to the IGAE areconverted to have a positive relationship with it. We estimate the empirical quantiles ϕ ( · )according to their historical values. The ﬁrst quantile ( ϕ ( X ∗ i ) < .

25) is in red, the secondquantile (0 . < ϕ ( X ∗ i ) < .

50) is in orange, the third quantile (0 . < ϕ ( X ∗ i ) < .

75) is inyellow, and ﬁnally, the fourth quantile (0 . < ϕ ( X ∗ i )) is green. Gray indicates that informationis not available.We can see that during the 2009 ﬁnancial crisis, the variables are mainly red, including theGoogle Trends variables, which is reasonable because the AH1N1 pandemic also occurred duringMarch and April of 2009. Additionally, during 2016, some variables related to the internationalmarket were red, for example, the US industrial production index, the exchange rate and theS&P 500. Note that since 2019, all variables are orange or red, denoting the weakening ofthe economy. Consequently, it is unsurprising that the estimated common factor summarizesthese dynamics. Note that this graph has only a descriptive objective. It cannot be employed togenerate recommendations for policy making because that some variables may be nonstationary. The nowcasts depend on the dates of the information released. Depending on the day of thecurrent month, we can obtain nowcasts with a larger or smaller percentage of updated variables.14or example, it is clear that the high-frequency variables are available in real time, but thetraditional and monthly time series, with are timely with respect to the IGAE, are available ondiﬀerent dates according to the oﬃcial release dates. Figure 3 shows the approximate day whenthe information is released for T ∗ + 2 after the current month T ∗ .Figure 3: Percentage of updated information to carry out the nowcasts T ∗ + 2 once the currentmonth T ∗ is closed.We can see that traditional and nontraditional high-frequency variables, business conﬁdenceand fuel demand, can be obtained on the day after the month T ∗ is closed. This indicates thaton the ﬁrst day of month T ∗ + 1, we can generate the nowcasts to T ∗ + 2 with approximately50% of the updated information and 81% for the current month, T ∗ + 1. Note that on day12, the IMSS variable is updated, and on day 16, the IPI USA is updated. These variables arehighly correlated with (cid:98) y with linear correlations of 0.77 and 0.80, respectively. Consequently, inoﬃcial statistics, we recommend conducting the nowcasts on the ﬁrst day of T ∗ + 1 and 16 daysafter, updating the nowcasts with two timely traditional and important time series, taking intoaccount the timely estimates but with relevant variables updated. In this work, the update of the database is August 13, 2020; consequently, we generate the Note that IPI represents around the 34% of the monthly GDP, and represents more than 97% of the secondgrand economic activity. Given that the IPI is updated around 10 days after the end of the reference month, thisinformation is very valuable to carry out the T ∗ + 1 nowcasts. T ∗ + 1 and T ∗ + 2, respectively. By applying the Onatski (2010) procedure to the covariance matrix of X ∗ , we can conclude thatˆ r = 1 is adequate to deﬁne the number of common factors. Hence, the estimated static commonfactor obtained by PCs by using the set of variables, X ∗ , their conﬁdence intervals at 95%, andthe dynamic factor estimates by applying the 2SM procedure with k = 1 lags, are presented inFigure 4Figure 4: Factor estimates. The blue line is the static common factor, the red lines are theirconﬁdence intervals, and the green line is the smoothed or dynamic common factor.We observe the common factors summarizing the previous elements representing the declinein the economy in 2009 and 2020. Note that in the last period, the dynamic common factor shows16 slight recovery of the economy because this common factor supplies more timely informationthan the static common factor. Thus, the static common factor has information until May 2020,while the dynamic factor has information until July 2020. Note that the conﬁdence intervals areclosed with respect to the static common factor, which implies that the uncertainty attributedto the estimation is well modeled. It is important to analyze the contemporaneous correlationwith respect to IGAE. Thus, Figure 5 shows the correlation coeﬃcient of ˜ F t with y ∗ since 2008.Figure 5: Blue line is Corr ( ˜ F t , y ∗ ) from January 2008 to May 2020. Red lines represent theconﬁdence interval at 95%.We see that the correlation is approximately 0.86 prior to the ﬁnancial crisis of 2009, in-creasing from this year to 0.98, showing a slight decrease since 2011, dropping in 2016 to 0.95and fully reaching levels of 0.96 since 2020. The conﬁdence intervals are between 0.75 and 0.97during all sample being the smallest value during the ﬁrst years of the sample and the largest onein the ﬁnal of period. Consequently, we can exploit the contemporaneous relationship betweenthe dynamic factor and the IGAE to generate their nowcasts for the two following months thatthe common factors have estimated with respect to the IGAE.17aving estimated the dynamic factor by the 2SM approach, we show the results of the loadingweight estimates that capture the speciﬁc contribution of the common factor to each variable,or in other words, given the PC restrictions, they can be seen as N times the contribution ofeach variable in the common factor. We compute the conﬁdence interval at 95% denoted by CI ˆ P , . . Once the dynamic factor is estimated by using the Kalman smoother, it is necessaryto reestimate the factor loadings to have ˆ P = f ( ˜ F ), such that ˜ F = g ( ˜ P ). To do so, we useMonte Carlo estimation iterating 1,000 samples and select the replication that best satisﬁes thefollowing condition: ˜ F ≈ X ˜ P /N s.t ˜ P ∈ CI ˆ P , . . The results of the estimated factor loadings are shown in Figure 6. The loadings are orderedfrom the most positive contribution to the most negative.Figure 6: Factor loadings. The blue point is each ˆ P i with its respective 95% conﬁdence interval.Red curves are the ˜ P i .We observe several similarities with respect to Figure 1. Note that the more important18ariables in the factor estimates are the industrial production of Mexico and the U.S., exportsand imports along with Google Trends topics such as quarantine and facemask, which makessense in the COVID-19 period. Obviously, when these variables are updated, it will be moreimportant to update the nowcasts. In this way, note that Google Trends are available in real time.Other timely variables, such as IMO, CONF MANUF, GAS, S&P 500, MOBILITY and E, arealso very relevant. However, note that all variables are signiﬁcant in all cases, and the conﬁdenceinterval does not contain zero. The less important variables are M4, the business conﬁdence ofthe construction sector and remittances. Also, note that the most relevant variables are verytimely with respect to the IGAE: the industrial production index of Mexico and the U.S. areupdated around days 10 and 16 for T ∗ + 1 and T ∗ + 2, respectively, once closed the currentmonth; furthermore, the exports and imports are updated for T ∗ + 2 by 25th day, while IMOand IMSS are updated since the ﬁrst day and 12th day, respectively for T ∗ + 2. Consequently,this allows us to have more accurate and correlated estimates since the ﬁrst day of the currentmonth for both, T ∗ + 1 and T ∗ + 2.As we have previously noted, to obtain a consistent estimation of ˜ F and ˆ P it is necessarythat ˆ ε be stationary. We check this point with the PANIC test of Bai and Ng (2004), concludingthat we achieved stationarity in the idiosyncratic component, obtaining a statistic of 6.6 thatgenerates a p-value of 0.00; hence, ˆ ε does not have a unit root. Additionally, we can verify withthe augmented Dickey-Fuller test that ˜ F is stationary with a p-value of 0.026; consequently, wealso achieved stationarity in X ∗ . We apply the procedure described in subsection 3.3.3 by using a Υ = (1 /H t , /H t , . . . , /H t ) (cid:48) ;then, we assume that each AE has equal weight over time in step 5. Additionally, we ﬁx p max = q max = 4. The obtained results indicate that the optimums p ∗ and q ∗ are selected to beequal to 4. Consequently, the best model is the following: y ∗ t = 1 . (0 . + 1 . (0 . ˜ F t + 0 . (0 . (cid:98) u t − + 1 . (0 . (cid:98) u t − + 0 . (0 . (cid:98) u t − − . (0 . (cid:98) u t − +0 . (0 . (cid:98) v t − − . (0 . (cid:98) v t − − . (0 . (cid:98) v t − + 0 . (0 . (cid:98) v t − + (cid:98) v t ˆ σ = 0 . . (8)Note that all coeﬃcients are signiﬁcant and the contribution of the factor over the IGAE ispositive. Additionally, estimating the Ljung-Box test over the residuals produces a result ofserially uncorrelated. This model generates the following historical nowcasts one step aheadduring H t = 36 months that are presented in Figure 719igure 7: Nowcasts of training model. Asterisks are the observed values, the red line depictsthe nowcasts, and the green lines are the conﬁdence intervals.We can see that the nowcast model performs well given that in 92% of cases, the observedvalues are within the conﬁdence interval at 95%. The MAE (equal weights in Υ) is 0.65, andthe mean absolute annual growth of IGAE is 2.55%. Regarding the median of the AEs, theestimated value is 0.36. These statistics are very competitive with respect to the model estimatedby Statistics Netherlands, see Kuiper and Pijpers (2020). They also estimate common factorsto generate the nowcasts of the annual variation of quarterly Netherlands GDP. According toTable 7.2 in their work, the root of the mean of squared forecast errors is between 0.91 and 0.67during 2011 and 2017. Additionally, the conﬁdence interval captures approximately 70% of theobserved values. Therefore, our nowcast approach generates good results even when consideringa monthly variable and COVID-19.In addition, we compare our results to Corona et al. (2017a), which forecasts IGAE levelstwo steps ahead. To have comparable results between such study and this one, we take themedian of the root squared errors obtained by the former just for the ﬁrst step forward, whichis between 0.4 and 0.5, while the current work generates a median AEs of 0.397 for the last H t = 36 months, including the COVID-19 period. Therefore, our approach is slightly moreaccurate when nowcasting the IGAE levels. Note that the number of the variables is drasticallyless, 211 there versus 68 here. Another nowcasting model to compare with is INEGI’s “EarlyMonthly Estimation of Mexico’s Manufacturing Production Level,” whose target variable ismanufacturing activity, generating the one step ahead nowcasts by using a timely electricityindicator. The average MAE for the annual percentage variation of manufacturing activity in • Naive model: We assume that all variables have equal weights in the factor, consequently,we standardize the variables used in the DFM, X ∗ t , and by averaging their rows, we obtaina F ∗ t . Then, we use this naive factor in a linear regression in order to obtain the nowcastsby the last H t = 36 months. • DFM without nontraditional information: We estimate a traditional DFM similar toCorona et al. (2017a) or G´alvez-Soriano (2020), but using only economic and ﬁnancialtime series, i.e. without considering the social mobility index and the relevant topicsextracted from Google Trends. Hence, we carry out the last H t = 36 nowcasts.Figure 8 shows the accumulated MAEs for the training period by the previous two modelsand the obtained by equation (8).Figure 8: Cummulative MAEs for models in training data. Blue is the nowcasting approach sug-gested in this work, red is the naive model, green is the traditional DFM (without nontraditionalinformation). The vertical line indicates indicates the COVID-19 onset period.21e can see that, in training data the named naive model is the one with the weakest per-formance, followed by traditional DFM. Speciﬁcally, the MAE is 1.02 for the naive model, 0.74when using DFM without nontraditional information and, as we have commented, 0.65 for theincumbent model, which includes this type of information. Note that the use of nontraditionalinformation does not aﬀects the behaviour of the MAEs previous to COVID-19 pandemic andreduces the error during this period. Consequently, the performance of the suggested approachis highly competitive when compared with i) similar models for nowcasting of GDP, ii) modelsthat estimate the levels of the objective variable and iii) alternative models that can be used inpractice. Having veriﬁed our approach in the previous section as highly competitive to capture the realobserved values, the ﬁnal nowcasts for the IGAE annual percentage variation for June and July2020 are shown in Figure 9. These are obtained after combining the statistically equal modelsto the best model with the approach previously described and the traditional nowcasting modelof Giannone et al. (2008), weighting both nowcasts according to their MAEs. Note that the model of Giannone et al. (2008) uses only the estimated dynamic factors as regressors, i.e.,linear regression models. Our approach also considers the possibility to model the errors with ARMA models.In order to consider nowcasts associated to speciﬁcally the dynamic of the common factors, we take into accountthe Giannone et al. (2008) model although its contribution in the ﬁnal nowcasts is small given that, frequently,during the test period, the nowcast errors are greater than the regression models with ARMA errors.

The procedure described in the previous subsection allows to generate nowcasts using databaseswith diﬀerent cut dates. In this way, we carry out the procedure updating the databases twice amonth during the COVID-19 period. Table 1 summarizes the nowcasts results, comparing themwith the observed values. 23able 1: Nowcasts with diﬀerent updates in COVID-19 times: annual percentage variation ofIGAE

Date of nowcastsDate Observed 04/06/2020 18/06/2020 07/07/2020 16/07/2020 06/08/2020 12/08/2020 -19.7 -18.3 -18.02020/05 -21.6 -20.4 -21.0 -21.8 -20.42020/06 -14.5 -16.6 -16.4 -15.5 -15.22020/07 -13.9 -13.2

We can see that in June 4, 2020, the nowcasts were very accurate, capturing the drasticdrop occurred in April (previous month was -2.5%) and May, with absolute discrepancies of 1.4and 1.2% respectively. The update of June 18, 2020 shows a slight accuracy improvement. Thefollowing two nowcasts generate also closes estimates with respect to the observed value of May,being the more accurate, the updated carried out in July 7, 2020. Note the the last updatesgenerate nowcasts by June around -16.6 and -15.2%, being the more accurate the last nowcastsdescribed in this work, with an absolute error of 0.7%. Considering these results, our approachanticipates the drop attributed to the COVID-19 and foresees and slight recovery since June,although it is also weak. According to G´alvez-Soriano (2020), the IGAE’s accurate and timelyestimates can drastically improve the nowcasts of the quarterly GDP; consequently, the beneﬁtsof our approach are also related to quarterly time series nowcast models.

In this paper, we contribute to the nowcasting literature by focusing on the two step-ahead ofthe annual percentage variation of IGAE, the equivalently of the Mexican monthly GDP, duringCOVID-19 times. For this purpose, we use statistical and econometric tools to obtain accurateand timely estimates, even, around 50 days before that the oﬃcial data. The suggested approachconsists in using LASSO regression to select the relevant topics that aﬀect the IGAE in the shortterm, build a correlated and timely database to exploit the correlation among the variables andthe IGAE, estimate a dynamic factor by using the 2SM approach, training a linear regressionwith ARMA errors to select the better models and generate current nowcasts.We highlight the following key results. We can see that our approach is highly competitiveconsidering other models as naive regressions or traditional DFM, our procedure frequentlycaptures the observed value, both, in data test and in real time, obtaining absolute errorsbetween 0.2% and 1.4% during the COVID-19 period. Another contribution of this paperlies in a statistical point of view, given that we compute the conﬁdence interval of the factorloadings and the factor estimates, verifying the signiﬁcance of the factor on each variable andthe uncertainty attributed to the factor estimates. Additionally, we consider some econometricissues to guarantee the consistency of estimates like stationarity in idiosyncratic noises anduncorrelated errors in nowcasting models. Additionally, it is of interest to denote in-sampleperformance whether the nowcast error increases when using monthly versus quarterly data.Future research topics emerged when doing this research. One is the implementation of an24lgorithm to allow to estimate nonstationary common factors and making the selection to thenumber of factors ﬂexible, such as the one developed in Corona et al. (2020), to minimize ameasure of nowcasting errors. Another interesting research line is to incorporate machine learn-ing techniques to automatically select the possible relevant topics from Google Trends. Also, itwould be interesting to incorporate IPI information as restrictions to the nowcasts, by explor-ing some techniques to incorporate nowcasts restrictions when oﬃcial countable information isavailable. Finally, for future research in this area, its worth to deep into the eﬀects of monthlytimely estimate variables versus quarterly time series in nowcasting models, this can be achievedby Monte Carlo analysis with diﬀerent data generating process which can occur in practice tocompare the increase in the error estimation when distinct frequencies of time series are used.

Acknowledgements

The authors thankfully acknowledge the comments and suggestions carried out by the authoritiesof INEGI Julio Santaella, Sergio Carrera and Gerardo Leyva. The seminars and meetingsorganized by them were very useful to improve this research. To Elio Villasen˜or who providedthe Twitter social mobility index and Manuel Lecuanda by the discussion about the GoogleTrend topics to be considered. Partial ﬁnancial support from CONACYT CB-2015-25996 isgratefully acknowledged by Francisco Corona and Graciela Gonz´alez-Far´ıas.

References

Alberro, J. (2020). La pandemia que perjudica a casi todos, pero no por igual/The pandemicthat harms almost everyone, but not equally.

Econom´ıaUNAM , 17(51):59–73.Ali, U., Herbst, C. M., and Makridis, C. (2020). The impact of COVID-19 on the US child caremarket: Evidence from stay-at-home orders. Technical report, Available at SSRN 3600532.Aprigliano, V. and Bencivelli, L. (2013). Ita-coin: a new coincident indicator for the Italianeconomy.

Banca D’Italia. Working papers , 935.Bai, J. (2003). Inferential theory for factor models of large dimensions.

Econometrica , 71(1):135–171.Bai, J. (2004). Estimating cross-section common stochastic trends in nonstationary panel data.

Journal of Econometrics , 122(1):137–183.Bai, J. and Ng, S. (2004). A PANIC attack on unit roots and cointegration.

Econometrica ,72(4):1127–1177.Bai, J. and Ng, S. (2008). Large dimensional factor analysis.

Foundations and Trends inEconometrics , 3(2):89–163.Bai, J. and Ng, S. (2013). Principal components estimation and identiﬁcation of static factors.

Journal of Econometrics , 176(1):18–29. 25ai, J. and Wang, P. (2016). Econometric analysis of large factor models.

Annual Review ofEconomics , 8:53–80.Barigozzi, M., Lippi, M., and Luciani, M. (2016). Non-Stationary Dynamic Factor Models forLarge Datasets.

Finance and Economics Discussion Series Divisions of Research & Statisticsand Monetary Aﬀairs Federal Reserve Board, Washington, D.C. , 024.Barigozzi, M., Lippi, M., and Luciani, M. (2017). Dynamic factor models, cointegration, anderror correction mechanisms.

Working Paper .Boivin, J. and Ng, S. (2006). Are more data always better for factor analysis?

Journal ofEconometrics , 132(1):169–194.Breitung, J. and Choi, I. (2013). Factor models, in Hashimzade, N. and Thorthon, M.A. (eds.).

Handbook of Research Methods and Applications in Empirical Macroeconomics , United King-dom: Edward Elgar Publishing.Breitung, J. and Eickmeier, S. (2006). Dynamic factor models, in H¨ubler, O. and J. Frohn(eds.).

Modern Econometric Analysis , Berlin: Springer.Buono, D., Kapetanios, G., Marcellino, M., Mazzi, G., and Papailias, F. (2018). Big dataeconometrics: Now casting and early estimates. Technical report, BAFFI CAREFIN, Centrefor Applied Research on International Markets, Banking, Finance and Regulation, Universit´aBocconi, Milano, Italy.Campos-Vazquez, R. M., Esquivel, G., and Badillo, R. Y. (2020). How Has Labor Demand BeenAﬀected by the COVID-19 Pandemic? Evidence from Job Ads in Mexico.

Covid Economics,CEPR , 1(46):94–122.Caperna, G., Colagrossi, M., Geraci, A., and Mazzarella, G. (2020). Googling unemploymentduring the pandemic: Inference and nowcast using search data. Technical report, Availableat SSRN 3627754.Caruso, A. (2018). Nowcasting with the help of foreign indicators: The case of Mexico.

EconomicModelling , 69(C):160–168.Choi, I. (2017). Eﬃcient estimation of nonstationary factor models.

Journal of StatisticalPlanning and Inference , 183:18–43.Corona, F., Gonz´alez-Far´ıas, G., and Orraca, P. (2017a). A dynamic factor model for theMexican economy: are common trends useful when predicting economic activity?

LatinAmerican Economic Review , 27(1).Corona, F., Poncela, P., and Ruiz, E. (2017b). Determining the number of factors after stationaryunivariate transformations.

Empirical Economics , 53(1):351–372.Corona, F., Poncela, P., and Ruiz, E. (2020). Estimating Non-stationary Common Factors:Implications for Risk Sharing.

Computational Economics , 55(1):37–60.26e Valk, S., de Mattos, D., and Ferreira, P. (2019). Nowcasting: An R Package for PredictingEconomic Variables Using Dynamic Factor Models.

The R Journal , 11(1).Diebold, F. X. and Mariano, R. (1995). Comparing predictive accuracy.

Journal of Businessand Economic Statistics , 13:253–263.Doz, C., Giannone, D., and Reichlin, L. (2011). A two-step estimator for large approximatedynamic factor models based on Kalman ﬁltering.

Journal of Econometrics , 164(1):188–205.Doz, C., Giannone, D., and Reichlin, L. (2012). A quasi maximum likelihood approach for large,approximate dynamic factor models.

The Review of Economics and Statistics , 94(4):1014–1024.Fern´andez, C. L. (2020). La pandemia del Covid-19: los sistemas y la seguridad alimenta-ria en Am´erica Latina/Covid-19 pandemic: systems and food security in Latin America. econom´ıaUNAM , 17(51):168–179.G´alvez-Soriano, O. (2020). Nowcasting Mexico’s quarterly GDP using factor models and bridgeequations.

Estudios Econ´omicos , 70(35):213–265.Geweke, J. (1977). The dynamic factor analysis of economic time series, in Aigner, D.J. andGoldberger, A.S. (eds.).

Latent Variables in Socio-Economic Models , Amsterdam: North-Holland:365–382.Giannone, D., Reichlin, L., and Small, D. (2008). Nowcasting: The real-time informationalcontent of macroeconomic data.

Journal of Montery Economics , 55:665–676.Goldsmith-Pinkham, P. and Sojourner, A. (2020). Predicting Initial Unemployment InsuranceClaims Using Google Trends. Technical report, Working Paper (preprint), Posted April 3. https://paulgp.github.io/GoogleTrendsUINowcast/google_trends_UI.html .Guerrero, V. M., Garc´ıa, A. C., and Sainz, E. (2013). Rapid Estimates of Mexico’s QuarterlyGDP.

Journal of Oﬃcial Statistics , 29(3):397–423.Harvey, A. and Phillips, G. (1979). Maximum Likelihood Estimation of Regression Models WithAutoregressive-Moving Averages Disturbances.

Biometrika , 152:49–58.Kershenobich, D. (2020). Fortalezas, deﬁciencias y respuestas del sistema nacional de saludfrente a la Pandemia del Covid-19/Strengths, weaknesses and responses of the national healthsystem to the Covid-19 Pandemic.

Econom´ıaUNAM , 17(51):53–58.Kuiper, M. and Pijpers, F. (2020). Nowcasting GDP growth rate: a potential substitute for thecurrent ﬂash estimate. Technical report, Statistics Netherlands: Discussion Paper.Lenza, M. and Primiceri, G. E. (2020). How to Estimate a VAR after March 2020. Technicalreport, National Bureau of Economic Research.Lustig, N., Pabon, V. M., Sanz, F., Younger, S. D., et al. (2020). The Impact of COVID-1927ockdowns and Expanded Social Assistance on Inequality, Poverty and Mobility in Argentina,Brazil, Colombia and Mexico.

Covid Economics, CEPR , 1(46):32–67.Meza, F. (2020). Forecasting the impact of the COVID-19 shock on the Mexican economy.

CovidEconomics, CEPR , 1(48):210–225.Moreno-Brid, J. C. (2020). Pandemia, pol´ıtica p´ublica y panorama de la econom´ıa mexi-cana en 2020/Pandemic, public policy and the outlook for the Mexican economy in 2020.

Econom´ıaUNAM , 17(51):335–348.Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues.

The Review of Economics and Statistics , 92(4):1004–1016.Poncela, P. and Ruiz, E. (2016). Small versus big data factor extraction in Dynamic FactorModels: An empirical assessment in dynamic factor models, in Hillebrand, E. and Koopman,S.J. (eds.).

Advances in Econometrics , 35:401–434.Samaniego, N. (2020). El Covid-19 y el desplome del empleo en M´exico/The Covid-19 and theCollapse of Employment in Mexico.

Econom´ıaUNAM , 17(51):306–314.S´anchez, E. C. (2020). M´exico en la pandemia: atrapado en la disyuntiva saludvs econom´ıa/Mexico in the pandemic: caught in the disjunctive health vs economy.

Econom´ıaUNAM , 17(51):282–295.Sargent, T. J. and Sims, C. A. (1977). Business cycle modeling without pretending to have toomuch a priory economic theory, in Sims, C.A. (ed.).

New Methods in Business Cycle Research ,Minneapolis: Federal Reserve Bank of Minneapolis.Stephens-Davidowitz, S. and Varian, H. (2014). A hands-on guide to Google data. Technicalreport, Google Inc.Stock, J. H. and Watson, M. W. (2011). Dynamic factor models, in Clements, M.P and Hendry,D.F. (eds.).

Oxford Handbook of Economic Forecasting , Oxford: Oxford University Press.Tibshirani, R. (1996). Regression shrinkage and Selection via the Lasso.

Journal of the RoyalStatistical Society. Series B (Methodological) , 58(1):267–288.Vanegas, L. L. (2020). Los desaf´ıos del sistema de salud en M´exico/The health system challengesin Mexico.

Econom´ıaUNAM , 17(51):16–27.Varian, H. R. (2014). Big data: New tricks for econometrics.

Journal of Economic Perspectives ,28(2):3–28. 28 nnexes

Annex 1: Database

Traditional and timely informationShort Variable Source Time Span

ANTAD Total sales of departmental stores ANTAD 2004/01-2020/06AUTO Automobiles production INEGI 2004/01-2020/07CONF COM Right time to invest (Commerce) INEGI 2011/06-2020/07CONF CONS Right time to invest (Construction) INEGI 2011/06-2020/07CONF MANU Right time to invest (Manufacturing) INEGI 2004/01-2020/07CONF SERV Right time to invest (Services) INEGI 2017/01-2020/07GAS Fuel demand SENER 2004/01-2020/07HOTEL Hotel occupancy Tourism secretariat 2004/01-2020/06IMO Index of manufacturing orders INEGI 2004/01-2020/07IMSS Permanent and eventual insureds of the Social Security IMSS 2004/01-2020/07IPI Industrial Production Index INEGI 2004/01-2020/06IPI USA Industrial Production Index (USA) BEA 2004/01-2020/07IRGS Income of retail goods and services INEGI 2008/01-2020/05L MANUF Trend of labor in manufacturing INEGI 2007/01-2020/05M Total imports INEGI 2004/01-2020/06M4 Monetary aggregate M4 Banxico 2004/01-2020/06REM Total remittances Banxico 2004/01-2020/06U Unemployment rate INEGI 2005/01-2020/06X Total exports INEGI 2004/01-2020/06

High frequency traditional variablesShort Variable Source Time Span

E Nominal exchange rate Banxico 2004/01-2020/07IR 28 Interest rate (28 days) Banxico 2004/01-2020/07MSM Mexican stock market index Banxico 2004/01-2020/07SP 500 Standard & Poor’s 500 Yahoo! ﬁnance 2004/01-2020/07

High frequency nontraditional variablesShort Variable Source Time Span