[PDF] Quasi-maximum likelihood estimation of break point in high-dimensional factor models

Abstract

This paper estimates the break point for large-dimensional factor models with a single structural break in factor loadings at a common unknown date. First, we propose a quasi-maximum likelihood (QML) estimator of the change point based on the second moments of factors, which are estimated by principal component analysis. We show that the QML estimator performs consistently when the covariance matrix of the pre- or post-break factor loading, or both, is singular. When the loading matrix undergoes a rotational type of change while the number of factors remains constant over time, the QML estimator incurs a stochastically bounded estimation error. In this case, we establish an asymptotic distribution of the QML estimator. The simulation results validate the feasibility of this estimator when used in finite samples. In addition, we demonstrate empirical applications of the proposed method by applying it to estimate the break points in a U.S. macroeconomic dataset and a stock return dataset.

Full PDF

aa r X i v : . [ ec on . E M ] F e b Quasi-maximum likelihood estimation of break point in high-dimensional factormodels

Jiangtao Duan , Jushan Bai , Xu Han Northeast Normal University, Columbia University and City University of Hong Kong

Abstract:

This paper estimates the break point for large-dimensional factor models with a single structural break infactor loadings at a common unknown date. First, we propose a quasi-maximum likelihood (QML) estimator ofthe change point based on the second moments of factors, which are estimated by principal component analysis.We show that the QML estimator performs consistently when the covariance matrix of the pre- or post-breakfactor loading, or both, is singular. When the loading matrix undergoes a rotational type of change while thenumber of factors remains constant over time, the QML estimator incurs a stochastically bounded estimationerror. In this case, we establish an asymptotic distribution of the QML estimator. The simulation results validatethe feasibility of this estimator when used in ﬁnite samples. In addition, we demonstrate empirical applicationsof the proposed method by applying it to estimate the break points in a U.S. macroeconomic dataset and a stockreturn dataset.

Key words and phrases:

Structural break, High-dimensional factor models, Factor loadings

1. Introduction

Large factor models assume that a few factors can capture the common driving forces of a large number of economicvariables. Although factor models are useful, practitioners have to be cautious about the potential structural changes. Forexample, either the number of factors or the factor loadings may change over time. This concern is empirically relevantbecause parameter instability is pervasive in large-scale panel data.So far, many methods have been developed to test structural breaks in factor models (e.g., Stock and Watson (2008),Breitung and Eickmeier (2011), and Chen et al. (2014)). The rejection of the null hypothesis of no structural change leadsto the subsequent issues of how to estimate the change point, determine the numbers of pre- and post-break factors, andestimate the factor space. Chen (2015) considers a least-squares estimator of the break point and proves the consistencyof the estimated break fraction (i.e., the break date k divided by the full time series T , kT ). Cheng et al. (2016) proposea shrinkage method to obtain a consistent estimator of the break fraction. Baltagi et al. (2017) develop a least-squaresestimator of the change point based on the second moments of the estimated pseudo-factors and show that the estimation rror of the proposed estimator is O p (1), which indicates the consistency of the estimated break fraction. A few recentstudies also explore a consistent estimation of break points, which is technically more challenging. Ma and Su (2018) developan adaptive fused group Lasso method to consistently estimate all break points under a multibreak setup. Barigozzi et al.(2018) propose a method based on wavelet transformations to consistently estimate the number and locations of break pointsin the common and idiosyncratic components. Bai et al. (2020) establish the consistency of the least-squares estimator ofthe break point in large factor models when factor loadings are subjected to a structural break and the size of the breakis shrinking as the sample size increases. Although the estimators proposed in these studies are consistent under certainassumptions, the simulation results show that they perform poorly when (1) the number of factors changes after the breakor (2) the loading matrix undergoes a rotational type of change.According to the factor model literature, a factor model with a break in factor loadings is observationally equivalent tothat with constant loadings and possibly more pseudo-factors (e.g., Han and Inoue (2015) and Bai and Han (2016)). Thus,the estimation of the change point of factor loadings can be converted into that of the change point of the second momentof the pseudo-factors. We propose a quasi-maximum likelihood (QML) method to estimate the break point based on thesecond moment of the estimated pseudo-factors; therefore, the number of original factors is not required to be known forcomputing our estimator. First, we estimate the number of pseudo-factors in an equivalent representation that ignores thebreak, and then estimate the pre- and post-break second moment matrices of the estimated pseudo-factors for all possiblesample splits. The structural break date is estimated by minimizing the QML function among all possible split points.This paper makes the following contributions to the literature. First, we establish the consistency of the QML breakpoint estimator if the break leads to more pseudo-factors than the original pre- or post-break factors. This occurs whenthe break augments the factor space or in the presence of disappearing or emerging factors. Under these circumstances, thecovariance matrix of loadings on the pre- or post-break pseudo-factors is singular, which is the key condition to establishthe consistency of our QML estimator. To the best of our knowledge, this is the ﬁrst study that links the consistency ofthe break point estimator to the singularity of covariance matrices of loadings on pre- and post-break pseudo-factors. Inaddition, we prove that the diﬀerence between the estimated and true change points is stochastically bounded when bothpre- and post-break loadings on the pseudo-factors have nonsingular covariance matrices. In this case, the loading matrixonly undergoes a rotational change, and both the numbers of pre- and post-break original factors are equal to the numberof pseudo-factors.The aforementioned singularity leads to a technical challenge of analyzing the asymptotic property. The singular popu-lation covariance matrix of the pre(post)-break loadings has a zero determinant, whose logarithm is not deﬁned appropriately.To resolve this issue, we show that the estimated covariance matrices have nonzero determinants and a well-deﬁned inversefor any given sample size, by obtaining the convergence rate of the lower bound of their smallest eigenvalues. This ensuresthat the objective function based on the estimated covariance is appropriately deﬁned in any ﬁnite sample.Our second major contribution is that the QML method allows a change in the number of factors. Namely, it allows fordisappearing or emerging factors after the break. This is an advantage over the methods developed by Ma and Su (2018) andBai et al. (2020), who assume that the number of factors remains constant after the break. Our simulation result indicatesthat the estimator proposed by Bai et al. (2020) is inconsistent when some factors disappear and the remaining factors havetime-invariant loadings. Baltagi et al. (2017) allow a change in the number of factors; however, their estimation error was nly stochastically bounded. In contrast, our QML estimator remains consistent under varying number of factors.Finally, the QML method has a substantial computational advantage over the estimators that iteratively implementhigh-dimensional principal component analysis (PCA). For example, the estimator proposed by Bai et al. (2020) runs PCAfor pre- and post-split sample covariance matrices for all possible split points. In comparison, our QML runs PCA for theentire sample only once, and thus, is computationally more eﬃcient, especially in large samples.The rest of this paper is organized as follows. Section 2 introduces the factor model with a single break on the factorloading matrix and describes the QML estimator for the break date. Section 3 presents the assumptions made for thismodel. Section 4 presents the consistency and asymptotic distribution of the QLM estimator for the break date. Section5 investigates the ﬁnite-sample properties of the QML estimator through simulations. Section 6 implements the proposedmethod to estimate the break points in a monthly macroeconomic dataset of the United States and a dataset of weekly stockreturns of Nasdaq 100 components. Section 7 concludes the study.

2. Model and estimator

Let us consider the following factor model with a common break at k in the factor loadings for i = 1 , · · · , N : x it =  λ i f t + e it for t = 1 , , · · · , k ( T ) λ i f t + e it for t = k ( T ) + 1 , · · · , T, (1)where f t is an r − dimensional vector of unobserved common factors; r is the number of pseudo-factors; k ( T ) is the unknownbreak date; λ i and λ i are the pre- and post-break factor loadings, respectively; and e it is the error term allowed to haveserial and cross-sectional dependence as well as heteroskedasticity. τ ∈ (0 ,

1) is a ﬁxed constant and [ x ] represents theinteger part of x . For notational simplicity, hereinafter, we suppress the dependence of k on T . Note that we formulate themodel using pseudo-factors instead of the original underlying factors. This simpliﬁes the representation of various breaks ina uniﬁed framework, which will be clariﬁed in the examples below.In vector form, model (1) can be expressed as x t =  Λ f t + e t for t = 1 , , · · · , k Λ f t + e t for t = k + 1 , · · · , T, (2)where x t = [ x t , · · · , x Nt ] ′ , e t = [ e t , · · · , e Nt ] ′ , Λ = [ λ , · · · , λ N ] ′ , and Λ = [ λ , · · · , λ N ] ′ .For any k = 1 , · · · , T −

1, we deﬁne X (1) k = [ x , · · · , x k ] ′ , X (2) k = [ x k +1 , · · · , x T ] ′ ,F (1) k = [ f , · · · , f k ] ′ , F (2) k = [ f k +1 , · · · , f T ] , e (1) k = [ e , · · · , e k ] ′ , e (2) k = [ e k +1 , · · · , e T ] , where the subscript k denotes the date at which the sample is to be split, and the superscripts (1) and (2) denote the pre- nd post- k data, respectively. We rewrite (2) using the following matrix representation:  X (1) k X (2) k  =  F (1) k Λ ′ F (2) k Λ ′  +  e (1) k e (2) k  =  F (1) k (Λ B ) ′ F (2) k (Λ C ) ′  +  e (1) k e (2) k  , =  F (1) k B ′ F (2) k C ′  Λ ′ +  e (1) k e (2) k  , = G Λ ′ + E. (3)where Λ is an N × r matrix with full column rank. The pre- and post-break loadings are modeled as Λ = Λ B and Λ = Λ C ,respectively, where B and C are some r × r matrices. In this model, r = rank ( B ) ≤ r and r = rank ( C ) ≤ r denotethe numbers of original factors before and after the break, respectively. To distinguish them from the original factors, werefer to G as the pseudo-factors in (3) and rank ( G ) = r . Hence, the last line of (3) provides an observationally equivalentrepresentation with constant loadings Λ and r pseudo-factors G . It is well known that the break can augment the factorspace; thus, r ≤ r and r ≤ r . F (1) k and F (2) k have dimensions k × r and ( T − k ) × r , respectively, and Λ and Λ havedimension N × r . Our representation in (3) allows for changes in the factor loadings and the number of factors. Below,several examples are provided to illustrate that the pseudo-factor representation in (3) is general enough to cover three typesof breaks. Type 1 . Both B and C are singular. In this case, the number of original factors is strictly less than that of thepseudo-factors both before and after the break (i.e., r < r and r < r ). This means that the structural break in the factorloadings augments the dimension of the factor space. Let us consider the following example.Example (1): Let F (1) k ( k × r ) and F (2) k (( T − k ) × r ) denote the original factors before and after the break, respectively,and Θ and Θ denote the pre- and post-break loadings on these factors. Thus, this model can be represented and transformedas  X (1) k X (2) k  =  F (1) k Θ ′ F (2) k Θ ′  + e =  F (1) k F (2) k   Θ ′ Θ ′  + e =  [ F (1) k ... ∗ ] B ′ [ ∗ ... F (2) k ] C ′  Λ ′ + e =  F (1) k B ′ F (2) k C ′ | {z } G Λ ′ + e, (4)where Λ = [Θ , Θ ], B = diag ( I r , r × r ), C = diag (0 r × r , I r ), F (1) k = [ F (1) k ... ∗ ], F (2) k = [ ∗ ... F (2) k ], and the asterisk denotessome unidentiﬁed numbers such that all rows in F (1) k and F (2) k have the same variance (to satisfy Assumption 1 in Section3). In the special case of r = r , Λ is of full rank 2 r (i.e., the dimension of the pseudo-factor space is twice that of theoriginal factor space) if the shift in the loading matrix Θ − Θ is linearly independent of Θ . We refer to this special caseas the shift type of change, because the augmentation of the factor space is induced by a linearly independent shift in theloading matrix. Hence, Type 1 covers the shift type of change. Type 2 . Only B or C is singular. In this case, emerging or disappearing factors are present in the model. Let usconsider the following example of disappearing factors. xample (2): Without loss of generality, let us assume that r < r and Θ is equal to the ﬁrst r columns of Θ ;thus, the last r − r factors disappear after the break. Therefore, we can obtain the pseudo-factors by using the followingtransformation from the original factors F :  X (1) k X (2) k  =  F (1) k Θ ′ F (2) k Θ ′  + e =  F (1) k Θ ′ [ F (2) k ... ∗ ] C ′ Θ ′  + e =  F (1) k F (2) k C ′  Θ ′ + e =  F (1) k F (2) k C ′ | {z } G Λ ′ + e, (5)where F (1) k = F (1) k , F (2) k = [ F (2) k ... ∗ ], C = diag ( I r , ( r − r ) × ( r − r ) ), Λ = Θ , and the asterisk is deﬁned in a similar mannerto that in (4). In this example, B = I r , r = r , and r = rank( C ) < r . Symmetrically, if B is singular and C = I r , then r = r and r = rank( B ) < r , which means that certain factors emerge after the break point. Type 2 changes are importantin empirical analysis. Please refer to Mcalinn et al. (2018) for empirical evidence regarding the varying number of factors inthe U.S. macroeconomic dataset. For Types 1 and 2, we obtain a signiﬁcant result that P (ˆ k − k = 0) → N, T → ∞ . Type 3 . Both B and C are nonsingular. In this case, the loadings on the original factors undergo a rotational change,and the dimension of the original factors is the same as that of the pseudo-factors.Example (3): Let us assume that r = r and Θ = Θ C for a nonsingular matrix C . The model with the originalfactors F can be transformed into the following pseudo-factor representation:  X (1) k X (2) k  =  F (1) k Θ ′ F (2) k Θ ′  + e =  F (1) k Θ ′ F (2) k C ′ Θ ′  + e =  F (1) k F (2) k C ′  Θ ′ + e = G Λ ′ + e, (6)where F (1) k = F (1) k , F (2) k = F (2) k , and Λ = Θ . In this example, B = I r and r = r = r , and the factor dimension remainsconstant. In the observationally equivalent pseudo-factor representation, the loading is time-invariant and the original post-break factors F (2) k are rotated by C . We refer to this as the rotation type of change.The above examples show that a factor model with any of these three types of change can be uniﬁed and reformulatedby the representation in (3) with pseudo-factors. This representation controls the break type by varying the settings for B and C , and thus, is convenient for our theoretical analysis.Bai et al. (2020) rule out the rotation type of change because the break date is not identiﬁable by minimizing the sumof squared residuals. Baltagi et al. (2017) allow changes in the number of factors and rotation type of change; however,the diﬀerence between their estimator and the true break point is only stochastically bounded (i.e., their estimator is notconsistent). Ma and Su’s (2018) setup requires r = r ; thus, Type 2 is ruled out under their assumptions. Our simulationresult shows that Ma and Su’s estimator does not perform well under rotational changes (Type 3), whereas our QML method Technically, Types 1 and 2 can be combined into one type that involves singularity, which renders our QML estimator consistent.We consider Type 2 separately to emphasize the case of emerging and disappearing factors. an handle changes in all three types discussed above. We obtain a signiﬁcant result that ˆ k − k = O p (1) if both B and C are of full rank (i.e., Type 3) and ˆ k − k = o p (1) if B or C , or both, is singular (i.e., Type 1 and Type 2).In this paper, we consider the QML estimator of the break date for model (3):ˆ k = arg min [ τ T ] ≤ k ≤ [ τ T ] U NT ( k ) , (7)where [ τ T ] and [ τ T ] denote the prior lower and upper bounds for the real break point k with τ , τ ∈ (0 ,

1) and τ ≤ τ ≤ τ .The QML objective function U NT ( k ) is equal to U NT ( k ) = k log(det( ˆΣ )) + ( T − k ) log(det( ˆΣ )) , (8)where ˆΣ and ˆΣ can be deﬁned as ˆΣ = 1 k k X t =1 ˆ g t ˆ g ′ t , ˆΣ = 1 T − k T X t = k +1 ˆ g t ˆ g ′ t , (9)and ˆ g t is the PCA estimator of g t (i.e., the transpose of the t -th row of G ). We deﬁne Σ G, = E ( g t g ′ t ) for t ≤ k ,Σ G, = E ( g t g ′ t ) for t > k , and Σ G = τ Σ G, + (1 − τ )Σ G, , where Σ Λ is the covariance matrix of Λ. The PCA estimatorˆ g t is asymptotically close to H ′ g t for a rotation matrix H , and H p −→ H = Σ / Φ V − / as ( N, T ) → ∞ , where V and Φare the eigenvalue and eigenvector matrices of Σ / Σ G Σ / , respectively. Evidently, the second moment of H g t shares thesame change point as that of g t . Therefore, we proceed to estimate the pre- and post-break second moments of g t by usingthe estimated factors ˆ g t , and then use (7) to obtain the QML break point estimator ˆ k QML . Similar QML objective functionshave been used for multivariate time series with observed data (e.g., Bai (2000)).

3. Assumptions

In this section, we state the assumptions made for validating the consistency and asymptotic distribution of the QMLestimator.

Assumption 1. (i) E k f t k < M < ∞ , E ( f t f ′ t ) = Σ F , where Σ F is positive deﬁnite, and k P k t =1 f t f ′ t p −→ Σ F , T − k P Tt = k +1 f t f ′ t p −→ Σ F . (ii) There exists d > such that k ∆ k ≥ d > , where ∆ = B Σ F B ′ − C Σ F C ′ and B, C are r × r matrices. Assumption 2. k λ ℓi k ≤ ¯ λ < ∞ for ℓ = 1 , , i = 1 , · · · , N , (cid:13)(cid:13)(cid:13) N Λ ′ Λ − Σ Λ (cid:13)(cid:13)(cid:13) → for some r × r positive deﬁnite matrix Σ Λ . Assumption 3.

There exists a positive constant

M < ∞ such that(i) E ( e it ) = 0 and E | e it | ≤ M for all i = 1 , · · · , N and t = 1 , · · · , T ;(ii) E ( e ′ s e t N ) = E ( N − P Ni =1 e is e it ) = γ N ( s, t ) and P Ts =1 | γ N ( s, t ) | ≤ M for every t ≤ T ;(iii) E ( e it e jt ) = τ ij,t with | τ ij,t | < τ ij for some τ ij and for all t = 1 , · · · , T and P Nj =1 | τ ij | ≤ M for every i ≤ N ; iv) E ( e it e js ) = τ ij,ts , NT X i,j,t,s =1 | τ ij,ts | ≤ M ; (v) For every ( s, t ) , E (cid:12)(cid:12) N − / P Ni =1 ( e is e it − E [ e is e it ]) (cid:12)(cid:12) ≤ M . Assumption 4.

There exists a positive constant

M < ∞ such that E ( 1 N N X i =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) √ k k X t =1 f t e it (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ) ≤ M,E ( 1 N N X i =1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) √ T − k T X t = k +1 f t e it (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ) ≤ M. Assumption 5.

The eigenvalues of Σ G Σ Λ are distinct. Assumption 6.

Let us deﬁne ǫ t = f t f ′ t − Σ F . According to the data-generating process (DGP) of factors, the Hájek-Rényiinequality applies to the processes { ǫ t , t = 1 , · · · , k } , { ǫ t , t = k , · · · , } , { ǫ t , t = k + 1 , · · · , T } , and { ǫ t , t = T, · · · , k + 1 } . Remark 1.

Using the Hájek-Rényi equality on ǫ t , we can ensure that max k

There exists an

M < ∞ such that(i) For each s = 1 , · · · , T , E (max kk k − k k X t = k +1 | √ N N X i =1 [ e is e it − E ( e is e it )] | ) ≤ M,E (max k ≥ k T − k T X t = k +1 | √ N N X i =1 [ e is e it − E ( e is e it )] | ) ≤ M. (ii) E (max kk k − k k X t = k +1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) √ N N X i =1 λ i e it (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ) ≤ M,E (max k ≥ k T − k T X t = k +1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) √ N N X i =1 λ i e it (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ) ≤ M. ssumption 8. There exists an

M < ∞ such that for all values of N and T ,(i) for each t , E max [ τ T ] ≤ k

4. Asymptotic properties of the QML estimator

In this section, we derive the asymptotic properties of the QML estimator for various breaks. In the literature ofstructural breaks for a ﬁxed-dimensional time series, conventional break point estimators, such as the least-squares (LS)estimator of Bai (1997) or the QML estimator of Qu and Perron (2007), are usually inconsistent. The estimation error ofthese conventional estimators is O p (1) when the break size is ﬁxed. To reach consistency, the cross-sectional dimension ofthe time series must be large (e.g., Bai (2010) and Kim (2011)).Recall that the observationally equivalent representation in (3) has time-invariant loadings and varying pseudo-factors.Hence, our problem converges to estimating the break point in the r -dimensional time series g t , where r is ﬁxed. Theorems1 and 2 below show that, for rotational breaks (Type 3), the convergence rate and limiting distribution are similar to thoseavailable in the literature. However, for Type 1 and 2 breaks, Theorem 3 derives a much more signiﬁcant result than thatavailable in the literature, according to which our QML estimator is consistent even if our g t has only a ﬁxed cross-sectionaldimension r . Theorem 1.

Under Assumptions 1–8, when both B and C are of full rank, ˆ k − k = O p (1) . This theorem implies that the diﬀerence between the QML estimator and the true change point is stochastically boundedin model (6). Although the estimation errors of BKW and QML methods both are bounded, the QML estimator has muchbetter ﬁnite sample properties. To conﬁrm this theoretical result, we conduct a simulation where the factor loadings haverotational change (see DGP 1.B in Section 5). Table 2 presents the MAEs and RMSEs of diﬀerent estimators. The simulationresult shows that the QML estimators have much smaller MAEs and RMSEs than other methods. In addition, ˆ k does notcollapse to k , leading to a nondegenerate distribution. We will state the limiting distribution in Theorem 2. Nevertheless,this theorem shows that the break point can be appropriately estimated because ˆ τ = ˆ k/T is still consistent for τ . Remark 2.

Recall that ∆ = C ′ Σ F C − B ′ Σ F B ; thus, k ∆ k represents the magnitude of the break. Note that the Baltagi et al.(2017) estimator comprises stochastically bounded estimation errors, and is not consistent even if the magnitude of thebreak is large. In contrast, the QML estimator remains consistent with an increasing k ∆ k . In fact, the proof indicates that U ( k ) − U ( k ) k − k → ∞ for k = k as k ∆ k → ∞ ; thus, the consistency of the QML estimator can be obtained. As it is not commonto consider a diverging break size in empirical applications, we do not analyze this case in the present paper. o make an inference regarding the change point when both B and C are of full rank, we derive the limiting distributionof ˆ k . Let us deﬁne ξ t = H ′ g t g ′ t H − Σ for t ≤ k ,ξ t = H ′ g t g ′ t H − Σ for t > k , where Σ = H ′ Σ G, H and Σ = H ′ Σ G, H are the pre- and post-breaks of H ′ E ( g t g ′ t ) H . The limiting distribution of ˆ k isgiven by the following theorem: Theorem 2.

Under Assumptions 1–8, when both B and C are of full rank, ˆ k − k d −→ arg min ℓ W ( ℓ ) , where W ( ℓ ) = k − X t = k + ℓ tr ((Σ − − Σ − ) ξ t ) − (cid:0) tr (Σ Σ − ) − r − log | Σ Σ − | (cid:1) ℓ for ℓ = − , − , · · · ,W ( ℓ ) = 0 for ℓ = 0 ,W ( ℓ ) = k + ℓ X t = k +1 tr ((Σ − − Σ − ) ξ t ) + (cid:0) tr (Σ − Σ ) − r − log | Σ − Σ | (cid:1) ℓ for ℓ = 1 , , · · · . This result shows that the limiting distribution depends on ξ t . If ξ t is independent over time, then W ( ℓ ) is a two-sidedrandom walk. If f t is stationary, then ξ t is stationary in each regime. Here, the limiting distribution of the estimated breakdate is dependent on the generation processes of the unobserved factors, and thus, cannot be directly used to construct aconﬁdence interval for a true break point. Bai et al. (2020) propose a bootstrap method to construct a conﬁdence intervalfor k when the change in the factor loading matrix shrinks as N → ∞ . However, their bootstrap procedure lacks robustnessin the cross-sectional correlation in the error terms. In the current setup, the break magnitude k Σ − Σ k is ﬁxed and weleave the case of shrinking break magnitude as a future topic.Next, we establish a much stronger result than that available in the literature, which states that the QML estimatorremains consistent when B or C , or both, is singular. We make the following additional assumptions. Assumption 9.

With probability approaching one (w.p.a.1), the following inequalities hold: < c ≤ min [ τ T ] ≤ k ≤ k ρ r Nk k X t =1 Λ ′ e t e ′ t Λ ! ≤ c < + ∞ < c ≤ min k ≤ k ≤ [ τ T ] ρ r N ( T − k ) T X t = k +1 Λ ′ e t e ′ t Λ ! ≤ c < + ∞ as N, T → ∞ , where c and c are some constants. ssumption 10. max [ τ T ] ≤ k ≤ k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) √ Nk k X t =1 N X i =1 f t e it λ ′ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p (1) , max k ≤ k ≤ [ τ T ] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) p N ( T − k ) T X t = k +1 N X i =1 f t e it λ ′ i (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p (1) . Assumption 9 is useful to derive the lower bound of the smallest eigenvalue of ˆΣ (or ˆΣ ) if B (or C ) is a singular matrix.Thus, Assumption 10 is similar to Assumption F2 of Bai (2003). As the log determinant of the matrix is involved in the QMLfunction, a natural problem is that the log determinant of a singular matrix is inﬁnity when B or C , or both, is singular.Note that k P t =1 in Assumptions 9-10 involve a positive fraction of observations over time since the low bound of k is τ T , where τ ∈ (0 , and Σ are singular matrices, the determinants of ˆΣ = k k P t =1 ˆ g t ˆ g ′ t and ˆΣ = T − k T P t = k +1 ˆ g t ˆ g ′ t are smallbut not equal to zero in ﬁnite samples. The following proposition develops a lower bound for the smallest eigenvalues of ˆΣ and ˆΣ . Proposition 1.

Under Assumptions 1–10, for k ≥ k and k ≤ [ τ T ] , if C is singular and √ N/T → as N, T → ∞ , thenthere exist constants c U ≥ c L > such that P (cid:18) min k ∈ [ k , [ τ T ]] ρ j ( ˆΣ ( k )) ≥ c L N (cid:19) → ,P (cid:18) max k ∈ [ k , [ τ T ]] ρ j ( ˆΣ ( k ) ≤ c U N (cid:19) → , for j = r + 1 , ..., r . In proposition 1, the lower bound of the smallest eigenvalues of the estimated sample covariance matrices ˆΣ and ˆΣ is c L /N for a constant c L > Assumption 11. (i) [ B, C ] row full rank .(ii) C Bf k = 0 when r − r = 1 and B Cf k +1 = 0 when r − r = 1 , where A denotes the adjoint matrix for thesingular matrix A .(iii) k Bf k − Proj( Bf k | C ) k ≥ d > when r − r ≥ or r = 0 and k Cf k +1 − Proj( Cf k +1 | B ) k ≥ d > when r − r ≥ or r = 0 , where Proj( A | Z ) denotes the projection of A onto the Z columns and d is a constant. Since rank (Σ G ) = rank h √ τ B, √ − τ C ] i " Σ F ,

00 Σ F √ τ B ′ √ − τ C ′ = rank h √ τ B, √ − τ C ] i " Σ / F ,

00 Σ / F = rank (cid:16)h √ τ B, √ − τ C ] i(cid:17) and 1 < τ <

1, [

B, C ] row full rank such that Σ G is a positive deﬁnite matrix. ssumption 11(i) implies that Σ G is positive deﬁnite, and B C = 0 when r − r = 1, and C B = 0 when r − r = 1.Assumption 11(ii) is to exclude the possibility that f k and f k +1 in the null space of C B and B C , respectively, thenthe speciﬁc low bound of | ˆΣ ( k ) | with respect to k − k can be obtained when k < k and k − k is bounded as N, T → ∞ .Similarly, Assumption 11(iii) also exclude the possibility that Bf k in the column space of C when r − r ≥ r = 0 and Cf k +1 in the column space of B when r − r ≥ r = 0, then the speciﬁc low bound of | ˆΣ ( k ) | with respect to k − k can be obtained when k < k and k − k is divergent as N, T → ∞ . Assumption 11 was used to establish Lemma 8, which isuseful for validating the consistency result that

P rob (ˆ k − k = 0) → f k and f k +1 have continuous probability distribution functions, then Assumption 11(ii)-(iii) are to exclude a zero probability event since C B and B C are not equal to 0. From another perspective, Assumption 11(ii)-(iii) allow f t have various data generatingprocess. Theorem 3.

Under Assumptions 1–11 and NT → κ , as N, T → ∞ for < κ < ∞ , when B or C , or both, is singular, P rob (ˆ k − k = 0) → . Theorem 3 shows that the estimated change point converges to the true change point w.p.a.1 when B or C , or both, issingular (Types 1 and 2 in Section 2). This result is much more signiﬁcant than that obtained by Baltagi et al. (2017), whoshow that the distance between the estimated and true break dates is bounded for Types 1–3. Note that the case in whichonly B (or C ) is singular corresponds to Type 2 with emerging (or disappearing) factors. Our QML estimator is consistentunder this type of change, whereas Bai et al. (2020) and Ma and Su (2018) rule out this type by assumption. In empiricalapplications, the conditions of theorem 3 are rather ﬂexible and likely to hold and the consistency of the break date estimatoris expected in most economic data for the factor analysis. Remark 3.

An important contribution of Theorem 3 is to link the consistency of the QML estimator with the singularityof the covariance matrices of the pre- or post-break factor loadings. The conditions that B or C , or both, is singular and NT → κ ∈ (0 , ∞ ) are likely to hold in many economic datasets for factor analysis. If both B and C are singular, the breakoccurs such that the number of pseudo-factors in the entire factor model is larger than that of the factors in the pre- andpost-break subsamples. For example, if all factors undergo large breaks in their loadings, the number of factors tends to bedoubled (see Breitung and Eickmeier (2011)). If B is of full rank and C is singular, some factors become useless, and thus,the loading coeﬃcients attached to these disappearing factors become zero. For example, in the momentum portfolio, somerisks are not part of the ﬁrm’s long-run structure as the sorting is only based on recent returns works; the reward is high butdisappears within less than a year. If B is singular and C is of full rank, some factors emerge after the break date, whichincreases the dimension of the post-break factor space. For example, changes in the technology or policy may produce certainnew factors. Remark 4.

Theorem 3 indicates that U NT ( k ) can be minimized to consistently estimate k . The intuition for this is that U NT ( k ) − U NT ( k ) is always larger than zero, even if k deviates only slightly from the true break point k , so that ˆ k must beequal to k to minimize U NT ( k ) − U NT ( k ) . For example, in Type 1, when both B and C are singular for k < k , we candecompose ˆΣ as ˆΣ = T − k k P t = k +1 ˆ g t ˆ g ′ t + T − k T P t = k +1 ˆ g t ˆ g ′ t , and the term T − k k P t = k +1 ˆ g t ˆ g ′ t enlarges the determinant of ˆΣ . Bysymmetry, we obtain a similar result for k > k . Thus, U NT ( k ) − U NT ( k ) > w.p.a.1 as N, T → ∞ if k = k . emark 5. With the QML estimator, we do not need to know the numbers of original factors r and r before and afterthe break point, but only the number of pseudo-factors in the entire sample. Bai et al. (2020) and Ma and Su (2018) requireknowledge of the number of original factors, which is much more diﬃcult to estimate due to the augmented factor spaceresulting from the break. In practice, the number of pseudo-factors is much easier to estimate by using one of a number ofestimators, such as the information criteria developed by Bai and Ng (2002).

5. Simulation

In this section, we consider DGPs corresponding to Types 1–3 to evaluate the ﬁnite sample performance of the QMLestimator. We compare the QML estimator with three other estimators. As shown below, ˆ k BKW is the estimator proposed byBaltagi, Kao, and Wang (2017, BKW hereafter); ˆ k BHS is the estimator proposed by Bai, Han, and Shi (2020, BHS hereafter);ˆ k MS is the estimator proposed by Ma and Su (2018, MS hereafter); and ˆ k QML is the QML estimator. Barigozzi et al. (2018)develops a change point estimator using wavelet transformation, which exhibits similar performance to that of the estimatorproposed by Ma and Su (2018). Hence, the comparison with the estimator proposed by Barigozzi et al. (2018) is not reportedhere, but the result is available upon request. The DGP roughly follows BKW, which can be used to examine various elementsthat may aﬀect the ﬁnite sample performance of the estimators, and we use this DGP for model (3). We calculate the rootmean square error (RMSE) and mean absolute error (MAE) of these change point estimators ˆ k BKW , ˆ k BHS , and ˆ k QML ,and each experiment is repeated 1000 times, where RMSE= s P s =1 (ˆ k s − k ) and MAE= P s =1 | ˆ k s − k | . When T is small, there is a possibility that Ma and Su’s (2018) method detects no break or multiple breaks; thus, the deﬁnition ofthe estimation error for a single break point in such cases is not straightforward. For a comparison, we compute the RMSEand MAE of the MS estimator by only using the results obtained by the MS estimator when it successfully detects a singlebreak. As the computation of ˆ k BHS and ˆ k MS requires the number of original factors and that of ˆ k BKW and ˆ k QML requiresthe number of pseudo-factors, we set ˆ r = r for ˆ k BHS and ˆ k MS and ˆ r = r for ˆ k QML and ˆ k BKW , where r is the number oforiginal factors and r is the number of pseudo-factors.We generate factors and idiosyncratic errors using a DGP similar to that of BKW. Each factor is generated by thefollowing AR(1) process: f tp = ρf t − ,p + u t,p , for t = 2 , · · · , T ; p = 1 , · · · , r , where u t = ( u t, , · · · , u t,r ) ′ is i.i.d. N (0 , I r ) for t = 2 , · · · , T and f = ( f , , · · · , f ,r ) ′ is i.i.d. N (0 , − ρ I r ). The scalar ρ captures the serial correlation of factors, and the idiosyncratic errors are generated by e i,t = αe i,t − + v i,t , for i = 1 , · · · , N t = 2 , · · · , T, where v t = ( v ,t , · · · , v N,t ) ′ is i.i.d. N (0 , Ω) for t = 2 , · · · , T and e = ( e , , · · · , e N, ) ′ is N (0 , − α )Ω ). The scalar α captures the serial correlation of the idiosyncratic errors, and Ω is generated as Ω ij = β | i − j | so that β captures the degreeof cross-sectional dependence of the idiosyncratic errors. In addition, u t and v t are mutually independent for all values of t . We set r = 3 and k = T /

2. We consider the following DGPs for factor loadings and investigate the performance of theQML estimator for the three types of breaks discussed in Section 2. GP 1.A

We ﬁrst consider the case in which C is singular, and set C = [1 , ,

0; 0 , ,

0; 0 , , λ i, are i.i.d. N (0 , r I r ) across i . In the post-break regime,Λ = ( λ , , · · · , λ N, ) ′ = Λ C . This case corresponds to a Type 2 change with a disappearing factor. The number of pseudo-factors is the same as r , so r = 3, and the numbers of pre- and post-break factors are 3 and rank( C ) = 2, respectively. Table1 lists the RMSEs and MAEs of three estimators for diﬀerent values of ( ρ, α, β ). In all cases, ˆ k QML has much smaller MAEsand RMSEs than ˆ k BKW and ˆ k BHS . Moreover, the MAEs and RMSEs of ˆ k QML tend to decrease as N and T increase. Thisconﬁrms the consistency of ˆ k QML established in Theorem 3. In addition, the RMSEs and MAEs of ˆ k BKW do not converge tozero as N and T increase, which conﬁrms that ˆ k BKW has a stochastically bounded estimation error. ˆ k BHS does not appearto be consistent when a factor disappears after the break. Moreover, a larger AR(1) coeﬃcient ρ tends to deteriorate theperformance of ˆ k BKW , but does not have much impact on our QML estimator.

DGP 1.B

We next consider the case in which C is of full rank. We set C as a lower triangular matrix. The diagonalelements are equal to 0 .

5, 1 .

5, and 2 .

5, and the elements below these diagonal elements are i.i.d. and drawn from a standardnormal distribution. Under this DGP, we have r = r . Table 2 reports the performance of three estimators for diﬀerentvalues of ( ρ, α, β ). In all cases, ˆ k BKW and ˆ k QML appear to have stochastically bounded estimation errors, which conﬁrmsTheorem 1 of BKW and Theorem 1 of this paper. Both ˆ k QML and ˆ k BKW are inconsistent under this DGP; however, underall settings, our QML estimator tends to have much smaller RMSEs and MAEs than the estimator of BKW. The MAEs andRMSEs of ˆ k BHS appear to increase with the sample size; thus, the BHS method cannot handle this case.

DGP 1.C

In this case, we set C = [1 , ,

0; 2 , ,

0; 3 , , m ] and m ∈ { , . , . , . , } . As m decreases to zero, the matrix C changes from full rank to singular. We still consider serial correlation in factors and serial correlation and cross-sectionaldependence in idiosyncratic errors simultaneously with N = 100 , T = 100. Table 3 shows that the MAEs and RMSEs ofˆ k QML decrease with m , which conﬁrms our ﬁndings in Theorems 1 and 3. In addition, the RMSEs and MAEs of ˆ k BKW andˆ k BHS are much larger than those of ˆ k QML , and do not tend toward zero as m decreases. For each value of m , the experimentis repeated 10000 times to more accurately estimate and compare the RMSEs (MAEs) of our QML estimator across diﬀerentvalues of m . DGP 1.D

This DGP considers a Type 1 break. In the ﬁrst regime, the last elements of λ i, are zeros for all i , and theﬁrst two elements of λ i, are both i.i.d. N (0 , I r ). In the second regime, λ i, is i.i.d. N (0 , I r ) across i . As λ i, and λ i, areindependent, the numbers of factors in the two regimes are r = 2 andr = 3, respectively, and the number of pseudo-factorsis r = 5. Because the numbers of pre- or post-break factors are smaller than that of the pseudo-factors, both Σ and Σ aresingular matrices. Table 4 reports the MAEs and RMSEs of ˆ k QML , ˆ k BHS , and ˆ k BKW under this DGP. Table 4 shows thesuitable performances of both ˆ k BHS and our ˆ k QML . Their MAEs (RMSEs) are less than 0.05 (0.25) for all combinations of N , T , ρ , α , and β . Although ˆ k BHS is consistent under this DGP, our QML estimator still has smaller RMSEs than ˆ k BHS in most cases reported in Table 4. In addition, ˆ k BKW performs better under this DGP than DGPs 1.A–1.C. However, itsestimation error is much larger than that of our QML estimator. This is not surprising because ˆ k BKW is not consistent.Finally, a larger AR(1) coeﬃcient ρ tends to yield a larger bias for ˆ k BKW , but does not have much eﬀect on the performancesof ˆ k BHS and ˆ k QML .In summary, Tables 1 and 2 show that the QML estimator performs much better than ˆ k BHS under Type 2 and 3 breaks,which are ruled out under the assumptions of Bai et al. (2020). Table 4 shows that the QML estimator tends to slightly utperform ˆ k BHS , even though the latter is known to be consistent under Type 1 breaks. BHS method is super good forType 1 changes with smaller breaks. QML method will lose its power when breaks are small like in BHS’s settings in theirpaper, because the dimension of G (determined by IC criterion in Bai and Ng (2002)) will not be augmented when breaksare small, which means the singularity does not show up in the covariance if breaks are small enough. Table 1: Simulated mean absolute errors (MAEs) and root mean squared errors (RMSEs) of ˆ k BKW , ˆ k BHS , andˆ k QML under DGP 1.A.

N, T ˆ k BKW ˆ k BHS ˆ k QML

MAE RMSE MAE RMSE MAE RMSE ρ = 0 α = 0 β = 0100,100 6.3130 8.9546 5.4600 7.7325 1.6070 2.9293100,200 7.0230 11.9053 7.9580 12.4801 1.2990 2.3206200,200 5.6730 9.9774 6.7150 10.8610 0.7960 1.5218200,500 4.6940 8.5732 10.0960 17.9778 0.7340 1.3799500,500 4.4580 8.5789 8.6770 15.6509 0.3890 0.8597 ρ = 0 . α = 0 β = 0100,100 9.7200 12.0612 4.5670 6.9270 1.3570 2.7592100,200 14.3410 19.5941 7.0110 11.1559 1.0470 2.2070200,200 13.6260 19.1151 6.7760 10.9099 0.5840 1.2394200,500 15.4880 27.5716 10.5450 18.7350 0.5190 1.1406500,500 16.9890 29.5463 8.2030 15.1581 0.3210 0.7944 ρ = 0 α = 0 . β = 0100,100 6.5060 9.1533 6.1520 8.6248 2.3740 4.0635100,200 7.5490 12.4416 8.7150 13.4473 1.6920 3.1464200,200 6.2890 10.8337 8.4910 13.2894 1.0230 1.9409200,500 5.1220 10.1068 11.3960 19.4945 0.8110 1.5156500,500 4.7580 9.5055 10.3660 18.7453 0.4570 0.9407 ρ = 0 α = 0 β = 0 . ρ = 0 . α = 0 . β = 0 . k BKW , ˆ k BHS , andˆ k QML under DGP 1.B.

N, T ˆ k BKW ˆ k BHS ˆ k QML

MAE RMSE MAE RMSE MAE RMSE ρ = 0, α = 0, β = 0100,100 4.1610 6.6934 8.7430 11.0347 1.2180 2.3259100,200 4.4450 8.4477 18.5660 22.9913 0.9960 1.8799200,200 4.9160 8.9420 19.4440 23.6923 0.9060 1.7082200,500 4.4530 8.8368 49.3330 59.4865 0.9130 1.7085500,500 3.9420 7.2061 51.9270 61.5507 0.8370 1.5959 ρ = 0 . α = 0 β = 0100,100 6.4570 9.4427 10.1710 12.3371 1.9460 3.7691100,200 9.1750 14.8115 21.3380 25.1834 1.8480 3.6362200,200 9.6310 15.0080 21.5560 25.2723 1.7850 3.5901200,500 11.4150 21.3302 51.9850 61.9028 1.6750 3.4218500,500 9.5430 18.4598 53.6060 62.7128 1.6490 3.5501 ρ = 0 α = 0 . β = 0100,100 3.9840 6.4778 7.9990 10.5485 1.0910 2.1824100,200 4.6820 8.6151 17.6010 22.5002 1.0360 1.9432200,200 4.6350 8.4454 21.9190 26.0996 0.8770 1.7306200,500 4.2690 8.2870 50.1790 61.5307 0.8600 1.6474500,500 4.2040 8.3094 54.8050 64.8615 0.8040 1.5492 ρ = 0 α = 0 β = 0 . ρ = 0 . α = 0 . β = 0 . k BKW , ˆ k BHS , andˆ k QML under DGP 1.C with N = 100 , T = 100 among 10000 replications. m ˆ k BKW ˆ k BHS ˆ k QML

MAE RMSE MAE RMSE MAE RMSE ρ = 0 α = 0 β = 01 3.9228 6.4437 7.2579 9.7141 0.6562 1.29030.8 3.9425 6.4624 6.6330 9.1145 0.6348 1.25590.5 3.7847 6.2319 5.4950 7.9789 0.5420 1.08140.1 3.8469 6.2895 4.6050 6.9212 0.5093 1.05680 3.8310 6.2414 4.4915 6.8352 0.4969 1.0315 ρ = 0 . α = 0 β = 01 6.0404 9.0280 9.3131 11.5733 0.9478 2.06800.8 6.0063 9.0017 8.5168 10.9192 0.8547 1.89600.5 5.9803 8.9390 6.5127 9.0641 0.6925 1.57520.1 5.9300 8.8833 4.6894 7.0610 0.5178 1.23350 6.0197 8.9771 4.5440 6.9049 0.5070 1.2057 ρ = 0 α = 0 . β = 01 3.8349 6.2423 7.1824 9.7359 0.6727 1.32340.8 3.8234 6.2331 6.6551 9.2338 0.6535 1.29630.5 3.8345 6.3110 5.8040 8.3371 0.6152 1.23620.1 3.9127 6.4083 5.0645 7.4846 0.5895 1.16440 3.9188 6.4124 4.9815 7.3974 0.5813 1.1551 ρ = 0 α = 0 β = 0 .

31 3.8250 6.3150 6.2535 8.7224 0.6622 1.30390.8 3.8135 6.2932 5.6808 8.1438 0.6259 1.23790.5 3.8253 6.3061 4.6189 6.9328 0.5619 1.11710.1 3.9120 6.4147 3.9299 6.0820 0.5424 1.09490 3.8176 6.2881 3.8963 6.0564 0.5199 1.0497 ρ = 0 . α = 0 . β = 0 .

31 6.0745 9.0347 8.0648 10.5304 1.0515 2.26690.8 6.0041 8.9542 7.3126 9.8433 0.9338 2.01730.5 6.0519 9.0124 5.8471 8.4490 0.7798 1.75370.1 6.0120 8.9694 4.6376 7.1447 0.6100 1.44010 6.0379 8.9861 4.5336 7.0337 0.5850 1.350916able 4: Simulated mean absolute errors (MAEs) and root mean squared errors (RMSEs) of ˆ k BKW , ˆ k BHS , andˆ k QML under DGP 1.D.

N, T ˆ k BKW ˆ k BHS ˆ k QML

MAE RMSE MAE RMSE MAE RMSE ρ = 0, α = 0, β = 0100,100 0.4330 1.3494 0.0370 0.1975 0.0260 0.1673100,200 0.3380 1.0900 0.0300 0.1732 0.0240 0.1549200,200 0.2780 0.7668 0.0180 0.1342 0.0130 0.1140200,500 0.2850 0.8155 0.0070 0.0837 0.0100 0.1000 ρ = 0 . α = 0 β = 0100,100 1.8760 4.8750 0.0120 0.1095 0.0110 0.1049100,200 1.1140 4.0007 0.0150 0.1225 0.0110 0.1140200,200 0.8700 3.5000 0.0050 0.0707 0.0020 0.0447200,500 0.4070 1.3435 0.0030 0.0548 0.0010 0.0316 ρ = 0 α = 0 . β = 0100,100 0.4400 1.4519 0.0450 0.2302 0.0410 0.2258100,200 0.3590 1.3802 0.0440 0.2145 0.0340 0.1897200,200 0.3080 0.8438 0.0150 0.1225 0.0140 0.1265200,500 0.2150 0.6656 0.0160 0.1265 0.0120 0.1095 ρ = 0 α = 0 β = 0 . ρ = 0 . α = 0 . β = 0 . Tables 5–8 present the probabilities of the correct estimation of the break date. The results are consistent with thosedisplayed in Tables 1–4: the QML estimator ˆ k QML can detect the true break date with higher probabilities than othersregardless of the values of ( ρ, α, β ). The MS method sometimes detects more than one or no break; hence, we only computeits probability of correctly estimating k under the condition that it detects a single break. The probabilities of a correctestimation of the QML method increase with the sample sizes N and T in Tables 5, 6, and 8.Table 7 shows that the probabilities of correct estimation of the QML estimators increase as m decreases. A smaller m means that C is closer to a singular matrix. Table 7 is consistent with Table 3, and conﬁrms Theorems 1 and 3. To explorein more detail the eﬀect of changes in m on the QML estimator, we vary the value of m using ﬁner grids and ﬁnd a similarpattern to that shown in Table 7. The results are reported in the supplementary appendix. igures 1 and 2 show the frequency of the estimated change points under DGP 1.A for N = 100 , T = 100 and N = 500 , T = 500 for 1000 replications. According to these ﬁgures, the QML estimators exhibit the highest frequencyaround the true break under diﬀerent settings. When we increase the ( N, T ) value from 100 to 500, the frequency at the truebreak point increases and the simulated distribution becomes tighter. This indicates that the QML estimators are highlylikely to identify the true break point. This is consistent with our theory. However, the other three methods are found to havemuch larger variation and substantially lower probabilities to correctly estimate the break point. Thus, the QML estimatorsare advantageous in this case. Moreover, the simulation result indicates that for a sample size exceeding N = 5000 , T = 1000,the probabilities of correctly estimating the QML estimator exceed 90%.Recall that BKW and QML only have O p (1) estimation errors under DGP 1.B. However, Table 6 shows that in all cases,the probabilities of correct estimation by the QML estimator are much higher than those of correct estimation by the BKWestimator Apparently, the BHS and MS methods cannot accurately estimate the true break point in this case. Figures 3 and4 show the distributions of the estimated change points under (1.B) for N = 100 , T = 100 and N = 500 , T = 500, indicatingthat BHS and MS cannot handle rotational changes. Although the estimation errors of BKW and QML are bounded underall settings, the QML estimators have a much tighter distribution around the true break point. N, T ˆ k BKW ˆ k BHS ˆ k MS ˆ k QML ρ = 0 α = 0 β = 0100,100 0.1530 0.1440 0.1626 0.4220100,200 0.1920 0.1510 0.1863 0.4370200,200 0.2340 0.1780 0.1307 0.5680200,500 0.2540 0.2030 0.2020 0.5780500,500 0.2990 0.2100 0.2123 0.7290 ρ = 0 . α = 0 β = 0100,100 0.1050 0.2050 0.2329 0.5290100,200 0.1250 0.1850 0.1779 0.5510200,200 0.1390 0.1920 0.1898 0.6660200,500 0.1750 0.1890 0.2031 0.6940500,500 0.2100 0.2420 0.2306 0.7810 ρ = 0 α = 0 . β = 0100,100 0.1790 0.1300 0.1072 0.3280100,200 0.1850 0.1380 0.1897 0.4090200,200 0.2260 0.1650 0.1931 0.5320200,500 0.2530 0.1730 0.1845 0.5650500,500 0.2750 0.1920 0.1964 0.6880 ρ = 0 α = 0 β = 0 . ρ = 0 . α = 0 . β = 0 . N, T ˆ k BKW ˆ k BHS ˆ k MS ˆ k QML ρ = 0 α = 0 β = 0100,100 0.2760 0.0690 0.0769 0.4790100,200 0.2920 0.0540 0.0362 0.5180200,200 0.2720 0.0320 0.0655 0.5270200,500 0.3110 0.0140 0.0091 0.5340500,500 0.2960 0.0100 0.0123 0.5580 ρ = 0 . α = 0 β = 0100,100 0.2710 0.0640 0.0909 0.4540100,200 0.2500 0.0270 0.0398 0.4530200,200 0.2180 0.0160 0.0200 0.4790200,500 0.2370 0.0120 0.0144 0.4970500,500 0.2450 0.0080 0.0080 0.5090 ρ = 0 α = 0 . β = 0100,100 0.3050 0.1060 0.1163 0.5180100,200 0.2930 0.0740 0.0989 0.5020200,200 0.2890 0.0390 0.0496 0.5540200,500 0.3000 0.0230 0.0328 0.5630500,500 0.3090 0.0090 0.0125 0.5780 ρ = 0 α = 0 β = 0 . ρ = 0 . α = 0 . β = 0 . N = 100 , T = 100. m ˆ k BKW ˆ k BHS ˆ k MS ˆ k QML ρ = 0 α = 0 β = 01 0.3044 0.1075 0.1188 0.60790.8 0.3033 0.1252 0.1389 0.61530.5 0.3009 0.1736 0.1904 0.64670.1 0.2976 0.1998 0.2014 0.66800 0.2977 0.2031 0.2192 0.6705 ρ = 0 . α = 0 β = 01 0.2841 0.0781 0.1040 0.60510.8 0.2871 0.0975 0.1135 0.62540.5 0.2896 0.1532 0.1620 0.66410.1 0.2876 0.2073 0.2369 0.71310 0.2848 0.2194 0.2297 0.7154 ρ = 0 α = 0 . β = 01 0.2973 0.1226 0.1442 0.60630.8 0.2981 0.1416 0.1648 0.61340.5 0.2988 0.1641 0.1730 0.62190.1 0.2993 0.1828 0.1954 0.63160 0.2988 0.1860 0.1995 0.6342 ρ = 0 α = 0 β = 0 .

31 0.3018 0.1344 0.1399 0.60750.8 0.3016 0.1549 0.1679 0.61640.5 0.3044 0.1927 0.2078 0.63830.1 0.3009 0.2141 0.2093 0.64610 0.3036 0.2211 0.2314 0.6566 ρ = 0 . α = 0 . β = 0 .

31 0.2821 0.1519 0.1739 0.59210.8 0.2844 0.1710 0.1964 0.60820.5 0.2843 0.2327 0.2402 0.64960.1 0.2850 0.2889 0.2911 0.68980 0.2868 0.2951 0.2966 0.695121able 8: Probability of correct estimation under DGP 1.D.

N, T ˆ k BKW ˆ k BHS ˆ k MS ˆ k QML ρ = 0 α = 0 β = 0100,100 0.7960 0.9640 0.9553 0.9750100,200 0.8200 0.9700 0.9700 0.9760200,200 0.8160 0.9820 0.9841 0.9870200,500 0.8260 0.9930 0.9930 0.9900 ρ = 0 . α = 0 β = 0100,100 0.7000 0.9880 0.9864 0.9890100,200 0.7540 0.9850 0.9859 0.9900200,200 0.7950 0.9950 0.9949 0.9980200,500 0.8220 0.9970 0.9970 0.9990 ρ = 0 α = 0 . β = 0100,100 0.8020 0.9580 0.9563 0.9630100,200 0.8170 0.9570 0.9589 0.9670200,200 0.8140 0.9850 0.9842 0.9870200,500 0.8510 0.9840 0.9840 0.9880 ρ = 0 α = 0 β = 0 . ρ = 0 . α = 0 . β = 0 .

6. Empirical Application

In the ﬁrst empirical application, we apply our proposed method to a U.S. macroeconomic dataset (Stock and Watson(2012)) to detect the possible structural breaks in the underlying factor model. We use the dataset adopted by Cheng et al.(2016), which comprises monthly observations of 102 U.S. macroeconomic variables. The sample begins after the GreatModeration and ranges from 1985:01 to 2013:01 ( T = 337). Following Bai et al. (2020), we focus on the subsample periodbetween 2001:12 and 2013:01 ( T = 134 , N = 102) because the complete data may have multiple breaks.Cheng et al. (2016) ﬁnd that 2007:12 is a single-break date, and that the pre-break and post-break subsamples have onefactor and two or three factors, respectively. Following Cheng et al. (2016), Bai et al. (2020) also set the number of factors

50 100020406080100120140160

BKW

BHS MS QML (a) ( ρ, α, β ) = (0 . , , BKW

BHS MS QML (b) ( ρ, α, β ) = (0 , . , BKW

BHS MS QML (c) ( ρ, α, β ) = (0 , , . BKW

BHS MS QML (d) ( ρ, α, β ) = (0 . , . , . Figure 1: Plots of the frequency of the estimated break points among 1000 replications for DGP 1.A and N =100 , T = 100. 23 BKW

BHS MS QML (a) ( ρ, α, β ) = (0 . , , BKW

BHS MS QML (b) ( ρ, α, β ) = (0 , . , BKW

BHS MS QML (c) ( ρ, α, β ) = (0 , , . BKW

BHS MS QML (d) ( ρ, α, β ) = (0 . , . , . Figure 2: Plots of the frequency of the estimated break points among 1000 replications for DGP 1.A and N =500 , T = 500. 24

50 100050100150200250300

BKW

BHS MS QML (a) ( ρ, α, β ) = (0 . , , BKW

BHS MS QML (b) ( ρ, α, β ) = (0 , . , BKW

BHS MS QML (c) ( ρ, α, β ) = (0 , , . BKW

BHS MS QML (d) ( ρ, α, β ) = (0 . , . , . Figure 3: Plots of the frequency of the estimated break points among 1000 replications for DGP 1.B and N =100 , T = 100. 25 BKW

BHS MS QML (a) ( ρ, α, β ) = (0 . , , BKW

BHS MS QML (b) ( ρ, α, β ) = (0 , . , BKW

BHS MS QML (c) ( ρ, α, β ) = (0 , , . BKW

BHS MS QML (d) ( ρ, α, β ) = (0 . , . , . Figure 4: Plots of the frequency of the estimated break points among 1000 replications for DGP 1.B and N =500 , T = 500. 26 qual to one and two for the pre- and post-break subsamples, respectively. Then, they implement the LS estimation and obtainthe estimated break point ˆ k = 2008 : 12. To implement our QML method, we ﬁrst use Bai and Ng’s information criterionIC1 and determine three pseudo-factors in the complete sample. Based on this result, we compute our QML estimator andobtain 2007:07 as the estimated break point, using which we split the sample into pre- and post-break subsamples. IC1 ofBai and Ng (2002) detects two pre-break and three post-break factors. From the numbers of pre- and post-break factors andthat of pseudo-factors, one factor appears to emerge over time and the QML estimator is consistent based on Theorem 3. The second empirical application uses the weekly rate of return for Nasdaq 100 Index from April 18, 2019, to October1, 2020. As all companies have data starting from April 18, 2019, we choose that as the start date. Traditionally, the indexis limited to 100 common-stock issues, with only one issue allowed per user. Now, the index is limited to 100 issuers, someof which may have multiple issues as index components. The current index has 103 components, representing 100 issuers,four of which are from China: Baidu, JD.com, Ctrip, and NetEase. Thus, the sample size is T = 76 and N = 103. As IC1and IC2 of Bai and Ng (2002), the methods proposed by Onatski (2010), Ahn and Horenstein (2013), and Fan et al. (2019)yield diﬀerent numbers of pseudo-factors for all samples, we use diﬀerent number of factors r = 2 , , , , , ,

000 to 1 .

72 million stocks in a one-day transaction on February 13, 2020. A week afterthis, that is, the week of February 20, 2020, the stock market began to fall sharply, and two weeks later, U.S. stocks haltedfor the ﬁrst time. Thus, the factor loading matrix appears to have changed in the early days of the epidemic.

7. Conclusions

We study the QML method for estimating the break point in high-dimensional factor models with a single large structuralchange. We consider three types of change and develop an asymptotic theory for the QML estimator. We show that theQML estimator is consistent when the covariance matrices of the pre- or post-break factor loadings, or both, are singular. Inaddition, the estimation error of the QML estimator is O p (1) when there is a rotation type of change in the factor loadingmatrix, and thus, the covariance matrices of the pre- and post-break loadings are both nonsingular. The limiting distributionof the estimated break point can also be derived in this case. The simulation results validate the suitable performance of theQML estimator. We use the proposed method to estimate the break point for U.S. macroeconomic data and stocks data.The estimated break date is July 2007 for the macroeconomic data and February 20, 2020, for the stocks data. Appendix

In model (3), X = G Λ ′ + e, (1) = ( g , ...g T ) ′ , g t = Bf t for t ≤ k , and g t = Cf t for t > k . λ i and f t are always r -dimensional vectors and both Λ andΛ have dimension N × r . Let ˆ G = (ˆ g , ..., ˆ g T ) ′ denote the full-sample PCA estimator for G :ˆΣ ( k ) ≡ k − k X t =1 ˆ g t ˆ g ′ t , ˆΣ ( k ) ≡ ( T − k ) − T X t = k +1 ˆ g t ˆ g ′ t . For notational simplicity, let ˆΣ ≡ ˆΣ ( k ) and ˆΣ ≡ ˆΣ ( k ) .The QML objective function can be expressed as U NT ( k ) = k log | ˆΣ | + ( T − k ) log | ˆΣ | . If k = k , the objective function is U NT ( k ) = k log | ˆΣ | + ( T − k ) log | ˆΣ | , where ˆΣ = k − k P t =1 ˆ g t ˆ g ′ t , ˆΣ = ( T − k ) − T P t = k +1 ˆ g t ˆ g ′ t . Representations of ˆ g t . The full-sample PCA estimator ˆ G satisﬁes the following identity:ˆ G = 1 NT XX ′ ˆ GV − NT = GH + 1 NT e Λ G ′ ˆ GV − NT + 1 NT G Λ ′ e ′ ˆ GV − NT + 1 NT ee ′ ˆ GV − NT , (2)where H = Λ ′ Λ G ′ ˆ GV − NT /NT and V NT is a diagonal matrix comprising the eigenvalues of XX ′ /NT .Hence, for each period t , we haveˆ g t − H ′ g t = V − NT (cid:18) ˆ G ′ GT Λ ′ e t N + ˆ G ′ e Λ NT g t + ˆ G ′ ee t NT (cid:19) (3)Bai (2003) shows that ˆ g k +1 − H ′ g k +1 = O p ( δ − NT ) (4) T − T X t =1 k ˆ g t − H ′ g t k = O p ( δ − NT ) , and T − ( ˆ G ′ ˆ G − H ′ G ′ GH ) = O p ( δ − NT ) (5)From (A.1) and Lemma A.2 in Bai (2003), we have the following lemma: Lemma 1. (i). Under Assumptions 1–8, max m m − k X t = k − m k ˆ g t − H ′ g t k = O p ( 1 N ) , (6)max m m − k + m X t = k +1 k ˆ g t − H ′ g t k = O p ( 1 N ) . (7) ii). Under Assumptions 1–9, max m m − k X t = k − m k ˜ g t − H ′ g t k ≤ ¯ cN , (8)max m m − k + m X t = k +1 k ˜ g t − H ′ g t k ≤ ¯ cN , (9) where ¯ c > is a constant. Proof.

See the supplementary appendix. ✷ Both Σ and Σ are positive deﬁnite matrices. We ﬁrst consider the case in which both Σ and Σ are positive deﬁnite matrices.Following Baltagi et al. (2017), we deﬁne ζ t = ˆ g t ˆ g ′ t − H ′ g t g ′ t H , for t = 1 , · · · , T and ξ t = H ′ g t g ′ t H − Σ for t ≤ k ,ξ t = H ′ g t g ′ t H − Σ for t > k , where Σ = H ′ Σ G, H and Σ = H ′ Σ G, H are the pre- and post-breaks of H ′ E ( g t g ′ t ) H and H is the probability limit of H . Thus, we have ˆ g t ˆ g ′ t = Σ + ξ t + ζ t for t ≤ k , ˆ g t ˆ g ′ t = Σ + ξ t + ζ t for t > k .H is nonsingular by Proposition 1 of Bai (2003).For k ≤ k , ˆΣ = Σ + 1 k k X t =1 ξ t + 1 k k X t =1 ζ t , ˆΣ = k − kT − k [Σ − Σ ] + Σ + 1 T − k T X t = k +1 ξ t + 1 T − k T X t = k +1 ζ t , (10)Thus, ˆΣ − ˆΣ = k − kkk k X t =1 ( ξ t + ζ t ) − k k X t = k +1 ( ξ t + ζ t ) , ˆΣ − ˆΣ = k − k T − k (Σ − Σ ) + 1 T − k k X t = k +1 ( ξ t + ζ t ) + k − k ( T − k )( T − k ) T X t = k +1 ( ξ t + ζ t ) . (11)Before analyzing the consistency of the estimated fraction and the boundedness of the estimation error, we need toprove the following lemmas. For any given 0 < η ≤ min( τ , − τ ) and M >

0, deﬁne D η = { k : ( τ − η ) T ≤ k ≤ ( τ + η ) T } , D cη as the complement of D η , τ = k T , and D η,M = { k : ( τ − η ) T ≤ k ≤ ( τ + η ) T, | k − k | > M } . emma 2. Under Assumptions 1–8, ( i ) max [ τ T ] ≤ k ≤ k (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) k k X t =1 ξ t (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p ( 1 √ T ) , ( ii ) max k ∈ D η ,k

See the supplementary appendix. ✷ Lemma 3.

Under Assumptions 1–8, max k ∈ D cη ,k

See the supplementary appendix. ✷ Lemma 4.

Under Assumptions 1–8, for k ∈ D η,M and k < k , if both Σ and Σ are positive deﬁnite matrices, then kk − k log | ˆΣ ˆΣ − | = − kk k − k k X t = k +1 tr ( ξ t ˆΣ − ) + o p (1) , where the o p (1) term is uniform in k ∈ D η,M . Proof.

See the supplementary appendix. ✷ Lemma 5.

Under Assumptions 1–8, for | k − k | ≤ M and k < k , if both Σ and Σ are positive deﬁnite matrices, then ( T − k ) log | ˆΣ ˆΣ − | = ( k − k ) tr (Σ − Σ )Σ − + k X t = k +1 tr ( ξ t Σ − ) + o p (1) . Proof.

See the supplementary appendix. ✷ roof of ˆ τ − τ = o p (1) By symmetry, it suﬃces to study the case of k < k . Expanding U NT ( k ) − U NT ( k ) gives U NT ( k ) − U NT ( k ) = k log | ˆΣ ˆΣ − | + ( T − k ) log | ˆΣ ˆΣ − | − ( k − k ) log | ˆΣ ˆΣ − | . (12)To prove ˆ τ − τ = o p (1), we need to show that for any ε > η > P ( | ˆ τ − τ | > η ) < ε as ( N, T ) → ∞ , and that P (ˆ k ∈ D cη ) < ε . For notational simplicity, we write U NT ( k ) as U ( k ).As ˆ k = arg min k U ( k ), we have U (ˆ k ) − U ( k ) ≤

0. If ˆ k ∈ D cη , then min k ∈ D cη U ( k ) − U ( k ) ≤

0. This implies P (ˆ k ∈ D cη ) ≤ P ( min k ∈ D cη U ( k ) − U ( k ) ≤ ε > η > P ( min k ∈ D cη U ( k ) − U ( k ) ≤ < ε as N, T → ∞ .Suppose that min k ∈ D cη U ( k ) − U ( k ) ≤ k ∗ = arg min k ∈ D cη U ( k ) − U ( k ); then, U ( k ∗ ) − U ( k ) ≤ U ( k ∗ ) − U ( k ) | k ∗ − k | ≤

0. As k ∗ ∈ D cη , we have min k ∈ D cη U ( k ) − U ( k ) | k − k | ≤ U ( k ∗ ) − U ( k ) | k ∗ − k | ≤

0. Thus, min k ∈ D cη U ( k ) − U ( k ) ≤ k ∈ D cη U ( k ) − U ( k ) | k − k | ≤

0. Similarly,min k ∈ D cη U ( k ) − U ( k ) | k − k | ≤ k ∈ D cη U ( k ) − U ( k ) ≤

0. Therefore, the following two events are equivalent: { w : min k ∈ D cη U ( k ) − U ( k ) ≤ } = { w : min k ∈ D cη U ( k ) − U ( k ) | k − k | ≤ } . (13)Note that P (min x ∈X a ( x ) + b ( x ) ≤ ≤ P (min x ∈X a ( x ) + min x ∈X b ( x ) ≤

0) = P (min x ∈X a ( x ) + o p (1) ≤

0) (14)if b ( x ) = o p (1) uniformly for x ∈ X .Now, using (12) and (13), we have P ( min k ∈ D cη ,k

0) = P ( min k ∈ D cη ,k

0) (15)where min k ∈ D cη ,k ρ i ( X ) >

1. Thus, for g ( X ) toachieve its minimum value, all eigenvalues of X must be one (i.e., all eigenvalues of the symmetric matrix ˆΣ − / ˆΣ ˆΣ − / should be equal to one); thus, ˆΣ − / ˆΣ ˆΣ − / = I and ˆΣ = ˆΣ . This implies that g ( I r ) = 0 is a unique minimum of g ( X ).Note that ˆΣ − k − k P t =1 H ′ g t g ′ t H = o p (1); thus, ˆΣ − H ′ B Σ F B ′ H = o p (1) under Assumption 1 and the fact that g t = Bf t for t ≤ k . Similarly, ˆΣ − H ′ C Σ F C ′ H = o p (1). As H → p H is a nonsingular matrix and B and C are nonsingular as well,Assumption 1(ii) implies ˆΣ ˆΣ − → p H ′ B Σ F B ′ ( C Σ F C ′ ) − H − ′ = I r , which has positive eigenvalues not equal to one. Thus, the sign of (18) implies that there exists a positive constant c ∆ suchthat min k ∈ D c ,k g ( I r ) (19)with w.p.a.1 as N, T → ∞ , where c ∆ is a constant related to the diﬀerence ∆ = B Σ F B ′ − C Σ F C ′ .Thus, we obtain the result P (ˆ τ = τ ) = 1. ✷ Proof of Theorem 1

To prove ˆ k − k = O p (1), we need to show that for any ε >

0, there exists an

M > P ( | ˆ k − k | > M ) < ε as( N, T ) → ∞ . By the consistency of ˆ τ , for any ε > { τ , − τ } > η > P (ˆ k ∈ D cη ) < ε as ( N, T ) → ∞ . For the given η and M , we have D η,M = { k : ( τ − η ) T ≤ k ≤ ( τ + η ) T, | k − k | > M } ; thus, P ( | ˆ k − k | > M ) = P (ˆ k ∈ D cη )+ P (ˆ k ∈ D η,M ).Hence, it suﬃces to show that for any ε > η >

0, there exists an

M > P (ˆ k ∈ D η,M ) < ε as ( N, T ) → ∞ .Again, by symmetry, it suﬃces to study the case of k < k . Similar to the proof of consistency of ˆ τ , we have P ( min k ∈ D η,M ,k

0, (22) implies P ( max k ∈ D η,M ,k δ/ ≤ P ( max k ∈ D η,M ,k

2) + o (1)= P ( max ( τ − η ) T ≤ k

2) + o (1) . (23)Let m = k − k , P ( max ( τ − η ) T ≤ k

2) = P ( max M , (25)where the fourth line follows from max k ∈ D η,M ,k k ( k − k ) − k P t = k +1 ˆ g t ˆ g ′ t − ˆΣ k < δ holds for a suﬃciently large M by (24) andmin k ∈ D η,M ,k , (26)w.p.a.1 as N, T → ∞ . In addition, by (25), we have P ( (cid:12)(cid:12)(cid:12) min k ∈ D η,M ,k M → ∞ . This shows that P ( min k ∈ D η,M ,k

Let us recall (12), U ( k ) − U ( k ) = ( k − k ) (cid:16) kk − k log | ˆΣ ˆΣ − | + T − kk − k log | ˆΣ ˆΣ − | − log | ˆΣ ˆΣ − | (cid:17) . For the second term in the above equation, we have( T − k ) log | ˆΣ ˆΣ − | = ( k − k ) tr (Σ − Σ )Σ − + k X t = k +1 tr ( ξ t Σ − ) + o p (1) , y Lemma 5. Similarly, by (11) and Lemma 2, we have k log | ˆΣ ˆΣ − | = k log | ( ˆΣ − ˆΣ ) ˆΣ − + I | , = k log | h k − kkk k X t =1 ( ξ t + ζ t ) − k k X t = k +1 ( ξ t + ζ t ) i ˆΣ − + I | , = k · tr ( k − kkk k X t =1 ( ξ t + ζ t ) ˆΣ − ) − k · tr ( 1 k k X t = k +1 ( ξ t + ζ t ) ˆΣ − ) + o p (1) , = − k X t = k +1 tr ( ξ t Σ − ) + o p (1) . Thus, U ( k ) − U ( k ) d −→ k X t = k +1 tr ( ξ t (Σ − − Σ − )) + ( k − k ) tr (cid:0) Σ Σ − − r − log | Σ Σ − | (cid:1) Similarly, for the case of k > k , the limit can be written as k P t = k +1 tr ( ξ t (Σ − − Σ − ))+( k − k ) tr (cid:0) Σ − Σ − r − log | Σ − Σ | (cid:1) . ✷ Σ or Σ , or both, is a singular matrix. Before proving the theorem, we need to prove the following lemmas, where A − denotes the MP inverse of A , ρ i ( A )represents the i -th eigenvalue of matrix A , and σ i ( A ) represents the i -th singular value of matrix A . Lemma 6.

Under Assumptions 1–8, max k ∈ [[ τ T ] ,k ] k k X s =1 k T X t =1 ˆ g t e ′ t e s /NT k = O p ( δ − NT )max k ∈ [ k , [ τ T ]] T − k T X s = k +1 k T X t =1 ˆ g t e ′ t e s /NT k = O p ( δ − NT ) . Proof : By symmetry, it is suﬃcient to focus on the case of k ∈ [ k , [ τ T ]].1 N T ( T − k ) T X s = k +1 k T X t =1 e ′ s e t ˆ g ′ t k ≤ N T ( T − k ) T X s = k +1 k T X t =1 (ˆ g t − H ′ g t ) e ′ t e s k + 2 N T ( T − k ) T X s = k +1 k T X t =1 H ′ g t e ′ t e s k Recall that E ( e ′ t e s ) /N = γ N ( s, t ). Consider the equationmax k ∈ [ k , [ τ T ]] N T ( T − k ) T X s = k +1 k T X t =1 H ′ g t e ′ t e s k ≤ T (1 − τ ) T X s =1 k T X t =1 H ′ g t e ′ t e s − E ( e ′ t e s ) N k + 2 T (1 − τ ) T X s =1 k T X t =1 H ′ g t γ N ( s, t ) k , here the ﬁrst term can be written as2 T (1 − τ ) N T X s =1 k √ NT T X t =1 N X i =1 H ′ g t [ e it e is − E ( e it e is )] k = O p (cid:16) NT (cid:17) (28)under Assumption 8(i) and the second term is O p ( T − ) because the expectation can be bounded by2 T T X s =1 E k T X t =1 g t γ N ( s, t ) k ≤ T T X s =1 T X t =1 T X u =1 E ( k g t kk g u k ) | γ N ( s, t ) || γ N ( s, u ) |≤ T T X s =1 T X t =1 T X u =1 max t E k g t k | γ N ( s, t ) || γ N ( s, u ) |≤ T T X s =1 max t E k g t k T X t =1 | γ N ( s, t ) | ! = O ( T − ) , (29)where we use the facts that E ( k g t kk g u k ) ≤ [ E k g t k E k g u k ] / ≤ max t E k g t k under Assumptions 1 and P Tt =1 | γ N ( s, t ) | ≤ M by Assumption 3(ii).Next, consider the termmax k ∈ [ k , [ τ T ]] N T ( T − k ) T X s = k +1 k T X t =1 (ˆ g t − H ′ g t ) e ′ t e s k ≤ T (1 − τ ) T X s =1 k T X t =1 (ˆ g t − H ′ g t ) e ′ t e s − E ( e ′ t e s ) N k + 2 T (1 − τ ) T X s =1 k T X t =1 (ˆ g t − H ′ g t ) γ N ( s, t ) k , where the ﬁrst term can be bounded by2 T (1 − τ ) T X s =1 k T X t =1 (ˆ g t − H ′ g t ) e ′ t e s − E ( e ′ t e s ) N k ≤ T (1 − τ ) N T X s =1 T X t =1 k ˆ g t − H ′ g t k T X t =1 " √ N N X i =1 e it e is − E ( e it e is ) = 2 NT (1 − τ ) T X t =1 k ˆ g t − H ′ g t k · T T X s =1 T X t =1 " √ N N X i =1 e it e is − E ( e it e is ) = O p (cid:18) Nδ NT (cid:19) , (30)by Assumption 3(v) and the second term is bounded by2 T (1 − τ ) T X s =1 k T X t =1 (ˆ g t − H ′ g t ) γ N ( s, t ) k ≤ T T X t =1 k ˆ g t − H ′ g t k T (1 − τ ) T X s =1 T X t =1 | γ N ( s, t ) | = O p (cid:18) T δ NT (cid:19) (31)because P Tt =1 | γ N ( s, t ) | ≤ ( P Tt =1 | γ N ( s, t ) | ) ≤ M under Assumption 3(ii). Combining the results obtained in (28)–(31),we obtain the desired result. ✷ When C is singular, ˆΣ ( k ) converges in probability to a singular matrix for k ≥ k . In ﬁnite samples, however, thesmallest eigenvalue of ˆΣ ( k ) is not zero. The following lemma establishes a lower bound for the smallest eigenvalue of ˆΣ ( k ),which ensures that it is meaningful to compute the logarithm of | ˆΣ ( k ) | in the objective function for any given sample size.Symmetrically, a similar lower bound can be established for the smallest eigenvalue of ˆΣ ( k ) for k ≤ k when B is singular.Because of space restrictions, Proposition 1 here only states the result for the case of ˆΣ ( k ). roposition 1. Under Assumptions 1–10, for k ≥ k and k ≤ [ τ T ] , if C is singular and √ N/T → as N, T → ∞ , thereexists a constant c U ≥ c L > such that P (cid:18) min k ∈ [ k , [ τ T ]] ρ j ( ˆΣ ( k )) ≥ c L N (cid:19) → ,P (cid:18) max k ∈ [ k , [ τ T ]] ρ j ( ˆΣ ( k ) ≤ c U N (cid:19) → , for j = r + 1 , ..., r . Proof : Part 1.

For k ≥ k , ˆΣ ( k ) = ( T − k ) − ˆ G k ′ ˆ G k , where ˆ G k = [ˆ g k +1 , ..., ˆ g T ] ′ . Let X k = [ X k +1 , ..., X T ] ′ , e k = [ e k +1 , ..., e T ] ′ ,and G k = [ G k +1 , ..., G T ] ′ .From XX ′ ˆ G/NT = ˆ GV NT , eq. (2) impliesˆ G k = X k X ′ ˆ GV − NT /NT = 1 NT ( G k Λ ′ + e k )(Λ G ′ + e ′ ) ˆ GV − NT ˆ G k − G k H = 1 NT e k Λ G ′ ˆ GV − NT + 1 NT e k e ′ ˆ GV − NT + 1 NT G k Λ ′ e ′ ˆ GV − NT . (32)In addition, note that ˆΣ ( k ) − T − k ˆ G k ′ M F k ˆ G k = 1 T − k ˆ G k ′ P F k ˆ G k ≥ , where P F k = F k ( F k ′ F k ) − F k ′ , M F k = I T − k − P F k , and F k = [ f k +1 , ..., f T ] ′ . Thus, Weyl’s inequality for eigenvalues impliesmin k ∈ [ k , [ τ T ]] ρ j ( ˆΣ ( k )) ≥ min k ∈ [ k , [ τ T ]] h ρ j (cid:16) T − k ˆ G k ′ M F k ˆ G k (cid:17) + ρ r (cid:16) T − k ˆ G k ′ P F k ˆ G k (cid:17)i ≥ min k ∈ [ k , [ τ T ]] ρ j (cid:16) T − k ˆ G k ′ M F k ˆ G k (cid:17) , for j = r + 1 , ..., r (33)Thus, it suﬃces to ﬁnd the lower bound for min k ∈ [ k , [ τ T ]] ρ j (cid:16) ˆ G k ′ M F k ˆ G k / ( T − k ) (cid:17) . As F k C ′ = G k for k ≥ k , we have M F k G k = 0 and 1 T − k ˆ G k ′ M F k ˆ G k = 1 T − k ( ˆ G k − G k H ) ′ M F k ( ˆ G k − G k H ) . (34)Now, using (32), we can obtain1 √ T − k M F k ( ˆ G k − G k H ) = 1 √ T − k M F k (cid:16) NT e k Λ G ′ ˆ GV − NT + 1 NT G k Λ ′ e ′ ˆ GV − NT + 1 NT e k e ′ ˆ GV − NT (cid:17) = a k + a k + a k . (35)Let us consider the term a k in (35). As σ i ( A + B ) ≤ σ i ( A ) + σ ( B ), we have σ j (cid:18) e k Λ N √ T − k G ′ ˆ GT V − NT (cid:19) ≤ σ j (cid:18) M F k e k Λ N √ T − k G ′ ˆ GT V − NT (cid:19) + σ (cid:18) P F k e k Λ N √ T − k G ′ ˆ GT V − NT (cid:19) , hich implies min k ∈ [ k , [ τ T ]] σ j (cid:18) M F k e k Λ N √ T − k G ′ ˆ GT V − NT (cid:19) ≥ min k ∈ [ k , [ τ T ]] σ j (cid:18) e k Λ N √ T − k G ′ ˆ GT V − NT (cid:19) − max k ∈ [ k , [ τ T ]] σ (cid:18) P F k e k Λ N √ T − k G ′ ˆ GT V − NT (cid:19) ≥ √ N σ r (cid:18) G ′ ˆ GT V − NT (cid:19) s min k ∈ [ k , [ τ T ]] ρ j (cid:18) Λ ′ e k ′ e k Λ N ( T − k ) (cid:19) − max k ∈ [ k , [ τ T ]] vuut N ( T − k ) ρ V − NT ˆ G ′ GT (cid:18) Λ ′ e k ′ F k √ NT (cid:19) (cid:18) F k ′ F k T (cid:19) − (cid:18) F k ′ e k Λ √ NT (cid:19) G ′ ˆ GT V − NT ! ≥ √ N c · σ r (cid:18) G ′ ˆ GT V − NT (cid:19) + O p (cid:18) √ NT (cid:19) ≥ √ N c, w.p.a. c > σ j ( AB ) ≥ σ j ( A ) σ r ( B ) and the relation ρ r ( A ′ A ) / = σ r ( A ), and the ﬁfthline uses Assumptions 9 and 10 and the fact that G ′ ˆ GT V − NT is nonsingular as N, T → ∞ by Proposition 1 and Lemma A.3 ofBai (2003).The term a k in (35) is zero because M F k G k = 0. For term a k in (35), we can obtain its upper bound as k NT √ T − k M F k e k e ′ ˆ GV − NT k ≤ k NT √ T − k e k e ′ ˆ GV − NT k + 2 k NT √ T − k P F k e k e ′ ˆ GV − NT k = 1 N T ( T − k ) 2 tr (cid:16) V − NT ˆ G ′ ee k ′ e k e ′ ˆ GV − NT (cid:17) + 1 N T ( T − k ) 2 tr (cid:16) V − NT ˆ G ′ ee k ′ P F k e k e ′ ˆ GV − NT (cid:17) . (37)For the ﬁrst term in (37), we have1 N T ( T − k ) k V − NT ˆ G ′ ee k ′ e k e ′ ˆ GV − NT k = 1 N T ( T − k ) k V − NT T X s = k +1 T X t =1 ˆ g t e ′ t e s e ′ s T X u =1 e u ˆ g ′ u V − NT k≤ T − k k V − NT k T X s = k +1 k T X t =1 ˆ g t e ′ t e s /NT k . Note that max k ∈ [ k , [ τ T ]] ( T − k ) − T X s = k +1 k T X t =1 ˆ g t e ′ t e s /NT k = O p ( δ − NT ) (38)by Lemma 6.For the second term in (37), we can obtainmax k ∈ [ k , [ τ T ]] (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) V − NT ˆ G ′ ee k ′ F k NT ( T − k ) (cid:18) F k ′ F k T − k (cid:19) − F k ′ e k e ′ ˆ GNT ( T − k ) V − NT (cid:13)(cid:13)(cid:13)(cid:13)(cid:13) = O p ( δ − NT ) . (39)To observe this, note that [ F k ′ F k / ( T − k )] − = O p (1) uniformly over k ∈ [ k , [ τ T ]] and1 NT ( T − k ) ˆ G ′ ee k ′ F k = 1 NT ( T − k ) T X s = k +1 T X t =1 ˆ g t e ′ t e s f ′ s = 1 T ( T − k ) T X s = k +1 T X t =1 ˆ g t (cid:20) e ′ t e s N − γ N ( s, t ) (cid:21) f ′ s + 1 T ( T − k ) T X s = k +1 T X t =1 ˆ g t γ N ( s, t ) f ′ s = O p ( δ − NT ) niformly over k ∈ [ k , [ τ T ]] following the derivation of terms I and II in Lemma B.2 of Bai (2003).Thus, combining the results in (37)–(39), we havemax k ∈ [ k , [ τ T ]] σ ( 1 NT √ T − k M F k e k e ′ ˆ GV − NT ) ≤ max k ∈ [ k , [ τ T ]] k NT √ T − k M F k e k e ′ ˆ GV − NT k = O p ( δ − NT ) . (40)Next, rearranging the terms in (35) yields a k = √ T − k M F k ( ˆ G k − G k H ) − a k , which implies that σ j ( a k ) ≤ σ j (cid:16) √ T − k M F k ( ˆ G k − G k H ) (cid:17) + σ ( − a k ) and min k ∈ [ k , [ τ T ]] σ j (cid:18) √ T − k M F k ( ˆ G k − G k H ) (cid:19) ≥ min k ∈ [ k , [ τ T ]] σ j ( a k ) − max k ∈ [ k , [ τ T ]] σ ( a k ) ≥ √ N c, w.p.a. c > σ ( a k ) is uniformly O p ( δ − NT ) by (40) and dominated by σ j ( a k ) in (36) under the condition that √ N /T → N, T → ∞ . Hence, combining (33), (34), and (41) yieldsmin k ∈ [ k , [ τ T ]] ρ j ( ˆΣ ( k )) ≥ min k ∈ [ k , [ τ T ]] ρ j (cid:16) T − k ( ˆ G k − G k H ) ′ M F k ( ˆ G k − G k H ) (cid:17) = min k ∈ [ k , [ τ T ]] σ j (cid:18) √ T − k M F k ( ˆ G k − G k H ) (cid:19) ≥ N c = 1 N c L , w.p.a.1 if √ N /T → N, T → ∞ . Part 2.

Note that σ j ( ˆ G k / √ T − k ) ≤ σ j ( G k H/ √ T − k ) + σ (( ˆ G k − G k H ) / √ T − k ) . In addition, σ j ( G k H/ √ T − k ) = 0 for r < j ≤ r andmax k ∈ [ k , [ τ T ]] σ (( ˆ G k − G k H ) / √ T − k ) ≤ max k ∈ [ k , [ τ T ]] √ T − k k ˆ G k − G k H k ≤ p (1 − τ ) T k ˆ G − GH k ≤ c √ N w.p.a.1 for some 0 < c < ∞ by Lemma 1 (ii) if √ N/T → N, T → ∞ . Thus,max k ∈ [ k , [ τ T ]] ρ j ( ˆΣ ( k )) = max k ∈ [ k , [ τ T ]] σ j ( ˆ G k / √ T − k ) ≤ c U /N, w.p.a. . as N, T → ∞ for j = r + 1 , ..., r and some positive constant c U . ✷ The following lemma yields a bound on the diﬀerence between | ˆΣ | and | ˆΣ ( k ) | for k < k ≤ τ T when C is singular.The same result applies to the diﬀerence between | ˆΣ | and | ˆΣ ( k ) | for τ T ≤ k < k when B is singular. Lemma 7.

Under Assumptions 1–10, for k > k and k ≤ [ τ T ] , if C is singular and T /N → κ as N, T → ∞ for < κ < ∞ ,then max k k , we haveˆ g t = H ′ Cf t + (ˆ g t − H ′ g t ) = H ′ C Σ / f ε t + (ˆ g t − H ′ g t )= A / k U k ε t + ( H ′ C Σ / f − A / k U k ) ε t + (ˆ g t − H ′ g t )= A / k U k ε t + O p ( δ − NT ) ε t + (ˆ g t − H ′ g t )by (47) and ˆ G ′ − A / k U k E ′ = O p ( δ − NT ) · E ′ + ˆ G ′ − H ′ G ′ (48)where E ′ ≡ [ ε k +1 , ..., ε k ], G ′ ≡ [ g k +1 , ..., g k ], and O p ( δ − NT ) term is uniform in k < k ≤ τ T . In addition, " T − k T X t = k +1 H ′ g t g ′ t H = 0 , for r − r ≥ . Thus, we have1 T − k max k

1, we havemax k k , and thelast equality follows from Lemma 1 and the result that 1 / | A k | = O p ( N ) by proposition 1 for r = r + 1. Hence, under thecondition T ∝ N , we have max k

Under Assumptions 1–11, for τ T ≤ k < k , if C is singular and T /N → κ as N, T → ∞ for < κ < ∞ , then | ˆΣ ( k ) | − | ˆΣ || ˆΣ | ≥ c · ( k − k ) w . p . a . for a constant c > as N, T → ∞ . Proof :Let us rewrite ˆΣ ( k ) asˆΣ ( k ) = 1 T − k T X t = k +1 ˆ g t ˆ g ′ t = 1 T − k T X t = k +1 ˆ g t ˆ g ′ t + 1 T − k k X t = k +1 ˆ g t ˆ g ′ t = D k + 1 T − k ˆ G ′ ˆ G , where D k ≡ ( T − k ) − P Tt = k +1 ˆ g t ˆ g ′ t and ˆ G ≡ [ˆ g k +1 , ..., ˆ g k ] ′ . By (7.10) of Lange (2010), we have | ˆΣ ( k ) | = | D k | · | I k − k + 1 T − k ˆ G D − k ˆ G ′ |≥ | D k | h ρ (cid:16) T − k ˆ G D − k ˆ G ′ (cid:17)i (55)We would like to ﬁnd the lower bound of the largest eigenvalue of matrix T − k ˆ G D − k ˆ G ′ , which can be written as1 T − k ˆ G D − k ˆ G ′ = 1 | D k | (cid:16) T − k ˆ G D k ˆ G ′ (cid:17) = 1 | D k | ( T − k ) ˆ G  D k − " T − k T X t = k +1 H ′ g t g ′ t H  ˆ G ′ + 1 | D k | ( T − k ) ˆ G " T − k T X t = k +1 H ′ g t g ′ t H ˆ G ′ ≡ | D k | ( P + P ) . (56)The subsequent proof will be performed in two steps. Step 1 . When r − r >

0, we have max τ T ≤ k c w . p . a . or a constant c > N, T → ∞ , because C B = 0 and Σ f is positive deﬁnite according to Assumption 11 (i). As | Σ f | /ρ (Σ f ) > k − k ρ (cid:16) G C ′ Q k C G ′ (cid:17) ≥ c w.p.a.1 for a constant c > N, T → ∞ .For k − k being bounded, ρ (cid:16) C G ′ G C ′ (cid:17) ≥ ρ (cid:16) C g k g ′ k C ′ (cid:17) = ρ (cid:16) C Bf k f ′ k B ′ C ′ (cid:17) = f ′ k B ′ C ′ C Bf k > c (64)for a constant c > C Bf k = 0. Combining (62), (63), and (64), we have ρ ( G C ′ Q k C G ′ ) / ( k − k ) > c w.p.a.1 for a constant c >

0. Thus, we can obtain the lower bound for the RHS of (61) as1 √ k − k σ ( Q / k C H ′ ˆ G ′ ) ≥ | H | c / − O p ( N − / )1 k − k ρ ( ˆ G H C ′ Q k C H ′ ˆ G ′ ) ≥ c w . p . a . N, T → ∞ for a constant c >

0, because H has a nonsingular limit.Based on (56) and Weyl’s inequality, we have1 k − k ρ (cid:16) T − k ˆ G D − k ˆ G ′ (cid:17) ≥ | D k | ( k − k ) ρ r ( P ) + 1 | D k | ( k − k ) ρ ( P )= 1 | D k | ( k − k ) ρ r ( P ) + 1 | D k | ( T − k ) ρ ( ˆ G H C ′ Q k C H ′ ˆ G ′ ) k − k ≥ O p ( T − ) + c NT − k | {z } dominating term w . p . a . c > N, T → ∞ by (58) and (65), where the last line follows from proposition 1 and the O p ( T − ) term isuniform over τ T ≤ k < k . Step 2 . When r − r ≥ r = 0, the term P in (56) is zero. For the term P in (56), we haveˆ G D k ˆ G ′ = [ G H + ( ˆ G − G H )] D k [ H ′ G ′ + ( ˆ G ′ − H ′ G ′ )] . Using similar techniques to those in (60) and (61), we have1 √ k − k σ ( D / k ˆ G ′ ) ≥ √ k − k σ ( D / k H ′ G ′ ) − √ k − k σ ( D / k ( ˆ G ′ − H ′ G ′ )) ≥ √ k − k σ ( D / k H ′ G ′ ) − k D / k ( ˆ G ′ − H ′ G ′ ) k√ k − k = 1 √ k − k σ ( D / k H ′ G ′ ) − O p ( N − ( r − r ) / ) , (67)where the last line is based on the fact that1 k − k k D / k ( ˆ G ′ − H ′ G ′ ) k = 1 k − k tr [( ˆ G − G H ) D k ( ˆ G ′ − H ′ G ′ )] ≤ ρ ( D k ) tr (cid:20) ( ˆ G ′ − H ′ G ′ )( ˆ G − G H ) k − k (cid:21) = O p ( N − ( r − r ) )uniformly over τ T ≤ k < k under the condition N ∝ T , because of Lemma 1, and the fact that ρ ( D k ) = | D k | /ρ r ( D k ) = O p ( N − ( r − r ) ) O p ( N ) (68) y proposition 1.Now, it suﬃces to determine the lower bound of ρ ( G HD k H ′ G ′ ). Similar to (47), we have D / k U − H ′ C Σ / f = O p ( δ − NT ) , (69)uniformly over τ T ≤ k < k , where U = D / k (Σ / f C ′ H ) − . For t ≤ k , we have g t = Bf t ; thus, H ′ g t = H ′ Bf t = D / k U ε t + ( H ′ Bf t − D / k U ε t )and H ′ G ′ = D / k U E ′ + ( H ′ G ′ − D / k U E ′ ) , (70)where we set f t = Σ / f ε t with E ( ε t ε ′ t ) = I r and E ≡ [ ε k +1 , ..., ε k ] ′ .First, we consider the case in which k − k is bounded. Note that D k has r eigenvalues of order O p ( N − ( r − r ) ) and r − r eigenvalues of order O p ( N − ( r − r − ) by (68), and let v ( r × r ) and v ( r × ( r − r )) denote the corresponding eigenvectors.By proposition 1, for t ≤ k , ε ′ t U D / k D k D / k U ε t = | D k | ( ε ′ t U U ε t ) = O p ( N − ( r − r ) ) , (71)which implies that D / k U ε t lies in the space spanned by v . Thus, for t ≤ k , f t C ′ HD k H ′ Cf t = k D / k [( H ′ C Σ / f ε t − D / k U ε t ) + D / k U ε t ] k ≤ k D / k ( H ′ C Σ / f − D / k U ) ε t k + O p ( N − ( r − r ) ) ≤ ρ ( D k ) k ( H ′ C Σ / f − D / k U ) ε t k + O p ( N − ( r − r ) ) = O p ( N − ( r − r ) ) , (72)where the second line follows from (71) and the last line follows from (68) and (69) under the condition N ∝ T .To bound ρ ( G HD k H ′ G ′ ), we consider D / k H ′ g k = D / k [ H ′ Bf k − Proj( H ′ Bf k | H ′ C Σ / f )] + D / k Proj( H ′ Bf k | H ′ C Σ / f )= D / k [ H ′ Bf k − Proj( H ′ Bf k | D / k U ) + O p ( δ − NT )] + D / k Proj( H ′ Bf k | H ′ C Σ / f )= D / k [ H ′ Bf k − Proj( H ′ Bf k | D / k U )] + O p ( N − ( r − r ) / ) , (73)where Proj( A | Z ) denotes the projection of A onto the columns of Z , the second line follows from (69), and the O p ( N − ( r − r ) / )term in the third line follows from (68) and the fact that D / k Proj( H ′ Bf k | H ′ C Σ / f ) = O p ( N − ( r − r ) / ) by (72), becauseProj( H ′ Bf k | H ′ C Σ / f ) is a linear combination of H ′ C Σ / f columns. Under Assumption 11 (iii), according to which k Bf k − Proj( Bf k | C ) k ≥ d >

0, we have H ′ Bf k − Proj( H ′ Bf k | H ′ C Σ / f ) bounded away from zero. This implies that the term H ′ Bf k − Proj( H ′ Bf k | D / k U ) in the last line of (73) is also bounded away from zero and lies in the space spanned by v ,because it is, by design, orthogonal to D / k U , which lies in the space of v by (71). As v corresponds to the O p ( N − ( r − r − )eigenvalues of D k , we have ρ ( G HD k H ′ G ′ ) ≥ g ′ k HD k H ′ g k ≥ ρ r − r ( D k ) k H ′ g k k = | D k | ρ r +1 ( D k ) k H ′ g k k ≥ Nc U k H ′ g k k | D k | w.p.a.1 as N, T → ∞ by proposition 1. Thus, for k − k being bounded, ρ ( G HD k H ′ G ′ ) ≥ g ′ k HD k H ′ g k ≥ c N · | D k | (74) .p.a.1. for a constant c > N, T → ∞ under the condition N ∝ T .Second, we consider the case in which k − k → ∞ as N, T → ∞ . Using (70), we rewrite G HD k H ′ G ′ as G HD k H ′ G ′ = [( G H − E U ′ D / k ) + E U ′ D / k ] D k [ D / k U E ′ + ( H ′ G ′ − D / k U E ′ )] . Based on the same techniques as in (60) and (61), the decomposition in (70) implies σ ( D / k ( H ′ G ′ − D / k U E ′ )) ≤ σ ( D / k H ′ G ′ ) + σ ( − D / k D / k U E ′ ) , so we have 1 √ k − k σ ( D / k H ′ G ′ ) ≥ √ k − k σ ( D / k ( H ′ G ′ − D / k U E ′ )) − √ k − k σ ( D / k D / k U E ′ ) ≥ √ k − k σ ( D / k ( H ′ G ′ − D / k U E ′ )) − | D / k | σ ( U ) σ (cid:18) E ′ √ k − k (cid:19) = 1 √ k − k σ ( D / k ( H ′ G ′ − D / k U E ′ )) − O p ( N − ( r − r ) / ) O p (1) (75)where the second inequality is based on the fact that σ ( AB ) ≤ σ ( A ) σ ( B ) and the last line follows from the facts that | D k | is uniformly O p ( N − ( r − r ) ) by proposition 1, σ ( U ) ≤ k U k = O p (1) by the structure of U , and √ k − k σ ( E ′ ) ≤ p kEk / ( k − k ) = O p (1) uniformly over τ T ≤ k < k .Next, we need to determine the lower bound of ρ (( G H − E U D / k ) D k ( H ′ G ′ − D / k U E ′ )) / ( k − k ). From (69), wehave H ′ G ′ − D / k U E ′ = H ′ G ′ − H ′ C Σ / f E ′ + ( H ′ C Σ / f E ′ − D / k U E ′ )= H ′ G ′ − H ′ C F ′ + O p ( δ − NT ) E ′ = H ′ ( B − C ) F ′ + O p ( δ − NT ) E ′ (76)where F ′ ≡ [ f k +1 , ..., f k ] = Σ / f E ′ in the second line. Again, using the inequality in (60), we have σ ( D / k ( H ′ G ′ − H ′ C Σ / f E ′ )) ≤ σ ( D / k ( H ′ G ′ − D / k U E ′ )) + σ ( − D / k ( H ′ C Σ / f E ′ − D / k U E ′ )) . Thus, in combination with (76), we obtain σ ( D / k ( H ′ G ′ − D / k U E ′ )) ≥ σ ( D / k ( H ′ G ′ − H ′ C Σ / f E ′ )) − σ ( D / k ( H ′ C Σ / f E ′ − D / k U E ′ ))1 √ k − k σ ( D / k ( H ′ G ′ − D / k U E ′ )) ≥ √ k − k σ ( D / k ( H ′ ( B − C ) F ′ ) − √ k − k σ ( D / k O p ( δ − NT ) E ′ ) ≥ r k − k ρ ( D k ( H ′ ( B − C ) F ′ F ( B − C ) ′ H ) + O p ( N − ( r − r ) / ) ≥ " ρ ( D k ) ρ r H ′ ( B − C ) 1 k − k k X t = k +1 f t f ′ t ( B − C ) ′ H ! / + O p ( N − ( r − r ) / )= (cid:0) ρ ( D k )[ ρ r ( H ′ ( B − C )Σ f ( B − C ) ′ H ) + o p (1)] (cid:1) / | {z } leading term + O p ( N − ( r − r ) / ) ≥ ρ ( D k ) / c (77) .p.a.1 for a constant c > N, T → ∞ , where the O p ( N − ( r − r ) / ) term in the third line follows from (68) and thecondition N ∝ T , the last line follows from the fact that [ ρ r ( H ′ ( B − C )Σ f ( B − C ) ′ H )] / ≥ c > B − C = 0and Σ f are positive deﬁnite by Assumption 1, and the leading term in the last line is O p ( N − ( r − r − / ) by (68). Hence,combining (75) and (77) yields 1 √ k − k σ ( D / k H ′ G ′ ) ≥ ρ ( D k ) / c and1 k − k ρ ( G HD k H ′ G ′ ) ≥ | D k | ρ r ( D k ) c ≥ c c U N | D k | = c N · | D k | (78)w.p.a.1 for a constant c > N, T → ∞ .According to (67), (74), and (78), there exists a constant c > √ k − k σ ( D / k ˆ G ′ ) ≥ p c N · | D k | | {z } leading term − O p ( N − ( r − r ) / ) , w.p.a.1 as N, T → ∞ ; thus, we have 1 k − k ρ ( ˆ G D k ˆ G ′ ) ≥ c N · | D k | w . p . a . N, T → ∞ . Hence, 1 k − k ρ (cid:16) T − k ˆ G D − k ˆ G ′ (cid:17) = 1 | D k | T − k ρ (cid:0) ˆ G D k ˆ G ′ (cid:1) k − k ≥ c NT − k w . p . a . N, T → ∞ . Summarizing the results in (66) and (79), we obtain the lower bound of ρ (cid:0) T − k ˆ G D − k ˆ G ′ (cid:1) . Thus, steps 1 and2 are completed.Finally, using the lower bound of the largest eigenvalue of matrix T − k ˆ G D − k ˆ G ′ , we can rewrite (55) as | ˆΣ ( k ) | = | D k | · | I k − k + 1 T − k ˆ G D − k ˆ G ′ | ≥ | D k | h c (cid:16) k − kT − k (cid:17) · N i ≥ | D k | (cid:20) c ( k − k )(1 − τ ) T · N (cid:21) w . p . a . N, T → ∞ . Comparing | D k | and | ˆΣ | , we have | D k | − | ˆΣ | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T − k T − k T − k T X t = k +1 ˆ g t ˆ g ′ t !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T − k T X t = k +1 ˆ g t ˆ g ′ t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = h(cid:16) − k − kT − k (cid:17) r − i | ˆΣ | = (cid:20) − r ( k − k ) T − k + r ( r − (cid:16) k − kT − k (cid:17) + ... (cid:21) | ˆΣ | = − c ( k − k ) T − k | ˆΣ | for some positive constant c >

0. In addition, | D k | / | ˆΣ | = (cid:16) T − k T − k (cid:17) r ≥ (cid:16) − τ − τ (cid:17) r . Thus, | ˆΣ ( k ) | − | D k | + | D k | − | ˆΣ || ˆΣ | ≥ | D k || ˆΣ | c ( k − k )(1 − τ ) T · N − c ( k − k ) T − k w . p . a . ≥ (cid:16) − τ − τ (cid:17) r c ( k − k ) N (1 − τ ) T | {z } leading term − c ( k − k ) T − k w . p . a . , as N, T → ∞ , which implies the desired result under the condition N ∝ T . ✷ roof of Theorem 3 We ﬁrst prove the consistency of ˆ τ , then ˆ k − k = O p (1), and ﬁnally, ˆ k − k = o p (1). Again, it suﬃces to study the caseof k < k .To prove ˆ τ − τ = o p (1), we need to show that for any ε > η > P ( | ˆ τ − τ | > η ) < ε as N, T → ∞ . For anygiven 0 < η ≤ min( τ , − τ ), deﬁne D η = { k : ( τ − η ) T ≤ k ≤ ( τ + η ) T } and D cη as the complement of D η . Similar to theproof for the consistency of ˆ τ when B and C are nonsingular, we need to show that P (ˆ k ∈ D cη ) < ε . Recalling (13) and (15),we have P ( min k ∈ D cη ,k

0) = P ( min k ∈ D cη ,k

0) (80)(1). Consider the ﬁrst term kk − k log | ˆΣ ˆΣ − | . When Σ is of full rank, it follows thatmin k ∈ D cη ,k ηT .(2). For the second and third terms, let f ( ˆΣ , ˆΣ ) = min k ∈ D cη ,k

0, (84)implies that ρ i ( ˆΣ ˆΣ − ) is uniformly O p (1) and bounded away from zero; thus, (cid:12)(cid:12) T − kk − k r X i =1 log ρ i ( ˆΣ ˆΣ − ) (cid:12)(cid:12) = O p (1)uniformly over k ∈ D cη for k < k . In addition, we have ρ i ( ˆΣ ˆΣ − ) = O p ( T − ) uniformly over k for i = r + 1 , ..., r byproposition 1 when N ∝ T ; thus, log ρ i ( ˆΣ ˆΣ − ) → −∞ at the rate of log T for i = r + 1 , ..., r . Therefore, (83) can be ewritten as f ( ˆΣ , ˆΣ ) = − r X i =1 log ρ i ( ˆΣ ˆΣ − ) | {z } → + ∞ at the rate log T + O p (1) . When Σ is singular and Σ is singular or nonsingular, we can bound (83) by f ( ˆΣ , ˆΣ ) ≥ min k ∈ D cη ,k T by proposition 1 for i = 1 , ..., r − r ; thus, P ri =1 log ρ i ( ˆΣ − ) → + ∞ at the rate log T . In addition,when ρ (Σ ) >

0, we have ρ ( ˆΣ ) → p ρ (Σ ) >

0; thus, r log ρ ( ˆΣ ) = O p (1). When ρ (Σ ) = 0 (i.e., r = 0), we have ρ ( ˆΣ ) · T ≥ c > c > − r log ρ ( ˆΣ ) → + ∞ at the rate log T . For ρ r ( ˆΣ ), rearrangingthe terms in (84) yields ˆΣ = ( k − k ) Tk ( T − k ) ( k T Σ + T − k T Σ ) + kk T − k T − k Σ + o p (1)= ( k − k ) Tk ( T − k ) [ τ Σ + (1 − τ )Σ ] + kk T − k T − k Σ + o p (1) , where τ Σ + (1 − τ )Σ is a positive deﬁnite matrix under Assumption 11 (i). Thus, ρ r ( ˆΣ ) is O p (1) and bounded awayfrom zero w.p.a.1, and (cid:12)(cid:12) T − kk − k r log ρ r ( ˆΣ ) (cid:12)(cid:12) = O p (1)uniformly over k ∈ D cη for k < k . Combining the above results, we establish the following result: f ( ˆΣ , ˆΣ ) → + ∞ at therate log T . Together with (81) and (82), we havemin k ∈ D cη ,k , w.p.a.1; thus, P ( min k ∈ D cη ,k

0, and hence, ˆ τ → p τ .Next, we show that ˆ k − k = O p (1).Similar to the proof of Theorem 1, for given η and M , deﬁne D η,M = { k : ( τ − η ) T ≤ k ≤ ( τ + η ) T, | k − k | > M } ,such that P ( | ˆ k − k | > M ) = P (ˆ k ∈ D cη ) + P (ˆ k ∈ D η,M ). Hence, it suﬃces to show that for any ε > η >

0, there existsan

M > P (ˆ k ∈ D η,M ) < ε as ( N, T ) → ∞ . Similar to (20) and (80), it suﬃces to show that for any given ε > η >

0, there exists an

M > P ( min k ∈ D η,M ,k is singular,min k ∈ D η,M ,k N ∝ T and the fact that log | ˆΣ | = O p (1) because ˆΣ → p Σ is positive deﬁnite.(ii). When Σ is singular and Σ is either singular or positive deﬁnite, we havemin k ∈ D η,M ,k

0, because T − kk − k log( k − k )log( T ) > T − kk − k / log( Tk − k ) → ∞ .Thus, we have shown that the second and third terms dominate the ﬁrst term, and hence, (85) holds.To indicate the consistency of ˆ k , we will show that for any k < k and k − k ≤ M , the objective function V ( k ) = U ( k ) − U ( k ) diverges to inﬁnity as N, T → ∞ ; thus, the minimum U ( k ) cannot be achieved at a point other than k . For he given M , deﬁne D M = { k : | k − k | ≤ M } , thenmin k ∈ D M ,k U ( k ) → ∞ when k < k and k − k < M as N, T → ∞ . Thus, we prove the consistencyof ˆ k . ✷ References

Ahn, S. C. and Horenstein, A. R., (2013). Eigenvalue ratio test for the number of factors.

Econometrica 81(3) , pp. 1203–1227. hn, S. C., Lee, Y. H. and Schmidt, P., (2013). Panel data models with multiple time-varying individual eﬀects. Journal ofEconometrics 174 , pp. 1–14.Amemiya, T., (1971). The estimation of the variances in a variance-components model.

International Economic Review 12 ,pp. 1–13.Bai, J., (1997). Estimation Of A Change Point In Multiple Regression Models.

The Review of Economics and Statistics 4 ,pp. 551–563.Bai, J., Perron, P., (1998). Estimating and testing linear models with multiple structural change.

Econometrica 64 , pp.47–78.Bai, J., (2000). Vector autoregressive models with structural changes in regression coeﬃcients and variance-covariancematrices.

Annals of Economics and Finance 1 , pp. 301–306.Bai, J. and Ng, Serena, (2002). Determining the number of factors in approximate factor models.

Econometrica 70 , pp.191–221.Bai, J. (2003). Inferential theory for factor models of large dimensions.

Econometrica 71 , pp. 135–171.Bai, J., (2010). Common breaks in means and variances for panel data, Journal of Econometrics.

Journal of Econometrics 157 ,pp. 78–92.Bai, J. and Han, X., (2016). Structural changes in high dimensional factor models.

Front. Econ. China 11 , pp. 9–39.Bai, J., Han, X., Shi, Y., (2020). Estimation and inference of change points in high-dimensional factor models.

Journal ofEconometrics 219 , pp. 66–100.Baltagi, B.H., Kao, C., Wang, F., (2017). Identiﬁcation and estimation of a large factor model with structural instability.

Journal of Econometrics 197 , pp. 87–100.Baltagi, B.H., Kao, C., Wang, F., (2020). Estimating and testing high dimensional factor models with multiple structuralchanges.

Journal of Econometrics . https://doi.org/10.1016/j.jeconom.2020.04.005.Barigozzi, M.,Cho, H.andFryzlewicz, P., (2018). Simultaneous multiple change-point andfactor analysis for high-dimensionaltime series.

Journal of Econometrics 206 , pp. 187–225.Bates, B., Plagborg-Moller, M., Stock, J.H., Watson, M.W., (2013). Consistent factor estimation in dynamic factor modelswith structural instability.

Journal of Econometrics 177 , pp. 289–304.Breitung, J. and Eickmeier, S., (2011). Testing for structural breaks in dynamic factor models.

Journal of Econometrics 163 ,pp. 71–74.Chen, L., (2015). Estimating the common break date in large factor models.

Economics Letters 131 , pp. 70–74. hen, L., Dolado, J. and Gonzalo, J., (2014). Detecting big structural breaks in large factor models. Journal of Economet-rics 180 , pp. 30–48.Cheng, X., Liao, Z. and Schorfheide, F., (2016). Shrinkage estimation of high-Dimensional factor models with structuralinstabilities.

Review of Economic Studies 83 , pp. 1511–1543.Fa, J., Guo, J. and Zheng, S. (2020). Estimating number of factors by adjusted eigenvaluesthresholding.

Journal of theAmerican Statistical Association. https://doi.org/10.1080/01621459.2020.1825448.Han, X. and Inoue, A., (2015). Tests for parameter instability in dynamic factor models.

Econometric Theory 31 , pp.1117–1152.Kim, D., (2011). Estimating a common deterministic time trend break in large panels with cross sectional dependence.

Journal of Econometrics 164 , pp. 310–330.Lange, K. (2010). Numerical Analysis for Statisticians,

New York: Springer Verlag .Ma, S. and Su, L., (2018). Estimation of large dimensional factor models with an unknown number of breaks.

Journal ofEconometrics 207(1) , pp. 1–29.McAlinn, K., Rockova, V. and Saha, E., (2018). Dynamic Sparse Factor Analysis. https://arxiv.org/abs/1812.04187 .Onatski, A., (2010). Determinging the number of factors from empirical distribution of eigenvalues.

The review of Economicsand Statistics 92(4) , pp. 1004–1016.Qu, Z. and Perron, P., (2007). Estimating and testing structural changes in multivariate regressions.

Econometrica 75 , pp.459–502.Stock, J. H. and Watson, M. W., (2008). Forecasting in dynamic factor models subject to structural instability. UK: OxfordUniversity Press.Stock, J. H. and Watson, M. W., (2012). Disentangling the Channels of the 2007-09 Recession. Brookings Papers on EconomicActivity, pp. 81–156., pp.459–502.Stock, J. H. and Watson, M. W., (2008). Forecasting in dynamic factor models subject to structural instability. UK: OxfordUniversity Press.Stock, J. H. and Watson, M. W., (2012). Disentangling the Channels of the 2007-09 Recession. Brookings Papers on EconomicActivity, pp. 81–156.