[PDF] Oracle Efficient Estimation of Structural Breaks in Cointegrating Regressions

Abstract

In this paper, we propose an adaptive group lasso procedure to efficiently estimate structural breaks in cointegrating regressions. It is well-known that the group lasso estimator is not simultaneously estimation consistent and model selection consistent in structural break settings. Hence, we use a first step group lasso estimation of a diverging number of breakpoint candidates to produce weights for a second adaptive group lasso estimation. We prove that parameter changes are estimated consistently by group lasso and show that the number of estimated breaks is greater than the true number but still sufficiently close to it. Then, we use these results and prove that the adaptive group lasso has oracle properties if weights are obtained from our first step estimation. Simulation results show that the proposed estimator delivers the expected results. An economic application to the long-run US money demand function demonstrates the practical importance of this methodology.

Full PDF

OOracle Eﬃcient Estimation of Structural Breaksin Cointegrating Regressions

Karsten Schweikert ∗ [Latest update: November 10, 2020] Abstract

In this paper, we propose an adaptive group lasso procedure to eﬃciently estimatestructural breaks in cointegrating regressions. It is well-known that the group lasso es-timator is not simultaneously estimation consistent and model selection consistent instructural break settings. Hence, we use a ﬁrst step group lasso estimation of a divergingnumber of breakpoint candidates to produce weights for a second adaptive group lassoestimation. We prove that parameter changes are estimated consistently by group lassoand show that the number of estimated breaks is greater than the true number but stillsuﬃciently close to it. Then, we use these results and prove that the adaptive group lassohas oracle properties if weights are obtained from our ﬁrst step estimation. Simulationresults show that the proposed estimator delivers the expected results. An economicapplication to the long-run US money demand function demonstrates the practical im-portance of this methodology.

Keywords:

Adaptive Group Lasso; Change-points; Cointegration; Model Selection; USMoney Demand

JEL Classiﬁcation:

C22, C52

MSC Classiﬁcation: ∗ Address: University of Hohenheim, Core Facility Hohenheim & Institute of Economics, SchlossHohenheim 1 C, 70593 Stuttgart, Germany, e-mail: [email protected] a r X i v : . [ ec on . E M ] N ov Introduction

In this paper, we consider modelling cointegration relationships where the long-runequilibrium may diﬀer for subsamples, thereby allowing for (multiple) structural breaksin the cointegrating regression. We assume that cointegration holds over some (fairlylong) period of time, but then shifts to a new ‘long-run’ relationship. The numberof breaks and their location are unknown to the researcher. Although coeﬃcients oflong-run equilibrium equations are relatively persistent by deﬁnition, accounting for thepossibility of structural breaks is crucial in cointegration analysis, which usually involveslong sample periods. On the one hand, long time series are needed to study the long-runbehaviour of economic systems, on the other hand, employing long time series increasesthe likelihood of encountering structural change during the sample period. It is widelyknown that structural breaks, when present, can mask cointegrating relationships andrender cointegration tests uninformative (Campos et al., 1996; Gregory et al., 1996; Qu,2007). Hence, we propose a two-step approach to detect (multiple) structural breaks incointegrating regressions using penalized regression techniques.Since time series used for economic analyses have become very long in some in-stances, detecting (multiple) structural breaks has emerged as an important problemin the econometrics literature. For comprehensive surveys on structural breaks in timeseries models (‘change-point’ detection in the statistics literature or ‘pattern recogni-tion’ in the context of signal processing), see, for example, Perron (2006), Aue andHorváth (2013) and Niu et al. (2015). While classical structural break models for linearregressions attempt to detect one unknown break via a grid search procedure (Andrews,1993), it is not feasible to use grid searches for the detection of multiple breaks becausethe computational cost increases exponentially with the presumed number of breaks(needing least squares operations of order O ( T m ) for m breaks). Addressing this issue,Bai and Perron (1998, 2003) use dynamic programming techniques (henceforth Bai-Perron algorithm), requiring at most least-squares operations of order O ( T ) for anynumber of breaks, to add breaks sequentially to the model. Recently, several approacheshave been proposed that reframe the task of detecting and estimating structural breaksas a model selection problem employing penalized regressions and related model selec-tion techniques (Davis et al., 2006; Harchaoui and Lévy-Leduc, 2010; Jin et al., 2013;Chan et al., 2014; Ciuperca, 2014; Jin et al., 2016; Qian and Jia, 2016; Qian and Su,2016; Behrendt and Schweikert, 2020). Instead of grid search procedures which aug-ment linear regression models with parameter changes, model selection procedures take2 top-down approach and try to shrink the set of all possible breakpoint candidates tocontain only the true breakpoints. These approaches beneﬁt from high computationaleﬃciency and detect structural breaks with high accuracy.The theory for (multiple) structural breaks in cointegrating regressions is not nearlyas developed as the theory for change-points in the statistics and signal processing litera-ture. Most studies are concerned with cointegration testing in the presence of structuralinstability. One of the most popular cointegration tests with an unknown breakpoint isthe one proposed by Gregory and Hansen (1996a,b) in which the location of the breakcan be estimated via grid search at the minimum of the individual cointegration teststatistics. Hatemi-J (2008) extends the test to account for two breaks and Schweikert(2020) allows for the possibility of nonlinear adjustment to the long-run equilibrium.Maki (2012) employs a hybrid procedure, detecting m − m − Methodology

In the following, we specify a cointegrated system with multiple structural breaks atwhich it attains new equilibrium states. The cointegrated system does not deviatepersistently from each equilibrium until the next break occurs and a new equilibriumis maintained.

Let { y t } ∞ t =1 denote a scalar process generated by y t = m +1 X j =1 h µ + β j X t + u t i { t j − ≤ t < t j } , t = 1 , , . . . , (1)where t j , j ∈ { , , . . . m +1 } denote the breakpoints 1 = t < t < · · · < t m +1 = T +1. µ is the intercept, β j = ( β j , β j , . . . , β jN ) are regime-dependent coeﬃcients and { X t } ∞ t =1 ,where X t = ( X t , X t , . . . , X Nt ) , follows an N -vector integrated process X t = X t − + v t , t = 1 , , . . . , (2)where X = 0. { u t } ∞ t =1 and { v t } ∞ t =1 are mean-zero weakly stationary error processes.For expositional simplicity, we restrict our analysis to cointegrating regressions witha constant intercept across regimes. We make the following assumptions about thevector process w t = ( u t , v t ) : Assumption 1.

The vector process { w t } ∞ t =1 satisﬁes the following conditions:(i) Ew t = 0 for t = 1 , , . . . (ii) { w t } ∞ t =1 is weakly stationary.(iii) { w t } ∞ t =1 is strong mixing with mixing coeﬃcients of size − pβ/ ( p − β ) and E | w t | p < ∞ for some p > β > / .Further, we assume that the long-run covariance matrix Ω v = ∞ P j = −∞ Ev t v t − j is positivedeﬁnite. Note that this speciﬁcation of the process rules out integrated regressors with a deterministic driftcomponent. Relaxing this assumption would be relatively straightforward. Our main results generally hold if the coeﬃcients of included deterministic components, e.g.linear and quadratic trend terms, do not change over the sampling period. A discussion of breaks inthe intercept is included in the supplementary material to this paper. v is positive deﬁnite implies that X t is non-cointegrated. Wedenote the number of structural breaks by m . While the number of true structuralbreaks m is unknown, we assume that the maximum number of structural breaks m ∗ is known to the researcher. The estimated number of breakpoints is denoted by ˆ m . Thelocations of breakpoints relative to sample size, so-called break fractions, are denotedby τ j = t j /T, j ∈ { , , . . . m + 1 } .Throughout this paper, we use the following notation to present our main results:let y T = ( y , y , . . . , y T ) denote the vector containing T observations of our responsevariable and u T = ( u , u , . . . , u T ) denotes the error term vector. The vector of T observations for the N -dimensional variable X t is denoted by X = ( X , . . . , X T ) .Our design matrix Z T is an T × T N matrix deﬁned by Z T =  X . . . X X . . . X X X . . . X T X T X T . . . X T  , (3)and we deﬁne the Gram matrix Σ = Z T Z T /T . Adjacent columns of Z T diﬀer only byone entry which means that the columns are almost identical for T → ∞ . Consequently, Σ does not converge to a positive deﬁnite asymptotic counterpart. It follows thatthe restricted eigenvalue condition (Bickel et al., 2009) does not hold and we cannotestablish our consistency proofs based on this assumption. See Chan et al. (2014) for athorough discussion of this issue.We set θ = β and θ i =  β j +1 − β j , when i = t j , , otherwise , (4)for i = 2 , . . . , T . For the remainder of this article, θ i = means that θ i has all entriesequaling zero and θ i = means that θ i has at least one non-zero entry. The coeﬃcientvector θ ( T ) = ( θ , θ , . . . , θ T ) is of length T N and contains all time-speciﬁc parameterchanges. Because we treat structural breaks as rare events and assume that parameterchanges persist for some time, the number of non-zero elements in θ ( T ) is assumed to6e small, i.e. smaller than m ∗ + 1 groups of size N .We denote the true value of a parameter with a 0 superscript. { τ j , j = 1 , . . . , m } denotes the set of true break fractions and β j , j = 1 , . . . , m + 1 deﬁnes the truecoeﬃcient of the j -th regime. For technical reasons, we additionally set β = 0. Wedeﬁne the index sets ¯ A = { ≤ i ≤ T : θ i = } denoting the indices of truly non-zerocoeﬃcients (including the baseline coeﬃcient) and A = { i ≥ θ i = } denotingthe non-zero parameter changes. The index set obtained from our ﬁrst step estimationbelonging to all estimated non-zero parameter changes is denoted by A T = { i ≥ θ i = } . We note that the ﬁrst regime’s coeﬃcient (before the ﬁrst breakpoint) is notallowed to be zero. Since we indicate breakpoints with non-zero coeﬃcients in ourpenalized regression approach, the set A = { t , t , . . . , t m } is also used to denote truebreakpoints. Similarly, the set A T = { ˆ t , ˆ t , . . . , ˆ t ˆ m } denotes estimated breakpoints,i.e., indices of those coeﬃcients which are estimated to be non-zero. |A| denotes thecardinality of the set A and A c denotes the complementary set. We use those sets toindex rows and columns of vectors and matrices. For example, let Z T, A , Z T, A c containthe columns of Z T and θ A ( T ), θ A c ( T ) contain the rows of θ ( T ) associated with activeand inactive breakpoints, respectively.For notational convenience, we use ‘ ⇒ ’ to signify weak convergence of the associatedprobability measures and p → to denote convergence in probability. Continuous stochas-tic processes such as a Brownian motion B ( s ) on [0,1] are simply written as B if noconfusion is caused. We also write integrals with respect to the Lebesgue measure suchas R B ( s ) ds simply as R B . Throughout the paper, several (distinct) large constants areall denoted with C , while small constants are denoted by (cid:15) .Using these deﬁnitions, our cointegration model described in Equation (1) can beexpressed as a high-dimensional regression model in matrix form y T = Z T θ ( T ) + u T . (5)Since only m + 1 groups within θ ( T ) are truly non-zero, we need to obtain a sparsesolution to the high-dimensional regression problem in Equation (5). This means weframe the detection of structural breaks as a model selection problem and use availablemethods from this strand of the literature. To reduce the dimensionality of the esti- While the cointegrating vector (1 , ) in principle ensures that y t = µ + u t is stationary underour assumptions, we exclude this case to simplify our exposition. In the following, we need a cleardistinction between zero and non-zero coeﬃcients to decide whether their indices belong into the sets¯ A or ¯ A c . Allowing zero baseline coeﬃcients would require several case-by-case considerations. Q ∗ ( θ ( T )) = 1 T k y T − Z T θ ( T ) k + λ T T X i =1 k θ i k , (6)to obtain the group lasso estimator for θ ( T ) which is henceforth denoted by ˜ θ ( T ) =arg min θ ( T ) Q ∗ . λ T is the tuning parameter and k · k denotes the L -norm. Unfor-tunately, the group lasso estimator inherits the same problems, namely estimationineﬃciency and model selection inconsistency, as the plain lasso estimator. Similar tothe idea ﬁrst presented in Zou (2006), we reestimate the objective function with indi-vidual coeﬃcient weights to alleviate this problem and to try to reduce the number offalsely detected breaks. The statistical properties of adaptive group lasso estimatorsfor a ﬁxed number of groups are investigated in Wang and Leng (2008). Since we havea diverging set of breakpoint candidates, least squares estimation of the full model isnot feasible. However, we show that group lasso is a consistent estimator for non-zero parameter changes giving us appropriate weights for a second step adaptive grouplasso estimation. This approach is similar to the ideas put forth in Wei and Huang(2010), Horowitz and Huang (2013), Schmidt and Schweikert (2019), and Behrendtand Schweikert (2020).As will be demonstrated later, the group lasso estimator only slightly overselectsbreaks under the right tuning. The algorithm employed to estimate ˜ θ ( T ) allows to pre-specify the maximum number of breakpoint candidates M , i.e. the maximum numberof non-zero groups in ˜ θ ( T ), and the minimum distance between breaks. Since the grouplasso overselects breaks in the ﬁrst step, M should be set large enough to encompass alltrue breakpoints and some additional falsely selected non-zero groups. This conditionguarantees that ˜ θ ( T ) always contains M N elements. In turn,

T N − M N columns of Z T corresponding to zero coeﬃcients are eliminated during the ﬁrst step to result inthe T × M N design matrix Z S . Hence, for given M (cid:28) T , the column size of the newdesign matrix is substantially smaller than the original size T N and does not longerdepend on the sample size. This allows us to further assume that all eigenvalues of Σ S = Z S Z S /T are contained in the interval [ c ∗ , c ∗ ], where c ∗ and c ∗ are two positiveconstants. This means that we can relate to a restricted eigenvalue condition similar8o Bickel et al. (2009) for the second step estimation. While the restricted eigenvaluecondition in general does not hold for change-point settings, the dimension reduction ofthe ﬁrst step allows us to postulate this assumption for our reduced design matrix. Itshould be noted that our assumption for the second step estimation is not restrictive forempirical applications because the notion of a long-run equilibrium relationship impliesa maximum number of breaks and a minimum regime length. A minimum regimelength is further justiﬁed by the minimum subsample size needed to precisely estimateparameter changes. Consequently, M should be chosen so that the average regimelength in case of equidistantly-spaced breaks still guarantees enough observations perregime to estimate all coeﬃcient changes.We follow Wang and Leng (2008) and deﬁne the adaptive group lasso objectivefunction Q ( θ S ) = 1 T k y T − Z S θ S k + λ S M X i =1 w i k θ S,i k , (7)where γ > w i are the group-speciﬁc weights assigned as follows w i =  k ˜ θ S,i k − γ if ˜ θ S,i = 0 ∞ if ˜ θ S,i = 0 , and set 0 × ∞ = 0. ˜ θ S,i , i = 1 , . . . , |A T | + 1 denotes the non-zero group lasso coeﬃ-cient estimates obtained from optimizing the objective function in Equation (6). Theremaining M − |A T | − θ S can be ﬁlled with zero groups as long astheir selected indices lead to Σ S being a positive deﬁnite matrix for all T .We denote the estimator minimizing Q ( θ S ) with ˆ θ S = arg min θ S Q . The weight ofthe ﬁrst coeﬃcient is usually set to zero to ensure that the system is cointegrated witha cointegrating vector diﬀerent from (1 , ) if no structural break occurs. Eliminatingcolumns from the initial design matrix requires a mapping of our second step indicesto recover the original indices. For notational convenience, we use the mapping g : N → N , i g ( i ) = t i , where t i is the breakpoint corresponding to the index i , for thispurpose and deﬁne the index set ¯ A ∗ ( A ∗ ) to pick out the elements that correspond totruly non-zero coeﬃcients (parameter changes).We note that the major computing cost comes from the ﬁrst step group lasso estima-tion considering a large number of observations as potential breakpoints. The secondstep represents a marginal addition to the total computing time if the ﬁrst step esti-mation was suﬃciently successful in eliminating inactive breakpoint candidates. The9nterested reader may consult Chan et al. (2014) for a detailed discussion of computa-tional complexity in this context. In the following, we study the asymptotic properties of our adaptive group lasso esti-mator. To discuss asymptotic properties, we need to impose some further assumptionsabout the location and magnitude of active breakpoints.

Assumption 2. (i) I min = min ≤ j ≤ m +1 | t j − t j − | > ζT for some ζ > , where I min is theminimum break interval.(ii) The break magnitudes are bounded to satisfy m β = min ≤ j ≤ m +1 k β j − β j − k > ν forsome ν > and M β = max ≤ j ≤ m +1 k β j − β j − k < ∞ .(iii) There exists a constant C > such that m Σ m ≥ C X j ∈ ¯ A k m j k , for all T N × vectors m = ( m , m , . . . , m T ) whenever P j ∈ ¯ A c k m j k ≤ P j ∈ ¯ A k m j k . Assumption 2(i) requires that the length of the regimes between breaks increaseswith the sample size and in the same proportions to each other. This allows us toconsistently detect and estimate the true break fractions as it makes the break datesasymptotically distinct (Perron, 2006). The ﬁrst inequality of Assumption 2(ii) is anecessary condition to ensure that a structural break occurs at t j . We do not considersmall breaks with local-to-zero behaviour in this setting (see Bai et al. (1998) for as-sumptions used in this context). This assumption is not believed to be restrictive forthe intended empirical applications where applied researchers aim to estimate the long-run equilibrium to obtain the error correction term (the residuals) for their follow-upanalysis. Essentially, they need optimal in-sample forecasts in terms of mean squared While the Bai-Perron algorithm needs at most O ( T ) operations, the group LARS algorithmused to solve Equation (6) has computational burden of order O ( M + M T ). Hence, the group LARSalgorithm has a stronger dependence on the maximum number of breaks, whereas the Bai-Perronalgorithm only depends on sample size. This implies that the Bai-Perron algorithm is better suited forsmall to moderate samples with a potentially large number of breaks, often found in linear regressionsmodelling short-run relationships. Instead, the group LARS algorithm is well-suited for large samplesizes and a small to moderate number of structural breaks which is often found for long-run relationshipsin the presence of structural change. Σ ¯ A is greater thanor equal to C by letting m j = 0 for j ∈ ¯ A c . Consequently, Assumption 2(iii) ensuresthat Σ ¯ A is positive deﬁnite for all T . This is only then the case if Z T, ¯ A contains columnswhich are suﬃciently distinct. This in turn means that the intervals between breaksneed to be suﬃciently large for all T . It is important to note that we need Assumption2(iii) exclusively for the ﬁrst step estimation. Our second step estimation requires onlyAssumption 2(i) and (ii) as long as consistent weights are available.First, we need to show that the initial estimator provides consistent weights forthe second step adaptive lasso procedure (Huang et al., 2008). The following theoremprovides a consistency result for the group lasso estimator in cointegrating regressionswith (possibly) multiple structural breaks. Theorem 1.

Under Assumption 1 and Assumption 2, if λ T = 2 N c T δ for some c > / < δ <

1, then there exists some

C > − Cc T δ − , k ˜ θ ( T ) − θ ( T ) k ≤ T (1 − δ ) / s N c ( m + 1) M β C .

Remark 1.

The speciﬁcation of λ T implies that λ T → ∞ for T → ∞ . This meanswe have to apply a stricter penalty for increasing sample sizes to discard a larger setof inactive candidate breaks searching for a ﬁxed number of m active breaks. On theother hand, λ T fullﬁls the condition λ T /T → δ , it is useful to employ a selection rule for λ T where δ is small. Remark 2.

Given that λ T is set optimally such that δ is only slightly above 1 /

2, theconvergence rate of our ﬁrst step group lasso estimator is slightly slower than T / . Thismeans that we lose a substantial portion of the convergence rate which is T for ﬁxedbreaks under complete information on their location. The reduced convergence rate canbe considered the cost for an estimator which is robust against (multiple) structuralbreaks with unknown location. For comparison, the convergence rate for mean shiftsin white noise processes reported in Harchaoui and Lévy-Leduc (2010) is ( T / log T ) / .11nstead, Chan et al. (2014) ﬁnd that the parameter changes in piecewise stationaryautoregressive processes have a faster convergence rate which amounts to q T / log T ,but this result is based on white-noise assumptions for the error term process.Theorem 1 shows that it is crucial to let the tuning parameter λ T grow at the rightrate. However, this rate provides only limited practical guidance towards the choice of λ T . We follow Kock (2016), Qian and Su (2016) and Schmidt and Schweikert (2019)and propose to select λ T by minimizing an information criterion in the form of IC ∗ ( λ T ) = log( SSR/T ) + ρ T |A T | , (8)where SSR is the sum of squared residuals resulting from the group lasso estimation ofEquation (6) and |A T | gives the number of non-zero breakpoint candidates. The penaltyfunction ρ T allows for diﬀerent choices. While Kock (2016) suggests to use the BIC forpotentially nonstationary autoregressive models which corresponds to ρ T = log( T ) /T ,Qian and Su (2016) propose to use ρ T = 1 / √ T for the estimation of structural breaksin stationary time series regressions. In this paper, we follow Schmidt and Schweikert(2019) and employ a modiﬁed BIC according to Wang et al. (2009) which incorporatesthe additional factor log log d ∗ T where d ∗ T denotes the total amount of coeﬃcients in thefull model. This modiﬁcation of the BIC accounts for the fact that the true model mustbe found in situations where the number of coeﬃcients diverges.For the next theorem, we temporarily assume that the exact number of breaks isknown. This assumption will help us to provide an important consistency result forthe estimated location of breakpoints. We note that this temporary assumption will berelaxed for our main results. Theorem 2.

Under Assumption 1 and Assumption 2, if m is ﬁxed and |A T | = m ,then for all (cid:15) > P (cid:18) max ≤ j ≤ m | ˆ t j − t j | ≤ T (cid:15) (cid:19) → , as T → ∞ . Remark 3.

Dividing by T on both sides of the inequality in Theorem 2 shows thateach break fraction can be detected within an (cid:15) -neighbourhood of its true location.Hence, the convergence rate is similar to the one found in Davis et al. (2006) whouse identical assumptions on the minimum break interval. Harchaoui and Lévy-Leduc(2010), allowing for a maximum number of location shifts in white noise processes,report a slightly faster convergence rate. Similarly, Chan et al. (2014) apply group lasso12o piecewise stationary autoregressive processes with a potentially diverging number oftrue breakpoints and report the nearly optimal convergence rate log T /T if errors areGaussian.The previous result is an important building block for our main results. Next, weprove that the group lasso estimator yields a set of estimated breakpoints for which thenumber of selected breaks is greater than the true number of breaks almost surely whenthe exact number of breakpoints is unknown. Further, we evaluate the consistencyof estimated breakpoints using the Hausdorﬀ distance between the set of estimatedbreakpoints and the set of true breakpoints. We follow Boysen et al. (2009) and deﬁne d H ( A, B ) = max b ∈ B min a ∈ A | b − a | with d H ( A, ∅ ) = d H ( ∅ , B ) = 1, where ∅ is the empty set.The following theorem shows that the set of estimated breakpoints converges to the setof true breakpoints under the Hausdorﬀ distance. Theorem 3.

If Assumption 1 and Assumption 2 hold, then as T → ∞ P ( |A T | ≥ m ) → , and for all (cid:15) > P ( d H ( A T , A ) ≤ T (cid:15) ) → . Remark 4.

The ﬁrst part of Theorem 3 yields the familiar result that the grouplasso estimator is not model selection consistent in settings where the restricted eigen-value condition (Bickel et al., 2009) or the irrepresentable condition (Meinshausen andBühlmann, 2006; Zhao and Yu, 2006) do not hold for the full design matrix. The estima-tor tends to overselect breakpoints. We note that Assumption 2(iii) is slightly diﬀerentfrom the restricted eigenvalue condition used in Bickel et al. (2009) and restricts onlythe design submatrix generated from columns containing active breakpoints. This re-sult shows that we do not systematically select too few breaks which is crucial for theintended second step estimation using weights obtained by group lasso estimation. Ig-nored breaks would directly result in inﬁnite weights for the second step which wouldmean that these breaks could not be recovered.

Remark 5.

The second part of Theorem 3 implies that the Hausdorﬀ distance from theset of estimated breakpoints to the true breakpoints diverges slower than the samplesize. Consequently, the Hausdorﬀ distance as a percentage of the sample size is boundedby a constant. This provides us with a consistency result for the estimated break13ractions and gives us justiﬁcation to consider multiple structural breaks at once, sincethe Hausdorﬀ distance evaluates the joint location of all breakpoints.Finally, we consider the asymptotic properties of the adaptive group lasso estimatorwith weights obtained from our ﬁrst step estimation. We note that Theorem 3 allows usto bound the number of breakpoint candidates by a constant. Hence, the dimensionalityof the model selection problem no longer depends on the sample size.

Theorem 4.

If Assumption 1 and Assumption 2 hold, λ S → λ S T (1 − δ ) γ → ∞ for1 / < δ < γ >

0, then(a) Consistency: k ˆ θ S − θ S k = O p ( T − )(b) Model Selection: P ( g ( { j ≥ k ˆ θ S,j k 6 = 0 } ) = A ) → T (ˆ θ S, ¯ A ∗ − θ S, ¯ A ∗ ) ⇒  Z B τ, ¯ A ∗ B τ, ¯ A ∗  −  Z B τ, ¯ A ∗ dU + Λ ∗ ¯ A ∗  , (9)Λ ∗ ¯ A ∗ = [Λ , (1 − τ )Λ , . . . , (1 − τ m )Λ] , Λ = ∞ X t =0 E ( v u t ) , where B τ, ¯ A ∗ and U are deﬁned in the proof. Remark 6.

Although the second step tuning parameter λ S can be chosen by a selectionrule independent of the ﬁrst step tuning parameter λ T , its value depends on δ , i.e. howeﬀective additional coeﬃcients are penalized in the ﬁrst step and consequently howmany truly inactive breakpoint candidates remain in our second step design matrix.Since the number of parameters in the full model can now be limited by a pre-speciﬁedmaximum number of breaks, we suggest to use an information criterion like the BIC,which has performed quite well in our simulation experiments. Remark 7.

Combining parts (a) to (c) of Theorem 4 shows that the adaptive grouplasso estimator has oracle properties. This means that the adaptive group lasso per-forms correct model selection and has the same asymptotic distribution as the leastsquares estimator if the breaks’ location would have been known beforehand. Since ourregression involves nonstationary components, the asymptotic distribution of the leastsquares estimator is naturally given as a functional of Brownian motions. Schmidt andSchweikert (2019) use the term ‘nonstandard oracle property’ to distinguish it fromthe term used in Fan and Li (2001). The asymptotic bias term Λ originating from the14ependency between increments of the regressors and the error term of the cointegrat-ing regression can be eliminated using dynamic augmentation according to Saikkonen(1991) and Stock and Watson (1993).

Remark 8.

It is notable that our estimator has nonstandard oracle properties althoughthe convergence rate of the group lasso estimator is slower than T / . Zou (2006) arguesthat the convergence rate of the initial estimator is allowed to be substantially slowerthan the desired convergence rate of the adaptive lasso estimator if the tuning parameteris speciﬁed accordingly. In this section, we conduct simulation experiments to assess the adequacy of our tech-nical results in Section 2. We investigate the ﬁnite sample performance of our adaptivegroup lasso procedure with respect to the accuracy in ﬁnding the exact number ofbreaks, their location and the magnitude of parameter changes. We consider modelspeciﬁcations with one, two and four breakpoints, respectively. The following DGP isemployed to model a multivariate cointegrated system with multiple structural breaks, y t = µ + β t X t + ϑ t ϑ t ∼ N (0 , σ ϑ ) , X t = X t − + ω t ω t ∼ N (0 , Σ) , (10)where X t = ( X t , X t , . . . , X Nt ) and Σ = diag ( σ ω ), i.e. the innovations of our generatedrandom walk processes have identical normal distributions. µ is a non-zero intercept and β t = ( β t , β t , . . . , β Nt ) is a time-varying slope coeﬃcient vector with non-zero baselinevalue and a ﬁnite number of breaks. We note that cov ( ϑ t , ω t ) = 0, i.e. our regressorsare strictly exogenous and the asymptotic bias reported in Theorem 4 is non-existent.Naturally, the ability of all structural break estimators to detect breaks dependson the overall signal strength. Niu et al. (2015) deﬁne signal strength in change-pointmodels by S = m β I min , where I min = min ≤ j ≤ m +1 | t j − t j − | is the minimum distancebetween breaks and m β = min ≤ j ≤ m +1 k β j − β j − k is the minimum jump size. For ourmain simulations concerned with consistency of the adaptive group lasso estimator, weuse equal jump sizes for multiple breaks and locate the breaks with equidistant spacingbetween them. Hence, overall signal strength is a linear function of the sample sizein our simulations. We use a baseline value of two and a jump size of two which is15qual to the standard deviation of the regression error term. Simulations with a bettersignal-to-noise ratio yield more precise estimates for all sample sizes.In Table 1, we report our results for N = 2 regressors. We specify our modelfor one break located at τ = 0 .

5, two breaks at τ = (0 . , .

67) and four breaks at τ = (0 . , . , . , .

8) to have an equidistant spacing on the unit interval. We ﬁrstcompute the percentages of correct estimation (pce) of the number of breaks m andmeasure the accuracy of the break date estimation conditional on the correct estimationof m . For this matter, we compute the average Hausdorﬀ distance and divide it by T (hd/T) to compare the values across diﬀerent sample sizes. The corresponding ﬁguresin our tables are reported in percentages. As T grows larger, the number of breaksis detected with increasing precision and the distance between estimated breakpointsand true breakpoints declines to nearly zero. Parameter estimates are already veryaccurate at small sample sizes. As expected, the parameter changes of models withfewer breakpoints can be estimated more precisely than those of models with a largernumber of breakpoints, as indicated by larger standard deviations obtained for thelatter at all sample sizes. Comparing these results with those obtained for the Bai-Perron algorithm , where the number of breaks is determined via the BIC, we ﬁnd thatboth approaches perform similarly well. The results are reported in Table 2. Whilethe Bai-Perron algorithm estimates the true break fractions slightly more accurately,parameter changes on average have larger standard deviations at all samples sizes. Thenumber of structural breaks is estimated with identical accuracy. Next, we investigate if dynamic augmentation according to Saikkonen (1991) andStock and Watson (1993) yields consistent coeﬃcient estimates if the strict exogeneity Results for N = 1 reported in Schmidt and Schweikert (2019) show a very similar pattern. Weﬁnd that it is slightly more diﬃcult to detect the correct number of breaks in regressions with multipleregressors although the jump size measured as the Euclidean distance is equal for both settings. Kejriwal and Perron (2008, 2010) obtain estimates of the parameters using the dynamic pro-gramming algorithm of Bai and Perron (2003) with no modiﬁcation since the algorithm itself is validirrespective of the nature of the regressors and errors given that it detects break dates that minimizethe global sum of squared residuals in a regression. We can conﬁrm the theoretical claims made about the computational complexity of the Bai-Perronalgorithm in comparisons to the group LARS algorithm in Subsection 2.1. We obtain the followingcomputational times (in seconds) for both algorithms. First, using the simulation set-up for Table 1and a sample size of T = 1000, we have M = 1: (gLARS: 2.61, BP: 11.03), M = 2: (gLARS: 9.24, BP:11.41), M = 4: (gLARS: 21.27, BP: 15.36). Here, we ﬁnd that the Bai-Perron algorithm is more robustto a larger number of breaks in terms of computational time. Second, we increase the sample size to T = 10000 and record the following times, M = 1: (gLARS: 4.04, BP: 1013.15), M = 2: (gLARS:27.35, BP: 1317.45), M = 4: (gLARS: 35.23, BP: 1563.55). In this case, we can conﬁrm that thegroup LARS algorithm is much more computationally eﬃcient for large sample sizes. All simulationsare computed on a computer with an Intel i5-6500 CPU at 3.20GHz and 16GB RAM. ϑ t , ω t , ω t ) jointly from a multivariate normal distribution withzero mean and covariance matrix V =  σ ϑ . . . σ ω . σ ω  . (11)Using this conﬁguration, the strict exogeneity condition is violated for both regressorsbut the regressors are still generated by independent processes. If we attempt to detectand estimate structural breaks without dynamic augmentation, we still detect break-points precisely but obtain strongly biased coeﬃcient estimates. In Table 5, we ﬁnd thecorresponding results after the inclusion of l = 1 and l = 2 leads and lags. Now, we canrecover the number, location and magnitude of all breakpoints with similar accuracycompared to our simulations under strict exogeneity.In Table 3, we consider partial breaks in the cointegrating vector. We use a modelspeciﬁcation according to the DGP in Equation (10) with N = 2 regressors and inducepartial structural breaks through β t only. Our estimator is applied estimating a fullstructural change model without prior knowledge that β t is constant over the samplingperiod. Again, we observe that the number of breaks, their timing and their magnitudeis consistently estimated. The distance between the set of estimated breakpoints andtrue breakpoints is larger than in the full break setting in Table 1. This result isnot surprising considering that the break magnitudes for partial breaks are smallermaking it more diﬃcult for the adaptive group lasso procedure to detect the truelocation of the breaks. Consequently, these results also help us to assess how the breakmagnitude inﬂuences the detection rates. Reducing the Euclidean distance from 2 to √

2, roughly doubles the average Hausdorﬀ distance. The convergence rates for zeroparameter changes in β t is almost identical to the convergence rate observed for thenon-zero parameter changes in β t . This is naturally driven by the joint evaluation ofall regressors in each group. Unlike bi-level estimators proposed in Huang et al. (2009)and Breheny and Huang (2009), the adaptive lasso procedure is not able to shrinkcoeﬃcients within active groups to zero. Hence, the usual convergence rate for non- Comparing the results with those obtained for the Bai-Perron algorithm (not reported), we againﬁnd that both approaches detect the number of structural breaks with identical accuracy. The Bai-Perron algorithm estimates the true break fractions slightly more accurately, but parameter changeshave larger standard deviations at for samples sizes T = 200 and T = 400. β t could in principle beincreased if our procedure was extended to feature bi-level shrinkage. However, this isbeyond the scope of this paper and is not investigated further at this point.Finally, we investigate how sensitive our procedure is to break fractions locatednear the boundary of the unit interval. While the properties of tests for structuralchanges in the literature depend strongly on the trimming parameter (Bai and Perron,2006), our method to recover breaks should be more robust in this regard. We only needsome lateral trimming to ensure that the ﬁrst and last regimes identiﬁed by our adaptivelasso procedure comprise a suﬃciently large number of observations to estimate regime-dependent coeﬃcients. The results for breaks near the boundary are summarized inTable 4. The ﬁrst and second panel considers one break located at τ = 0 . τ = 0 . τ = (0 . , .

9) in panel three of Table 4.Here, we ﬁnd that the pce is quite low compared to our main results with equidistantspacing of breakpoints. The ﬁrst break is estimated less accurately than the secondbreak which can be explained by the fact that parameter changes are measured fromone regime to the next and that only a relatively small number of observations isavailable to estimate the break at τ = 0 . τ = (0 . , . τ = 0 .

95 can still beaccurately detected, however the standard errors of the parameter changes increase dueto the smaller number of observations in the last regime. We conclude that trimmingis not necessary to detect breaks located at the end of the sample. Still, we suggest toset a minimum number of observations per regime to ensure that parameter changesare estimated precisely. For our main results, reported in Table 1 and Table 2, we follow Kejriwal and Perron (2008) anduse a 15% lateral trimming to compare both methods. Application: US Money Demand

In this section, we apply our proposed methodology to the US money demand function.Particularly, we estimate a long-run money demand speciﬁcation and investigate thepresence of long-run instabilities in a cointegrating framework. Juselius (2006) considersthe condition

M/P = L ( Y, R ) for equilibrium in the money market, which relates

M/P ,the ratio of nominal money balances to price levels, to real income Y and the shortterm nominal interest rate R . Two competing empirical speciﬁcations are considered inthe literature, namely, the semi-log and the log-log speciﬁcation. The latter is given by L ( Y, R ) = αY β R β , where α is a constant, β is the income-elasticity assumed to beunity and β < For our empirical application, we choose alog-log speciﬁcation which has been found to ﬁt quite well to US data (Lucas, 2000; Baeand Jong, 2007; Ireland, 2009; Mogliani and Urga, 2018). We extend the dataset usedby Maki (2012) to span the period from January 1959 to December 2018. Monthly dataare obtained from the Federal Reserve Bank of St. Louis. We consider the empiricalUS money demand function, m ∗ t = µ + β y t + β r t + u t , (12)where m ∗ t and y t denote the natural logarithm of the ratio of nominal money balancesto price levels, and the natural logarithm of real income, respectively. According tothe log-log speciﬁcation, we employ the natural logarithm of the short term nominalinterest rate, denoted by r t . u t denotes the equilibrium error of the money demandfunction if the system is cointegrated. We use M2 as nominal money, the consumerprice index as prices, and the index of industrial production as real income. For theinterest rate, we use the 6-month Treasury bill rate. All time series are tested for aunit root using the Dickey-Fuller test. The results, which are not reported, supportthe assumption that all variables are integrated of order one and we can continue ourcointegration analysis.First, we assume constancy of the parameters and ignore potential structural breaks.Estimation of the long-run equilibrium equation yields coeﬃcients ˆ µ = − .

05, ˆ β = 0 . β = − .

08. Dynamic augmentation of the cointegrating regression with two leads Recall that in general, coeﬃcients of log-transformed variables in cointegrating regressions shouldnot be interpreted as elasticities (Johansen, 2005). Only in the special case when those variablesare strongly exogeneous, it is allowed to use a ceteris paribus interpretation for the correspondingcoeﬃcients. − .

063 which does not lead to a rejection of the nullhypothesis at the 10% level. Similar results can be obtained for the Phillips-Ouliaristest and the Johansen test. Although it is implausible from a theoretical standpointthat the system is not cointegrated, at least our estimated coeﬃcients have the expectedsign and magnitude for post-war data. The estimated income-elasticity measured by β is slightly below the theoretically expected value. The interest-elasticity of moneydemand, measured by β is expected to be negative. Lucas (2000) considers − . − . − . β and ﬁnds that β = − . r t , y t and m ∗ t has changed during the sampling period. We observe at least threetwo-dimensional surfaces which correspond to distinct long-run levels from which m ∗ t does not persistently deviate. However, if we consider linear cointegration without thepossibility of structural breaks, we infer from Figure 2 that the residual series exhibits aclear trend during the latter half of the sample. We note that the presence of structuralbreaks might mask the cointegrating relationship. Next, we compare several previouslymentioned structural break models with our model selection approach. The Gregoryand Hansen (1996a) test indicates a breakpoint at 2008 m06 but does not reject thenull hypothesis at the 10% level. Because the GH-test does not model structural breaksunder the null hypothesis, this means that the timing of the indicated breakpoint isnot informative. The Hatemi-J (2008) test indicates two breakpoints at 1992 m01 and2008 m06. The null hypothesis of no cointegration can be rejected at the 5% level ifthese breakpoints are taken into account. The maximum number of breaks chosen forthe Maki (2012) test is ﬁve. It selects the breakpoints at 1986 m05, 1992 m04, 20042005, 2008 m11, 2014 m03 and rejects the null hypothesis of no cointegration at the1% level. We initially also start with a maximum of ﬁve breakpoints for our adaptivegroup lasso procedure. However, imposing a minimum regime length of one year toprecisely estimate the parameter changes and dynamically augmenting the cointegratingregression results in a model speciﬁcation with three breakpoints. The ﬁnal estimatesyield break dates 1992 m07, 2005 m12, and 2015 m11. . . . . . . −3−2−1 0 1 2 3 r t y t m t * lllllllllll lll ll llllll lllllll ll llllll lllll lllll ll ll ll llll l ll ll ll l lll llll ll ll lll ll lll lllll l llll lll lllll llll lll lll ll l ll lll lll ll l llll lll ll lll ll lllll l lll lllll l l l lll llllll l ll lll ll lll lll ll l ll llll lll lll ll lll l ll l ll l lllllll ll ll lll lll ll l l lll l lll l lll ll ll ll ll lll l lll lll ll llll ll ll llll l lllll ll ll llll lll ll l lll ll ll ll lllll lll l l lll llll lllll llll ll l llll lll lllll lll lll lllll l ll lll lll l l ll l lllll ll lll l l ll lll lllllll lll ll lll ll ll l l ll ll llll ll lll l ll lll lll ll llll ll ll lll l ll l l ll llll l ll llllll l lll lll llll ll llll l llll l lll l lll ll lll lll ll lll lll llll ll ll lll ll ll lllllll ll l llll ll lll l ll llll lll lll l lll lll lll lll lllll lll ll ll ll ll lllll ll lllllll llll lll llllll lllllllll ll ll lllllllll llll lllll lll lll ll llll lllll lllll ll ll llllll lllll lllll l lllllll lllll l llllll llllll lll lllllll l ll llllllll l llllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllll l l lllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllll lllllllllllllllllll lll l llllllllllllllllllll l llllllllllll l lll llll l lll l lllllllllllllll lllllll lllllllllllllll Figure 1:

Three-dimensional scatterplot of r t (x-axis), y t (y-axis) and m ∗ t (z-axis). The income-elasticity from 1959 m01 to 1992 m07 is estimated to be 0 .

95 and theinterest-elasticity amounts to − .

10 for the same period. These estimates correspond tothe theoretical predictions formulated in Juselius (2006) and to the results reported inempirical papers considering this sample period (Lucas, 1988; Stock and Watson, 1993;Lucas, 2000). The ﬁrst breakpoint leads to an income-elasticity reduction from 0 . .

89 while the interest-elasticity remains largely unchanged. A partial decouplingof money demand from income might be explained by the begin of the costly GulfWar and a sharp increase in US debt. In turn, the second breakpoint at 2005 m12has a negligible eﬀect on the income-elasticity (0 .

89 to 0 .

90) but results in a largerreduction of the interest-elasticity from − .

10 to − .

07. This breakpoint can be relatedto the beginning Global Financial Crisis of 2007-2008. It must be emphasized at this The corresponding breakpoint estimates using the Bai-Perron algorithm are almost identicallylocated at 1991 m10, 2004 m07, and 2014 m06. .

01. In contrast, the income-elasticity is very close to unity (0 . − .

097 which means that roughly 10% of long-run deviations are corrected each period.

In this paper, we propose a penalized regression approach to the problem of detecting anunknown number of structural breaks and their location in cointegrating regressions.Our estimator eliminates irrelevant breakpoints from a set of candidate breakpointsand, hence, follows a top-down approach regarding the estimation of structural breaks.Practitioners should apply this new methodology in complement to the Bai-Perronalgorithm which follows a bottom-up approach, i.e. sequentially increasing the numberof breaks. Due to the importance of ﬁnding the right model speciﬁcation with respectto the number and location of structural breaks, either approach can serve as a valuablerobustness check of the model speciﬁcation chosen by the other approach. Ideally bothapproaches should indicate the same breakpoints which would mean that the chosenmodel speciﬁcation is suﬃciently sparse (bottom-up) and does not ignore importantbreaks (top-down).We can show the important theoretical result that the adaptive group lasso estimatorhas nonstandard oracle properties in settings with a diverging number of breakpointcandidates. This means that the estimator determines the true number of non-zeroparameter changes with probability tending to one and consistently estimates their lo-cation. The corresponding parameter changes are estimated with the same convergencerate that least squares estimators would have under full information of the number andlocation of breaks. 22 . − . . . . . . Figure 2:

Residual series obtained from least squares estimation. − . − . . . . . . Figure 3:

Post-lasso residual series. Estimated regimes are marked by grey and white areas.

The present paper does not consider cointegration testing. It is unclear how optimalcointegration test can be constructed from the proposed penalized regression approach.An attempt to design such cointegration tests has been made by Schmidt and Schweik-ert (2019) for a single regressor. Our results depend critically on the stationarity as-sumption about the error term. Hence, it is required to establish the existence of a23ointegration relationship before the penalized regression is estimated. Practitionersshould employ cointegration tests which are robust to the presumed number of breaksduring the sample period.Further extensions include the use of bi-level selection via the group fused lasso(Huang et al., 2009; Breheny and Huang, 2009) to estimate partial breaks more eﬃ-ciently, and the possibility to detect structural breaks in system-based approaches withmultiple equilibria (Bai et al., 1998; Qu, 2007).

I thank Florian Stark, Alexander Schmidt, Markus Mößler, Timo Dimitriadis and theparticipants of the Doctoral Seminar in Econometrics in Tübingen, German StatisticalWeek in Trier, ZU Methodenkolloquium in Friedrichshafen, THE Christmas Work-shop in Stuttgart, Seminar at Maastricht University, and the 2nd CSL Symposium inStuttgart for valuable comments and suggestions. Further, I thank Maike Becker andManuel Huth for excellent research assistance.24

Mathematical Appendix

Lemma 1.

Under Assumption 1 and Assumption 2, for any c > and δ > / , thereexists some constant C > such that P max ≤ s ≤ T max ≤ j ≤ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T X i = s X ji u i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ c T δ ! ≤ Cc T δ − . (A.1) Proof of Lemma 1.

According to Theorem 4.1 of Hansen (1992a), it holds for all j = 1 , . . . , N and 0 ≤ r ≤ T [ T r ] P i =1 X ji u i has weak limit r R B j dU + r Λ j , where X j [ T r ] = T − / T r ] P t =1 v jt ⇒ B j ( r ), Λ j = ∞ P j =0 Ev u j and B j ( r ) is a Brownian motion pro-cess with variance σ . Since the second moment of r R B j dU + r Λ j is ﬁnite, we have E (cid:12)(cid:12)(cid:12)(cid:12) T s P i =1 X ji u i (cid:12)(cid:12)(cid:12)(cid:12) ! ≤ C for 1 ≤ s ≤ T and T large enough. It follows from Markovinequality that P max ≤ s ≤ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T s X i =1 X ji u i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ c T δ ! ≤ T X s =1 P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T s X i =1 X ji u i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ c T δ ! ≤ T X s =1 c T δ E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T s X i =1 X ji u i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)  (A.2) ≤ C c T δ − , for some C >

0. Thus, it holds that P max ≤ s ≤ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T X i = s X ji u i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ c T δ ! = P max ≤ s ≤ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T X i =1 X ji u i − T s − X i =1 X ji u i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ c T δ ! ≤ P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T X i =1 X ji u i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ c T δ ! + P max ≤ s ≤ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T s − X i =1 X ji u i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ c T δ ! (A.3) ≤ Cc T δ − . Since N is ﬁnite, Equation (A.1) follows. (cid:3) emma 2. Let ˜ θ ( T ) be the estimator of θ ( T ) as deﬁned in Equation (6) , then it holdsunder the same conditions as in Theorem 1 that T X s =ˆ t j X s y s − s X i =1 ˜ θ i X s ! − T λ T ˜ θ ˆ t j k ˜ θ ˆ t j k = 0 , ∀ ˜ θ ˆ t j = 0 , and k T X s = j X s y s − s X i =1 ˜ θ i X s ! k ≤ T λ T , ∀ j. Furthermore t P i =1 ˜ θ i = ˜ β j for ˆ t j − ≤ t ≤ ˆ t j − j = 1 , , . . . , |A T | . Proof of Lemma 2.

This lemma is a direct consequence of the Karush-Kuhn-Tucker(KKT) conditions for group lasso estimators. (cid:3)

Lemma 3.

A necessary and suﬃcient condition for the estimator ˆ θ S to be a solutionto the adaptive group lasso objective function Q ( θ ) is Z S,i (cid:16) y T − Z S, A ∗ ˆ θ S, A ∗ (cid:17) − T λ S k ˜ θ S,i k − γ ˆ θ S,i k ˆ θ S,i k = 0 , ∀ ˆ θ S,i = 0 , and k Z S,i (cid:16) y T − Z S, A ∗ ˆ θ S, A ∗ (cid:17) k ≤ T λ S k ˜ θ S,i k − γ , ∀ ˆ θ S,i = 0 . Proof of Lemma 3.

This lemma is a direct consequence of the Karush-Kuhn-Tucker(KKT) conditions for adaptive group lasso estimators. (cid:3)

Proof of Theorem 1.

We prove that the group lasso estimator consistently estimatesall coeﬃcients. The ﬁrst part of our proof is related to the results given in Chanet al. (2014), while the second part uses ideas presented in He and Huang (2016). Bydeﬁnition of ˜ θ ( T ), it holds that1 T k y T − Z T ˜ θ ( T ) k + λ T T X i =1 k ˜ θ i k ≤ T k y T − Z T θ ( T ) k + λ T T X i =1 k θ i k . (A.4)26ote that ¯ A contains the indices of all truly non-zero coeﬃcients. Inserting y T = Z T θ ( T ) + u T into Equation (A.4) yields1 T k Z T ( θ ( T ) − ˜ θ ( T )) k ≤ T X j =1 (˜ θ j − θ j )  T T X i = j X i u i  + λ T X i ∈ ¯ A ( k θ i k − k ˜ θ i k ) − λ T X i ∈ ¯ A c k ˜ θ i k (A.5) ≤ N  T X j =1 k ˜ θ j − θ j k   max ≤ l ≤ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) T T X i = j X li u i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + λ T X i ∈ ¯ A ( k θ i k − k ˜ θ i k ) − λ T X i ∈ ¯ A c k ˜ θ i k . Noting that T X j =1 k ˜ θ j − θ j k = X j ∈ ¯ A k ˜ θ j − θ j k + X j ∈ ¯ A c k ˜ θ j k , (A.6)and using Lemma 1, we have with probability greater than 1 − Cc T δ − that1 T k Z T ( θ ( T ) − ˜ θ ( T )) k ≤ N c T δ  T X j =1 k ˜ θ j − θ j k  + λ T X i ∈ ¯ A ( k θ i k − k ˜ θ i k ) − λ T X i ∈ ¯ A c k ˜ θ i k = λ T X i ∈ ¯ A k ˜ θ i − θ i k + λ T X i ∈ ¯ A (cid:16) k θ i k − k ˜ θ i k (cid:17) (A.7) ≤ λ T X i ∈ ¯ A k θ i k ≤ λ T ( m + 1) max ≤ j ≤ m +1 k β j − β j − k = 4 N c T δ ( m + 1) M β , where both equalities follow from the deﬁnition of λ T .We denote P i ∈ ¯ A k θ i − ˜ θ i k = κ and P i ∈ ¯ A c k ˜ θ i k = κ . It holds that1 T k Z T ( θ ( T ) − ˜ θ ( T )) k ≤ T (˜ θ ( T ) − θ ( T )) Z T u T + λ T X i ∈ ¯ A k θ i − ˜ θ i k − λ T X i ∈ ¯ A c k ˜ θ i k (A.8)= 2 T (˜ θ ( T ) − θ ( T )) Z T u T + λ T κ − λ T κ . Now, we consider two cases: κ > κ and κ ≤ κ . First, we show that P ( κ > κ ) → k ˜ θ ¯ A ( T ) − θ A ( T ) k . We note that the27riangle inequality yields | T (˜ θ ( T ) − θ ( T )) Z T u T | = 2 T | (˜ θ ¯ A ( T ) − θ A ( T )) Z T, ¯ A u T + (˜ θ ¯ A c ( T ) − θ A c ( T )) Z T, ¯ A c u T | (A.9) ≤ T | (˜ θ ¯ A ( T ) − θ A ( T )) Z T, ¯ A u T | + 2 T | (˜ θ ¯ A c ( T ) − θ A c ( T )) Z T, ¯ A c u T | . Using the Cauchy-Schwarz inequality, we then have2 T | (˜ θ ¯ A ( T ) − θ A ( T )) Z T, ¯ A u T | ≤ T X i ∈ ¯ A k θ i − ˜ θ i kk Z T, ¯ A u T k = O p (1) κ , (A.10)since max ≤ j ≤ N T P i =1 X ji u i ! = O p ( T ) which follows from Lemma 1 and | ¯ A| = m + 1 isﬁnite. In contrast, | ¯ A c | is diverging with T → ∞ . We observe that k Z T u T k ≤ N max ≤ j ≤ N T − X i =1 T X t = i X jt u t ! ≤ N max ≤ j ≤ N T − X i =1 T X t = i ( X jt u t ) ≤ N max ≤ j ≤ N T T X t =1 ( X jt u t ) = T O p ( T ) , (A.11)for all j = 1 , . . . , N which implies that k Z T u T k = O p ( T / ). Consequently, we have2 T | (˜ θ ¯ A c ( T ) − θ A c ( T )) Z T, ¯ A c u T | ≤ T X i ∈ ¯ A c k ˜ θ i kk Z T, ¯ A c u T k = O p ( T / ) κ . (A.12)Hence, using the assumption κ > κ , we obtain the contradiction0 ≤ T k Z T ( θ ( T ) − ˜ θ ( T )) k ≤ ( O p ( T / ) − λ T ) κ + ( O p (1) + λ T ) κ (A.13) < ( O p ( T / ) + O p (1) − λ T ) κ < , for T → ∞ and it follows that P ( κ > κ ) →

0. Turning to the event κ ≤ κ ,Assumption 2(iii) and noting X X /T = O p (1) implies that k Z T ( θ ( T ) − ˜ θ ( T )) k = T ( θ ( T ) − ˜ θ ( T )) Σ ( θ ( T ) − ˜ θ ( T )) ≥ T C k θ A ( T ) − ˜ θ ¯ A ( T ) k . (A.14)28hus, we have k θ A ( T ) − ˜ θ ¯ A ( T ) k ≤ s k Z T ( θ ( T ) − ˜ θ ( T )) k T C (A.15) ≤ s N c T δ +1 ( m + 1) M β T C = 2 T (1 − δ ) / s N c ( m + 1) M β C → , if 1 / < δ <

1. Combining the results for both cases completes the proof. (cid:3)

Proof of Theorem 2.

Deﬁne A T i = n | ˆ t i − t i | > T (cid:15) o , i = 1 , , . . . , m such that P (cid:18) max ≤ i ≤ m | ˆ t i − t i | > T (cid:15) (cid:19) ≤ m X i =1 P (cid:16) | ˆ t i − t i | > T (cid:15) (cid:17) = m X i =1 P ( A T i ) . (A.16)Further deﬁne C T = (cid:26) max ≤ i ≤ m | ˆ t i − t i | ≤ min i | t i − t i − | / (cid:27) . It suﬃces to show that m X i =1 P ( A T i C T ) → m X i =1 P ( A T i C cT ) → . (A.17)The proof follows along the lines of the proof of Proposition 3 in Harchaoui and Lévy-Leduc (2010) and Theorem 2.2 in Chan et al. (2014). In the following, we focus on m P i =1 P ( A T i C T ) → C T , it holds that t i − < ˆ t i < t i +1 , ∀ ≤ i ≤ m . (A.18)Next, we split A T i into two parts (i) ˆ t i < t i and (ii) ˆ t i > t i to show that P ( A T i C T ) → k t i − X s =ˆ t i X s ( y s − ˜ β i +1 X s ) k ≤ T λ T . (A.19)Note that because of ˆ t i < t i , the true coeﬃcient has not changed at ˆ t i . Hence, plugging29n for y s = β i X s + u s yields k t i − X s =ˆ t i X s u s + t i − X s =ˆ t i X s ( β i − β i +1 ) X s + t i − X s =ˆ t i X s ( β i +1 − ˜ β i +1 ) X s k ≤ T λ T . (A.20)It follows for ˆ t i < t i that, P ( A T i C T ) ≤ P  k t i − X s =ˆ t i X s ( β i − β i +1 ) X s k ≤ T λ T  ∩ n | ˆ t i − t i | > T (cid:15) o (A.21)+ P  k t i − X s =ˆ t i X s u s k > k t i − X s =ˆ t i X s ( β i − β i +1 ) X s k  ∩ n | ˆ t i − t i | > T (cid:15) o + P  k t i − X s =ˆ t i X s ( β i +1 − ˜ β i +1 ) X s k > k t i − X s =ˆ t i X s ( β i − β i +1 ) X s k  ∩ A T i C T  = P ( A T i ) + P ( A T i ) + P ( A T i ) . For the ﬁrst term, we observe that under Assumption 2 and on the set n | ˆ t i − t i | > T (cid:15) o it holds that k t i − X s =ˆ t i X s ( β i − β i +1 ) X s k ≥ k t i − X s =ˆ t i X s X s k min i n k β i − β i +1 k o > (cid:15) νT , (A.22)for suﬃciently small (cid:15) with probability going to one. Taking into account that T λ T = O ( T δ ) for 1 / < δ <

1, we conclude that P ( A T i ) → T → ∞ . For the secondterm, we have t i − X s =ˆ t i X s u s = O p ( | t i − − ˆ t i | ) = O p ( T ) , (A.23)but since k t i − P s =ˆ t i X s ( β i − β i +1 ) X s k > (cid:15) νT with probability going to one, we concludethat the right hand side of the inequality asymptotically dominates the left hand sideand P ( A T i ) → T → ∞ . Turning to the third term, we note the deﬁnition of˜ β i +1 = t P j =1 ˜ θ j for ˆ t i − ≤ t ≤ ˆ t i −

1, i.e., ˜ β i +1 is a linear function of the ˜ θ j . Hence,according to Theorem 1 and the Continuous Mapping Theorem (see Billingsley (1999),Theorem 2.7), ˜ β i +1 is a consistent estimator for β i +1 with convergence rate T (1 − δ ) / ,1 / < δ <

1. This means, we have ( β i +1 − ˜ β i +1 ) → P ( A T i ) → T → ∞ . Itfollows that P (cid:16) A T i C T ∩ { ˆ t i < t i } (cid:17) →

0. 30n case of (ii), we have k ˆ t i − X s = t i X s ( y s − ˜ β i X s ) k ≤ T λ T , (A.24)Since ˆ t i > t i , the true coeﬃcient has changed at ˆ t i and we plug in for y s = β i +1 X s + u s , which yields k ˆ t i − X s = t i X s u s + ˆ t i − X s = t i X s ( β i +1 − β i ) X s + ˆ t i − X s = t i X s ( β i − ˜ β i ) X s k ≤ T λ T . (A.25)It follows that, P ( A T i C T ) ≤ P  k ˆ t i − X s = t i X s ( β i − − β i ) X s k ≤ T λ T  ∩ n | ˆ t i − t i | > T (cid:15) o (A.26)+ P  k ˆ t i − X s = t i X s u s k > k ˆ t i − X s = t i X s ( β i − − β i ) X s k  ∩ n | ˆ t i − t i | > T (cid:15) o + P  k ˆ t i − X s = t i X s ( β i − ˜ β i ) X s k > k ˆ t i − X s = t i X s ( β i − − β i ) X s k  ∩ A T i C T  = P ( A T i ) + P ( A T i ) + P ( A T i ) . The same arguments as for case (i) can be used to show that P (cid:16) A T i C T ∩ { ˆ t i > t i } (cid:17) → P ( A T i C T ) → (cid:3) Proof of Theorem 3.

We begin to prove the ﬁrst part. Suppose that |A T | < m , thenthere exists some t i , i = 1 , , . . . and ˆ t s ∈ A T ∪ { , ∞} , s = 0 , , . . . , |A T | + 1 with t i +1 − t i ∨ ˆ t s ≥ T (cid:15)/ t i +2 ∧ ˆ t s +1 − t i +1 ≥ T (cid:15)/ t = 0 and ˆ t |A T | +1 = ∞ .Applying Lemma 2 to the intervals [ t i ∨ ˆ t s , t i +1 −

1] and [ t i +1 , t i +2 ∧ ˆ t s +1 − k t i − X s = t i ∨ ˆ t s X s ( y s − ˜ β s +1 X s ) k ≤ T λ T , (A.27)31nd k t i ∧ ˆ t s − X s = t i X s ( y s − ˜ β s +1 X s ) k ≤ T λ T . (A.28)Similar arguments to those used in the proof of Theorem 2 show that either P   k t i − X s = t i ∨ ˆ t s X s ( β i +1 − ˜ β s +1 ) X s k > k t i − X s = t i ∨ ˆ t s X s ( β i − β i +1 ) X s k  ∩{| t i +1 − t i ∨ ˆ t s | ≥ T (cid:15)/ }  → , (A.29)or P   k t i ∧ ˆ t s − X s = t i X s ( β i +2 − ˜ β s +1 ) X s k > k t i ∧ ˆ t s − X s = t i X s ( β i +1 − β i +2 ) X s k  ∩{| t i +2 ∧ ˆ t s +1 − t i +1 | ≥ T (cid:15)/ }  → , (A.30)has to hold to contradict |A T | < m . Since ˜ β s +1 is a consistent estimator accordingto Theorem 1 and the Continuous Mapping Theorem, we either have ˜ β s +1 p → β i +1 or˜ β s +1 p → β i +2 . In the former case, the left hand side of Inequality (A.29) convergesto zero. In the latter case, the left hand side of Inequality (A.30) converges to zero.Hence, there is no situation in which not at least one probability converges to zero.Consequently, we have a contradiction to |A T | < m .For the second part, we deﬁne ˆ T k = { ˆ t , ˆ t , . . . , ˆ t k } . Then, it is enough to show that P ( { d H ( A T , A ) > T (cid:15), m ≤ |A T | ≤ T } ) (A.31)= T X k = m P (cid:16) { d H ( ˆ T k , A ) > T (cid:15) } (cid:17) P ( |A T | = k ) → , as T → ∞ . By Theorem 2, we have already shown that P (cid:16) d H ( ˆ T m , A ) > T (cid:15) (cid:17) → k>m P (cid:16) d H ( ˆ T k , A ) > T (cid:15) (cid:17) → . (A.32)32iven t i , we deﬁne B T,k,i, = {∀ ≤ s ≤ k, | ˆ t s − t i | ≥ T (cid:15) and ˆ t s < t i } B T,k,i, = {∀ ≤ s ≤ k, | ˆ t s − t i | ≥ T (cid:15) and ˆ t s > t i } (A.33) B T,k,i, = {∃ ≤ s ≤ k − , such that | ˆ t s − t i | ≥ T (cid:15), | ˆ t s +1 − t i | ≥ T (cid:15) and ˆ t s < t i < ˆ t s +1 } . Then, max k>m P (cid:16) d H ( ˆ T k , A ) > T (cid:15) (cid:17) = max k>m P  m [ i =1 3 [ j =1 B T,k,i,j  . (A.34)Using similar arguments as in the proof of Theorem 2, we can show thatmax k>m P ( S m i =1 B T,k,i,j ) → ≤ j ≤

3. This completes the proof of Theorem3. (cid:3)

Proof of Theorem 4.

To prove Theorem 4, we follow ideas similar to those put forthin Wang and Leng (2008) and Zhang and Xiang (2016). As we will note at diﬀerentpoints of the proof, the statistical properties of the adaptive group lasso estimator hingecrucially on our ﬁrst step weights. It is particularly important that our second stepdesign matrix Z S fulﬁls the restricted eigenvalue condition which can be ensured bythe ﬁrst step group lasso algorithm.We note that the adaptive group lasso objective function Q ( θ S ) is a strictly convexfunction and show that there is a local minimizer which is superconsistent. Then byglobal convexity of Q ( θ S ), it follows that such a local minimizer must be ˆ θ S . Similaras in Fan and Li (2001), the existence of an above-described local minimizer is impliedby the fact that for any (cid:15) >

0, there is a suﬃciently large constant

C >

0, such thatlim inf T P inf v :=( v ,..., v M ) ∈ R MN : k v k = C Q ( θ S + T − v ) > Q ( θ S ) ! > − (cid:15). (A.35)33t holds that Q ( θ S + T − v ) − Q ( θ S )= 1 T k y T − Z S ( θ S + T − v ) k + λ S M X i =1 w i k ( θ S,i + T − v i ) k− T k y T − Z S θ S k − λ S M X i =1 w i k θ S,i k = 1 T v (cid:18) T Z S Z S (cid:19) v − T v Z S u T (A.36)+ λ S M X i =1 w i k ( θ S,i + T − v i ) k − λ S M X i =1 w i k θ S,i k≥ T v (cid:18) T Z S Z S (cid:19) v − T v Z S u T + λ S X g ( i ) ∈A T ∩A k ˜ θ S,i k − γ (cid:16) k ( θ S,i + T − v i ) k − k θ S,i k (cid:17) ≥ T v (cid:18) T Z S Z S (cid:19) v − T v Z S u T − T λ S X g ( i ) ∈A T ∩A k ˜ θ S,i k − γ k v i k = I − I − I . Since the restricted eigenvalue condition holds for Σ S = Z S Z S /T , i.e., its eigenvaluesare positive for all T , and Σ S thus converges to a positive deﬁnite random matrix,we have I = O p ( T − ) k v k . Further, it follows from Cauchy-Schwarz inequality andLemma 1 that E | I | = 4 T E ( v Z S u T ) ≤ T k v k E k Z S u T k (A.37)= 1 T k v k O p (1) , and consequently I = O p ( T − ) k v k . Finally, using the Cauchy-Schwarz inequality, wehave I ≤ T λ S  X g ( i ) ∈A T ∩A k ˜ θ S,i k − γ  / k v k (A.38) ≤ T λ S m / min g ( i ) ∈A T ∩A k ˜ θ S,i k − γ k v k .

34e note that min g ( i ) ∈A T ∩A k ˜ θ S,i k − γ = O p (1) since ˜ θ S,i is a consistent estimator accord-ing to Theorem 1 and our ﬁrst step estimation does not ignore relevant breakpointsasymptotically according to Theorem 3. Using the condition λ S →

0, we know that I is bounded by O p ( T − ) k v k . Hence, we can specify a large enough constant C such that I dominates I and I . This completes the proof of part (a).Next, we turn to the proof of part (b). Lemma 3 gives the necessary and suﬃcientcondition for an estimator to be a solution to the adaptive group lasso objective functionas deﬁned by Equation (7). Now, to prove that all truly zero parameters are set to zeroalmost surely, it suﬃces to show that P (cid:18) ∀ g ( i ) ∈ A T ∩ A c , k T Z g ( i ) ( y T − Z S, A ∗ ˆ θ S, A ∗ ) k ≤ λ S k ˜ θ S,i k − γ (cid:19) → , (A.39)or equivalently P (cid:18) ∃ g ( i ) ∈ A T ∩ A c , k T Z g ( i ) ( y T − Z S, A ∗ ˆ θ S, A ∗ ) k > λ S k ˜ θ S,i k − γ (cid:19) → , (A.40)where Z g ( i ) = (0 , . . . , , X g ( i ) , X g ( i )+1 , . . . , X T ). It holds that P (cid:18) ∃ g ( i ) ∈ A T ∩ A c , k T Z g ( i ) ( y T − Z S, A ∗ ˆ θ S, A ∗ ) k > λ S k ˜ θ S,i k − γ (cid:19) (A.41) ≤ P (cid:18) ∃ g ( i ) ∈ A T ∩ A c , k T Z g ( i ) u T k > λ S k ˜ θ S,i k − γ − k T Z g ( i ) Z S, A ∗ (cid:16) ˆ θ S, A ∗ − θ S, A ∗ (cid:17) k (cid:19) . Further, we have k T Z g ( i ) Z S, A ∗ (cid:16) ˆ θ S, A ∗ − θ S, A ∗ (cid:17) k≤ (cid:20)(cid:16) ˆ θ S, A ∗ − θ S, A ∗ (cid:17) Z S, A ∗ (cid:18) T Z g ( i ) Z g ( i ) (cid:19) Z S, A ∗ (cid:16) ˆ θ S, A ∗ − θ S, A ∗ (cid:17)(cid:21) / (A.42) ≤ O p ( T ) k ˆ θ S, A ∗ − θ S, A ∗ k , and the ﬁrst part of Theorem 4 implies that k ˆ θ S, A ∗ − θ S, A ∗ k = O p ( T − ) such that k T Z g ( i ) Z S, A ∗ (cid:16) ˆ θ S, A ∗ − θ S, A ∗ (cid:17) k = O p (1). Hence, we need to prove P (cid:18) ∃ g ( i ) ∈ A T ∩ A c , k T Z g ( i ) u T k > λ S k ˜ θ S,i k − γ (cid:19) → . (A.43)Considering that Theorem 1 implies max g ( i ) ∈A c k ˜ θ S,i k = O p ( T − (1 − δ ) / ) for 1 / < δ <

1, we35ave P (cid:18) ∃ g ( i ) ∈ A T ∩ A c , k T Z g ( i ) u T k > λ S k ˜ θ S,i k − γ (cid:19) ≤ P  ∃ g ( i ) ∈ A T ∩ A c , k T Z g ( i ) u T k > λ S i ∈A c k ˜ θ S,i k γ  ≤ P (cid:18) ∃ g ( i ) ∈ A T ∩ A c , k T Z g ( i ) u T k > λ S (cid:16) CT − (1 − δ ) / (cid:17) − γ (cid:19) (A.44) ≤ X g ( i ) ∈A T ∩A c P (cid:18) k T Z g ( i ) u T k > λ S (cid:16) CT − (1 − δ ) / (cid:17) − γ (cid:19) ≤ X g ( i ) ∈A T ∩A c E k T Z g ( i ) u T k λ S ( CT − (1 − δ ) / , ) − γ for some C >

0. Since Lemma 1 implies E k T Z g ( i ) u T k = O p (1) and λ S T (1 − δ ) γ → ∞ ,we have 4 E k T Z g ( i ) u T k λ S C − γ T (1 − δ ) γ → , (A.45)for all g ( i ) ∈ A T ∩ A c . Note that |A T | < M for all T and that all remaining indices i not included in A T correspond to coeﬃcients which have already been set to zero inthe ﬁrst step.For the proof of model selection consistency, we still need to show that no trulynon-zero parameter changes are set to zero. It holds thatmin g ( i ) ∈A k ˆ θ S,i k ≥ min g ( i ) ∈A k θ S,i k − max g ( i ) ∈A k ˆ θ S,i − θ S,i k . (A.46)Since k ˆ θ S,i − θ S,i k p → P min g ( i ) ∈A k ˆ θ S,i k ≥ ν ! → . (A.47)This completes the proof of part (b).Finally, we turn to the proof of part (c). It follows from Lemma 2 that P (cid:18) − T Z S, ¯ A ∗ (cid:16) y T − Z S, ¯ A ∗ ˆ θ S, ¯ A ∗ (cid:17) + 12 λ S η = 0 (cid:19) → , (A.48)36here η =  ˆ θ S, k ˜ θ S, k γ k ˆ θ S, k , ˆ θ S, k ˜ θ S, k γ k ˆ θ S, k , . . . , ˆ θ S,m +1 k ˜ θ S,m +1 k γ k ˆ θ S,m +1 k  . (A.49)Using y T = Z S, ¯ A ∗ θ S, ¯ A ∗ + u T , we have P (cid:18) T Z S, ¯ A ∗ Z S, ¯ A ∗ (cid:16) ˆ θ S, ¯ A ∗ − θ S, ¯ A ∗ (cid:17) = 1 T Z S, ¯ A ∗ u T − λ S η (cid:19) → . (A.50)Then, it holds that T (ˆ θ S, ¯ A ∗ − θ S, ¯ A ∗ ) = (cid:18) T Z S, ¯ A ∗ Z S, ¯ A ∗ (cid:19) − T Z S, ¯ A ∗ u T − λ S T (cid:18) T Z S, ¯ A ∗ Z S, ¯ A ∗ (cid:19) − η + o p (1) . (A.51)Since Z S, ¯ A ∗ Z S, ¯ A ∗ /T = O p (1), we have | λ S T Z S, ¯ A ∗ Z S, ¯ A ∗ η | ≤ λ S C k η k ≤ λ S C ( m + 1) / min i ∈ ¯ A k ˜ θ i k − γ . (A.52)As in part (a), it holds that min i ∈ ¯ A k ˜ θ i k − γ = O p (1) and by the conditions of Theorem4, we have λ S → T → ∞ such that (cid:12)(cid:12)(cid:12)(cid:12) λ S T Z S, ¯ A ∗ Z S, ¯ A ∗ η (cid:12)(cid:12)(cid:12)(cid:12) = o p (1) . (A.53)We use ω v j to denote the long-run variance of the stationary process { v jt } ∞ t =1 , j =1 , . . . , N . Note that ω v j is the j -th diagonal element of Ω v . Under the conditions ofAssumption 1, it holds that T − / X j [ T s ] = T − / T s ] X t =1 v jt ⇒ B ( s ) , (A.54)for s ∈ [0 , j ∈ { , . . . , N } and T → ∞ , where B ( s ) is a scalar Brownian motion withvariance ω v j . Further, it holds that T − / Z S, A ∗ , [ T s ] ⇒ ( B ( s ) , B ( s ) ϕ τ ( s ) , . . . , B ( s ) ϕ τ m ( s )) ≡ B τ, ¯ A ∗ ( s ) , (A.55)37here ϕ τ k ( s )  s < τ k s ≥ τ k , k ∈ { , . . . , m } , s ∈ [0 , , (A.56)and B ( s ) is N -vector Brownian motion process with covariance matrix Ω v . Using (A.4)in Gregory and Hansen (1996a) and the Continuous Mapping Theorem, we observe that1 T Z S, ¯ A ∗ Z S, ¯ A ∗ ⇒ Z B τ, ¯ A ∗ ( s ) B τ, ¯ A ∗ ( s ) ds, (A.57)where the weak convergence is uniform over the vector ( τ , . . . , τ m ) ∈ T . Further,using (A.3) in Gregory and Hansen (1996a) and Theorem 3.1 in Hansen (1992b), wehave the weak convergence to a stochastic integral1 T Z S, ¯ A ∗ u T ⇒ Z B τ, ¯ A ∗ ( s ) dU ( s ) +  Λ(1 − τ )Λ...(1 − τ m )Λ  , (A.58)where Λ = ∞ P t =0 E ( v u t ). This completes the proof. (cid:3) Tables

Table 1: Estimation of (multiple) structural breaks in the slope coeﬃcients (Adaptive Group Lasso)

SB1: µ = 2, θ k = 2, k = { , } , ( τ = 0 . T pce hd/T τ θ , θ , θ , θ ,

100 100 0.48 0.501 (0.022) 2.01 (0.144) 1.99 (0.168) 2.01 (0.155) 1.98 (0.168)200 100 0.17 0.500 (0.009) 2.01 (0.073) 2.00 (0.087) 2.00 (0.073) 1.99 (0.086)400 100 0.08 0.500 (0.004) 2.00 (0.042) 2.00 (0.057) 2.00 (0.040) 2.00 (0.053)SB2: µ = 2, θ k = 2, k = { , , } , ( τ = 0 . τ = 0 . T pce hd/T τ τ θ , θ , θ , θ , θ , θ ,

100 99.8 0.69 0.327 (0.024) 0.672 (0.024) 2.01 (0.221) 2.00 (0.254) 1.98 (0.306) 2.02 (0.217) 2.01 (0.269) 1.97 (0.321)200 100 0.25 0.329 (0.010) 0.670 (0.006) 2.00 (0.092) 2.00 (0.117) 2.00 (0.114) 2.01 (0.093) 2.00 (0.116) 2.00 (0.105)400 100 0.06 0.330 (0.004) 0.670 (0.002) 2.00 (0.047) 2.00 (0.059) 2.00 (0.053) 2.00 (0.046) 2.00 (0.058) 2.00 (0.055)SB4: µ = 2, θ k = 2, k = { , . . . , } , ( τ = 0 . τ = 0 . τ = 0 . τ = 0 . T pce hd/T τ τ τ τ

100 96.5 1.66 0.201 (0.024) 0.402 (0.032) 0.601 (0.026) 0.801 (0.016)200 100 0.57 0.199 (0.008) 0.400 (0.006) 0.600 (0.006) 0.800 (0.009)400 100 0.16 0.200 (0.006) 0.400 (0.003) 0.600 (0.002) 0.800 (0.003)

T θ , θ , θ , θ , θ ,

100 2.04 (0.380) 2.01 (0.497) 1.98 (0.544) 1.96 (0.556) 1.99 (0.504)200 2.02 (0.191) 1.99 (0.204) 2.00 (0.213) 2.00 (0.203) 2.00 (0.203)400 2.00 (0.074) 2.00 (0.096) 2.00 (0.092) 2.00 (0.087) 2.00 (0.091)

T θ , θ , θ , θ , θ ,

100 2.04 (0.358) 2.01 (0.426) 1.96 (0.495) 1.97 (0.532) 1.99 (0.447)200 2.02 (0.211) 1.99 (0.222) 2.00 (0.208) 1.99 (0.203) 2.00 (0.203)400 2.00 (0.074) 2.00 (0.097) 2.00 (0.090) 2.00 (0.087) 2.00 (0.087)

Note: We use 1,000 replications of the data-generating process given in Equation (10). The variance of the error terms is σ ω = 1 and σ ϑ = 4, respectively. The ﬁrstpanel reports the results for one active breakpoint at τ = 0 .

5, the second panel considers two active breakpoints at τ = 0 .

33 and τ = 0 .

67 and the third panel has fouractive breakpoints at τ = 0 . τ = 0 . τ = 0 .

6, and τ = 0 .

8. The baseline coeﬃcients and parameter changes at all breakpoints take the value 2. Standard deviationsare given in parentheses. able 2: Estimation of (multiple) structural breaks in the slope coeﬃcients (Bai-Perron) SB1: µ = 2, θ k = 2, k = { , } , ( τ = 0 . T pce hd/T τ θ , θ , θ , θ ,

100 100 0.23 0.500 (0.009) 2.00 (0.155) 2.00 (0.214) 2.00 (0.156) 1.99 (0.207)200 100 0.07 0.500 (0.004) 2.00 (0.077) 2.00 (0.109) 2.00 (0.076) 2.00 (0.106)400 100 0.03 0.500 (0.002) 2.00 (0.039) 2.00 (0.054) 2.00 (0.040) 2.00 (0.054)SB2: µ = 2, θ k = 2, k = { , , } , ( τ = 0 . τ = 0 . T pce hd/T τ τ θ , θ , θ , θ , θ , θ ,

100 100 0.40 0.330 (0.008) 0.670 (0.008) 2.00 (0.238) 2.01 (0.322) 1.99 (0.338) 2.01 (0.239) 1.99 (0.338) 1.99 (0.327)200 100 0.15 0.330 (0.004) 0.670 (0.003) 2.00 (0.120) 1.99 (0.165) 2.00 (0.164) 2.00 (0.119) 2.01 (0.165) 2.00 (0.159)400 100 0.06 0.330 (0.002) 0.670 (0.002) 2.00 (0.059) 2.00 (0.080) 2.00 (0.082) 2.00 (0.058) 2.00 (0.078) 2.00 (0.082)SB4: µ = 2, θ k = 2, k = { , . . . , } , ( τ = 0 . τ = 0 . τ = 0 . τ = 0 . T pce hd/T τ τ τ τ

100 96.7 0.86 0.200 (0.011) 0.401 (0.009) 0.600 (0.011) 0.799 (0.011)200 100 0.32 0.200 (0.005) 0.400 (0.004) 0.600 (0.004) 0.800 (0.004)400 100 0.13 0.200 (0.002) 0.400 (0.002) 0.600 (0.001) 0.800 (0.002)

T θ , θ , θ , θ , θ ,

100 1.97 (0.400) 2.05 (0.558) 2.00 (0.550) 2.00 (0.587) 1.98 (0.575)200 2.00 (0.194) 1.99 (0.266) 2.00 (0.266) 2.00 (0.279) 1.98 (0.271)400 2.00 (0.096) 2.00 (0.134) 2.00 (0.133) 2.00 (0.135) 2.00 (0.132)

T θ , θ , θ , θ , θ ,

100 2.02 (0.434) 2.00 (0.539) 1.98 (0.559) 1.99 (0.556) 2.01 (0.559)200 2.00 (0.202) 2.00 (0.274) 2.01 (0.251) 2.00 (0.272) 2.02 (0.270)400 2.00 (0.095) 2.00 (0.136) 2.00 (0.135) 2.00 (0.133) 2.01 (0.136)

5, the second panel considers two active breakpoints at τ = 0 .

33 and τ = 0 .

67 and the third panel has fouractive breakpoints at τ = 0 . τ = 0 . τ = 0 .

6, and τ = 0 .

8. The baseline coeﬃcients and parameter changes at all breakpoints take the value 2. Standard deviationsare given in parentheses. able 3: Estimation of partial structural breaks SB1: µ = 2, θ = 2, θ , = 2, θ , = 0 ( τ = 0 . T pce hd/T τ θ , θ , θ , θ ,

100 99.6 0.80 0.499 (0.033) 2.01 (0.145) 1.98 (0.186) 2.00 (0.137) − .

01 (0.158)200 100 0.26 0.501 (0.013) 2.00 (0.078) 1.99 (0.104) 2.00 (0.074) 0.00 (0.082)400 100 0.09 0.500 (0.005) 2.00 (0.040) 2.00 (0.053) 2.00 (0.034) 0.00 (0.042)SB2: µ = 2, θ = 2, θ , = 2, θ ,j = 0, j = { , } , ( τ = 0 . τ = 0 . T pce hd/T τ τ θ , θ , θ , θ , θ , θ ,

100 98.2 1.58 0.328 (0.034) 0.672 (0.032) 2.03 (0.221) 1.99 (0.270) 1.97 (0.293) 2.00 (0.190) − .

01 (0.230) 0.00 (0.226)200 100 0.57 0.328 (0.012) 0.671 (0.015) 2.01 (0.105) 1.99 (0.134) 2.00 (0.130) 2.00 (0.092) 0.00 (0.117) 0.00 (0.105)400 100 0.22 0.330 (0.005) 0.670 (0.005) 2.00 (0.049) 2.00 (0.091) 2.00 (0.078) 2.00 (0.047) 0.00 (0.058) 0.00 (0.055)SB4: µ = 2, θ = 2, θ , = 2, θ ,j = 0, j = { , . . . , } , ( τ = 0 . τ = 0 . τ = 0 . τ = 0 . T pce hd/T τ τ τ τ

100 94.9 2.86 0.205 (0.035) 0.401 (0.037) 0.600 (0.040) 0.797 (0.033)200 99.4 1.10 0.200 (0.015) 0.400 (0.013) 0.600 (0.015) 0.800 (0.015)400 100 0.37 0.200 (0.005) 0.400 (0.004) 0.600 (0.004) 0.800 (0.005)

T θ , θ , θ , θ , θ ,

100 2.07 (0.408) 2.00 (1.436) 1.93 (1.527) 1.96 (0.691) 1.98 (0.539)200 2.03 (0.217) 1.99 (0.245) 1.98 (0.276) 2.00 (0.236) 2.00 (0.211)400 2.01 (0.085) 1.99 (0.104) 2.00 (0.101) 2.00 (0.097) 2.00 (0.096)

T θ , θ , θ , θ , θ ,

100 2.01 (0.313) − .

02 (0.533) − .

01 (0.527) 0.00 (0.502) 0.01 (0.448)200 2.00 (0.154) 0.00 (0.188) 0.00 (0.183) 0.00 (0.187) 0.01 (0.177)400 2.00 (0.074) 0.00 (0.094) 0.00 (0.089) 0.00 (0.087) 0.01 (0.087)

5, the second panel considers two active breakpoints at τ = 0 .

33 and τ = 0 .

67 and the third panel has fouractive breakpoints at τ = 0 . τ = 0 . τ = 0 .

6, and τ = 0 .

8. The baseline coeﬃcients and take the value 2. We induce partial structural breaks through a change in θ only. Standard deviations are given in parentheses. able 4: Break fractions located near the boundaries of the unit interval SB1: µ = 2, θ k = 2, k = { , } , ( τ = 0 . T pce hd/T τ θ , θ , θ , θ ,

100 89.0 0.19 0.100 (0.013) 2.01 (0.622) 2.00 (0.621) 2.04 (0.581) 1.96 (0.584)200 96.6 0.18 0.098 (0.007) 2.00 (0.300) 2.00 (0.302) 2.01 (0.276) 1.99 (0.276)400 99.7 0.15 0.099 (0.007) 2.01 (0.150) 2.00 (0.151) 2.00 (0.142) 2.00 (0.142)SB1: µ = 2, θ k = 2, k = { , } , ( τ = 0 . T pce hd/T τ θ , θ , θ , θ ,

100 95.2 0.21 0.901 (0.008) 2.01 (0.090) 1.98 (0.477) 2.00 (0.081) 1.94 (0.493)200 98.0 0.18 0.901 (0.008) 2.00 (0.042) 1.99 (0.239) 2.00 (0.044) 2.00 (0.242)400 99.5 0.08 0.901 (0.005) 2.00 (0.022) 1.99 (0.132) 2.00 (0.021) 2.00 (0.124)SB2: µ = 2, θ k = 2, k = { , , } , ( τ = 0 . τ = 0 . T pce hd/T τ τ θ , θ , θ , θ , θ , θ ,

100 86.4 0.50 0.100 (0.016) 0.901 (0.013) 2.01 (0.623) 2.00 (0.624) 1.98 (0.490) 2.04 (0.587) 1.95 (0.589) 1.95 (0.480)200 94.8 0.47 0.099 (0.017) 0.901 (0.010) 2.00 (0.278) 2.00 (0.280) 1.98 (0.248) 2.02 (0.287) 1.98 (0.288) 1.99 (0.244)400 99.3 0.28 0.099 (0.008) 0.901 (0.005) 2.01 (0.158) 1.99 (0.159) 2.00 (0.132) 2.01 (0.147) 1.99 (0.146) 2.00 (0.123)SB2: µ = 2, θ k = 2, k = { , , } , ( τ = 0 . τ = 0 . T pce hd/T τ τ θ , θ , θ , θ , θ , θ ,

100 83.9 0.49 0.102 (0.023) 0.950 (0.012) 2.02 (0.640) 1.99 (0.638) 1.95 (0.876) 2.05 (0.597) 1.95 (0.599) 1.98 (0.855)200 92.8 0.40 0.098 (0.008) 0.950 (0.018) 2.00 (0.270) 2.00 (0.270) 1.97 (0.513) 2.02 (0.275) 1.98 (0.277) 1.97 (0.516)400 97.7 0.26 0.099 (0.007) 0.951 (0.005) 2.01 (0.154) 1.99 (0.154) 1.99 (0.251) 2.01 (0.143) 2.00 (0.143) 2.00 (0.244)

1, the second panel considers one active breakpoint near the right boundary at τ = 0 . τ = 0 . τ = 0 . τ = 0 . τ = 0 .

95. The baseline coeﬃcients andparameter changes at all breakpoints take the value 2. Standard deviations are given in parentheses. able 5: Endogeneity correction via dynamic augmentation SB1: µ = 2, θ k = 2, k = { , } , ( τ = 0 . T pce hd/T τ θ , θ , θ , θ , l = 1100 99.9 0.54 0.500 (0.025) 2.00 (0.153) 1.99 (0.170) 2.01 (0.147) 1.99 (0.170)200 100 0.14 0.500 (0.008) 2.00 (0.075) 2.00 (0.088) 2.00 (0.069) 2.00 (0.084) l = 2100 99.9 0.48 0.500 (0.022) 2.00 (0.169) 1.99 (0.201) 2.01 (0.170) 1.99 (0.212)200 100 0.16 0.500 (0.009) 2.00 (0.075) 2.00 (0.087) 2.00 (0.071) 2.00 (0.085)SB2: µ = 2, θ k = 2, k = { , , } , ( τ = 0 . τ = 0 . T pce hd/T τ τ θ , θ , θ , θ , θ , θ , l = 1100 99.2 0.93 0.329 (0.025) 0.672 (0.019) 2.03 (0.292) 1.97 (0.329) 1.99 (0.274) 2.03 (0.285) 1.98 (0.328) 1.98 (0.284)200 100 0.35 0.330 (0.008) 0.670 (0.010) 2.00 (0.099) 2.00 (0.128) 2.00 (0.141) 2.01 (0.134) 2.00 (0.128) 2.00 (0.158) l = 2100 99.4 1.06 0.329 (0.029) 0.670 (0.026) 2.03 (0.290) 1.97 (0.348) 1.99 (0.283) 2.03 (0.299) 1.97 (0.364) 1.99 (0.310)200 100 0.37 0.329 (0.008) 0.671 (0.011) 2.00 (0.101) 2.00 (0.135) 2.00 (0.123) 2.00 (0.109) 2.00 (0.138) 2.00 (0.133)SB4: µ = 2, θ = 2, θ , = 2, θ ,j = 0, j = { ,..., } , ( τ = 0 . τ = 0 . τ = 0 . τ = 0 . T pce hd/T τ τ τ τ l = 1100 98.3 1.42 0.203 (0.024) 0.402 (0.023) 0.600 (0.022) 0.800 (0.017)200 99.4 0.59 0.199 (0.010) 0.400 (0.006) 0.600 (0.009) 0.800 (0.009) l = 2100 97.9 1.52 0.202 (0.022) 0.401 (0.023) 0.600 (0.023) 0.799 (0.019)200 99.7 0.64 0.199 (0.011) 0.400 (0.008) 0.600 (0.011) 0.800 (0.011) T θ , θ , θ , θ , θ , θ , θ , θ , θ , θ , l = 1100 2.04 (0.386) 2.00 (0.422) 1.95 (0.487) 1.98 (0.457) 2.01 (0.417) 2.05 (0.403) 1.99 (0.476) 1.99 (0.503) 1.95 (0.474) 1.99 (0.417)200 2.01 (0.160) 2.00 (0.190) 1.99 (0.185) 2.01 (0.258) 1.99 (0.258) 2.01 (0.157) 2.01 (0.193) 1.99 (0.197) 2.01 (0.261) 1.99 (0.257) l = 2100 2.04 (0.397) 2.00 (0.415) 2.01 (0.460) 1.97 (0.452) 2.02 (0.373) 2.05 (0.409) 1.98 (0.437) 2.01 (0.480) 1.94 (0.485) 2.01 (0.380)200 2.01 (0.191) 2.00 (0.195) 1.98 (0.242) 2.01 (0.203) 1.99 (0.181) 2.01 (0.174) 2.01 (0.199) 1.98 (0.261) 2.00 (0.215) 1.99 (0.181) Note: We use 1,000 replications of the data-generating process given in Equation (10) with an endogenous error termspeciﬁcation. The covariance matrix of the error terms is speciﬁed according to Equation (11) with σ ω = 1 and σ ϑ = 4,respectively. We denote the number of leads and lags with l . The ﬁrst panel reports the results for one active breakpointat τ = 0 .

5, the second panel considers two active breakpoints at τ = 0 .

33 and τ = 0 .

67 and the third panel has fouractive breakpoints at τ = 0 . τ = 0 . τ = 0 .

6, and τ = 0 .

8. The baseline coeﬃcients and parameter changes at allbreakpoints take the value 2. Standard deviations are given in parentheses. eferences Andrews, D. W. K., 1993. Tests for Parameter Instability and Structural Change WithUnknown Change Point. Econometrica 61 (4), 821–856.Arai, Y., Kurozumi, E., 2007. Testing for the Null Hypothesis of Cointegration with aStructural Break. Econometric Reviews 26 (6), 705–739.Aue, A., Horváth, L., 2013. Structural breaks in time series. Journal of Time SeriesAnalysis 34 (1), 1–16.Bae, Y., Jong, R. M. D., 2007. Money Demand Function Estimation by NonlinearCointegration. Journal of Applied Econometrics 22, 767–793.Bai, J., Lumsdaine, R. L., Stock, J. H., 1998. Testing for and Dating Common Breaksin Multivariate Time Series. The Review of Economic Studies 65 (3), 395–432.Bai, J., Perron, P., 1998. Estimating and Testing Linear Models with Multiple Struc-tural Changes. Econometrica 66 (1), 47–78.Bai, J., Perron, P., 2003. Computation and analysis of multiple structural change mod-els. Journal of Applied Econometrics 18 (1), 1–22.Bai, J., Perron, P., 2006. Multiple structural change models: A simulation analysis.In: Corbae, D., Durlauf, S., Hansen, B. (Eds.), Econometric Theory and Practice.Cambridge University Press, pp. 212–337.Ball, L., 2001. Another look at long-run money demand. Journal of Monetary Economics47 (1), 31–44.Behrendt, S., Schweikert, K., 2020. A Note on Adaptive Group Lasso for StructuralBreak Time Series. Econometrics and Statistics, forthcoming.Bickel, P. J., Ritov, Y., Tsybakov, A. B., 2009. Simultaneous analysis of lasso anddantzig selector. Annals of Statistics 37 (4), 1705–1732.Billingsley, P., 1999. Convergence of Probability Measures, 2nd Edition. Wiley, NewYork.Boot, T., Pick, A., 2019. Does modeling a structural break improve forecast accuracy?Journal of Econometrics, 1–25. 44oysen, L., Kempe, A., Liebscher, V., Munk, A., Wittich, O., 2009. Consistencies andrates of convergence of jump-penalized least squares estimators. Annals of Statistics37 (1), 157–183.Breheny, P., Huang, J., 2009. Penalized methods for bi-level variable selection. Statisticsand its interface 2 (3), 369–380.Campos, J., Ericsson, N. R., Hendry, D. F., 1996. Cointegration tests in the presenceof structural breaks. Journal of Econometrics 70 (1), 187–220.Carrion-i Silvestre, J. L., Sanso, A., 2006. Testing for the Null Hypothesis of Cointe-gration with Structural Breaks. Oxford Bulletin of Economics and Statistics 68 (5),623–646.Chan, N. H., Yau, C. Y., Zhang, R.-M., 2014. Group LASSO for Structural Break TimeSeries. Journal of the American Statistical Association 109 (506), 590–599.Chen, S. L., Wu, J. L., 2005. Long-run money demand revisited: Evidence from anon-linear approach. Journal of International Money and Finance 24 (1), 19–37.Ciuperca, G., 2014. Model selection by LASSO methods in a change-point model. Sta-tistical Papers 55 (2), 349–374.Davidson, J., Monticini, A., 2010. Tests for cointegration with structural breaks basedon subsamples. Computational Statistics and Data Analysis 54 (11), 2498–2511.Davis, R. A., Lee, T. C. M., Rodriguez-Yam, G. A., 2006. Structural Break Estimationfor Nonstationary Time Series Models. Journal of the American Statistical Associa-tion 101 (473), 223–239.Fan, J., Li, R., 2001. Variable selection via nonconcave penalized likelihood and itsoracle properties. Journal of the American Statistical Association 96 (456), 1348–1360.Gregory, A. W., Hansen, B. E., 1996a. Residual-based tests for cointegration in modelswith regime shifts. Journal of Econometrics 70 (2), 99–126.Gregory, A. W., Hansen, B. E., 1996b. Tests for Cointegration in Models with Regimeand Trend Shifts. Oxford Bulletin of Economics and Statistics 58 (3), 555–560.45regory, A. W., Nason, J. M., Watt, D. G., 1996. Testing for structural breaks incointegrated relationships. Journal of Econometrics 71 (1-2), 321–341.Hansen, B. E., 1992a. Convergence to Stochastic Integrals for Dependent HeterogeneousProcesses. Econometric Theory 8 (4), 489–500.Hansen, B. E., 1992b. Eﬃcient estimation and testing of cointegrating vectors in thepresence of deterministic trends. Journal of Econometrics 53, 87–121.Harchaoui, Z., Lévy-Leduc, C., 2010. Multiple Change-Point Estimation With a TotalVariation Penalty. Journal of the American Statistical Association 105 (April), 1480–1493.Hatemi-J, A., 2008. Tests for cointegration with two unknown regime shifts with anapplication to ﬁnancial market integration. Empirical Economics 35 (3), 497–505.He, K., Huang, J. Z., 2016. Asymptotic properties of adaptive group Lasso for sparsereduced rank regression. Stat 5 (1), 251–261.Hoﬀman, D. L., Rasche, R. H., 1991. Long-Run Income and Interest Elasticities ofMoney Demand in the United States. The Review of Economics and Statistics 73 (4),665–674.Horowitz, J., Huang, J., 2013. Penalized estimation of high-dimensional models undera generalized sparsity condition. Statistica Sinica 23 (2), 725–748.Huang, J., Ma, S., Xie, H., Zhang, C. H., 2009. A group bridge approach for variableselection. Biometrika 96 (2), 339–355.Huang, J., Ma, S., Zhang, C.-H., 2008. Adaptive Lasso for Sparse High-DimensionalRegression Models. Statistica Sinica 18, 1603–1618.Ireland, P. N., 2009. On the Welfare Cost of Inﬂation and the Recent Behavior of MoneyDemand. The American Economic Review 99 (3), 1040–1052.Jawadi, F., Sousa, R. M., 2013. Money demand in the euro area, the US and the UK:Assessing the role of nonlinearity. Economic Modelling 32 (1), 507–515.Jin, B., Shi, X., Wu, Y., 2013. A novel and fast methodology for simultaneous multi-ple structural break estimation and variable selection for nonstationary time seriesmodels. Statistics and Computing 23 (2), 221–231.46in, B., Wu, Y., Shi, X., 2016. Consistent two-stage multiple change-point detection inlinear models. Canadian Journal of Statistics 44 (2), 161–179.Johansen, S., 2005. Interpretation of cointegrating coeﬃcients in the cointegrated vectorautoregressive model. Oxford Bulletin of Economics and Statistics 67 (1), 93–104.Juselius, K., 2006. The Cointegrated VAR Model: Methodology and Applications. Ox-ford University Press, United Kingdom.Kejriwal, M., Perron, P., 2008. The limit distribution of the estimates in cointegratedregression models with multiple structural changes. Journal of Econometrics 146 (1),59–73.Kejriwal, M., Perron, P., 2010. Testing for Multiple Structural Changes in CointegratedRegression Models. Journal of Business & Economic Statistics 28 (4), 503–522.Kock, A. B., 2016. Consistent and Conservative Model Selection With the AdaptiveLasso in Stationary and Nonstationary Autoregressions. Econometric Theory 32 (1),243–259.Li, Y., Perron, P., 2017. Inference on locally ordered breaks in multiple regressions.Econometric Reviews 36 (1-3), 289–353.Lucas, R. E., 1988. Money Demand in the United States: A Quantitive Review.Carnegie-Rochester Conference on Public Policy 29, 137–168.Lucas, R. E., 2000. Inﬂation and Welfare. Econometrica 68 (2), 247–274.Lucas, R. E., Nicolini, J. P., 2015. On the stability of money demand. Journal ofMonetary Economics 73, 48–65.Maki, D., 2012. Tests for cointegration allowing for an unknown number of breaks.Economic Modelling 29 (5), 2011–2015.Meinshausen, N., Bühlmann, P., 2006. High-dimensional graphs and variable selectionwith the Lasso. Annals of Statistics 34 (3), 1436–1462.Meltzer, A. H., 1963. The Demand for Money: The Evidence from the Time Series.The Journal of Political Economy 71 (3), 219–246.47ogliani, M., Urga, G., 2018. On the Instability of Long-Run Money Demand and theWelfare Cost of Inﬂation in the United States. Journal of Money, Credit and Banking50 (7), 1645–1660.Niu, Y. S., Hao, N., Zhang, H., 2015. Multiple Change-point Detection: A SelectiveOverview. Statistical Science 31 (4), 611–623.Oka, T., Perron, P., 2018. Testing for common breaks in a multiple equations system.Journal of Econometrics 204 (1), 66–85.Perron, P., 2006. Dealing with structural breaks. In: Hassani, H., Mills, T., Patterson,K. (Eds.), Palgrave Handbook of Econometrics - Volume 1: Econometric Theory.Palgrave Macmillan UK, pp. 278–352.Qian, J., Jia, J., 2016. On stepwise pattern recovery of the fused Lasso. ComputationalStatistics and Data Analysis 94, 221–237.Qian, J., Su, L., 2016. Shrinkage Estimation of Regression Models With Multiple Struc-tural Changes. Econometric Theory 32 (6), 1376–1433.Qu, Z., 2007. Searching for cointegration in a dynamic system. Econometrics Journal10 (3), 580–604.Qu, Z., Perron, P., 2007. Estimating and testing structural change in multivariateregressions. Econometrica 75 (2), 459–502.Saikkonen, P., 1991. Asymptotically eﬃcient estimation of cointegration regressions.Econometric Theory 7 (1), 1–21.Schmidt, A., Schweikert, K., 2019. Multiple Structural Breaks in Cointegrating Regres-sions: A Model Selection Approach. SSRN Workingpaper, 1–49.URL https://ssrn.com/abstract=3489870https://ssrn.com/abstract=3489870