Detection of similar successive groups in a model with diverging number of variable groups
DDetection of similar successive groups in a model withdiverging number of variable groups
Gabriela CIUPERCA , Mat´uˇs MACIAK , Franc¸ois WAHL Institut Camille Jordan, UMR 5208, Universit´e Claude Bernard Lyon 1, France Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic
Abstract
In this paper, a linear model with grouped explanatory variables is considered. Theidea is to perform an automatic detection of different successive groups of the unknowncoefficients under the assumption that the number of groups is of the same order as thesample size. The standard least squares loss function and the quantile loss function areboth used together with the fused and adaptive fused penalty to simultaneously estimateand group the unknown parameters. The proper convergence rate is given for the obtainedestimators and the upper bound for the number of different successive group is derived. Asimulation study is used to compare the empirical performance of the proposed fused andadaptive fused estimators and a real application on the air quality data demonstrates thepractical applicability of the proposed methods.
Keywords: different successive groups; fused group; diverging-dimensional group model; adap-tive penalty.
Subject Classifications : 62F12; 62F35; 62J07.
The idea of this paper is to automatically detect different successive groups of unknown coef-ficients of some explanatory variables in a multivariate linear model. The number of groupsis supposed to be of the same order as the number of observations. For a given loss function,the fused type penalties allow this automatic detection of these successive groups of the un-known coefficients. Depending on the assumptions imposed on the model errors, two modelingframeworks are considered: either the standard least squares loss function is used or the robustquantile loss function is considered instead. Moreover, for each framework, two fused grouppenalties are proposed: firstly, the fused-type penalty which is later used to construct the adap-tive fused penalty leading to a more accurate selection of different successive groups. For eachCONTACT: G. Ciuperca, Universit´e de Lyon, Universit´e Claude Bernard Lyon 1, CNRS,UMR 5208, Institut Camille Jordan, Bat. Braconnier, M43, blvd du 11 novembre 1918, F -69622 Villeurbanne Cedex, France;
E-mail : [email protected] a r X i v : . [ s t a t . M E ] M a y f the two estimators the convergence rates are provided and the upper bound for the number ofthe successive groups is derived.In order to highlight the novelty of our work, we firstly make the state of the art regardingthe proposed fused method with the automatic detection of the grouped explanatory variables inthe multivariate linear model. Let g ∈ N denote the number of variable groups and let n ∈ N bethe total number of the available observations. The fused quantile method for a particular caseof non-grouped variables with the quantile level τ = 0 . was already considered in Liu et al.(2018) where the fused LASSO penalized least absolute deviation (LAD) estimator in a high-dimensional linear model is discussed and the proper convergence rate of the obtained estimatoris derived together with a linearized alternating directional method for finding the numericalsolution.The quantile linear model with a finite number of non-grouped explanatory variables is in-vestigated by Jiang et al. (2013) and Jiang et al. (2014) by utilizing the adaptive fused penaliza-tion. In Jiang et al. (2013), the oracle property for the difference in the estimated coefficientsfor two different quantile levels is proved. More precisely, an automatically detection of theunchanged quantile slope coefficients across various quantile levels is discussed. In Jiang et al.(2014), the adapted fused method is used to automatically select the explanatory variables andto identify their successive differences at the neighhoring quantile levels. For a linear quantileregression with g groups of explanatory variables, Ciuperca (2017) shows the oracle propertyfor the adaptive fused estimator when g = O ( n c ) , for ≤ c < .If the model errors satisfy the classical conditions (i.e., zero mean and bounded variance)then the least squares (LS) loss function is more appropriate: in such case, the high-dimensionallinear model with the automatic selection of the corresponding groups of the explanatory vari-ables with the adaptive LASSO penalty is considered by Wei and Huang (2010) for the Gaussianerrors when the number of groups is much larger than the sample size ( g (cid:29) n ) and by Zhangand Xiang (2016) for non Gaussian errors. These results are further elaborated in Wang and Tian(2019) for a generalized linear model when g = n c , with < c < . The automatic selection ofthe grouped variables is also considered in Guo et al. (2015) where the SCAD penalty is utilizedunder the assumption that the number of groups can grow at a certain polynomial rate with n . Acombination of the L and L norms under the Gaussian model errors is investigated in Camp-bell and Allen (2017), where the authors propose a structured variable selection in order to selectat least one variable from each group. To our best knowledge, the only papers considering thefused penalty with the main focus on the selection of variable groups, there is a paper of Li etal. (2014) where the LS loss function is penalized with the fused LASSO penalty, where the L norm si considered for the magnitudes of the parameters and also for the successive differencesof between the estimated coefficients.In the present paper, the penalty is of the fused type, that is, it is built against the L q, norm (with q ≥ ) of the difference between two successive groups of parameters, while in thementioned just before papers, the norm in the penalty is L , or L , of each parameter group,the goal being to automatically select the significant coefficient groups and not the identicalsuccessive coefficient groups. In a model, it can have successive vectors of non-zero parametersthat are not different. A practical example is given in Section 4 of the present paper on theinfluence of the groups of meteorological variables measured every hour, on the daily maximum2enzen concentration. It is this type of automatic detection that interests us in the present work.Whether for the quantile or LS methods, particular cases of L q, penalties of the differencebetween two successive groups of parameter vectors were considered within the change-pointsautomatic detection framework in linear model. Except that, in the literature, for linear modelswith change-points the statistical model is different from that considered in this paper, becausethe parameter number of the model was constant. For the LS loss function, we have in Zhangand Geng (2015) the sum of the penalties L , and L , , while in Qian and Su (2016) the penaltyis L , . For the quantile loss function, Ciuperca and Maciak (2019) consider the L , penalty.The paper is organized as follows. In Section 2 we introduce the model, assumptions and generalnotation. In Section 3, fused and adaptive fused group estimators for LS and quantile lossfunction are defined and asymptotically studied. In Section 4 we present a simulation study onthe proposed estimators ans an application on real data. The proofs of the results in Section 3are given in Section 5. In this section we state the model definition and some general assumptions imposed on the modeldesign. Let us start, however, with some notation which will be used throughout the paper: Alllimits in are taken with respect to n → ∞ ; All vectors are columns and matrices and vectorsare denoted with a bold face; For some matrix A we denote its transpose as A (cid:62) and for a set A , we denote by |A| its cardinality and by A its complement; Expressions µ min ( . ) and µ max ( . ) are used to refer to the smallest and largest eigenvalue of some positive definite matrix and for x = ( x , · · · , x p ) (cid:62) ∈ R p being some p dimensional vector (cid:107) x (cid:107) q = (cid:0) | x | q + · · · + | x p | q (cid:1) /q denotes its L q norm while (cid:107) x (cid:107) ∞ = max( | x | , · · · , | x p | ) stands for the maximum norm. If, inaddition, x = ( x (cid:62) , · · · , x (cid:62) g ) (cid:62) is a vector split into g subvectors, then (cid:80) gj =1 (cid:107) x j (cid:107) q defines forthe L q, norm of x .Moreover, β j = ( β j, , · · · , β j,p ) (cid:62) ∈ R p stands for the corresponding group specific vectorof the dimension p ∈ N , for any j ∈ { , . . . , g } , where g ∈ N is the number of the successivegroups. Last but not least, C denotes some positive generic constant not depending on n whichmay take different values in different formulas throughout the paper.In the present paper, we consider a multivariate linear model with g groups of explanatoryvariables. The number of groups g ∈ N depends on the sample size n ∈ N , g being known,such that g ≤ n/p , while the number of explanatory variables in each group is fixed and doesnot depend on n . Without reducing any generality, it is assumed that each group of the explana-tory variables contains the same number of variables, p ∈ N . Thus, the overall number of allparameters in the regression model is r n ≤ n .Let us consider the following linear model with the grouped explanatory variables Y i = g (cid:88) j =1 X (cid:62) i,j β j + ε i = X (cid:62) i β g + ε i , i = 1 , · · · , n, (1)with β g ≡ ( β (cid:62) , · · · , β (cid:62) g ) (cid:62) ∈ R r n , where β j ∈ R p is the vector of parameters for the group j ∈{ , . . . , g } . For each observation i ∈ { , . . . , n } , the vector X i ∈ R r n contains the explanatory3ariables X i,j ∈ R p from all groups. These group specific explanatory variables are assumedto be deterministic, for any j = 1 , . . . , g and i = 1 , . . . , n . The error terms { ε i } i are assumedto be independent and the response variable is denoted as Y i ., ε i . The true (unknown) vectorof parameters is β = ( β (cid:62) , . . . , β (cid:62) g ) (cid:62) . For p = 1 , the model with ungrouped explanatoryvariables is obtained. Note that the order of appearance of the groups in the model in (1) isimportant and some natural ordering is required.Given the data { ( Y i , X (cid:62) i ); i = 1 , . . . , n } we would like to automatically determine, usingthe fused method, whether two successive groups of the explanatory variables have the sameinfluence on the response or not while, at the same time, quantifying the corresponding effectmagnitudes. In addition to the example on the air pollution in Section 4, a nice demonstrationof the practical applicability of the proposed estimation method can be also seen in the veryrecent work of Zhou et al. (2012), where the fused group method allows for capturing the tem-poral smoothness of the predictive biomarkers on the cognitive scores in the progression of theAlzheimer’s disease. To achieve the sparsity property between two successive groups of theexplanatory variables (in a sense that the corresponding vectors of estimated parameters for twosuccessive groups are mostly the same), the fused and adaptive fused group estimators are pro-posed and studied with two loss functions: the standard least squares and the quantile checkfunction.The asymptotic behavior of the group specific estimators for the fused and the adaptive fusedmethod with n ≥ gp are investigated for n → ∞ where, in addition, a deterministic sequence ( b n ) n ∈ N is needed, such that b n → , n / b n → ∞ . (2) Example of such sequence which satisfies (2) is b n = (cid:0) n − log n (cid:1) / .Unlike Ciuperca (2017), where the number of groups is either fixed or it is of the order n c ,with < c < , the model in (1) assumes that the number of the successive groups may be ofthe same order as the sample size. A similar model is also considered in Ciuperca and Maciak(2019) where the change-point detection and estimation is performed in the quantile model withfused type penalty, however, for the unknown vector of parameters with the dimension p , notdepending on n . The same model is also considered in Leonardi and Buhlmann (2016) wherethe change-point locations are detected by utilizing the LS loss function with the LASSO typepenalty. Assumptions
The following regularity assumptions imposed on the model design are needed. The assumptionsrequired for the model errors will be presented in Subsection 3.1 for the quantile framework andin Subsection 3.2 for the LS framework. (A1) max (cid:54) i (cid:54) n (cid:107) X i (cid:107) ∞ ≤ C , for some constant C > . (A2) There exist two positive constants, < m ≤ M < ∞ , such that m ≤ µ min ( n − n (cid:88) i =1 X i X (cid:62) i ) ≤ µ max ( n − n (cid:88) i =1 X i X (cid:62) i ) ≤ M . In this section, two estimation frameworks are presented: the automatic detection and estimationof the successive groups of the explanatory variables is considered under two different modelerror assumptions. For each framework, the asymptotic properties are investigated. Firstly,the fused group estimator is proposed and, afterwards, the adaptive version of the fused groupestimator is defined.If the model errors { ε i } i (cid:54) i (cid:54) n in (1) do not meet the standard conditions for the existenceof the first two moments then the robust version needs to be employed, therefore, the quan-tile estimation technique is appropriate. On the other hand, if the conditions E [ ε i ] = 0 and V ar [ ε i ] < ∞ are satisfied, the penalized LS method is considered. The main results are pre-sented for both scenarios in next two subsections while the proofs are all postponed to Section5. Quantile loss function
Let the model errors in (1) satisfy the following: (A3)
Random errors ε i , for i = 1 , . . . , n , are independent and identically distributed (i.i.d.)with the continuous distribution function F , such that F (0) = P [ ε ≤
0] = τ , for someknown τ ∈ (0 , . The corresponding density function f with the nonzero compact sup-port G ⊆ R is supposed to be continuous and strictly positive in a neighborhood of zero.Moreover, the first derivative of f is bounded in a neighborhood of zero.Assumption (A3) on the errors is standard for the quantile regression models when the num-ber of parameters depends on the sample size n ∈ N (see, for instance, Ciuperca (2019) and Wuand Liu (2009)). The standard assumptions E [ ε ] = 0 and E [ ε ] < ∞ are not considered and,therefore, the least squares method is not appropriate. Since P [ ε <
0] = τ , we can consider thequantile method with the fixed quantile level τ ∈ (0 , , with the corresponding quantile checkfunction ρ τ ( u ) = u ( τ − { u< } ) , for u ∈ R . Thus, for the model in (1) the following quantilerandom process is obtained G n ( β g ) ≡ n (cid:88) i =1 ρ τ ( Y i − X (cid:62) i β g ) , (3)with the group quantile estimator defined as (cid:102) β g ≡ arg min β g ∈ R rn G n ( β g ) . (4)5or the particular case of τ = 0 . we obtain the median regression, for which the quantileprocess and the associated estimator (4) are reduced to the absolute deviation process and theleast absolute deviation estimator respectively. The following Lemma gives the appropriateconvergence rate of the group quantile estimator (cid:102) β g . Lemma 3.1
Under Assumptions (A1), (A2), and (A3) it holds that (cid:107) (cid:102) β g − β (cid:107) = O P ( b n ) , where ( b n ) n ∈ N is the sequence defined in (2) . The convergence rate of the group quantile estimator for the number of groups g = O ( n ) isdifferent from that obtained when g = O ( n c ) , with ≤ c < . Indeed, for ≤ c < theconvergence rate of (cid:102) β g is of the order O P ( gn − ) / = O P ( n ( c − / ) (see Lemma 1 of Ciu-perca (2019)) and the convergence rate of (cid:102) β g from (4) can not be obtained as a straightforwardextension of the situation where g = O ( n c ) for ≤ c < , when c → .In order to preserve the group effect of the explanatory variables and to simultaneously detectthe successive groups of identical parameter vectors the L q, norm of the consecutive differences β j − β j − , for j = 2 , · · · , g , is used as a penalty with some q ≥ fixed. Thus, the followingquantile process is considered Q n ( β g ) ≡ G n ( β g ) + nλ n g (cid:88) j =2 (cid:107) β j − β j − (cid:107) q . (5)For q = 1 the relation is (5) gives the process penalized with the standard L norm whilefor q = 2 the process is penalized by the L , norm. The positive sequence ( λ n ) n ∈ N plays arole of a tuning parameter, such that it converges to zero as the sample size tends to infinity.An additional condition on ( λ n ) n ∈ N will be given later when formulating the theorems with themain results.Based on the penalized process in (5), the corresponding fused group quantile estimator isobtained as (cid:99) β g ≡ arg min β g ∈ R rn Q n ( β g ) , (6)where (cid:99) β g = (cid:0)(cid:98) β (cid:62) , . . . , (cid:98) β (cid:62) g (cid:1) (cid:62) . The estimator (cid:99) β g depends on the norm considered in the penaltyterm of random process in (5) and, also, the tuning parameter λ n > .Let us define the set of indexes which form the true different successive groups B = (cid:8) j ∈ { , · · · , g } ; β j (cid:54) = β j − (cid:9) . (7)Since the values of the true parameter vector β are unknown the set B is left unknown too.Therefore, an analogous set is considered with respect to the differences of the estimated param-eters of two successive groups as (cid:98) B n = (cid:8) j ∈ { , · · · , g } ; (cid:98) β j (cid:54) = (cid:98) β j − (cid:9) . It is obvious, that this set is used to provide a reasonable estimate for B .6 emark 3.2 The results obtained in this section are also valid for p = 1 , which is, to the au-thors’ best knowledge, the case which has not been previously considered with in any literature.The number of the groups in Ciuperca (2017) is of order n c , with ≤ c < and, moreover, inCiuperca (2017), the goal is to select the groups of significant variables simultaneously with thegroup’s inheritance. The following theorem provides the convergence rate of the fused group quantile estimatordefined in (6), under the additional assumption that there is only a finite number of the successivegroups with different coefficients. For a suitable choice of the tuning parameter this convergencerate is of the same order as the sequence ( b n ) and, moreover, it is the same as the one obtainedfor the group quantile estimator in Lemma 3.1. The convergence rate of (cid:99) β g does not depend onthe L q norm considered in the penalty term in (5). Theorem 3.3
Under Assumptions (A1), (A2), and (A3), the condition in (2) , if, moreover, |B | < ∞ and λ n b − n −→ n →∞ , then (cid:107) (cid:99) β g − β (cid:107) = O P ( b n ) . Examples of such sequences ( λ n ) n ∈ N , ( b n ) n ∈ N which satisfy (2) and λ n b − n −→ n →∞ are λ n = n − (log n ) / and b n = (cid:0) n − log n (cid:1) / . Similarly as for the standard LASSO type penalties, the consistent selection of the differentsuccessive groups does not occur with the probability converging to 1 and some overfitting isobserved. The missclassification error | (cid:98) B n \ B | is used to assess the number of the differentsuccessive groups being mistakenly detected as different. The following theorem provides theupper bound for this missclassification error. Theorem 3.4
Under the same assumptions as in Theorem 3.3, there exists a positive constant C > , such that lim n →∞ P (cid:20) | (cid:98) B n \ B | ≤ C max (cid:18) b n λ n , b n (cid:19)(cid:21) = 1 . Note, that the upper bound in Theorem 3.4 depends on the tuning parameter λ n > and thesequence ( b n ) n ∈ N abd thus, it can be hypothetically unbounded from above. Nevertheless, thisresult provides the upper bound for the number of elements in (cid:98) B n , more specifically, it gives theupper bound for the number of successive groups of explanatory variables which have differentestimated effect on the response variable. Corollary 3.5
Since |B | < ∞ and (cid:12)(cid:12) (cid:98) B n \B (cid:12)(cid:12) ≥ | (cid:98) B n |−|B | with probability one, we can deduceby Theorem 3.4, that lim n →∞ P (cid:2) | (cid:98) B n | ≤ C max (cid:0) b n λ − n , b − n (cid:1)(cid:3) = 1 . Remark 3.6
For instance, if λ n = n − (log n ) / and b n = (cid:0) n − log n (cid:1) / , then the upperbound given by Theorem 3.4 is | (cid:98) B n | ≤ C max (cid:0) n / (log n ) − , n / (log n ) − / (cid:1) = Cn / (log n ) − / , hich implies that the number of elements contained by (cid:98) B n is much smaller than n / , however,it can converge to infinity for n → ∞ . To improve the estimation accuracy of B we consider an adaptive penalty constructed onthe basis of the estimator in (6). Let us consider the random process ∨ Q n ( β g ) ≡ G n ( β g ) + nλ n g (cid:88) j =2 (cid:98) ω n,j (cid:107) β j − β j − (cid:107) q , (8)with the adaptive weights (cid:98) ω n,j ≡ / max (cid:0) n − / , p (cid:88) k =1 | (cid:98) β j,k − (cid:98) β j − ,k | γ (cid:1) , for a fixed constant γ > , where (cid:98) β j = ( (cid:98) β (cid:62) j, , . . . , (cid:98) β (cid:62) j,p ) (cid:62) . Let us remark that for j (cid:54)∈ (cid:98) B n wehave (cid:98) β j − (cid:98) β j − = p . The tuning parameter sequences in relations (5) and (8) may be different,both with a convergence rate faster than the sequence ( b n ) n ∈ N . Therefore, the choice of n − / in (cid:98) ω n,j is used as deterministic sequence that converges to 0 when (cid:98) β j = (cid:98) β j − , however, withthe rate faster than b n because of the condition n / b n → ∞ in (2). Notice that ∨ Q n ( β ) ≡ G n ( β ) + nλ n (cid:80) gj =2 (cid:98) ω n,j (cid:107) β j − β j − (cid:107) q and the adaptive fused group quantile estimator for β is defined as ∨ β g ≡ arg min β g ∈ R rn ∨ Q n ( β g ) , where ∨ β g = (cid:0) ∨ β (cid:62) , · · · , ∨ β (cid:62) g (cid:1) (cid:62) . By Theorem 3.3, we have that for all j ∈ B there exists aconstant c > such that lim n →∞ P (cid:2)(cid:98) ω n,j > c | j ∈ B (cid:3) = 1 . (9)Therefore, taking into account the relation in (9) and the fact that γ > a similar proof to thatof Theorem 3.3 can be used to derive the convergence rate of ∨ β g which is the same as for (cid:99) β g . Theorem 3.7
Under Assumptions (A1), (A2), and (A3), the condition in (2) , if |B | < ∞ the forany sequence ( λ n ) n ∈ N such that λ n b − n −→ n →∞ it holds that (cid:107) ∨ β g − β (cid:107) = O P ( b n ) . Considering the adaptive fused group quantile estimator ∨ β g we can also define an updatedestimator for the set B as ∨ B n ≡ (cid:8) j ∈ { , · · · , g } ; ∨ β j (cid:54) = ∨ β j − (cid:9) , which is indeed more appropriate as shown by the next theorem where the upper bound for thecardinality of ∨ B n \ B is proved to be much smaller than the one for (cid:98) B n \ B in Theorem 3.4.8 heorem 3.8 Under the same assumptions as in Theorem 3.7, there exist a positive constant C such that, lim n →∞ P (cid:20) | ∨ B n \ B | ≤ C max (cid:0) n − / , b γn (cid:1) max (cid:18) b n λ n , b n (cid:19)(cid:21) = 1 . Remark 3.9 (i) For γ > and the tuning parameter ( λ n ) n ∈ N such that n − / b n λ − n → and b γ +1 n λ − n → , we obtain that max (cid:0) n − / , b γn (cid:1) max (cid:0) b n λ − n , b − n (cid:1) → , as n → ∞ . Theexamples of sequences ( λ n ) and ( b n ) from Remark 3.6 satisfy these conditions.(ii) If < γ ≤ then, max (cid:0) n − / , b γn (cid:1) = b γn . In this case we have, b γn max (cid:0) b n λ − n , b − n (cid:1) ≥ b γ − n and the sequence on the right-hand side of this inequality converges to infinity for γ < and it is bounded for γ = 1 . Thus, in this case, it seems like we should take the value γ = 1 andthe same sequences ( b n ) , ( λ n ) as in Remark 3.6. Comparing now Theorem 3.4 and Theorem 3.8, we can deduce that the adaptive weights (cid:98) ω n,j are responsible for a strong reduce the number of elements in ∨ B n ∩ B , e.i, the false discoveriesof different successive groups. This is also later confirmed by the simulation study performed inSection 4. Least squares loss function
In a standard linear regression model the least squares (LS) objective function is standardly usedunder the following assumptions imposed on the model errors: (A4)
The error terms ( ε i ) (cid:54) i (cid:54) n are i.i.d., such that E [ ε ] = 0 and V ar [ ε ] < ∞ ;We will now focus on the fused and adaptive fused group estimator based on the least squaresobjective function. In this case, instead of (3), an analogous empirical process is considered L n ( β g ) ≡ n (cid:88) i =1 ( Y i − X (cid:62) i β g ) , (10)with the corresponding estimator given as (cid:102) β g ( LS ) ≡ arg min β g ∈ R rn L n ( β g ) , and the penalized process analogous to (5) is U n ( β g ) ≡ L n ( β g ) + nλ n g (cid:88) j =2 (cid:107) β j − β j − (cid:107) q , (11)with the corresponding fused group LS estimator (cid:99) β g ( LS ) ≡ arg min β g ∈ R rn U n ( β g ) . p = 1 ) with the penaltyof the form αν (1) n (cid:80) gj =1 | β j | + (1 − α ) ν (1) n (cid:80) j Under Assumptions (A1), (A2), and (A4), and the sequence ( b n ) n ∈ N as in (2) , itholds that (cid:107) (cid:102) β g ( LS ) − β (cid:107) = O P ( b n ) . Following the lines of the proof of Theorem 3.3 we also obtain the proof of the followingtheorem. Theorem 3.11 Under Assumptions (A1), (A2), and (A4), the condition in (2) , if |B | < ∞ and λ n b − n −→ n →∞ , then (cid:107) (cid:99) β g ( LS ) − β (cid:107) = O P ( b n ) . The estimator of B based on (cid:99) β g ( LS ) = (cid:0)(cid:98) β (cid:62) , ( LS ) , · · · , (cid:98) β (cid:62) g, ( LS ) (cid:1) (cid:62) is given in a straightfor-ward way as (cid:98) B n, ( LS ) = (cid:8) j ∈ { , · · · , g } ; (cid:98) β j, ( LS ) (cid:54) = (cid:98) β j − , ( LS ) (cid:9) , and a similar result to the one in Theorem 3.4 can be derived again. Theorem 3.12 Under the same assumptions as in Theorem 3.11, there exists a positive constant C such that, lim n →∞ P (cid:20) | (cid:98) B n, ( LS ) \ B | ≤ C max (cid:18) b n λ n , b n (cid:19)(cid:21) = 1 . Similarly as for the quantile framework before, one can again improve the estimation accu-racy of B by taking the advantage of (cid:99) β g ( LS ) and defining the adaptive fused penalty with thecorresponding empirical process ∨ U n ( β g ) ≡ L n ( β g ) + nλ n g (cid:88) j =2 (cid:98) ω n,j, ( LS ) (cid:107) β j − β j − (cid:107) q , (12)where the weights (cid:98) ω n,j, ( LS ) are again constructed on the basis of fused group LS estimator as (cid:98) ω n,j, ( LS ) ≡ / max (cid:0) n − / , (cid:80) pk =1 | (cid:98) β j,k, ( LS ) − (cid:98) β j − ,k, ( LS ) | γ (cid:1) , for some fixed γ > and (cid:98) β j,k, ( LS ) being the k th component of (cid:98) β j, ( LS ) . Thus, the adaptive fused group LS estimator is ∨ β g ( aLS ) ≡ arg min β g ∈ R rn ∨ U n ( β g ) , B is defined as ∨ B n, ( aLS ) ≡ (cid:8) j ∈ { , · · · , g } ; ∨ β j, ( aLS ) (cid:54) = ∨ β j − , ( aLS ) (cid:9) . As for the quantile framework, the sequence ( λ n ) n ∈ N , in relations (11) and (12), can be different.Finally, using now the same arguments as in Theorem 3.7 and following the same lines of theproof, we obtain an analogous results also for ∨ β g ( aLS ) . Theorem 3.13 Under Assumptions (A1), (A2), and (A4), the condition in (2) , if |B | < ∞ , thefor any sequence ( λ n ) n ∈ N such that λ n b − n −→ n →∞ , it holds that (cid:107) ∨ β g ( aLS ) − β (cid:107) = O P ( b n ) . The results presented in Subsection 3.1 and Subsection 3.2 show that the estimated numberof different successive groups is of the same order for both estimation frameworks with theadaptive fused approach and, moreover, the convergence rates of the corresponding estimatorsfor the model parameters are also of the same order, all under the assumption that the truenumber of groups is bounded. The finite sample performance is investigated in the next section. In this section we firstly present a Monte Carlo simulation study to show some numerical prop-erties of the proposed fused methods for the varying number of groups, different sample sizesand error distributions. Later, the application on the air quality data is presented. The goal is todetect daily moments when the temperature and humidity contribution change their effect withrespect to the maximum daily benzene concentration. Numerical study The fused group quantile estimator (cid:99) β g defined in terms of the minimization in (5) and the adap-tive fused group quantile estimator ∨ β g in (8) are both compared with the fused group LS estima-tor (cid:99) β g ( LS ) in (11) and its adaptive version ∨ β g ( aLS ) in (12) with respect to a wide range of differentsimulation settings. In order to make the comparison meaningful the quantile level of τ = 0 . is considered. The dimension of the unknown group specific vector of parameters β ∈ R p iseither p = 1 or p = 3 and three options are used for number of groups, g ∈ { , , } .The sample is given as n = pg . The model covariates are randomly generated from the normaldistribution and two distributions are used for the error terms (standard normal and Cauchy).The true number of different successive groups in the model is either , , , or n/ , wherethe last option ( 20 % of the sample size) clearly does not satisfy the model assumptions but itis still included in the simulation setup for the comparison purposes. Obviously, if there aretwo change points in the group parameter then there are three successive groups. Analogously11or changes in the group specific vector parameter—there are 6 successive groups. The cor-responding locations of changes between successive groups are determined randomly and thejump magnitudes are also assigned randomly on the scale from . to to allow for varioussignal-to-noise ratio. The regularization parameter equals λ n = n − (log( n )) / for the fusedmethod and λ n = n − (log( n )) / for the adaptive fused method with the adaptive weightsdefined in (8) and (12) for γ = 1 .All four methods are compared with respect to the quality of the final fit and, mainly, the dif-ferent successive coefficient group detection performance. The median (MED) of ( Y i − (cid:98) Y i ) (cid:54) i (cid:54) n and the L norm of the difference between the true vector of parameters and its estimate areused to evaluate the estimation performance while the true recovery rate (the proportion of trulydetected different successive coefficient groups with respect to all unknown changes) and theoverestimation rate (proportion of the number of detected different successive coefficient groupswith respect to the number of true changes) are used to assess the detection performance. Theresults are summarized in Table 1 (for p = 1 ) and Table 2 (for p = 3 ).For M independent Monte Carlo replications let (cid:98) β ( m ) denote the estimate of β g by one ofthe four estimation methods for the m -th Monte Carlo run, with m = 1 , · · · , M . The corre-sponding forecast for Y i is (cid:98) Y i, ( m ) = X (cid:62) i (cid:98) β ( m ) . For each Monte Carlo replication the medianerror med ( m ) = median ( Y i − (cid:98) Y i, ( m ) ; i = 1 , · · · , n ) is obtained and the reported results are av-eraged over all M simulation runs MED = M − (cid:80) Mm =1 med ( m ) . For the parameter estimationthe value of MAD = M − (cid:80) Mm =1 1 pg (cid:80) pgj =1 | β j − (cid:98) β j, ( m ) | is reported.For some illustration of the model there is an example in Figure 1: the number of truedifferent successive groups is two (out of g = 20 in total) and the number of the explanatoryvariables within each group is three ( p = 3 ). The true vector of the group specific parametersfor the first group (group indexes j ∈ { , . . . , } ) is β = (1 , , (cid:62) and the true vectorof the group specific parameters for the second group (group indexes j ∈ { , . . . , } ) is β = (1 . , , (cid:62) . The sample size is n = gp ( n = 80 ). All four proposed estimation methodsare applied and the corresponding estimates are given in Figure 1(a) for (10) and (11) and Figure1(b) for (3) and (5). 12 Group Index P a r a m e t e r E s t i m a t e s b b b Group no. 1 Group no. 2 True valueFused MethodAdaptive Fused Method (a) Least Squares Error Group Index P a r a m e t e r E s t i m a t e s b b b Group no. 1 Group no. 2 True valueFused MethodAdaptive Fused Method (b) Quantile Check Function Figure 1: An illustration of the model in (1) for two truly different successive groups (out of g = 20 in total) andthree explanatory variables in each group ( p = 3 ). The first group specific vector parameter is the same for the groups j ∈ { , . . . , } and it differs from the second group specific vector parameter, which is the same for the groups j ∈ { . . . , } . The Cauchy error terms are considered to visualize the robust favor of the quantile estimationapproach for τ = 0 . (right panel) when compared with the standard least squares (left panel). p = M o d e l w i t h d i ff ere n t g r o up s M o d e l w i t h d i ff ere n t g r o up s M o d e l w i t h d i ff ere n t g r o up s M o d e l w i t h % d i ff ere n t g r o up s g = n M E D M AD R ec ov e r y M E D M AD R ec ov e r y M E D M AD R ec ov e r y M E D M AD R ec ov e r y N . . . / . . . . / . . . . / . . . . / . (cid:99) β g ( L S ) . . . / . - . . . / . . . . / . . . . / . . . . / . . . / . . . . / . . . . / . - . . . / . . . . / . . . . / . - . . . / . ∨ β g ( a L S ) . . . / . - . . . / . . . . / . - . . . / . . . . / . . . . / . - . . . / . . . . / . . . . / . - . . . / . . . . / . - . . . / . (cid:99) β g . . . / . - . . . / . . . . / . . . . / . . . . / . . . / . . . . / . . . . / . - . . . / . - . . . / . . . . / . - . . . / . ∨ β g - . . . / . - . . . / . . . . / . - . . . / . . . . / . . . . / . - . . . / . . . . / . C . . . / . - . . . / . . . . / . . . . / . (cid:99) β g ( L S ) . . / . . . . / . . . . / . . . . / . . . . / . . . / . - . . . / . . . . / . . . . / . . . . / . . . . / . . . . / . ∨ β g ( a L S ) - . . / . - . . . / . - . . . / . - . . / . - . . . / - . . . / . - . . . / . - . . . / . . . . / . - . . . / . . . . / . - . . . / . (cid:99) β g . . . / . - . . . / . . . . / . . . . / . . . . / . . . / . - . . . / . . . . / . - . . . / . - . . . / . . . . / . - . . . / . ∨ β g - . . . / . - . . . / . . . . / . - . . . / . - . . . / . . . . / . - . . . / . . . . / . T a b l e : S i m u l a ti on r e s u lt s f o r t h e s it u a ti on w h e r e t h e d i m e n s i ono f t h e unkno w np a r a m e t e r i s p = . T w ogoodn e ss - o f- fi t qu a n titi e s a r e p r ov i d e d :t h e m e d i a n ( M E D ) o f ( Y i − (cid:98) Y i ) (cid:54) i (cid:54) n a nd t h e m ea n a b s o l u t e d i ff e r e n ce ( M AD ) b e t w ee n t h e t r u e p a r a m e t e r v ec t o r β a nd t h ec o rr e s pond i ng e m p i r i ca l e s ti m a t e (cid:98) β . T h e R ec o ve r y c o l u m n i s g i v e n i n t e r m s o f t w ov a l u e s :t h e p r opo r ti ono f t r u l yd i s c ov e r e dd i ff e r e n t s u cce ss i v ec o e f fi c i e n t s ( v a l u e s t a nd s f o r a llt r u ec h a ng e s b e i ngd i s c ov e r e d ) a nd t h e p r opo r ti onb e t w ee n t h e nu m b e r o f e s ti m a t e dd i ff e r e n t s u cce ss i v ec o e f fi c i e n t s a nd t r u ec h a ng e s ( v a l u e s t a nd s f o r t h e s it u a ti on w h e r e t h e nu m b e r o f e s ti m a t e d c h a ng e s e qu a l s t h e nu m b e r o f t r u ec h a ng e s ) . A n i d ea l s it u a ti on i s . / . w h i c h m ea n s t h a t a llt r u ec h a ng e s a r e d i s c ov e r e d w it hnoo t h e r d e t ec ti on s i n a dd iti on . T h e r e s u lt s a r ea v e r a g e dov e r M on t e C a r l o s i m u l a ti on r un s . p = M o d e l w i t h d i ff ere n t g r o up s M o d e l w i t h d i ff ere n t g r o up s M o d e l w i t h d i ff ere n t g r o up s M o d e l w i t h % d i ff ere n t g r o up s g = n / p M E D M AD R ec ov e r y M E D M AD R ec ov e r y M E D M AD R ec ov e r y M E D M AD R ec ov e r y N - . . . / . . . . / . . . . / . - . . . / . (cid:99) β g ( L S ) . . . / . . . . / . . . . / . . . . / . . . . / . . . . / . - . . . / . . . . / . - . . . / . . . . / . - . . . / . - . . . / . ∨ β g ( a L S ) . . . / . . . . / . . . . / . . . . / . . . . / . . . . / . . . . / . . . . / . - . . . / . . . . / . . . . / . - . . . / . (cid:99) β g - . . . / . - . . . / . - . . . / . . . . / . - . . . / . . . . / . - . . . / . - . . . / . . . . / . . . . / . - . . . / . - . . . / . ∨ β g . . . / . - . . . / . - . . . / . - . . . / . . . . / . - . . . / . . . . / . - . . . / . C . . . / . . . . / . . . . / . . . . / . (cid:99) β g ( L S ) - . . . / . . . . / . - . . . / . - . . . / . - . . . / - . . . / . - . . . / . - . . . / . . . . / . . . . / . . . . / . . . . / . ∨ β g ( a L S ) - . . . / . - . . . / . - . . . / . - . . . / . - . . . / . - . . . / . - . . . / . - . . . / . - . . . / . . . . / . . . . / . . . . / . (cid:99) β g - . . . / . - . . . / . - . . . / . - . . . / . - . . . / - . . . / . - . . . / . - . . . / . - . . . / . . . . / . - . . . / . . . . / . ∨ β g - . . . / . - . . . / . - . . . / . - . . . / . - . . . / . - . . . / . - . . . / . - . . . / . T a b l e : S i m u l a ti on r e s u lt s f o r t h e s it u a ti on w h e r e t h e d i m e n s i ono f t h e unkno w np a r a m e t e r i s p = . T w ogoodn e ss - o f- fi t qu a n titi e s a r e p r ov i d e d :t h e m e d i a n ( M E D ) o f ( Y i − (cid:98) Y i ) (cid:54) i (cid:54) n a nd t h e m ea n a b s o l u t e d i ff e r e n ce ( M AD ) b e t w ee n t h e t r u e p a r a m e t e r v ec t o r β a nd t h ec o rr e s pond i ng e m p i r i ca l e s ti m a t e (cid:98) β . T h e R ec o ve r y c o l u m n i s g i v e n i n t e r m s o f t w ov a l u e s :t h e p r opo r ti ono f t r u l yd i s c ov e r e dd i ff e r e n t s u cce ss i v ec o e f fi c i e n t g r oup s ( v a l u e s t a nd s f o r a llt r u ec h a ng e s b e i ngd i s c ov e r e d ) a nd t h e p r opo r ti onb e t w ee n t h e nu m b e r o f e s ti m a t e dd i ff e r e n t s u cce ss i v ec o e f fi c i e n t g r oup s a nd t r u ec h a ng e s ( v a l u e s t a nd s f o r t h e s it u a ti on w h e r e t h e nu m b e r o f e s ti m a t e d c h a ng e s e qu a l s t h e nu m b e r o f t r u ec h a ng e s ) . A n i d ea l s it u a ti on i s . / . w h i c h m ea n s t h a t a llt r u ec h a ng e s a r e d i s c ov e r e d w it hnoo t h e r d e t ec ti on s i n a dd iti on . T h e r e s u lt s a r ea v e r a g e dov e r M on t e C a r l o s i m u l a ti on r un s . |B | ∈ { , , } , thefused estimations for the least squares and the quantile methods have the same properties andthe same also applies for the adaptive frameworks which, moreover, have the recovery detectionrates close to one. On the other hand, if the assumption |B | < ∞ does not hold, as for themodels with different successive groups, then not all of the different successive groups aredetected and the performance is worse.The robustness of the quantile methods is obvious when the Cauchy errors are used instead:while the LS based methods fail in both, the estimation and the group detection, the quantileapproaches perform comparably well as in the situations with the Gaussian errors. Application to Air Quality data In order to demonstrate the practical applicability of the proposed model we use the air qualitydata from De Vito et al. (2009) which can be downloaded from the Machine Learning Reposi-tory site http://archive.ics.uci.edu/ml/datasets/Air+Quality g = 24 hourly groupsand for each group there is the corresponding vector parameter β j = ( β Tj , β Hj ) (cid:62) ∈ R , for j = 1 , . . . , , where β Tj is responsible for the contribution of the temperature at ’ j ’ o’clockand β Hj models the effect of the humidity, again at ’ j ’ o’clock. Using the model formulationfrom Section 2 and the estimation in terms of (5) it can be achieved that most of the corre-sponding parameter vector estimates are the same. If otherwise, then the existing changes in thevector estimates identify some specific daily segments with the same temperature and humid-ity contribution with respect to the maximum daily benzene concentration. The correspondingmagnitudes for both effects in each daily segment are all estimated simultaneously.Similarly as in the simulation section, four different models are fitted: the fused group LSapproach and its adaptive version both presented in Figures 3(a) and 3(a) and the proposed fusedgroup quantile and the adaptive fused group quantile in Figures 3(c) and 3(d). The temperaturedata and the humidity data are heavily skewed and, therefore, it can be assumed that robustapproaches are more appropriate for this situation.Indeed, while the fused group LS and also its adaptive version can not identify any specificdaily moments which should be used to determine the maximum daily benzene concentration,the fused group quantile and its adaptive version in particular clearly identify some segmentsduring the day when the contribution of the temperature and humidity is obvious.16 ime [hours] F r equen cy o f M a x i m u m B en z en C on c en t r a t i on (a) Daily temperature Time [hours] F r equen cy o f M a x i m u m B en z en C on c en t r a t i on (b) Daily humidity Figure 2: Daily temperature profiles (left panel) and daily humidity profiles (right panel) for 50 randomly selecteddays out of 357 available days with full profiles in total. In addition, the maximum benzene concentration is recordedfor each day and the corresponding time of the maximum occurrence (in hours) is given in terms of the frequencyhistograms in both panels. − − Time [hours] P a r a m e t e r E s t i m a t e s Daily TemperatureDaily Humidity (a) Fused group LS solution − − Time [hours] P a r a m e t e r E s t i m a t e s Daily TemperatureDaily Humidity (b) Adaptive fused group LS solution − . . . . Time [hours] P a r a m e t e r E s t i m a t e s Daily TemperatureDaily Humidity (c) Fused group quantile solution − . . . . Time [hours] P a r a m e t e r E s t i m a t e s Daily TemperatureDaily Humidity (d) Adaptive fused group quantile solution Figure 3: The estimated parameter vectors (cid:98) β j = ( (cid:98) β Tj , (cid:98) β Hj ) (cid:62) ∈ R , for j = 1 , . . . , , for four different estimationtechniques: fused group LS, adaptive fused group LS, fused group quantile and adaptive fused group quantile. Theadaptive fused group quantile estimator in panel (d) clearly identifies some instant moments during a day when thetemperature and humidity information is relevant for the maximum benzene concentration. In other words, it seemsenough to record the temperature and humidity information at 2 pm and, also, after 6 pm. Proofs Throughout the proofs, the following identity for the quantile check function ρ τ is be used: forany x, y ∈ R it holds that ρ τ ( x − y ) − ρ τ ( x ) = y (11 x< − τ ) + (cid:90) τ (11 x ≤ v − u x ≤ ) dv. (13) Proof of Lemma 3.1 .We will show that for all (cid:15) > , there exists a constant C (cid:15) > , such that for n large enough, wehave P (cid:20) inf u ∈ R rn , (cid:107) u (cid:107) =1 G n (cid:0) β + C (cid:15) b n u (cid:1) > G n ( β ) (cid:21) ≥ − (cid:15). (14)Then, for any constant c > , we can write the difference G n (cid:0) β + c b n u (cid:1) − G n ( β ) usingthe form G n (cid:0) β + c b n u (cid:1) − G n ( β ) = E (cid:2) G n (cid:0) β + c b n u (cid:1) − G n ( β ) (cid:3) + W (cid:62) n u + n (cid:88) i =1 ( R i − E [ R i ]) , (15)with the r n -dimensional random vector W n ≡ c b n (cid:80) ni =1 D i X i , the random variables D i ≡ (1 − τ )11 { ε i < } − τ { ε i ≥ } , and R i ≡ ρ τ ( ε i − c b n X (cid:62) i u ) − ρ τ ( ε i ) − c b n D i X (cid:62) i u .Using the Holder’s inequality, we have that | X (cid:48) i u | ≤ (cid:107) X i (cid:107) ∞ (cid:107) u (cid:107) . Then, for all u ∈ R r n such that (cid:107) u (cid:107) = 1 , by Assumption (A1), we have that | X (cid:62) i u | ≤ C .Firstly, we study the first term on the right-hand side of relation (15). Using the identity in(13), we obtain G n (cid:0) β + c b n u (cid:1) − G n ( β ) = − c b n n (cid:88) i =1 X (cid:62) i u D i + n (cid:88) i =1 (cid:90) c b n X (cid:62) i u [11 { ε i References Campbell, F., Allen, G.(2017). Within group variable selection through the Exclusive Lasso. Electronic Journal of Statistics , 57(1), 4220–4257.Ciuperca, G.(2017). Adaptive Fused LASSO in Grouped Quantile Regression. Journal of Sta-tistical Theory and Practice , 11(1), 107–125.Ciuperca, G.(2019). Adaptive group LASSO selection in quantile models. Statistical Papers ,60(1), 173–197.Ciuperca, G. and Maciak, M.(2019). Change-point detection in a linear model by adaptive fusedquantile method. arxiv:1901.09607 De Vito, S., Piga, M., Martinotto, L., and Francia, G.(2009). CO, NO2 and NOx urban pollu-tion monitoring with on-field calibrated electronic nose by automatic bayesian regularization. Sensors and Actuators B: Chemical , 143(1), 182–191.Guo, X., Zhang, H., Wang, Y., and Wu, J.L.(2015). Model selection and estimation in highdimensional regression models with group SCAD. Statistics & Probability Letters , 103(1),86–92. 24e, Q., Kong, L., Wang, Y., Wang, S., Chan, T.A., and Holland, E.(2016). Regularized quantileregression under heterogeneous sparsity with application to quantitative genetic traits, Com-putational Statistics and Data Analysis , 95(1), 222–239.Jang, W., Lim, J., Lazar, N.A., Loh, J.M., and Yu, D.(2015). Some properties of generalizedfused lasso and its applications to high dimensional data. Journal of the Korean StatisticalSociety , 44(3), 352–365.Jiang, L., Wang, H.J., and Bondell, H.D.(2013). Interquantile shrinkage in regression models. Journal of Computational and Graphical Statistics , 22(1), 970–986.Jiang, L., Bondell, H.D., and Wang, H.J.(2014). Interquantile shrinkage and variable selectionin quantile regression. Computational Statistics and Data Analysis , 69(1), 208–219.Leonardi, F. and Buhlmann, P.(2016). Computationally efficient change point detection forhigh-dimensional regression. arXiv:1601.03704 .Li, X., Mo, L., Yuan, X., and Zhang, J.(2014). Linearized alternating direction method ofmultipliers for sparse group and fused LASSO models. Computational Statistics and DataAnalysis , 79(1), 203–221.Liu, Y., Tao, J., Zhang, H., Xiu, X., and Kong, L.(2018). Fused LASSO penalized least absolutedeviation estimator for high dimensional linear regression, Numerical Algebra, Control andOptimization , 8(1), 97–117.Qian, J. and Su, L. (2016). Shrinkage estimation of regression models with multiple structuralchanges. Econometric Theory , 32(6), 376–1433.Wang, M. and Tian, G.L.(2019). Variable selection in quantile regression. Statistical Papers , inpress, http://dx.doi.org/10.1007/s00362-017-0882-z.Wei, F. and Huang, J.(2010). Consistent group selection in high-dimensional linear model. Bernoulli , 16(4), 1369–1384.Wu, Y. and Liu, Y.(2009). Variable selection in quantile regression. Statistica Sinica , 19(1),801–817.Zhang, B. and Geng, J.(2015). Multiple change-points estimation in linear regression models viasparse group lasso. IEEE Transactions on Signal Processing , 63(9), 2209–2224.Zhang, C. and Xiang, Y.(2016). On the oracle property of adaptive group lasso in high-dimensional linear models. Statistical Papers , 57(1), 249–265.Zhou, J., Liu, J., Narayan, V.A., and Ye, J.(2012). Modeling Disease Progression via FusedSparse Group Lasso.