Segmentation of multiple series using a Lasso strategy
Karine Bertin, Xavier Collilieux, Emilie Lebarbier, Cristian Meza
aa r X i v : . [ s t a t . M E ] J un Segmentation of multiple series using a Lassostrategy
Karine Bertin ∗ , Xavier Collilieux † , Emilie Lebarbier ‡ and Cristian Meza § Abstract
We propose a new semi-parametric approach to the joint segmen-tation of multiple series corrupted by a functional part. This problemappears in particular in geodesy where GPS permanent station coordi-nate series are affected by undocumented artificial abrupt changes andadditionally show prominent periodic variations. Detecting and esti-mating them are crucial, since those series are used to determine av-eraged reference coordinates in geosciences and to infer small tectonicmotions induced by climate change. We propose an iterative procedurebased on Dynamic Programming for the segmentation part and Lassoestimators for the functional part. Our Lasso procedure, based on thedictionary approach, allows us to both estimate smooth functions andfunctions with local irregularity, which permits more flexibility thanprevious proposed methods. This yields to a better estimation of thebias part and improvements in the segmentation. The performance ofour method is assessed using simulated and real data. In particular,we apply our method to data from four GPS stations in Yarragadee,Australia. Our estimation procedure results to be a reliable tool toassess series in terms of change detection and periodic variations esti-mation giving an interpretable estimation of the functional part of themodel in terms of known functions.
The objective of segmentation methods is to detect abrupt changes (calledbreakpoints) in a signal. Such segmentation problems arise in many areas:in biology for the detection of chromosomal aberrations (Picard et al., 2005;Lai et al., 2005), in meteorology and climate (Caussinus and Mestre, 2004)to homogenize temperature and precipitation series or in geodesy for thedetection of changes in GPS location series (Williams, 2003). In the latter ∗ CIMFAV-Facultad de Ingeniería, Universidad de Valparaíso, Valparaíso, Chile † IGN LAREG, Université Paris Diderot Sorbonne Paris Cité, Paris, France ; Observa-toire de Paris, SYRTE, CNRS, UPMC, Paris, France ‡ AgroParisTech UMR518, Paris 5e and INRA UMR518, Paris 5e, FRANCE § CIMFAV-Facultad de Ingeniería, Universidad de Valparaíso, Valparaíso, Chile
We observe M series. We note y m ( t ) the observed signal of the series m attime t and we suppose that it satisfies for m ∈ { , . . . , M } y m ( t ) = µ m ( t ) + f ( x m ( t )) + e m ( t ) , (1)where µ m ( t ) = µ mk if t ∈ I mk = ( τ mk − , τ mk ] , x m represents possible covariates(the simple one is the time t ), f is an unknown function to be estimated, τ mk is the k th breakpoint of the series m , µ mk is the mean of the series m onthe segment I mk and the e m ( t ) are i.i.d centered Gaussian with variance σ .We note K m the number of segments of the m th series and K = P Mm =1 K m the total number of segments. Note that the segmentation is specific to eachseries.For m ∈ { , . . . , M } , the series m has n m observations in the times t mi , i ∈ { , . . . , n m } , so the total number of observations is N = P Mm =1 n m and4he model is y mi = µ mi + f ( x mi ) + e mi , (2) ∀ i ∈ { , . . . , n m } , m ∈ { , . . . , M } , where y mi = y m ( t mi ) , x mi = x m ( t mi ) , e mi = e m ( t mi ) and µ mi = µ m ( t mi ) .We define the vectors y m := ( y mi ) i , x m := ( x mi ) i , e m := ( e mi ) i and µ m :=( µ mk ) k .The parameters of the model are the means µ mk , the breakpoints τ mk , thefunction f , the variance σ and the number of segments K . As usual in the segmentation estimation framework, the parameters are es-timated for a fixed number of segments K for which we propose here aDP-Lasso procedure, then K is choosen using a model selection strategy. Following Bai and Perron (2003), we propose an iterative procedure thatalternates the segmentation part with the estimation of f . The function f corresponds to a bias part common to each series and our objective isto estimate it non-parametrically using a Lasso-type method based on adictionary approach. More specifically, we consider a collection of functions φ = { φ , . . . , φ J } and we propose to estimate f by a linear combination ofthe functions φ j , f λ = J X j =1 λ j φ j , λ ∈ R J . In order to write our estimation algorithm in a matricial form, we con-catenate the means vectors µ m in a vector µ of size K × . We denoteby T ([ N × K ]) the incidence matrix of breakpoints T = Bloc [ T m ] with T m = Bloc (cid:2) n mk (cid:3) of size ([ n m × K m ]), and with n mk = τ mk − τ mk − the lengthof k − th segment for series m . T µ corresponds to the segmentation part.Moreover, we concatenate the vectors y m and x m in the ([ N × ]) vectors Y and X . We denote by F the [ N × J ] matrix F = ( f i,j ) where f i,j = φ j ( X i ) .We denote by T µ ( h ) the segmentation estimated parameters, ( σ ( h ) ) theestimated variance, λ ( h ) the estimated coefficients of the function f , and f ( h ) = f λ ( h ) the estimated function f at iteration ( h ) . At iteration ( h + 1) ,we get: • given λ ( h ) , the segmentation parameters T µ ( h +1) are estimated by: T µ ( h +1) = argmin T µ k Y − T µ − F λ ( h ) k , k · k stands for the L norm in R N . The problem is then reducedto segment Y − F λ ( h ) into K segments. In the case of joint segmen-tation, Picard et al. (2011) proposed a double-stage of DP which usedthe multiple structure and allows us to obtained the best segmentationof all the series into K segments in a more reasonable computationaltime compared to the classical DP. • given T µ ( h +1) and σ ( h ) , the function f is estimated using a Lasso-typestrategy: f ( h +1) = f λ ( h +1) where λ ( h +1) minimizes k Y − T µ ( h +1) − F λ k + 2 J X j =1 r N,j | λ j | , where following Arribas-Gil et al. (2014), r N,j = σ ( h ) k φ j k N √ γ log J with γ > and k φ j k N = qP Nl =1 φ j ( X l ) . • given T µ ( h +1) and f ( h +1) , the variance σ is estimated by ( σ ( h +1) ) = 1 N k Y − T µ ( h +1) − F λ ( h +1) k . The algorithm stops when the difference between parameters of two succes-sive iterations is smaller than ǫ ( − in practice).The final estimators are denoted ˆ τ mk , ˆ µ mk , d T µ , ˆ σ , ˆ λ and ˆ f = f ˆ λ . Remark 1.
From a theoretical point of view, the condition γ > ensuresthat the resulting estimator of f has good properties (oracle performance, seeArribas-Gil et al., 2014). However, in cases in which the Lasso estimationis performed within an iterative procedure involving the estimation of otherparameters than f , the value of γ may also influence the stability of the wholeiterative procedure. Then, γ should be chosen as close as possible to whileallowing for the stability of the iterative algorithm. The last issue is the choice of the number of segments K . We propose hereto use the modified BIC criterion proposed by Zhang and Siegmund (2007)6nd successfully adapted to the joint segmentation by Picard et al. (2011): mBIC JointSeg ( K ) = log (cid:20) Γ (cid:18) N − K + 12 (cid:19)(cid:21) − (cid:18) N − K + 12 (cid:19) log SS wg + (cid:20) − ( K − M ) (cid:21) log ( N ) − M X m =1 k m X k =1 log (ˆ τ mk − ˆ τ mk − ) , where SS wg = k Y − d T µ − F ˆ λ k . In order to assess the performance of our procedure, so-called here
Lasso ,in Section 4.1, we conduct the simulation study described below. We alsopropose to compare our method to the work of Picard et al. (2011), whereeither the function f is estimated using splines or f is viewed as a fixed effectdepending on the time t , i.e. f ( t ) = β t . We call these two approaches Spline and
Position respectively and we perform them on the simulated data usingthe cghseg
R package, in particular using the multiseg
R function. For ourprocedure, we develop our own functions in R using the lars
R package toperform the Lasso estimation of f . In addition in Section 4.2, we illustrateon a climatic data set the need to model correctly the function f in order toavoid false detection in the segmentation. Simulation design.
We consider the model (1) for series m ∈ { , . . . , M } at time t : y m ( t ) = µ m ( t ) + f ( t ) + e m ( t ) , t = 1 , . . . , n (3)where e m ( t ) ∼ N (0 , σ ) i.i.d. The length n of the series is fixed and equalto . We consider two different numbers of series: M ∈ { , } , andfive values for error variance: σ ∈ { . , . , . , . , . } . For each series,the number of segments K follows a Poisson distribution with mean ¯ K =3 and their positions are uniformly distributed. The mean value withineach segment alternates between and a value in {− , − , +1 , +2 } withprobability { . , . , . , . } respectively. The function f is generated as amixture of a sine function with three peaks (see Figure 1): f ( t ) = 0 . × sin (cid:18) π t (cid:19) + 0 . t =0 . × n (4) − t =0 . × n + 2 t =0 . × n . f .Each configuration, i.e. specific values of M and σ , is simulated 100times.For the Lasso strategy, we use a dictionary with functions: Haarfunctions ( t / [0 , (cid:16) t − k (cid:17) , k = 0 , . . . , − , the Fourier functions ( t sin (cid:16) πjt (cid:17) , t cos (cid:16) πjt (cid:17) , j = 1 , . . . , and the functions t t and t t . The Lasso estimator is obtained by LARS algorithm with γ = 2 . . Quality criteria.
To study the quality of the estimation, for each config-uration, we consider several criteria: • For the segmentation parameters, in order to study the global qualityof the estimation, we consider the root-mean-square distance betweenthe true mean and its estimate:RMSE ( µ ) = h N P Mm =1 P nt =1 { µ m ( t ) − ˆ µ m ( t ) } i / where N = M × n .Moreover, to study the performance of the estimation of the breakpointpositioning, we consider both the proportion of erroneously detectedbreakpoints among detected breakpoints (false discovery rate, FDR)and the proportion of undetected true breakpoints among true break-points (false negative rate, FNR).8 For the function f , the root-mean-square distance between f and itsestimate:RMSE ( f ) = (cid:20) n P nt =1 n f ( t ) − ˆ f ( t ) o (cid:21) / is also considered.For each configuration, we consider the average of these criteria over the simulations. Comparison between
Lasso , Spline and
Position . Only the resultswith M = 10 are presented since the results for M = 50 leads to same con-clusions.Figure 2 presents the RMSE ( f ) for the different methods with respectto σ . We observe that the larger is the noise, the worst is the estimationof f due to the confusion between the signal and the noise. Whatever thelevel of noise, Lasso outperforms
Position and
Spline in terms of the non-parametric part estimation. However, the behavior of
Position and
Spline is opposite with respect to σ . For small σ , Spline leads to bad performancescompared to
Lasso and
Position . Indeed, as expected,
Spline tends to cap-ture the smooth part of the signal, i.e. the sinusoidal trend only, whereasthe two others catch both the peaks and the trend. However, for large σ , itis more difficult to detect the peaks of the true function, resulting in closestresults for Lasso and
Spline . Position behaves worstly since, as mentioned inPicard et al. (2011), it tends to catch the trend but also the noise resulting inan erractic estimation of f . The bad estimation of f can have consequenceson the segmentation estimation. Figure 3 summarizes the results for thesegmentation estimation obtained with the different methods with respectto σ . In general, Lasso is sligthly better than the two other methods. For σ larger than . , the results are similar, even for Position for which f is notwell estimated. However for small values of σ , since Spline does not detectthe peaks, they are considered as breakpoints in the segmentation, leadingto bad results: more segments are then detected (see ˆ K − K ), these falsebreakpoints then increase the FDR and so the RMSE ( µ ) .As a conclusion, Position and
Lasso behave similarly for the estimationof the segmentation part. The main difference concerns the estimation of f which is less reliable for Position . An important advantage of
Lasso is itsflexibility in the sense that functions of different regularities can be includedin the dictionary and in particular some functions chosen according to theknowlegde of the expert. The final form of the estimator ˆ f is a sparse linearcombination of the dictionary functions that allows a possible interpretationof f compared to Position (see Section 5).9 iscussion on the quality of the estimation with
Lasso . We firstcompare the results obtained with the true and estimated number of seg-ments. In Figure 2, we observe that the more difficult is the detection (more σ increases), more the number of segments is under-estimated. This resultwas expected and is now classical in the study of model selection for segmen-tation. Indeed, the number of segments is reduced in order to avoid falsedetection. This is illustrated by a less increase of the FDR obtained with theestimated number of segments compared to the true one (Figure 3). Thatleads to a better estimation in terms of segmentation (small RMSE ( µ ) ) andconsequently to a better estimation of the function f (small RMSE ( f ) ). Segmentation and the estimation of f as a function of the numberof series. Table 1 summarizes the relative differences for two criteria, theFDR and the root-mean-square of f between M = 10 and M = 50 , forseveral values of σ as: F DR σ = F DR σ − F DR σ F DR σ ,RM SE ( f ) σ = RM SE ( f ) σ − RM SE ( f ) σ RM SE ( f ) σ , where, for example, F DR σ and RM SE ( f ) σ denote respectively the FDRand the root-mean-square of f for M = 10 series for a specific value of σ .Table 2 shows the percentage of the true functions of the simulated function f selected in the estimator ˆ f against different values of σ , with M = 10 and M = 50 series. The ID function corresponds to the position of thetrue functions in the dictionary with size . Specifically, the first threefunctions (labels , and ) are Haar functions centered in , and and the function is the function x sin (cid:0) π t (cid:1) . In addition, aFDR criterion is calculated, corresponding to the number of false selectedfunctions among the selected ones. As expected, the increase of the numberof series improves the estimation of f (large RM SE ( f ) σ ). For small valuesof σ , the Lasso procedure leads to a good performance in terms of selectedfunctions whatever the number of series: the number of selected functions isclose to the true one, and among them all the true functions of the simulatedfunction are retrieved with less false selection (small FDR function). Thatleads logically to an accurate estimation of f (small RM SE ( f ) Figure 2).For noisy configurations (large σ ), fewer functions are selected, which wasexpected. Indeed, in this case, there are more confusion between noise andsignal, the small peaks (in particular ID 13 and 64) are more difficult todetect. This is particularly true for a small number of series. Remark thatfor M = 50 , the ID 77 and 137 are always selected.Moreover, the better accuracy of the estimation of f observed for M = 50 M = 10 and M = 50 series for F DR and
RM SE ( f ) criteria.Relative differences σ F DR σ RM SE ( f ) σ ID function FDR Mean σ
13 64 77 137 function length0.1 100 100 100 100 0.052 4.270.2 100 100 100 100 0.055 4.29M=10 0.5 26 99 100 100 0.064 3.531.0 5 28 99 99 0.114 2.131.5 0 12 73 76 0.137 1.90.1 100 100 100 100 0.059 4.310.2 100 100 100 100 0.059 4.31M=50 0.5 100 100 100 100 0.068 4.361.0 53 100 100 100 0.084 3.951.5 18 92 100 100 0.108 3.6leads to a better positioning of the breakpoints (see
F DR σ ). This is lessmarked when σ is large. In this section, we want to illustrate the need to model correctly the func-tion f in order to avoid false detection in the segmentation. To this end,we compare our procedure to the results obtained by Picard et al. (2011) intheir study on harvest dates. In this application, the purpose is to detectchanges in the agricultural practices by detecting changes in the grape har-vest dates which are not due to the climatic effect. The data are harvestdates obtained at 10 French stations. The model they proposed is a mixedlinear model containing a segmentation part, a random effect and a climaticeffect modelled by a degree 2 polynomial according to the temperature. Tocompare with our proposed strategy, we avoid the random effect. The model11igure 2: RMSE of f with respect to σ for Lasso △ , Position + , Spline × and Lasso with the true number of segments ◦ for M = 10 .is then written as follows: y m ( t ) = µ mk + f ( x m ( t )) + e m ( t ) , if t ∈ I mk , where y m ( t ) is the grape harvest date and x m ( t ) is the mean temperature ofthe year t for series m . In case (1) , the form of the climatic effect f is fixedto be f ( x m ( t )) = bx mt + cx mt . In case (2) , no assumptions are made on thefunction f and it is estimated using our proposed procedure, for which weconsider a dictionary with 36 functions compound with high resolution levelHaar wavelets, Fourier basis, x , x and x . In the resulting estimator of f obtained with γ = 2 . , five functions are selected. Figure 4 represents thenumber of detected breakpoints per year over all the series for the two mod-els. The result obtained in case (1) is slightly different from the one obtainedin Picard et al. (2011). However, the most important difference compared tothe result obtained by our proposed procedure concerns the year 2003 whichcorresponds to a very hot summer: that year is considered as a breakpoint incase (1) and not in case (2) . This breakpoint appears in the series 6. Figure 5represents respectively the harvest dates of the series 6 and its segmentationafter correction in case (1) (segmentation of y t − ˆ bx t − ˆ cx t ). The temperatureat year 2003 is . . As shown in Figure 6, the correction of the harvestdate at this year by ˆ bx mt + ˆ cx mt is too strong compared to ˆ f ( x mt ) obtainedin case (2) that is why a false breakpoint is added (see Figure 5 bottom).12igure 3: Results for M = 10 with respect to σ . Top: RMSE of µ on theleft, ˆ K − K on the right. Bottom: FDR on the left and FNR on the right. Lasso △ , Position + , Spline × and Lasso with the true number of segments ◦ . 13igure 4: Number of the detected breakpoints over all the stations obtainedin case (1) on the top and case (2) on the bottom.14igure 5: Top: harvest dates of the series 6. Bottom: obtained segmentationof the corrected series in case (1) (on y mt − ˆ bx mt − ˆ cx mt ).15igure 6: Fit of f in case (1) on the top and case (2) on the bottom.16 Application
In this Section, we summarize the results obtained with our estimation pro-cedure for the GPS dataset described in Introduction. In particular, weuse the height coordinate series of four GPS stations in Australia located inYarragadee (YAR1, YAR2, YAR3 and YARR). Those were computed by theJet Propulsion Laboratory (JPL). They can be downloaded at ftp://sideshow.jpl.nasa.gov/pub/JPL_GPS_Timeseries/repro2011b/post/point/ . Weuse the series from their first observations to the 22nd of June 2013 - seriesprovided online are updated everyday. Then the model (1) is considered with M = 4 and n = 2862 , n = 5209 , n = 1443 and n = 2197 , the respectivelengths of the series. Here they have been averaged at weekly scale. For allthese series, the ground motion is assumed to be identically observed andis described with function f ( t ) . Thus, equipment changes or malfunction atindividual station should show up in the segmentation. For those series, JPLdetected changes using a procedure based on sequential F-test applied to thetridimensional coordinate series (M. Heflin, personal communication, 2014).We apply our proposed procedure to these series with a dictionary with functions, which are only Fourier functions: t sin (2 πw i t ) , t cos (2 πw i t ) where w i = i/T , T = max ( t ) − min ( t ) and T /i is larger than 8 weeks sincesmaller period amplitudes are generally negligible (see in Ray et al., 2008).Figure 7 shows the results for the four series: the obtained breakpoints insolid vertical lines, the known equiment changes in dashed vertical line andthe estimated function f in solid line.A total of 50 periods (62 bases) has been selected, among them the ones closeto the well-known frequencies mentioned above (annual and semi-annual)and submultiples of the draconitic periods. 12 long periods - larger than1 year - reflect well-known GPS low-frequency noise as already noticed byAmiri-Simkooei et al., 2007.Heigth breakpoints are detected. Four (GPS week and of theseries YAR2 and and of the series YAR3) correspond exactly toreceiver and antenna changes. The changes at time of the series YAR2is likely to be related to the equipment change at time . In the sameseries, a change at time is detected. This change is not known fromdatabases, however, it is also proposed by JPL. Compared to the JPL of-ficial list of changes, we found three additional changes for YAR2 at GPSweek and the two validated changes at and . Our two otheradditional changes at time of the series YAR1 and at time of theseries YARR are not reported by JPL. Up to now, no explanation has beensupplied for those.As a conclusion, our method found the same known breakpoints as JPL of-ficial list, but includes new validated one. Moreover the bases functionselected in the Lasso procedure furnish relevant geodetic information.17igure 7: Results for height coordinate series of four GPS stations (YAR1,YAR2, YAR3 and YARR): obtained breakpoints in solid vertical lines; knownequiment changes in dashed vertical line; estimated function f in solid line.18 Conclusion
The proposed semi-parametric approach for the segmentation of single ormultiple series has been shown to provide a valuable and reliable tool toassess changes and functional variations in series, as illustrated with ourGPS height series. The search for functions that model ground motionsand periodic errors here was crucial to provide the right segmentation ofthe series and reliable estimates of the breakpoint amplitudes. Conversely,because the segmentation is simultaneous and the number and location ofthe breakpoints unknown, estimated functions are also more reliable. Theycan be used to better interpret ground deformation observations or to en-hance the piece-wise linear coordinate model of the Terrestrial ReferenceFrame (Petit and Luzum, 2010; Altamimi and Dermanis, 2012), widely usedfor geosciences and mapping applications. This would provide a significantimprovement for the users since such coordinates are aimed to be extrapo-lated in the future (up to 5 years). Because the method is totally flexibleand allows for a large number of functions to be included in the dictionary,it could also be applied to GPS series from active tectonic areas where theground motion signal is more complex and should be modeled with additionalfunctions.
Acknowledgements
Karine Bertin is supported by the grant ANILLO ACT–1112, CONICYT-PIA, Chile and FONDECYT project 1141258. Emilie Lebarbier is supportedby the grant CONICYT 870100003 atracción de capital humano avanzadodel extranjero. Cristian Meza is supported by the grant ANILLO ACT–1112,CONICYT-PIA, Chile and FONDECYT project 1141256.
References
Altamimi, Z., X. Collilieux, J. Legrand, B. Garayt, and C. Boucher (2007).Itrf2005: A new release of the international terrestrial reference framebased on time series of station positions and earth orientation parameters.
Journal of Geophysical Research 112 (B09401).Altamimi, Z. and A. Dermanis (2012). The choice of reference system in itrfformulation. In N. Sneeuw, P. NovÃąk, M. Crespi, and F. SansÚ (Eds.),
VII Hotine-Marussi Symposium on Mathematical Geodesy , Volume 137of
International Association of Geodesy Symposia , pp. 329–334. SpringerBerlin Heidelberg.Amiri-Simkooei, A. R., C. C. J. M. Tiberius, and P. J. G. Teunissen (2007).19ssessment of noise in GPS coordinate time series: Methodology and re-sults.
Journal of Geophysical Research (Solid Earth) 112 (B7).Arribas-Gil, A., B. K., M. C., and R. V. (2014). Lasso-type estimators forsemiparametric nonlinear mixed-effects models estimation.
Statistics andComputing 24 (3), 443–460.Bai, J. and P. Perron (2003). Computation and analysis of multiple structuralchange models.
J. Appl. Econ. 18 , 1–22.Caussinus, H. and O. Mestre (2004). Detection and correction of artificialshifts in climate series.
Applied Statistics 53 , 405–425.Dong, D., P. P. Fang, Y. Bock, M. K. Cheng, and S. Miyazaki (2002).Anatomy of apparent seasonal variations from gps-derived site positiontime series.
Journal of Geophysical Research (Solid Earth) 107 (B4), ETG9–1.Gazeaux, J., S. Williams, M. King, M. Bos, R. Dach, M. Deo, A. W. Moore,L. Ostini, E. Petrie, M. Roggero, F. N. Teferle, G. Olivares, and F. H.Webb (2013). Detecting offsets in gps time series: First results from thedetection of offsets in gps experiment.
Journal of Geophysical Research:Solid Earth 118 (5), 2397–2407.King, M., Z. Altamimi, J. Boehm, M. Bos, R. Dach, P. Elosegui, F. Fund,M. Hernandez-Pajares, D. Lavallée, P. Cerveira, R. Riva, P. Steigenberger,T. van Dam, L. Vittuari, S. Williams, and P. Willis (2010). Improved con-straints on models of glacial isostatic adjustment.
A review of the contribu-tion of ground-based geodetic observations, Surveys in Geophysics 31 (5).Lai, W., M. Johnson, R. Kucherlapati, and P. J. Park (2005). Comparativeanalysis of algorithms for identifying amplifications and deletions in arrayCGH data.
Bioinformatics 21 (19), 3763–3770.Petit, G. and B. Luzum (2010). Iers conventions (2010). (iers technical note; 36). Technical report, Frankfurt am Main: Verlag des Bundesamts fürKartographie und Geodäsie. inprint.Picard, F., E. Lebarbier, E. Budinska, and Robin (2011). Joint segmentationof multivariate gaussian processes using mixed linear models.
Comp. Stat. & Data Analysis 55 , 1160–1170.Picard, F., E. Lebarbier, M. Hoebeke, G. Rigaill, B. Thiam, and S. Robin(2011). Joint segmentation, calling and normalization of multiple cghprofiles.
Biostatistics 12 (3), 413–428.Picard, F., S. Robin, M. Lavielle, C. Vaisse, and J.-J. Daudin (2005). Astatistical approach for CGH microarray data analysis.
BMC Bioinfor-matics 6 , 27. 20ay, J., Z. Altamimi, X. Collilieux, and T. van Dam (2008). Anomalousharmonics in the spectra of GPS position estimates.
GPS Solutions 12 (1).Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso.
Journal of the Royal Statistical Society, Series B 58 , 267–288.van Dam, T., X. Collilieux, J. Wuite, Z. Altamimi, and J. Ray (2012). Non-tidal ocean loading: amplitudes and potential effects in gps height timeseries.
Journal of Geodesy 86 (11), 1043–1057.Wahr, J., S. A. Khan, T. van Dam, L. Liu, J. H. van Angelen, M. R. van denBroeke, and C. M. Meertens (2013). The use of gps horizontals for loadingstudies, with applications to northern california and southeast greenland.
Journal of Geophysical Research: Solid Earth 118 (4), 1795–1806.Williams, S. (2003). Offsets in global positioning system time series.
Journalof Geophysical Research (Solid Earth) 108 (19), 2310–+.Williams, S., Y. Bock, P. Fang, P. Jamason, R. Nikolaidis, L. Prawirodirdjo,M. M., and D. Johnson (2004). Error analysis of continuous GPS positiontime series.
Journal of Geophysical Research 109 (B18), B03412.Wu, X., X. Collilieux, Z. Altamimi, B. Vermeersen, R. Gross, and I. Fuku-mori (2012). Accuracy of the international terrestrial reference frame originand earth expansion.
Geophysical Research Letters 38 (L13304).Wu, X., M. B. Heflin, H. Schotman, B. L. A. Vermeersen, D. Dong, R. S.Gross, E. R. Ivins, A. W. Moore, and S. E. Owen (2011). Simultane-ous estimation of global present-day water transport and glacial isostaticadjustment.
Nature Geoscience 3 (9), 642–646.Zhang, N. R. and D. O. Siegmund (2007). A modified Bayes informationcriterion with applications to the analysis of comparative genomic hy-bridization data.