[PDF] Introduction to Geodetic Time Series Analysis

Abstract

This contribution is the chapter 2 of the book "geodetic time series analysis" (10.1007/978-3-030-21718-1). The book is dedicated to the art of fitting a trajectory model to those geodetic time series in order to extract accurate geophysical information with realistic error bars in geodymanics and environmental geodesy related studies. In the vast amount of the literature published on this topic in the past 25 years, we are specifically interested in parametric algorithms which are estimating both functional and stochastic models using various Bayesian statistical tools (maximum likelihood, Monte Carlo Markov chain, Kalman filter, least squares variance component estimation, information criteria). This chapter will focus on how the parameters of the trajectory model can be estimated. It is meant to give researchers new to this topic an easy introduction to the theory with references to key books and articles where more details can be found. In addition, we hope that it refreshes some of the details for the more experienced readers. We pay special attention to the modelling of the noise which has received much attention in the literature in the last years and highlight some of the numerical aspects.

Full PDF

aa r X i v : . [ s t a t . O T ] A ug Introduction to Geodetic Time SeriesAnalysis

M.S. Bos, J.-P. Montillet, S.D.P. Williams, R.M.S. Fernandes

Abstract

The previous chapter gave various examples of geophysical timeseries and the various trajectory models that can be ﬁtted to them. In thischapter we will focus on how the parameters of the trajectory model can beestimated. It is meant to give researchers new to this topic an easy intro-duction to the theory with references to key books and articles where moredetails can be found. In addition, we hope that it refreshes some of the detailsfor the more experienced readers. We pay special attention to the modellingof the noise which has received much attention in the literature in the lastyears and highlight some of the numerical aspects. The subsequent chapterswill go deeper into the theory, explore diﬀerent aspects and describe the stateof art of this area of research.

Geodetic time series consist out of a set observations at various epochs. Theseobservations, stored in a vector y , are not perfect but contain noise whichcan be described as a set of multivariate random variables. Let us deﬁne this M.S. BosInstituto Dom Luiz, Universidade da Beira Interior, Portugal e-mail: [email protected]

J.-P. MontilletSpace and Earth Geodetic Analysis Laboratory, Universidade da Beira Interior,Portugal - Institute of Earth Surface Dynamics, University of Lausanne, Lausanne,Switzerland e-mail: [email protected]

S.D.P. WilliamsNational Oceanographic Centre, Liverpool, United Kingdom e-mail: [email protected]

R.M.S. FernandesInstituto Dom Luiz, Universidade da Beira Interior, Portugal e-mail: [email protected] as the vector w = [ W , W , W , . . . , W N ] where each W i is a random variable.If f ( w ) is the associated probability density function, then the ﬁrst moment µ , the mean of the noise, is deﬁned as (Casella and Berger, 2001): µ = E [ W ] = ∞ Z −∞ wf ( w ) dw (1)where E is the expectation operator. It assigns to each possible value ofrandom variable w a weight f ( w ) over an inﬁnitely small interval of dw , sumseach of them to obtain the mean expected value E [ W ]. The second moment µ is deﬁned in a similar manner: µ = E [ W ] = ∞ Z −∞ w f ( w ) dw = ∞ Z −∞ w dF ( w ) (2)The last term F is the cumulative distribution. The second moment isbetter known as the variance. Since we have N random variables, we cancompute variances for E [ W i W j ], where both i and j range from 1 to N . Theresult is called the covariance matrix. In this book, the probability densityfunction f ( w ) is assumed to be a Gaussian: f ( w | µ , σ ) = 1 √ πσ exp (cid:20) − ( w − µ ) σ (cid:21) (3)where σ is the standard deviation, the square-root of the variance of ran-dom variable w . This function is very well known and is shown in Figure 1for zero µ .The standard error is deﬁned as the 1- σ interval and contains on average68% of the observed values of w . The reason why it is so often encountered inobservations is that the central limit theorem states that the sum of variouscontinuous probability distributions always tends to the Gaussian one. Anadditional property of the Gaussian probability density function is that allits moments higher than two ( µ , µ , . . . ) are zero. Therefore, the mean andthe covariance matrix provide a complete description of the stochastic prop-erties. Actually, we will always assume that the mean of the noise is zero andtherefore only need the covariance matrix. The term in front of the exponen-tial is needed to ensure that the integral of f ( x ) from −∞ to ∞ produces 1.That is, the total probability of observing a value between these limits is 1,as it should be. We have not one, but several observations with noise in ourtime series. The probability density function of the multi-variate noise is: f ( w | C ) = 1 p (2 π ) N det( C ) exp h − w T C − w i (4) ntroduction to Geodetic Time Series Analysis 3 f ( w ) w68%95%99.7% Fig. 1: The Gaussian probability density function, together with the 1, 2 and3 σ intervals.We assumed that the covariance matrix C is known. The expression f ( w | C ) should be read as the probability density function f for variable w , for given and ﬁxed covariance matrix C . Next, we assume that our obser-vations can be described by our model g ( x , t ), where x are the parametersof the model and t the time. The observations are the sum of our model plusthe noise: y = g ( x , t ) + w or w = y − g ( x , t ) (5)The noise w is described by our Gaussian probability density function withzero mean and covariance matrix C . The probability that we obtained theactual values of our observations is: f ( y | x , C ) = 1 p (2 π ) N det( C ) exp h − ( y − g ( x , t )) T C − ( y − g ( x , t )) i (6)However, we don’t know the true values of x or the covariance matrix C .We only know the observations. Consequently, we need to rephrase our prob-lem as follows: what values of x and C would produce the largest probabilitythat we observe y ? Thus, we are maximising f ( x , C | y ) which we call thelikelihood function L . Furthermore, we normally work with the logarithm ofit which is called the log-likelihood:ln( L ) = − h N ln(2 π ) + ln(det( C )) + ( y − g ( x , t )) T C − ( y − g ( x , t )) i (7) Filtering GPS Time Series and Common Mode Error Analysis

We need to ﬁnd value of x to maximise this function and the method istherefore called Maximum Likelihood Estimation (MLE). The change from f ( y | x , C ) to f ( x , C | y ) is subtle. Assume that the covariance matrix C alsodepends on parameters that we store in vector x . In this way, we can simplifythe expression f ( y | x , C ) to f ( y | x ). Bayes’ Theorem, expressed in terms ofprobability distributions gives us: f ( x | y ) = f ( y | x ) f ( x ) f ( y ) (8)where f ( y ) and f ( x ) are our prior probability density function for the ob-servations y and parameters x , respectively. These represent our knowledgeabout what observations and parameter values we expect before the mea-surements were made. If we don’t prefer any particular values, these priorprobability density functions can be constants and they will have no inﬂu-ence on the maximising of the likelihood function f ( x | y ) = L .Another subtlety is that we changed from random noise and ﬁxed pa-rameter values of the trajectory model f ( y | x ) to ﬁxed noise and randomparameters of the trajectory model f ( x | y ). If the trajectory model is for ex-ample a linear tectonic motion then this is a deterministic, ﬁxed velocity, nota random one. However, one should interpret f ( x | y ) as our degree of trust,our conﬁdence that the estimated parameters x are correct. See also Koch(1990, 2007) and Jaynes (2003). The last one is particularly recommended tolearn more about Bayesian statistics. So far we simply deﬁned our trajectory model as g ( x , t ). An important classof models that are ﬁtted to the observations are linear models. These aredeﬁned as: g ( x , t ) = x g ( t ) + x g ( t ) + . . . + x M g M ( t ) (9)where x to x M are assumed to be constants. We can rewrite this in matrixform as follows: g ( x , t ) =  g ( t ) g ( t ) . . . g M ( t ) g ( t ) g ( t ) g M ( t )... ... g ( t N ) g ( t N ) g M ( t N )   x ... x M  = Ax (10)Matrix A is called the design matrix. From Chapter 1 we know that tec-tonic motion or sea level rise can be modelled by a linear trend (i.e. theStandar Linear Trajectory Model). Thus g ( t ) is a constant and g ( t ) a lin-ear trend. This can be extended to a higher degree polynomial to model ntroduction to Geodetic Time Series Analysis 5 acceleration for example. Next, in many cases an annual and semi-annualsignal is included as well. A periodic signal can be described by its amplitude b k and its phase-lag ψ k with respect to some reference epoch: g ( t ) = b k cos( ω k t − ψ k )= b k cos ψ cos( ω k t ) + b k sin ψ k sin( ω k t )= c k cos( ω k t ) + s k sin( ω k t ) (11)Since the unknown phase-lag ψ k makes the function non-linear, one mustalmost always estimate the amplitudes c k and s k , see Chapter 1. These pa-rameters are linear with functions cos and sin, and derive from these valuesthe amplitude b k and phase-lag ψ k .Other models that can be included in g ( t ) are oﬀsets and post-seismicrelaxation functions, see Chapter 1. An example of a combination of all thesemodels into a single trajectory model is shown in Figure 2. -80-70-60-50-40-30-20-10 0 10 20 30 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 d i s p l a c e m en t ( mm ) Years seasonalexponentialor logarithmic decay velocity 2instantaneousdisplacementvelocity 1signal offsets

Fig. 2: Sketch of a trajectory model containing common phenomenaFor linear models, the log-likelihood can be rewritten as:ln( L ) = − h N ln(2 π ) + ln(det( C )) + ( y − Ax ) T C − ( y − Ax ) i (12)This function must be maximised. Assuming that the covariance matrixis known, then it is a constant and does not inﬂuence ﬁnding the maximum.Next, the term ( y − Ax ) represent the observations minus the ﬁtted modeland are normally called the residuals r . It is desirable to choosing the pa-rameters x in such a way to make these residuals small. The last term can Filtering GPS Time Series and Common Mode Error Analysis be written as r T C − r and it is a quadratic function, weighted by the inverseof matrix C .Now let us compute the derivative of ln( L ): d ln( L ) d x = A T C − y − A T C − Ax (13)The minimum of ln( L ) occurs when this derivative is zero. Thus: A T C − Ax = A T C − y → x = (cid:16) A T C − A (cid:17) − A T C − y (14)This is the celebrated weighted least-squares equation to estimate the pa-rameters x . Most derivations of this equation focus on the minimisation ofthe quadratic cost function. However, here we highlight the fact that forobservations that contain Gaussian multivariate noise, the weighted least-squares estimator is a maximum likelihood estimator (MLE). From Eq. (14)it can also be deduced that vector x , like the observation vector y , follows amulti-variate Gaussian probability density function.The variance of the estimated parameters estimated is: var ( x ) = var (cid:18)(cid:16) A T C − A (cid:17) − A T C − y (cid:19) = (cid:16) A T C − A (cid:17) − A T C − var ( y ) C − A (cid:16) A T C − A (cid:17) − = (cid:16) A T C − A (cid:17) − A T C − C C − A (cid:16) A T C − A (cid:17) − = (cid:16) A T C − A (cid:17) − (15)Next, deﬁne the following matrix I ( x ): I ( x ) = − E (cid:20) ∂ ∂ x ln( L ) (cid:21) = − Z (cid:18) ∂ ∂ x ln( f ) (cid:19) f d x (16)It is called the Fisher Information matrix. As in Eqs. (1) and (2), we usethe expectation operator E . Remember that we simply called f our likelihood L but these are the same. We already used the fact that the log-likelihoodas function of x is horizontal at the maximum value. Let us call this ˆ x . Thesecond derivative is related to the curvature of the log-likelihood function.The sharper the peak near its maximum, the more accurate we can estimatethe parameters x and therefore the smaller their variance will be.Next, it can be shown that the following inequality holds:1 ≤ Z (ˆ x − x ) f d x Z (cid:18) ∂ ln( f ) ∂ x (cid:19) f d x (17) ntroduction to Geodetic Time Series Analysis 7 The ﬁrst integral represents the variance of x , see Eq. (2). The second one,after some rewriting, is equal to the Fisher information matrix. This gives us,for any unbiased estimator, the following Cram´er-Rao Lower Bound (Kay,1993): var (ˆ x ) ≥ I ( x ) (18)Eq. (18) predicts the minimum variance of the estimated parameters x forgiven probability density function f and its relation with the parameters x that we want to estimate. If we us Eq. (13) to compute the second derivativeof the log-likelihood, then we obtain: I ( x ) = A T C − A (19)Comparing this with Eq. (15), one can see that for the case of the weightedleast-square estimator, the Cram´er-Rao Lower Bound is achieved. Therefore,it is an optimal estimator. Because we also need to estimate the parametersof the covariance matrix C , we shall use MLE which approximates this lowerbound for increasing number of observations. Therefore, one can be sure thatout of all existing estimation methods, none of them will produce a moreaccurate result than MLE, only equal or worse. For more details, see Kay(1993). Least-squares and maximum likelihood estimation are well known techniquesin various branches of science. In recent years much attention has been paidby geodesists to the structure of the covariance matrix. If there was no re-lation between each noise value, then these would be independent randomvariables and the covariance matrix C would be zero except for values onits diagonal. However, in almost all geodetical time series these are depen-dent random variables. In statistics this is called temporal correlation and weshould consider a full covariance matrix: C =  σ σ . . . σ N σ σ σ N ... . . . ... σ N . . . σ NN − σ NN  (20)where σ is the covariance between random variables w and w . If weassume that the properties of the noise are constant over time, then we havethe same covariance between w and w , w and w and all other correlationswith 1 time step separation. As a result, σ , σ , . . . , σ N − N are all equal.A simple estimator for it is: Filtering GPS Time Series and Common Mode Error Analysis σ = σ = . . . = σ N − N = 1 N − N − X i =1 w i w i +1 (21)This is an approximation of the formula to compute the second moment,see Eq. (2), and it called the empirical or sample covariance matrix. There-fore, one could try the following iterate scheme: ﬁt the linear model to theobservations some a priori covariance matrix, compute the residuals and usethis to estimate a more realistic covariance matrix using Eq. (20) and ﬁtagain the linear model to the observations until all estimated parametershave converged.The previous chapter demonstrated that one of the purpose of the trajec-tory models is to estimate the linear or secular trend. For time series longerthan 2 years, the uncertainty of this trend depends mainly on the noise atthe lowest observed periods (Bos et al., 2008; He et al., 2019). However, theempirical covariance matrix estimation of Eq. (20) does not result in an ac-curate estimate of the noise at long periods because only a few observationsare used in the computation. In fact, only the ﬁrst and last observation areused to compute the variance of the noise at the longest observed period (i.e. σ N ).This problem has been solved by deﬁning a model of the noise and es-timating the parameters of this noise model. The estimation of the noisemodel parameters can be achieved using the log-likelihood with a numericalmaximisation scheme but other methods exist such as least-squares variancecomponent estimation (see Chapter 6).The development of a good noise model started with the paper of Hurst(1957) who discovered that the cumulative water ﬂow of the Nile river de-pended on the previous years. The inﬂuence of the previous years decayedaccording a power-law. This inspired Mandelbrot and van Ness (1968) to de-ﬁne the fractional Brownian motion model which includes both the power-law and fractional Gaussian noises, see also Beran (1994) and Graves et al.(2017). While this research was well known in hydrology and in econometry,it was not until the publication by Agnew (1992), who demonstrated thatmost geophysical time series exhibit power-law noise behaviour, that thistype of noise modelling started to be applied to geodetic time series. In hind-sight, Press (1978) had already demonstrated similar results but this workhas not received much attention in geodesy. That the noise in GNSS timeseries also falls in this category was demonstrated by Johnson and Agnew(1995). Power-law noise has the property that the power spectral densityof the noise follows a power-law curve. On a log-log plot, it converts into astraight line. The equation for power-law noise is: P ( f ) = P ( f /f s ) κ (22)where f is the frequency, P is a constant, f s the sampling frequency andthe exponent κ is called the spectral index. ntroduction to Geodetic Time Series Analysis 9 Granger (1980), Granger and Joyeux (1980) and Hosking (1981) demon-strated that power-law noise can be achieved using fractional diﬀerencing ofGaussian noise: (1 − B ) − κ/ v = w (23)where B is the backward-shift operator ( Bv i = v i − ) and v a vector withindependent and identically distributed (IID) Gaussian noise. Hosking andGranger used the parameter d for the fraction − κ/ κ is used in the equations. Hosking’s deﬁnitionof the fractional diﬀerencing is:(1 − B ) − κ/ = ∞ X i =0 (cid:18) − κ/ i (cid:19) ( − B ) i = 1 − κ B − κ − κ B + . . . = ∞ X i =0 h i (24)The coeﬃcients h i can be viewed as a ﬁlter that is applied to the indepen-dent white noise. These coeﬃcients can be conveniently computed using thefollowing recurrence relation (Kasdin, 1995): h = 1 h i = ( i − κ − h i − i for i > i , the fraction ( i − κ/ − /i is slightly lessthan 1. Thus, the coeﬃcients h i only decrease very slowly to zero. This impliesthat the current noise value w i depends on many previous values of v . In otherwords, the noise has a long memory. Actually, the model of fractional Gaus-sian noise deﬁned by Hosking (1981) is the basic deﬁnition of the general classof processes called Auto Regressive Integrated moving Average (Taqqu et al.,1995). If we ignore the Integrated part, then we obtain the Auto Regres-sive Moving Average (ARMA) model (Box et al., 2015; Brockwell and Davis,2002) which are short-memory noise models. The original deﬁnition of theARIMA processes only considers the value of the power κ/ κ is a ﬂoating value, generally in the range of − ≤ i ≤

1. Montillet and Yu(2015) discussed the application of the ARMA and FARIMA models in mod-elling GNSS daily position time series and concluded that the FARIMA isonly suitable in the presence of a large amplitude coloured noise capable ofgenerating a distribution with large tails (i.e. random-walk, aggregations).

Equation (25) also shows that when the spectral index κ = 0, then allcoeﬃcients h i are zero except for h . This implies that there is no temporalcorrelation between the noise values. In addition, Eq. (22) shows that thiscorresponds to a horizontal line in the power spectral density domain. Usingthe analogy of the visible light spectrum, this situation of equal power atall frequencies produces white light and it is therefore called white noise.For κ = 0, some values have received a speciﬁc colour. For example, κ = − κ = − h i = 1 for all values of i . Thus,this noise is a simple sum of all its previous values plus a new random stepand is better known as random walk (Mandelbrot, 1999). However, note thatthe spectral index κ does not need to be an integer value (Williams, 2003).One normally assumes that v i = 0 for i <

0. With this assumption, theunit covariance between w k and w l with l > k is: C ( w k , w l ) = k X i =0 h i h i +( l − k ) (26)Since κ = 0 produces an identity matrix, the associated white noise covari-ance matrix is represented by unit matrix I . The general power-law covariancematrix is represented by the matrix J . The sum of white and power-law noisecan be written as (Williams, 2003): C = σ pl J ( κ ) + σ w I (27)where σ pl and σ w are the noise amplitudes. It is a widely used combinationof noise models to describe the noise in GNSS time series (Williams et al.2004). Besides the parameters of the linear model (i.e. the trajectory model),maximum likelihood estimation can be used to also estimate the parameters κ , σ pl and σ w . This approach has been implemented various software packagessuch as CATS (Williams, 2008), est noise (Langbein, 2010) and Hector (Boset al. 2013). In recent years one also has detected random walk noise in thetime series and this type has been included as well in the covariance matrix(Langbein, 2012; Dmitrieva et al., 2015).We assumed that v i = 0 for i < κ smaller than -1, the noise be-comes non-stationary. That is, the variance of the noise increases over time.If it is assumed that the noise was always present, then the variance wouldbe inﬁnite.Most GNSS time series contain ﬂicker noise which is just non-stationary.Using the assumption of zero noise before the ﬁrst observation, the covariancematrix still increases over time but remains ﬁnite. ntroduction to Geodetic Time Series Analysis 11 For some geodetic time series, such as tide gauge observations, the power-law behaviour in the frequency domain shows a ﬂattening below some thresh-old frequency. To model such behaviour, Langbein (2004) introduced theGeneralised Gauss-Markov (GGM) noise model which is deﬁned as:(1 − φB ) − κ/ v = w (28)The only new parameter is φ . The associated recurrence relation to com-pute the new coeﬃcients h i is: h = 1 h i = ( i − κ − φ h i − i for i > φ = 1, then we obtain again our pure power-law noise model. For anyvalue of φ slightly smaller than one, this term helps to shorten the memory ofnoise which makes it stationary. That is, the temporal correlation decreasesfaster to zero for increasing lag between the noise values. The power-spectrumof this noise model shows a ﬂattening below some threshold frequency whichguarantees that the variance is ﬁnite and that the noise is stationary. Finally,it is even possible to generalise this a bit more to a fractionally integratedgeneralised Gauss-Markov model (FIGGM):(1 − φB ) − κ / (1 − B ) κ / v = w (1 − φB ) − κ / u = w (30)This is just a combination of the two previous models. One can ﬁrst applythe power-law ﬁlter to v to obtain u and afterwards apply the GGM ﬁlter onit to obtain w . Other models will be discussed in this book, such as ARMA(Box et al., 2015; Brockwell and Davis, 2002), but the power-law, GGM andFIGGM capture nicely the long memory property that is present in mostgeodetic time series. A list of all these noise models and their abbreviation isgiven in Table 1.Table 1: Common abbreviation of noise models Noise model AbbreviationAuto-Regressive Moving Average ARMAAuto-Regressive Fractionally Integrated Moving Average ARFIMA or FARIMAFlicker noise FNFractionally Integrated GGM FIGGMGeneralised Gauss Markov GGMPower-law PLRandom Walk RWWhite noise WN2 Filtering GPS Time Series and Common Mode Error Analysis

Figure 3 shows examples of white, ﬂicker and random walk noise for a dis-placement time series. One can see that the white noise varies around a stablemean while the random walk is clearly non-stationary and deviates away fromits initial position. -40-30-20-10 0 10 20 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 mm Yearswhite noise flicker noiserandom walk noise

Fig. 3: Examples of white, ﬂicker and random walk noiseIn the previous section we mentioned that power-law noise has a speciﬁccurve in the power spectral density plots. Methods to compute those plotsare given by Buttkus (2000). A simple but eﬀective method is based on theFourier transform that states that each time series with ﬁnite variance canbe written as a sum of periodic signals: y n = 1 N N/ X k = − N/ Y k · e i πkn/N for n = [0 , . . . , N −

1] (31)Actually, this is called the inverse discrete Fourier transform. Y k are com-plex numbers, denoting the amplitude and phase of the periodic signal withperiod k/ ( N T ) where T is the observation span. An attentive reader willremember that ﬂicker and random walk noise are non-stationary while theFourier transform requires time series with ﬁnite variance. However, we neverhave inﬁnitely long time series which guarantees the variance remains withinsome limit. The coeﬃcients can be computed as follows: ntroduction to Geodetic Time Series Analysis 13 Y k = N − X n =0 y n · e − i πkn/N for k = [ − N/ , . . ., N/

2] (32)The transformation to the frequency domain provides insight which pe-riodic signals are present in the signal and in our case, insight about thenoise amplitude at the various frequencies. This is a classic topic and moredetails can be found in the books by Bracewell (1978) and Buttkus (2000).The one-sided power spectral density S k is deﬁned as: S = | Y | /f s S N/ = | Y N/ | /f s S k = 2 | Y k | /f s for k = [1 , . . . , N/ −

1] (33)The frequency f k associated to each S k is: f k = kf s N for k = [0 , . . . , N/

2] (34)The highest frequency is half the sampling frequency, f s /

2, which is calledthe Nyquist frequency. The power spectral density (PSD) computed in thismanner is called a periodogram. There are many reﬁnements, such as applyingwindow functions and cutting the time series in segments and averaging theresulting set of PSD’s. However, a detail that normally receives little attentionis that the Fourier transform produces positive and negative frequencies.Time only increases and there are no negative frequencies. Therefore, onealways uses the one-sided power spectral density. Another useful relation isthat of Parseval (Buttkus, 2000):1 N N − X n =0 | y n | = 1 N N/ X k = − N/ | Y k | (35)Thus, the variance of the noise should be equal to the sum of all S k values(and an extra f s /N scale). The one-sided power spectral density of the threetime series of Figure 3 are plotted in Figure 4. It shows that power-law noiseindeed follows a straight line in the power spectral density plots if a log-log scale is used. In fact, the properties of the power-law noise can also beestimated by ﬁtting a line to the power spectral density estimates (Mao et al.,1999; Caporali, 2003).The PSD of power-law noise generated by fractionally diﬀerenced Gaussiannoise is (Kasdin, 1995): Random Walk noise -5 -4 -3 -2 -1 Frequency (cpy) -4 -3 -2 -1 Frequency (cpy) -3 -2 -1 P o w e r ( mm / c p y ) Frequency (cpy) =−1 =−2White noise Flicker noise=0 κκ κ

Fig. 4: One-sided power spectral density for white, ﬂicker and random walknoise. The blue dots are the computed periodogram (Welch’s method) whilethe solid red line is the ﬁtted power-law model. S ( f ) = 2 σ f s (2 sin( πf /f s )) κ ≈ σ f s ( πf /f s )) κ = P ( f /f s ) κ for f ≪ f s (36)For small value of f , this approximates P ( f /f s ) κ . The sine function isthe result of having discrete data (Kasdin, 1995). The PSD for GGM noiseis: S ( f ) = 2 σ f s (cid:2) φ − φ cos(2 πf /f s ) (cid:3) κ/ (37)For φ = 1, it converts to the pure power-law noise PSD. The Fourier trans-form, and especially the Fast Fourier Transform, can also be used to ﬁltera time series. For example, Eqs. (23) and (24) represent a ﬁltering of whitenoise vector v to produce coloured noise vector w : w i = i − X j =0 h i − j v j (38)Let us now extend the time series y and the vector h containing theﬁlter coeﬃcients with N zeros. This zero padding allows us to extend thesummation to 2 N . Using Eq. (32), their Fourier transforms, Y k and H k , canbe computed. In the frequency domain, convolution becomes multiplicationand we have (Press et al., 2007): W k = H k Y k for k = [ − N, . . . , N ] (39)Using Eq. (31) and only using the ﬁrst N elements, the vector w with thecoloured noise can be obtained. ntroduction to Geodetic Time Series Analysis 15 To explain the principle of maximum likelihood, this section will show someexamples of the numerical method using Python 3. For some years Matlabhas been the number one choice to analyse and visualise time series. However,in recent years Python has grown in popularity, due to the fact that it is opensource and has many powerful libraries. The following examples are made inIPython (https://ipython.org), using the Jupyter notebook webapplication.How to install this program is described on the afore mentioned website. Theexamples shown here can be downloaded from the publisher website. Theﬁrst step is to import the libraries: import mathimport numpy as npfrom matplotlib import pyplot as pltfrom scipy.optimize import minimizefrom numpy.linalg import inv

Next step is to create some data which we will store in Numpy arrays.As in Matlab, the ‘linspace’ operator creates a simple array on integers. Fur-thermore, as the name implies ’random.normal’ creates an array of Gaussiandistributed random numbers. We create a line y with slope 2 and oﬀset 6on which we superimpose the noise w that were created using a standarddeviation σ pl = 0 . v , see Eq. (23). N = 500

Of course the normal situation is that we are given a set observations andthat we need to estimate the parameters of the trajectory model y ( t ) = a + bt . Fig. 5: Our synthetic time series containing a simple line plus ﬂicker noise.However, creating synthetic time series is a very good method to test if yourestimation procedures are correct.First we will estimate the trajectory assuming white noise in the data:

The result should be: ntroduction to Geodetic Time Series Analysis 17

White noise approximationa = 6.728 +/- 0.064 mmb = 1.829 +/- 0.080 mm/yr

What we have done here is using weighted least-squares with a whitenoise model that has unit variance. The real variance of the noise has beenestimated from the residuals and the uncertainty of the estimated parameters x have been scaled with it.At this point the reader will realise that this approach is not justiﬁedbecause the noise is temporally correlated. It will be convenient to deﬁne thefollowing two functions that will create the covariance matrix for power-lawnoise and apply weighted least-squares (Williams, 2003; Bos et al., 2008): The function that creates the covariance matrix for power-law noise hasbeen discussed in section 3 and uses Eqs. (25) and (26). The weighted least-squares function contains some numerical tricks. First, the Cholesky decom- position is applied to the covariance matrix (Bos et al., 2008): C = U T U (40)where U is an upper triangle matrix. That is, only the elements above thediagonal are non-zero. A covariance matrix is a positive deﬁnite matrix whichensures that the Cholesky decomposition always exists. The most importantadvantage it that one can compute the logarithm of the determinant of matrix C by just summing the logarithm of each element on the diagonal of matrix U . The factor two is needed because matrix C is the product of U T U . Usingthese two functions, we can compute the correct parameters x : The result is:

Correct Flicker noisea = 6.854 +/- 2.575 mmb = 1.865 +/- 4.112 mm/yr

If one compares the two estimates, one assuming white noise and the otherassuming ﬂicker noise, then one can verify that the estimates themselves aresimilar. The largest diﬀerence occurs for the estimated errors which are 5times larger for the latter. This also happens in real geodetic time series.Mao et al. (1999) concluded that the velocity error in GNSS time-series couldbe underestimated by factors of 5âĂŞ11 if a pure white noise model is as-sumed. Langbein (2012) demonstrated that an additional factor of two mightbe needed if there is also random walk noise present.For sea level time series Bos et al. (2014) obtained a more moderate fac-tor of 1.5-2 but still, white noise underestimates the true uncertainty of theestimated linear trend. Williams et al. (2014) estimated a factor 6 for theGRACE gravity time series. As discussed in section 3, most geodetic timeseries are temporally correlated and therefore one nowadays avoids the whitenoise model.So far we have assumed that we knew the true value of the spectral index κ and the noise amplitude σ pl . Using MLE, we can estimate these parametersfrom the data: ntroduction to Geodetic Time Series Analysis 19 def log_likelihood(x_noise):sigma_pl = x_noise[0]kappa = x_noise[1]C = create_C(sigma_pl,kappa)[x,C_x,ln_det_C] = leastsquares(C,A,y)r = y - A @ x Note that we inverted the sign of the log-likelihood function because mostsoftware libraries provide minisation subroutines, not maximisation. In ad-dition, it is in this function that we need the logarithm of the determinantof matrix C . If one tries to compute it directly from matrix C , then onequickly encounters too large numbers that create numerical overﬂow. Thisfunction also shows that we use weighted least-squares to estimate the pa-rameters of the trajectory model while the numerical minisation algorithm(i.e. Nelder-Mead), is only used the compute the noise parameters. The rea-son for using weighted least-squares, also a maximum likelihood estimatoras we have shown in section 2, is solely for speed. Numerical minisation is aslow process which becomes worse for each additional parameter we need toestimate. The results is: sigma_pl= 0.495, kappa=-1.004 These values are close to the true values of σ pl = 0 . κ = −

1. Thefollowing code can be used to plot the log-likelihood as function of κ and σ pl : S = np.empty((21,21))for i in range(0,21):sigma_pl = 1.2 - 0.05*ifor j in range(0,21):kappa = -1.9 + 0.1*jx_noise0 = [sigma_pl,kappa]S[i,j] = math.log(log_likelihood(x_noise0))plt.imshow(S,extent=[-1.9,0.1,0.2,1.2], cmap=’nipy_spectral’, \ aspect=’auto’);plt.colorbar()plt.ylabel(’sigma_pl’)plt.xlabel(’kappa’)plt.show() −1.75 −1.50 −1.25 −1.00 −0.75 −0.50 −0.25 0.00kappa0.20.40.60.81.01.2 s i g m a _ p l Fig. 6: The log of the log( L ) function as function of κ and σ pl .The result is shown in Figure 6 which indeed shows a minimum around σ pl = 0 . κ = −

1. Depending on the computer power, it might take sometime to produce the values for this ﬁgure.In section 3 we noted that for GNSS time series the power-law plus whitenoise model is common. Thus, one must add the covariance matrix for whitenoise, σ w I , to the covariance matrix we discussed in the examples. In addition,it is more eﬃcient to write the covariance matrix of the sum of power-lawand white noise as follows: C = σ pl J ( κ ) + σ w I = σ ( φ J ( κ ) + (1 − φ ) I ) (41)where σ can be computed using: ntroduction to Geodetic Time Series Analysis 21 σ = s r T C − r N (42)Since σ can be computed from the residuals, we only use our slow numericalminisation algorithm need to ﬁnd the value of φ (Williams, 2008).Note that we only analysed 500 observations while nowadays time serieswith 7000 observations are not uncommon. If one tries to rerun our examplesfor this value of N , then one will note this takes an extremely long time. Themain reason is that the inversion of matrix C requires O ( N ) operations.Bos et al. (2008, 2013) have investigated how the covariance matrix C canbe approximated by a Toeplitz matrix. This is a special type of matrix whichhas a constant value on each diagonal and one can compute its inverse usingonly O ( N ) operations. This method has been implemented in the Hectorsoftware package that is available from http://segal.ubi.pt/hector .The Hector software was used to create time series with a length of 5000daily observations (around 13.7 years) for 20 GNSS stations which we will callthe Benchmark Synthetic GNSS (BSG). This was done for the the horizontaland vertical components, producing 60 time series in total. Each containsa linear trend, an annual and a semi-annual signal. The sum of ﬂicker andwhite noise, w i , was added to these trajectory models: w i = σ p φ i − X j =0 h i − j v j + p − φ u i  (43)with both u i and v i are Gaussian noise variables with unit variance. Thefactor φ was deﬁned in Eq. (41). To create our BSG time series we used σ = 1 . φ = 0 . σ = 4 . φ = 0 . ∆T − κ/ where ∆T is the sampling period in years. For the vertical ﬂicker noise amplitudewe obtain: σ pl = σ √ φ∆T κ/ = 4 . × √ . / . / = 17 . . (44)The vertical white noise amplitude is 2.6 mm. For the horizontal compo-nent these values are σ pl = 4 . . and σ w = 0 . In this chapter we have given a brief introduction to the principles of time se-ries analysis. We paid special attention to the maximum likelihood estimation(MLE) method and the modelling of power-law noise. We showed that withour assumptions on the stochastic noise properties, the estimated parametershave their variance bounded by the Cramer Rao lower bound. Therefore theMLE is an optimal estimator in the sense of asymptotically unbiased andeﬃcient (minimum variance).In this book we will present other estimators such as the Kalman ﬁlter, theMarkov Chain Monte Carlo Algorithm and the Sigma-method. All have theiradvantages and disadvantages and to explain them was one of the reasonsfor writing this book. The other reason was to highlight the importance oftemporal correlated noise. This phenomenon has been known for a long timebut due to increased computer power, it has now become possible to includeit in the analysis of geodetic time series. We illustrated how this can be doneby various examples in Python 3 that highlighted some numerical aspectsthat will help the reader to implement their own algorithms.

References

Agnew DC (1992) The time-domain behaviour of power-law noises. GeophysRes Letters 19(4):333–336, DOI 10.1029/91GL02832Beran J (1994) Statistics for long-memory processes. Monographs on statis-tics and applied probability, Chapman & Hall, New YorkBeran J (1995) Maximum Likelihood Estimation of the Diﬀerenc-ing Parameter for Invertible Short and Long Memory Autore-gressive Integrated Moving Average Models. Journal of the RoyalStatistical Society Series B (Methodological) 57(4):659–672, URL

Bos MS, Williams SDP, Ara´ujo IB, Bastos L (2008) Fast error analysisof continuous GPS observations. J Geodesy 82:157–166, DOI 10.1007/s00190-007-0165-xBos MS, Williams SDP, Ara´ujo IB, Bastos L (2013) Fast error analysis ofcontinuous GNSS observations with missing data. J Geodesy 87:351–360,DOI 10.1007/s00190-012-0605-0Bos MS, Williams SDP, Ara´ujo IB, Bastos L (2014) The eﬀect of temporalcorrelated noise on the sea level rate and acceleration uncertainty. GeophysJ Int 196:1423–1430, DOI 10.1093/gji/ggt481Box GEP, Jenkins GM, Reinsel GC, Ljung GM (2015) Time Series Analysis,Forecasting and Control, 5th edn. WileyBracewell R (1978) The Fourier Transform and its Applications, 2nd edn.McGraw-Hill Kogakusha, Ltd., Tokyo ntroduction to Geodetic Time Series Analysis 23

Brockwell P, Davis RA (2002) Introduction to Time Series and Forecasting,second edition edn. Springer-Verlag, New YorkButtkus B (2000) Spectral Analysis and Filter Theory in Applied Geophysics.Springer-Verlag Berlin HeidelbergCaporali A (2003) Average strain rate in the Italian crust inferred from a per-manent GPS network - I. Statistical analysis of the time-series of permanentGPS stations. Geophys J Int 155:241–253, DOI 10.1046/j.1365-246X.2003.02034.xCasella G, Berger R (2001) Statistical Inference, 2nd edn. Duxbury ResourceCenterDmitrieva K, Segall P, DeMets C (2015) Network-based estimation of time-dependent noise in GPS position time series. J Geodesy 89:591–606, DOI10.1007/s00190-015-0801-9Granger C (1980) Long memory relationships and the aggregation of dynamicmodels. Journal of Econometrics 14(2):227 – 238, DOI https://doi.org/10.1016/0304-4076(80)90092-5Granger CWJ, Joyeux R (1980) An Introduction to Long-MemoryTime Series Models and Fractional Diﬀerencing. Journal of TimeSeries Analysis 1(1):15–29, DOI 10.1111/j.1467-9892.1980.tb00297.x, https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-9892.1980.tb00297.x

Graves T, Gramacy R, Watkins N, Franzke C (2017) A BriefHistory of Long Memory: Hurst, Mandelbrot and the Road toARFIMA, 1951âĂŞ1980. Entropy 19(9), DOI 10.3390/e19090437, URL

He X, Bos MS, Montillet JP, Fernandes RMS (2019) Investigationof the noise properties at low frequencies in long GNSS timeseries. Journal of Geodesy DOI 10.1007/s00190-019-01244-y, URL https://doi.org/10.1007/s00190-019-01244-y

Hosking JRM (1981) Fractional diﬀerencing. Biometrika 68:165–176Hurst HE (1957) A Suggested Statistical Model of some Time Series whichoccur in Nature. Nature 180:494, DOI 10.1038/180494a0Jaynes ET (2003) Probability theory: The logic of science. Cambridge Uni-versity Press, CambridgeJohnson HO, Agnew DC (1995) Monument motion and measurements ofcrustal velocities. Geophys Res Letters 22(21):2905–2908, DOI 10.1029/95GL02661Kasdin NJ (1995) Discrete simulation of colored noise and stochastic pro-cesses and 1 /f α power-law noise generation. Proc IEEE 83(5):802–827Kay SM (1993) Fundamentals of Statistical Signal Processing: EstimationTheory. Prentice-Hall, Inc., Upper Saddle River, NJ, USAKoch KR (1990) Bayesian Inference with Geodetic Applications. LectureNotes in Earth Sciences, Springer-Verlag, DOI 10.1007/BFb0048699Koch KR (2007) Introduction to Bayesian Statistics, 2nd edn. Springer-Verlag, Berlin Heidelberg Langbein J (2004) Noise in two-color electronic distance meter measurementsrevisited. J Geophys Res 109:B04406, DOI 10.1029/2003JB002819Langbein J (2010) Computer algorithm for analyzing and processing bore-hole strainmeter data. Comput Geosci 36(5):611–619, DOI 10.1016/j.cageo.2009.08.011Langbein J (2012) Estimating rate uncertainty with maximum likelihood:diﬀerences between power-law and ﬂicker–random-walk models. J Geodesy86:775–783, DOI 10.1007/s00190-012-0556-5Mandelbrot BB (1999) Multifractals and 1 /f Noise. Springer, DOI 10.1007/978-1-4612-2150-0Mandelbrot BB, van Ness JW (1968) Fractional Brownian motions, fractionalnoises and applications. SIAM Review 10:422–437Mao A, Harrison CGA, Dixon TH (1999) Noise in GPS coordinate time series.J Geophys Res 104(B2):2797–2816, DOI 10.1029/1998JB900033Montillet JP, Yu K (2015) Modeling Geodetic Processes with Levy α -StableDistribution and FARIMA. Mathematical Geosciences 47(6):627–646Press WH (1978) Flicker noises in astronomy and elsewhere. Comments onAstrophysics 7:103–119Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) NumericalRecipes 3rd Edition: The Art of Scientiﬁc Computing, 3rd edn. CambridgeUniversity Press, New York, NY, USASowell F (1992) Maximum likelihood estimation of stationary univariate frac-tionally integrated time series models. J Econom 53:165–188Taqqu MS, Teverovsky V, Willinger W (1995) Estimators for long-range de-pendence: An empirical study. Fractals 3:785–798Williams SD, Moore P, King MA, Whitehouse PL (2014) Re-visiting grace antarctic ice mass trends and accelerations con-sidering autocorrelation. Earth and Planetary Science Letters385:12 – 21, DOI https://doi.org/10.1016/j.epsl.2013.10.016, URL