Fast Time Series Detrending with Applications to Heart Rate Variability Analysis
FFast Time Series Detrending with Applications toHeart Rate Variability Analysis
M. AndrecutJune 21, 2019
Calgary, Alberta, T3G 5Y8, [email protected]
Abstract
Here we discuss a new fast detrending method for the non-stationary RR time seriesused in Heart Rate Variability analysis. The described method is based on the diffusionequation, and we show numerically that it is equivalent to the widely used SmoothingPriors Approach (SPA) and Wavelet Smoothing Approach (WSA) methods. The speedof the proposed method is comparable to the WSA method and it is several orders ofmagnitude faster than the SPA method, which makes it suitable for very long time seriesanalysis.Keywords: time series analysis, heart rate variabilityPACS Nos.: 05.45.Tp
Heart Rate Variability (HRV) is an important indicator used in the analysis of the autonomousnervous system controlling the cardiovascular activity [1]. The HRV is extracted from thefluctuations of the heart beat intervals (RR intervals) obtained from electrocardiogram (ECG)measurements, and it is used for the assessment of different patient groups at risk of cardiacdiseases, such as: lifethreatening arrhythmias, myocardial infarction, and diabetes mellitus [1,2].The HRV analysis requires reliable and efficient signal processing techniques of the series ofRR-intervals. Both time domain and frequency domain analysis can be used to extract usefulclinical information from the ECG measurements and the corresponding RR time series [3, 4].However, the extracted RR time series are non-stationary, and contain trends and artifactsthat must be removed before analysis. Especially, the frequency domain analysis is sensitive tothe trends embedded in the RR time series [3, 4]. These trends have a complex behavior andare caused by external effects, corresponding to smooth and slowly varying processes, whichcan have both oscillating and stochastic components. Several methods have been developed to1 a r X i v : . [ phy s i c s . d a t a - a n ] F e b emove these complex trends from the RR time series. Most methods are based on polynomialregression models [5], spectral filters (wavelets etc.) [6], and empirical mode decomposition [7].One of the most reliable and widely used detrending method is based on the Smoothing PriorApproach (SPA), which acts as a time-varying finite impulse high-pass filter in the frequencydomain [8]. Another frequently used method is based on the Wavelet Smoothing Approach(WSA) [9], which is probably the fastest method currently in use.In this paper we discuss a new fast detrending method based on the diffusion equation,and we show numerically that it is equivalent to the SPA and WSA methods. However, ournumerical implementation shows that the speed of the diffusion equation approach is comparableto the WSA method, and therefore it is several orders of magnitude faster than the SPA method,which makes it suitable for very long time series analysis. While here we limit our discussionto the HRV analysis, we should mention that the described method can be used to accelerateany time series detrending, smoothing or fitting tasks. Let us denote the RR time series by: r = [ R − R , R − R , ..., R N − R N − ] T ∈ R N , (1)where N is the number of R peaks detected. The RR time series is considered to have twocomponents [8]: r = x + y, (2)where x is the nearly stationary component, and y is the slowly varying trend. Typically thetrend component can be approximated with a linear model such as: y = Hθ + η, (3)where H is the observation matrix, θ are the regression parameters and η is the observationerror. The corresponding regularized least squares problem is: θ ( µ ) = arg min θ {(cid:107) Hθ − r (cid:107) + µ (cid:107) D d ( Hθ ) (cid:107) } , (4)where µ > is the regularization parameter and D d is the discrete approximation of the d -thderivative operator. The solution of this problem is given by: θ ( µ ) = ( H T H + µH T D Td D d H ) − H T r, (5)such that: y ( µ ) = Hθ ( µ ) . (6)2he observation matrix H can be calculated using different regression functions like: polyno-mials, Gaussians or sigmoids. However, in order to avoid the selection problem of the basisfunctions, the SPA simply uses H = I , where I ∈ R N × N is the identity function [8]. Also, SPAuses the second order derivative operator D ∈ R ( N − × N in order to estimate the regularizationpart of the optimization problem: [8] D = − . . .
00 1 − . . . ... . . . . . . . . . . . . ... . . . − . (7)With these SPA parameters, the detrended RR time series is given by: x ( µ ) = r − y ( µ ) = [ I − ( I + µD T D ) − ] r. (8)A direct implementation of the SPA method using the above equation is quite costly becausethe matrix inversion, which is an O ( N ) operation. A better approach is to speculate the factthat the matrix is symmetrical, and therefore one can use the Cholesky decomposition: I + µD T D = LL T , (9)where L is a lower triangular matrix. Then the problem reduces to successively solving twotriangular systems of equations: Lr (cid:48) = r ⇒ L T y = r (cid:48) , (10)in order to extract the trend y . WSA detrending is a spectral method using the discrete wavelet decomposition of the RR timeseries [9, 10]. Wavelets provide an analysis of the signal which is localized in both time andfrequency, unlike the Fourier transform which is localized only in frequency. The continuouswavelet transform (CWT) of r ( t ) is defined as the convolution: ˆ r ( a, b ) = 1 √ a ˆ ∞−∞ Ψ (cid:18) t − ba (cid:19) dt, (11)where Ψ( t ) is the mother wavelet, depending on two parameters: the scale dilation a and thetranslation b . We should note that there are an infinite number of choices of Ψ( t ) functions.In computational tasks one can use the discrete wavelet decomposition (DWT), which is asampled version of CWT, transforming a discrete signal into discrete coefficients in the waveletdomain. Thus, instead of using the continuous coefficient values a, b ∈ R , the values ˆ r ( a, b ) are3alculated over a dyadic discrete grid: ˆ r j,k = 1 √ j ˆ ∞−∞ Ψ (cid:18) t − k j j (cid:19) dt, j, k ∈ Z . (12)The DWT can be computed efficiently using the Mallat’s algorithm, which has a complexity O ( N ) [11]. The algorithm decomposes the signal into different scales, corresponding to differentfrequency bands, using low-pass (LP) and respectively high-pass (HP) filters. The output valuesof the LP filters correspond to the approximation coefficients, while the output values of theHP filters correspond to the detail coefficients. The number of decomposition levels (typically3 to 5) is a parameter depending on the signal to be analyzed.Due to the orthogonality of the wavelets, the DWT has the interesting property of trans-forming noise into noise [9]. This property can be used to smooth the signal, and to extractthe trend by keeping only the coefficients which are stronger than the noise level in the signal.This procedure is called wavelet thresholding and consists in comparing the coefficients with athreshold in order to decide if they are included in the inverse reconstruction process. Thus,wavelet thresholding is setting to zero the coefficients with an absolute value below a certainthreshold level τ , using the hard (H) or the soft (S) thresholding functions: Θ Hτ ( x ) = if | x | ≤ τx if | x | > τ , Θ Sτ ( x ) = if | x | ≤ τx − τ if x > τx + τ if x < − τ . (13)A frequently used estimate of the optimal threshold τ value is the universal threshold whichcan be used with both hard and soft thresholding functions [12]: τ = σ √ N , (14)where σ is the standard deviation of the detail wavelet coefficients. We should note that thethresholding procedure is usually applied only to the detail coefficients [13]. After applying thethresholding procedure, the trend can be recovered using the inverse DWT. Let us rewrite the RR time series as: r = [ r , r , ..., r N − ] T ∈ R N , (15)where r i = R i +1 − R i . Also, we consider the following optimization problem: y ( µ ) = arg min y S ( y, µ ) , (16)4here: S ( y, µ ) = F ( y ) + µG ( y ) , (17)with: F ( y ) = (cid:88) i ( y i − r i ) , (18)and respectively: G ( y ) = (cid:88) i ( y i − − y i + y i +1 ) . (19)Obviously this problem is equivalent to the SPA optimization problem.Let us now consider the following diffusion equation: ∂y∂t = α ∇ y, (20)where α is the diffusion coefficient. We also assume that the initial condition of the diffusionequation is: y (0) = r ⇔ y i (0) = r i , i = 0 , , ..., N − . (21)The finite difference iteration of the above equation can be written as: y i ( t + 1) = y i ( t ) + α [ y i − ( t ) − y i ( t ) + y i − ( t )] = y i ( t ) + αL i ( t ) , (22)where: ∂y∂t (cid:12)(cid:12)(cid:12)(cid:12) i (cid:39) y i ( t + 1) − y i ( t ) , (23)and ∂ y i y (cid:39) y i − ( t ) − y i ( t ) + y i − ( t ) = L i ( t ) , (24)are the discrete approximations of the the time derivative and respectively of the Laplacian.One can see that the gradient descent solution for the minimization of G ( y ) is also givenby: y i ( t + 1) = y i ( t ) − γ∂ y i G ( y ) = y i ( t ) + γL i ( t ) , (25)where γ > . For γ = α this is obviously equivalent to the finite difference solution of thediffusion equation, which therefore minimizes the objective function G ( y ) . Thus, starting from y = r , with the maximum value of the objective optimization function: S ( y (0) , µ ) = µG ( r ) , (26)we let the diffusion process shrink and smooth y until the minimum of S ( y, µ ) is reached for agiven µ . The pseudo-code of the Diffusion Detrending Algorithm (DDA) is listed in Algorithm1. One can see that the maximum number of iterations is set to N , and the function returnsthe trend y , the almost stationary component x = y − r , and the time step t when the minimum5f S has been reached. Also, the first and the last points of the trend are computed as: y ( t + 1) = y ( t ) + 2 α [ y ( t ) − y ( t )] , (27) y N − ( t + 1) = y N − ( t ) + 2 α [ y N − ( t ) − y N − ( t )] . (28) Algorithm 1
Diffusion Detrending Algorithm (DDA). function diffusion_detrending ( r, µ, α ) N, S, y, t ← length ( r ) , ∞ , r, while true do S ∗ ← (cid:80) i [( y i − r i ) + µ ( y i − − y i + y i +1 ) ] if S ∗ > S or t > N thenbreakelse for i = 1 : N − do y i ← y i + α ( y i − − y i + y i +1 ) end for S, t ← S ∗ , t + 1 y , y N − ← y + 2 α ( y − y ) , y N − + 2 α ( y N − − y N − ) end ifend whilereturn t, y, r − y end function The performance of the DDA method was estimated using synthetic RR time series generatedwith the model described in Ref. [14]. In this model, a time series is generated with similarspectral characteristics to a real RR time series. The spectral characteristics of the RR timeseries, including the low-frequency (LF) and high-frequency (HF) bands, are simulated using abi-modal spectrum corresponding to the sum of two Gaussian functions: S ( ν ) = P LF √ πσ LF exp (cid:32) − (cid:20) ν − ν LF √ σ LF (cid:21) (cid:33) + P HF √ πσ HF exp (cid:32) − (cid:20) ν − ν HF √ σ HF (cid:21) (cid:33) (29)where ν LF , ν HF and σ LF , σ HF are means, and respectively the standard deviation, of eachband. The power in the LF and HF bands is given by P LF and P HF respectively. The RRtime series with the power spectrum S ( ν ) is generated by taking the inverse Fourier transformof a sequence of complex numbers with amplitudes (cid:112) S ( ν ) and phases which are randomlydistributed in [0 , π ] . By multiplying this time series with an appropriate constant and addingthe trend we obtain a realistic simulation of a RR time series. Typical values are 1000 ms forthe average and 25 ms for standard deviation. Finally, the slowly varying trend is simulatedby integrating a Gaussian random process. 6 .0 0.1 0.2 0.3 0.4 0.50510 LF HF [Hz] S r = x + y [ms]0 200 400 600 800 1000 12001000100 y y DDA y y
DDA [ms]0 200 400 600 800 1000 12001000100 x x
DDA x x
DDA [ms]0 200 400 600 800 1000 1200 t [s]1000100 y y SPA y y
SPA [ms]0 200 400 600 800 1000 1200 N [s]1000100 x x SPA x x
SPA [ms]
Figure 1: Detrending example (see the text for discussion).7n example of a spectrum S ( ν ) and the detrending of the corresponding time series r ( t ) is given in Fig. 1. The power spectrum has the following parameters: σ LF = σ HF = 0 . , ν LF = 0 . Hz, ν HF = 0 . Hz, P LF /P HF = 0 . . The y DDA , x DDA are the trend and thestationary component recovered from r = x + y using the DDA method, while y SP A , x SP A are the trend and the stationary component recovered using the SPA method (the time seriesaverage was subtracted). The regularization parameter and the diffusion constant were set to µ = N and α = 0 . . One can see that the DDA and SPA results are practically identical withthe originals (similar results are obtained using the WSA method). We should note that themaximum diffusion constant value for which the finite difference solution is stable is α = 0 . (see Ref. [15] for example).In Fig. 2 we give an estimation of the optimal regularization parameter µ using the rootmean square error (RMSE) averaged over 1000 time series with the length N = 600 , whichcorresponds to 10 minutes of ECG recording. One can see that the optimal value is µ (cid:39) N , forboth SPA and DDA methods. We should note also that in this case the RMSE is smaller forthe DDA method.
300 400 500 600 700 800 9000.900.920.940.960.981.001.021.04 RMSE-SPARMSE-DDA
Figure 2: Optimal regularization parameter µ as a function of N .In order to evaluate the performance of the algorithms we used the ratio of the executiontimes ρ = t SP A /t DDA , ρ = t W SA /t DDA and the RMSE as a function of the length N of thetime series (Fig. 3). The results are averaged over 1000 time series for each N value. One cansee that the ratio ρ ( N ) increases with N , showing that the DDA method is about 250 timesfaster than the SPA method for time series with length N = 3600 , which corresponds to onehour ECG recordings. In the same time the WSA method is about twice faster than the DDAmethod, and this ratio does not depend on the length of the time series. The RMSE decreases8ith N , and the values for SPA, WSA and DDA are very close, with a slight advantage forDDA. The computations were performed on an Amazon EC2 instance with 64 GB RAM and16 CPU cores at 3.4 GHz. N [s]050100150200 = t SPA / t DDA N [s]0.00.20.40.60.81.0 = t WSA / t DDA N [s]0.000.250.500.751.001.251.501.752.00 RMSE SPARMSE WSARMSE DDA
Figure 3: The execution time ratio ρ = t SP A /t DDA , ρ = t W SA /t DDA and the RMSE as afunction of the RR timeseries length N . 9he computation with the SPA method quickly becomes prohibitive for very long timeseries, however the DDA method has no problems scaling up, similarly to the WSA method. InFig. 4 we show the execution time and the RMSE for time series with a length up to N = 86400 ,corresponding to 24 hours ECG recording (the results are averaged over 1000 time series). Onecan see that the computation takes only 0.5 seconds for 24 hours long time series using theDDA method. N [s]0.00.10.20.30.40.5 t DDA [ s ]0 20000 40000 60000 80000 N [s]0.0250.0500.0750.1000.1250.1500.175 RMSE DDA
Figure 4: The execution time t DDA and the RMSE as a function of the RR timeseries lengthfor the DDA method up to 24 hours long time series.
Conclusion
In this paper we presented a novel approach to detrending non-stationary RR time series. Theapproach is based on the finite difference iterative solution of the diffusion equation, with theinitial condition given by the time series values. We have shown numerically that the diffusionmethod is equivalent to the SPA and WSA methods, which are widely used in HRV analysis.10he speed of the proposed DDA method is comparable to the WSA method and it is severalorders of magnitude faster than the SPA method, which makes it suitable for very long timeseries analysis. Besides the HRV examples discussed here, the described method can be usedto accelerate any time series detrending, smoothing of fitting tasks.
Appendix
Here we provide the ’detrending.py’, which is a Python3 implementation of the described algo-rithms (requires Numpy, Scipy), with the following functions: • def RR_timeseries(N, stdev) Computes a synthetic RR time series with a given length N and standard deviation stdev(stdev=25 is recommended). • def trend(N, k, a) Computes a synthetic trend with a given length N and amplitude a. The smoothness iscontrolled by the parameter k (k=2 is recommended). • SPA_detrending(r, mu)
Computes the SPA solution using the Cholesky method, given the timeseries r and theregularization parameter mu (mu=N is recommended). • WSA_detrending(r, wavelet, mode, level)
Computes the WSA solution, given the timeseries r, the thresholding mode (soft or hard)and the decomposition level. The implementation is based on the PyWavelets module. [16] • DDA_detrending(r, mu, alpha)
Computes the DDA solution given the timeseries r, the regularization parameter mu, andthe diffusion constant alpha (mu=N, alpha ≤ [n] = alpha*np.exp(-np.power((n*df-fLF)/ss,2))s[n] += np.exp(-np.power((n*df-fHF)/ss,2))s = s*(N/np.sum(s))r = np.random.rand(N+1)*np.pi*2.0r = np.sqrt(s)*(np.cos(r) + np.sin(r)*1.0j)r = np.concatenate((r,np.conjugate(np.flip(r[1:N]))))r = np.real(np.fft.ifft(r))r = (stdev/np.std(r))*rreturn r, sdef trend(N, k, a):y = np.random.randn(N)for i in range(k):y = y - np.mean(y)for n in range(0,N-1):y[n+1] += y[n]y = (y-np.min(y))/(np.max(y)-np.min(y))return a*(y-np.mean(y))def SPA_detrending(r, mu):N = len(r)D = np.zeros((N-2,N))for n in range(N-2):D[n,n], D[n,n+1], D[n,n+2] = 1.0, -2.0, 1.0D = mu*np.dot(D.T,D)for n in range(len(D)):D[n,n] += 1.0L = sl.cholesky(D,lower=True)Y = sl.solve_triangular(L,r,trans=’N’,lower=True)y = sl.solve_triangular(L,Y,trans=’T’,lower=True)return y, r - ydef WSA_detrending(r, wavelet, mode, level):coeff = pywt.wavedec( r, wavelet)tau = np.std(coeff[-level])*np.sqrt(2*np.log(len(r)))coeff[1:] = (pywt.threshold(i, value=tau, mode="soft") for i in coeff[1:])y = pywt.waverec( coeff, wavelet)return y, r-ydef DDA_detrending(r, mu, alpha):N, s, y, t = len(r), np.inf, np.copy(r), 0while True:f, g = y - r, y[0:N-2] + y[2:N] - 2*y[1:N-1] s = np.dot(f,f) + mu*np.dot(g,g)if ss > s or t > N:breakelse:s, t = ss, t + 1y[1:N-1] = y[1:N-1] + alpha*gy[0] = y[0] + 2*alpha*(y[1] - y[0])y[N-1] = y[N-1] + 2*alpha*(y[N-2] - y[N-1])return t, y, r - yif __name__ == ’__main__’:np.random.seed(54321)N = 1200x, s = RR_timeseries(N, 25)y = trend(len(x), 2, 200)r = x + ystart_time = time.time()y_spa, x_spa = SPA_detrending(r, N)end_time = time.time()print(’SPA execution time=’, end_time - start_time, ’[s]’)print(’SPA RMS=’, np.linalg.norm(y-y_spa)/np.sqrt(N))start_time = time.time()y_wsa, x_wsa = WSA_detrending(r, "db32", "soft", 3)end_time = time.time()print(’WSA execution time=’, end_time - start_time, ’[s]’)print(’WSA RMS=’, np.linalg.norm(y-y_wsa)/np.sqrt(N))start_time = time.time()t, y_dd, x_dd = DDA_detrending(r, N, 0.25)end_time = time.time()print(’DDA execution time=’, end_time - start_time, ’[s]’)print(’DDA RMS=’, np.linalg.norm(y-y_dd)/np.sqrt(N)) To run the program on a Linux computer use: python3 detrending.py
The output will look like this:
SPA execution time= 0.15083026885986328 [s]SPA RMS= 0.8331844427187562WSA execution time= 0.0009303092956542969 [s]WSA RMS= 0.8856002765903443 DA execution time= 0.002299070358276367 [s]DDA RMS= 0.5105247820704635