[PDF] A Fast Algorithm for Calculation of Thêo1

Abstract

Thêo1 is a frequency stability statistic which is similar to the Allan variance but can provide stability estimates at longer averaging factors and with higher confidence. However, the calculation of Thêo1 is significantly slower than the Allan variance, particularly for large data sets, due to a worse computational complexity. A faster algorithm for calculating the `all- τ ' version of Thêo1 is developed by identifying certain repeated sums and removing them with a recurrence relation. The new algorithm has a reduced computational complexity, equal to that of the Allan variance. Computation time is reduced by orders of magnitude for many datasets. The new, faster algorithm does introduce an error due to accumulated floating point errors in very large datasets. The error can be compensated for by increasing the numerical precision used at critical steps. The new algorithm can also be used to increase the speed of ThêoBr and ThêoH which are more sophisticated statistics derived from Thêo1.

Full PDF

IIEEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS AND FREQUENCY CONTROL 1

A Fast Algorithm forCalculation of Thło1

Ben Lewis ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, includingreprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, orreuse of any copyrighted component of this work in other works. DOI: 0.1109/TUFFC.2020.2996313

Abstract —Thło1 is a frequency stability statistic which issimilar to the Allan variance but can provide stability estimates atlonger averaging factors and with higher conﬁdence. However,the calculation of Thło1 is signiﬁcantly slower than the Allanvariance, particularly for large data sets, due to a worse compu-tational complexity. A faster algorithm for calculating the ‘all- τ ’version of Thło1 is developed by identifying certain repeatedsums and removing them with a recurrence relation. The newalgorithm has a reduced computational complexity, equal to thatof the Allan variance. Computation time is reduced by orders ofmagnitude for many datasets. The new, faster algorithm doesintroduce an error due to accumulated ﬂoating point errorsin very large datasets. The error can be compensated for byincreasing the numerical precision used at critical steps. Thenew algorithm can also be used to increase the speed of ThłoBrand ThłoH which are more sophisticated statistics derived fromThło1. Index Terms —Noise, Stability analysis, Frequency control,Software, Theo1, TheoH.

I. I

NTRODUCTION W HEN characterising a frequency source, it is necessaryto determine the amount and types of noise which de-termine the frequency stability on different timescales. Thereare many statistics used for this purpose, one of which isthe ‘theoretical variance O ( N ) for a dataset of N measurements. Similar statistics with O ( N ) complexityhave been reduced to O ( N ) complexity by the use of anappropriate algorithm [5].For a series of N time deviation points x i , each separatedby an interval τ , Thło1 can be deﬁned [1] asThło1 ( τ = 1 . kτ , N ) = T k N − k )( kτ ) (1)where < k ≤ ( N − / , the averaging time is τ and T k = N − k − (cid:88) i =0 k − (cid:88) δ =0 k − δ ) [( x i − x i − δ + k ) + ( x i +2 k − x i + δ + k )] (2) Department of Physics, University of Strathclyde, UK, email:[email protected] received 19 May 2020

A naive implementation of this deﬁnition of Thło1 will havea complexity of O ( N ) because there are ≈ N/ values of k for which to calculate T k , each taking O ( N ) operationsdue to the nested sum in (2). This can make computationprohibitively expensive for extremely large datasets or inapplications requiring low latency, such as measuring thedynamic stability of an oscillator with a high data rate [6].In some cases it is not necessary to calculate Thło1 for everyvalue of k , sometimes called an ‘all- τ ’ calculation, and it maybe sufﬁcient to only use k equal to powers of two. However,the more sophisticated statistics ThłoBr and ThłoH [7], whichattempt to correct for bias in Thło1 require the calculation ofThło1 for all k as a ﬁrst step. There is a technique called ‘fastTheoBr’ [8] which increases the speed of this calculation byaveraging points within the initial dataset to reduce its size.However, for a ﬁxed amount of averaging, the speed increaseis only a constant factor and does not change the O ( N ) complexity. II. A LGORITHM

One way to produce a faster algorithm for Thło1 is toﬁnd a recurrence relation between parts of the outer sum,which allows calculation of one term from the next withoutperforming the full inner sum. This is made difﬁcult by theterm / ( k − δ ) which forces a different coefﬁcient beforeeach terms as δ is incremented in the inner sum. However,the deﬁnition of T k can be rearranged to move this awkwardterm outside the inner sum by swapping the order of the sumsand using the substitution v = k − δ , so that T k = k (cid:88) v =1 v A k,v (3)where A k,v is deﬁned by A k,v = N − k − (cid:88) i =0 ( x i − x i + v + x i +2 k − x i +2 k − v ) (4) = N − k − (cid:88) i =0 ( x i + x i + v + x i +2 k + x i +2 k − v + 2 x i x i +2 k + 2 x i + v x i +2 k − v − x i x i + v − x i x i +2 k − v − x i + v x i +2 k − x i +2 k x i +2 k − v ) . (5)Some of the expanded terms in (5) have similar forms, andcan be expressed in terms of new summations C ( n ) , deﬁnedas a r X i v : . [ phy s i c s . d a t a - a n ] M a y EEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS AND FREQUENCY CONTROL 2 C (1) j = j (cid:88) i =0 x i (6) C (2) j = N − j − (cid:88) i =0 x i x i + j (7) C (3) k,j = N − k − (cid:88) i = k x i − j x i + j (8) C (4) k,j = N − k − (cid:88) i =0 x i x i + j + x i +2 k x i +2 k − j . (9)It can then be shown by substitution that A k,v = C (1) N − k − + ( C (1) N − k − v − C (1) v − )+ ( C (1) N − − C (1)2 k − ) + ( C (1) N − v − − C (1)2 k − v − )+ 2( C (2)2 k + C (3) k,k − v − C (4) k,v − C (4) k, k − v ) . (10)The calculation of T k from the C ( n ) can be completed in O ( N ) , so if the C ( n ) could all be calculated in O ( N ) thenthis would reduce the overall complexity of Thło1 to O ( N ) .For C (1 , the deﬁnition is already ≤ O ( N ) , but it can alsobe achieved for C (3 , by using a recurrence relation betweenconsecutive terms to avoid the full sum in (8) and (9): C (3) k,j = C (3) k − ,j − x k − − j x k − j − x N − k − j x N − k + j , j < k (11) C (4) k,j = C (4) k − ,j − x k − − j x k − − x k − − j x k − − x N − k x N − k + j − x N − k +1 x N − k +1 − j , j < k − (12)This allows almost all values of C (3 , to be calculated in arecursive manner, the remaining values are C (3) k,k = C (2)2 k (13) C (4) k, k − = 2 C (2)2 k − − x x k − − x N − k x N − (14) C (4) k, k = 2 C (2)2 k (15)and so C (3 , can be calculated in the required O ( N ) .Because the recurrence relations are for an incremented k value, the technique can only be used when calculating T k for all values of k .In order to calculate T k it is sufﬁcient to calculate C (1) j , ≤ j ≤ N (16) C (2) j , j = 2 k, k − (17) C (3) k,j , ≤ j ≤ k (18) C (4) k,j , ≤ j ≤ k (19)so only these values need to be held in memory. The recursionrelations (11) and (12) can then be used to update C (3 , k,j to N -5 T i m e / s Fig. 1. The time taken by different methods of calculating Thło1. Orangecircles show the naive method. Green ’x’ marks show the new algorithm usinga standard double precision ﬂoating point datatype. Purple ‘+’ marks showthe new algorithm using the int-128 datatype. C (3 , k +1 ,j in place, allowing calculation of T k +1 . The memoryrequirement is only O ( N ) . Speciﬁcally, it requires memory for N double precision values: N values for each of the inputarray x and C (1 , , and N/ values for each of C (3) and theoutput array. The naive algorithm requires storage of N/ values so this is a signiﬁcant increase but is still only 32 MBfor a dataset with N = 10 .In order to calculate Thło1, the algorithm can proceed asfollows:1) Calculate C (1) using (6).2) For each value of k from to (cid:98) ( N − / (cid:99) : a) Calculate required values of C (2) using (7).b) Add new values to C (3 , using (13) to (15).c) Update C (3 , using (11) and (12).d) Calculate A k,v from the C ( n ) using (10).e) Calculate T k from A k,v using (3).An example implementation of the algorithm in C++ canbe found in appendix A.III. A CCURACY

Whilst the new algorithm for Thło1 is faster than the naiveapproach, it has more opportunities for ﬂoating point errors toaccumulate. Equation (10) shows that terms of similar mag-nitude are subtracted from each other, allowing catastrophiccancellation to occur and leading to a loss of precision. Thesize of each term in (10) is ≤ (cid:80) x and the size of the totalis ∼ T k , so the fractional error might be expected to scaleas ∼ (cid:104) x (cid:105) /T k . Thło1 is insensitive to any offset or linearchange in x , so these components can be removed in orderto reduce (cid:104) x (cid:105) without ill-effect. This prevents a signiﬁcantdrop in precision that could be caused by a constant frequencyor phase offset This change alone is sufﬁcient to preventappreciable errors in most practical situations. However, insome cases with large datasets and where the long-term clockstability is dominated by frequency drift the errors could growlarge enough to be signiﬁcant. EEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS AND FREQUENCY CONTROL 3 N -15 -10 -5 F r a c t i ona l e rr o r Fig. 2. The maximum fractional error introduced by different methods ofcalculating Thło1, measured against the naive method. Green ‘x’ marks (bluediamonds) indicate the standard double precision method with (without) anylinear component removed. Purple ‘+’ marks used the int-128 datatype forincreased accuracy. Results depend strongly on the type of noise simulated,for this test white frequency noise with a linear frequency drift was used.

In order to fully mitigate the ﬂoating point error it isnecessary to use additional bits of precision. Whilst 128-bitﬂoating point types are available in some environments, theyare not generally supported in hardware and are therefore veryslow to use. In contrast, many 64-bit CPUs have hardwaresupport for multiplying two 64-bit integers into a 128-bitinteger. By re-scaling the data and converting it to a 64-bitinteger representation, this CPU instruction can be used tocalculate Thło1 quickly without ﬂoating point errors. Terms ∝ x are stored as 64-bit integers, but terms ∝ x are storedas 128-bit integers. In C or C++ the GCC int128 datatypemay be used with multiplications implemented asi n t 6 4 t x1 , x2 ;i n t 1 2 8 r e s u l t = ( i n t 1 2 8 ) x1 * x2 ;The use of a larger data types does cause a speed reductionof approximately 70%, possibly due to additional memoryoverhead. The conversion between datatypes is O ( N ) andtakes negligible time in most cases.The speed and fractional ﬂoating point error for the differentmethods are shown in Figs. 1 and 2 respectively. The fractionalerror was measured by comparing the value of Thło1 ascalculated with the new algorithm to that calculated with thenaive algorithm. Due to the slow speed of the naive algorithm,only points spaced at powers of two were compared, and themaximum of these errors was taken. Fig. 2 should be takenas indicative only as the details vary signiﬁcantly dependingon the noise type of the simulated data, although the ‘int-128’method had negligible error in all cases tested. To exaggeratethe errors seen, a white frequency noise with added linearfrequency drift was simulated. The linear drift was chosensuch that the frequency stability at the longest and shortestaveraging factors was approximately equal. A linear frequencydrift is particularly difﬁcult for the simpler error reductionmethod (removing any linear component to x ) to deal with, as the dominant x component is quadratic. Despite this, 1 to2 orders of magnitude improvement was seen.IV. C ONCLUSION

Manipulating the deﬁnition of Thło1 has led to an algorithmthat calculates the ‘all- τ ’ version with a reduced computationalcomplexity of O ( N ) . Although the new algorithm initiallylead to loss of precision, this is reduced by removing anylinear component in the dataset. In the cases where the erroris still signiﬁcant, it is made negligible by using an int-128datatype, at the cost of a ≈ slowdown.This algorithm makes the speed of Thło1 (and ThłoBror ThłoH) calculations similar to other time-domain stabilitystatistics such as the Allan variance. The new algorithm canmore easily be used in low-latency applications such as char-acterising oscillators using a high sample rate, and commercialtesting of oscillators and other sensors. It also makes theuse of dynamic Thło statistics easier, with opportunities forcalculating statistics over multiple timescales simultaneously.Further work could incorporate this algorithm into a dynamicThło algorithm and be used to identify changes to clockstability in real-time.All data and code supporting this publication are openlyavailable from the University of Strathclyde Knowledge-Base at https://doi.org/10.15129/4403d30e-4257-4a4a-817f-f459e7465011. A PPENDIX AE XAMPLE IMPLEMENTATION / *========================================* C a l c u l a t e s Theo1 o f a g i v e n d a t a s e t .* The d a t a s e t i s assumed t o be p h a s e* d a t a , t a k e n a t u n i t t i m e i n c r e m e n t s .* Any l i n e a r c o m p o n e n t t o t h e d a t a s e t i s* removed a s a f i r s t s t e p . S h o u l d be* c o m p i l e d w i t h − O3 − f f a s t − math .** X = i n p u t d a t a s e t* N = number o f e l e m e n t s i n X* T = Theo1 o u t p u t** Ben Lewis , U n i v e r s i t y o f S t r a t h c l y d e*======================================* // * Remove any l i n e a r p a r t o f t h e d a t a s e t * / v o i d r e m o v e L i n e a r ( d o u b l e * X, i n t N) { d o u b l e m i d p o i n t = ( ( d o u b l e ) (N − l o n g d o u b l e sum1 = 0 ; l o n g d o u b l e sum2 = 0 ; f o r ( i n t i = 0 ; i < N; i ++) { sum1 += X[ i ] ;sum2 += X[ i ] * ( i − m i d p o i n t ) ; } d o u b l e a = sum1 /N; d o u b l e b = sum2 /N * 1 2 / ( ( d o u b l e )N*N − EEE TRANSACTIONS ON ULTRASONICS, FERROELECTRICS AND FREQUENCY CONTROL 4 f o r ( i n t i = 0 ; i < N; i ++) { X[ i ] − = a + b * ( i − m i d p o i n t ) ; }} / * The c o m p u t a t i o n a l r o u t i n e * / v o i d Theo1 ( d o u b l e *X, d o u b l e *T , i n t N) { / / I n i t i a l i s e a r r a y s i n t k max = (N − d o u b l e * C1 = new d o u b l e [N ] ; d o u b l e * C3 = new d o u b l e [ k max + 1 ] ; d o u b l e * C4 = new d o u b l e [ k max * 2 ] ; / / P r e p r o c e s s by r e m o v i n g l i n e a r p a r t r e m o v e L i n e a r (X, N ) ; / / C a l c u l a t e C1 d o u b l e s = 0 ; f o r ( i n t i = 0 ; i < N; i ++) { s += (X[ i ] *X[ i ] ) ;C1 [ i ] = s ; } / / Main l o o p C3 [ 0 ] = C1 [N − f o r ( i n t k = 1 ; k < =k max ; k ++) { / / C a l c u l a t e C2 v a l u e s d o u b l e C2 2k = 0 ; d o u b l e

C2 2k 1 = 0 ; f o r ( i n t j = 0 ; j < = N − −

1; j ++) { C2 2k += (X[ j ] *X[ j +2* k ] ) ;C2 2k 1 += (X[ j ] *X[ j +2*k − } C2 2k 1 += (X[N − − / / Update C3 , C4 i n p l a c e f o r ( i n t v = 0 ; v < k ; v ++) { C3 [ v ] − = (X[ k − − v ] *X[ k − − k+v ] *X[N − k − v ] ) ; } f o r ( i n t v = 1 ; v < =2*k −

2; v ++) { C4 [ v − − = (X[ 2 * k − − v ] *X[ 2 * k − − − v ] *X[ 2 * k − − − − − } C3 [ k ] = C2 2k ;C4 [ 2 * k −

2] = 2* C2 2k 1 − (X[ 0 ] *X[ 2 * k − − (X[N − − −

1] = 2* C2 2k ; / / C a l c u l a t e un − n o r m a l i s e d T k f r o m C1 − C4 d o u b l e T k = 0 ; d o u b l e

A0 = C1 [N − − C1 [ 2 * k − − −

1] + 2* C2 2k ; f o r ( i n t v = 1 ; v < =k ; v ++) { d o u b l e A1 = A0 − C1 [ v − − − v ] − C1 [ 2 * k − v − − − d o u b l e A2 = C3 [ k − v ] − C4 [ v − − C4 [ 2 * k − v − } / / A p p l y n o r m a l i s a t i o n t o g e t Theo1 T [ k −

1] = ( T k / ( 3 * ( d o u b l e ) ( N − } / / R e l e a s e memory d e l e t e [ ] C1 ; d e l e t e [ ] C3 ; d e l e t e [ ] C4 ; } A CKNOWLEDGMENT

I would like to thank David Howe and Magnus Danielsonfor their discussions regarding this research.R

EFERENCES[1] D. A. Howe and T. K. Peppler, “Very long-term frequency stability: esti-mation using a special-purpose statistic,” in

IEEE International FrequencyControl Symposium and PDA Exhibition Jointly with the 17th EuropeanFrequency and Time Forum, 2003. Proceedings of the 2003 , May 2003,pp. 233–238.[2] J. A. Mcgee and D. A. Howe, “TheoH and allan deviation as power-lawnoise estimators,”

IEEE Transactions on Ultrasonics, Ferroelectrics, andFrequency Control , vol. 54, no. 2, pp. 448–452, 2007.[3] P. D. D. Schwindt, Y. Jau, H. L. Partner, D. K. Serkland, A. Ison,A. McCants, E. Winrow, J. Prestage, J. Kellogg, N. Yu, C. D. Boschen,I. Kosvin, D. Mailloux, D. Scherer, C. Nelson, A. Hati, and D. A. Howe,“Miniature trapped-ion frequency standard with 171yb+,” in , 2015, pp. 752–757.[4] T. E. Parker, S. R. Jefferts, T. P. Heavner, and E. A. Donley, “Operation ofthe NIST-F1 caesium fountain primary frequency standard with a maserensemble, including the impact of frequency transfer noise,”

Metrologia ,vol. 42, no. 5, pp. 423–430, sep 2005.[5] Mingfu Li, Hsin-Min Peng, and Chia-Shu Liao, “Fast computation oftime deviation and modiﬁed allan deviation for telecommunications clockstability characterization,” in

Proceedings International Symposium onParallel Architectures, Algorithms and Networks. I-SPAN 2000 , 2000, pp.156–160.[6] D. A. Howe, N. Ashby, D. Lirette, A. Hati, and C. Nelson, “Limited live-time measurements of frequency spectra,” in , May 2011, pp. 1–4, iSSN: 1075-6787.[7] D. A. Howe, J. McGee-Taylor, and T. Tassett, “ThłoH Bias-removalMethod,” in , Jun. 2006, pp. 788–792, iSSN: 2327-1914.[8] J. A. Taylor and D. A. Howe, “Fast TheoBR: A method for long dataset stability analysis,”

IEEE Transactions on Ultrasonics, Ferroelectrics,and Frequency Control , vol. 57, no. 9, pp. 2091–2094, Sep. 2010.