Denoising scheme based on singular-value decomposition for one-dimensional spectra and its application in precision storage-ring mass spectrometry
DDenoising Scheme Based on Singular-Value Decomposition for One-Dimensional Spectra and ItsApplication in Precision Storage-Ring Mass Spectrometry
X.C. Chen, ∗ Yu. A. Litvinov,
1, 2, 3
M. Wang, Q. Wang,
1, 4 and Y.H. Zhang Institute of Modern Physics, Chinese Academy of Sciences, Lanzhou 730000, China GSI Helmholtzzentrum für Schwerionenforschung GmbH, 64291 Darmstadt, Germany Max-Planck-Institut für Kernphysik, 69117 Heidelberg, Germany University of Chinese Academy of Sciences, Beijing 100049, China (Dated: October 29, 2020)This work concerns noise reduction for one-dimensional spectra in the case that the signal is corrupted by anadditive white noise. The proposed method starts with mapping the noisy spectrum to a partial circulant matrix.In virtue of singular-value decomposition of the matrix, components belonging to the signal are determined byinspecting the total variations of left singular vectors. Afterwards, a smoothed spectrum is reconstructed fromthe low-rank approximation of the matrix consisting of the signal components only. The de-noising effect ofthe proposed method is shown to be highly competitive among other existing non-parametric methods includingmoving average, wavelet shrinkage, and total variation. Furthermore, its applicable scenarios in precisionstorage-ring mass spectrometry are demonstrated to be rather diverse and appealing.
I. INTRODUCTION
Noise is ubiquitous in the outcome of any physical experi-ment, owing to statistical fluctuations and various uncontrol-lable disturbances affecting the measuring system. It is su-perposed over the measured signal, and obscures the inferenceof the underlying ground truth. To reduce the noise effect,usually a batch of independent measurements under the samecondition are performed for averaging, since the noise vari-ance scales inversely with the averaging number. However,the accumulated statistics may be insufficient to bring downthe noise due to pragmatic constraints. Or even the measure-ment is just a one-shot instance. Under those circumstances,data de-noising in the later analysis can only be resorted to asa mitigation.It is quite common that measurement results are encodedin one-dimensional spectra, such as the energy spectrum ofa 𝛾 -radioactive isotope, the frequency response of a notchfilter, or the hourly temperature readings of a thermometer.Mathematically speaking, it is an ordered sequence of real-valued numbers, where usually the signal is smooth while thenoise overlays complex irregularities like kinks and wiggles.In this sense, smoothing can be used interchangeably withde-noising in the context of one-dimensional data analysis.Despite the voluminous data smoothing techniques, thereare in general two categories, namely parametric and non-parametric. A parametric method starts with a data modelbuilt from a priori knowledge on the signal and the noise.Such knowledge can be a theoretical comprehension of themeasuring system, or merely an educated guess. In contrast,a non-parametric method makes a minimal assumption aboutthe noise and then lets the data speak for themselves. Thus withless restrictions, non-parametric methods own more versatilityand applicability.As an example in the non-parametric category, moving av-erage probably is the most intuitive method [1]. The smoothed ∗ [email protected] sequence is formed by replacing every noisy datum with the(weighted) mean of data in its neighborhood. This is es-sentially equivalent to convolution smoothing [2], where thenoisy sequence is convolved with a smoothing kernel suchas binomial, Gaussian, or exponential [3]. In the same vein,the neighborhood for averaging can be extended to encom-pass data with similar values, which has inspired two othermethods: bilateral filter [4] and non-local means [5].When viewed from a different perspective, the convolu-tion smoothing is in fact low-pass filtering in the Fourier-transformed domain [6], where the smoothing kernel is theimpulse response of the filter. Normally the signal is band-limited whereas the noise covers the whole domain, hence alow-pass filter can effectively reject most of the noise. Basedon the same Fourier transform, another method named Fourierthresholding takes a different approach to data de-noising [7].It abandons sinusoidal components whose transform coeffi-cients are below a given threshold, as they more likely belongto the noise.Like Fourier transform, wavelet transform is another pow-erful tool in digital signal processing [8], which decomposesan input sequence into localized oscillatory components. TheFourier-based data de-noising methods can likewise be mi-grated to the wavelet-transformed domain. As a counterpartof the Fourier thresholding, wavelet shrinkage manipulateswavelet coefficients according to a specific criterion to smootha noisy sequence [9, 10].Lastly, it is worth noting a special mainstream method in thenon-parametric category: total variation regularization [11],whose underlying framework differs from the aforementioned.The idea is to minimize the total variation, which is the ℓ -normof its first derivative, of the smoothed sequence. Meanwhile, italso tries to preserve data fidelity as the regularizing condition.Since the ℓ - rather than ℓ -norm is adopted, this method isgood at retaining discontinuities in the noisy sequence [12], butalso suffers from a staircase effect in the smoothed one [13].On the contrary, curve fitting of a pre-defined signal modelby least-squares regression is a typical data de-nosing exam-ple in the parametric category. To apply the fitting, the noisy a r X i v : . [ phy s i c s . d a t a - a n ] O c t sequence can be treated as a whole or in partitions, wherethe latter gives rise to locally weighted regression [14] andSavitzky-Golay filter [15]. If additionally a noise model isdefined, data de-noising can be achieved in a more sophisti-cated manner. Examples are Wiener filter [16] and Kalmanfilter [17], which are both rooted in statistical inference andestimation.In the work presented here, we propose a non-parametricone-dimensional data de-noising method based on Singular-Value Decomposition (SVD). The SVD is a well-developedmatrix factorization technique [18], which can be used to re-alize the idea of principal component analysis [19]. Its ap-plications are prevalent in digital signal processing, especiallyin two-dimensional image-related works such as coding, wa-termarking, and de-noising [20–23]. With the aid of a bijec-tive map from a sequence to a matrix, the SVD also plays arole in one-dimensional data analysis, such as speech recogni-tion [24], fetal electrocardiogram extraction [25], mass spec-trometry [26], and fault diagnosis in mechanical systems [27–30]. In particular for the data de-noising, a noisy sequence isto be decomposed by SVD, where a smoothed sequence canbe reconstructed from a selected subset of the components. II. METHOD
Let us consider a noise-corrupted sequence 𝑥 of length 𝑛 ,which consists of the signal 𝑎 and the noise 𝑑 : 𝑥 𝑖 = 𝑎 𝑖 + 𝑑 𝑖 , ≤ 𝑖 < 𝑛. (1)Here, we have made no assumptions about the noise other thanbeing additive, zero-mean, and white.The proposed data de-noising method mainly includes threesteps: (1) Embed the sequence into a matrix. (2) Apply SVD tothe matrix and determine signal components. (3) Reconstructa smoothed sequence with the selected components. In thefollowing, each step will be elaborated. A. Matrix selection
In literature there are mainly three approaches to embedthe sequence 𝑥 into a matrix. Probably the simplest one isto evenly partition 𝑥 into segments, which are then stacked toform a matrix [25, 27, 31]. This approach is mostly suitablefor a quasi-periodic sequence, when each segment roughlycontains one or multiple periods. Another approach is tobuild a time-frequency representation of 𝑥 [32], which can becomputationally expensive. As a matter of fact, the frequentlyadopted approach is to construct a Hankel matrix [24, 26,28–30, 33, 34], which balances algorithmic performance andcomputational cost.Specifically, the constructed Hankel matrix 𝐻 of size 𝑚 ×( 𝑛 − 𝑚 + ) reads 𝐻 𝑖 𝑗 = 𝑥 𝑖 + 𝑗 , ≤ 𝑖 < 𝑚 ; 0 ≤ 𝑗 ≤ 𝑛 − 𝑚. (2)Apparently, the choice of 𝑚 will somehow influence thesmoothing effect. It is shown that 𝑚 should best be 𝑛 / ground truthHankel matrixpartial circulant matrix FIG. 1. Illustration of different de-noising effects with two matrix-embedding approaches. (a) The green line shows the de-noised resultwith a Hankel matrix, while the blue line corresponds to a partialcirculant matrix. The orange line is the ground truth. (b) Deviationsof the two results with respect to the ground truth. ( 𝑛 + )/
2, depending on the parity of 𝑛 [29]. That is to say, 𝐻 should optimally be (almost) square.However, a drawback of the mapping in Eq. (2), which isoften overlooked, is that the elements of 𝑥 are not put on anequal footing, i.e., the central elements tend to occur moretimes in 𝐻 than the lateral ones. Numerical experiments haveproved that by construction of 𝐻 the signal details close to thesequence ends can badly be recovered after de-noising.Fig. 1 illustrates such a phenomenon with a synthetic se-quence, which is the superposition of a sinc signal and a nor-mal noise. The sequence of length 𝑛 = 𝐻 with a row number 𝑚 = . 𝐻 .To overcome this limitation, we thus propose to construct apartial circulant matrix 𝑋 of size 𝑚 × 𝑛 with 𝑚 ≤ 𝑛 : 𝑋 𝑖 𝑗 = 𝑥 ( 𝑖 + 𝑗 ) mod 𝑛 , ≤ 𝑖 < 𝑚 ; 0 ≤ 𝑗 < 𝑛. (3)Every row of 𝑋 is a cyclic left-shift of the above one by anelement, such that every 𝑥 𝑖 occurs exactly 𝑚 times. By virtue of 𝑋 with the same row number 𝑚 = . 𝐻 . B. Low-rank approximation
For the matrix 𝑋 defined in Eq. (3), the Eckart-Young the-orem has asserted that it can be decomposed to (at most) 𝑚 rank-one matrices [35]: 𝑋 = 𝑚 − ∑︁ 𝑖 = 𝑠 𝑖 u 𝑖 v T 𝑖 , (4)where the singular values 𝑠 𝑖 are non-negative and ordered non-increasingly; the left singular vectors u 𝑖 ∈ R 𝑚 × are orthonor-mal; and similarly for the right singular vectors v 𝑖 ∈ R 𝑛 × .When expressed with matrix notation, Eq. (4) leads to thecanonical form of SVD: 𝑋 = 𝑈 Σ 𝑉 T , (5)where Σ is an 𝑚 × 𝑚 diagonal matrix composed of 𝑠 𝑖 as thenon-zero elements; 𝑈 is an 𝑚 × 𝑚 matrix composed of u 𝑖 asthe column vectors, while 𝑉 is an 𝑛 × 𝑚 matrix composed of v 𝑖 .The physical interpretation for 𝑠 is that each element de-scribes the amount of energy distributed in the correspondingcomponent. Specifically, the total energy of 𝑥 is proportionalto the squared Frobenius norm of 𝑋 : (cid:107) 𝑋 (cid:107) 𝐹 = 𝑚 − ∑︁ 𝑖 = 𝑛 − ∑︁ 𝑗 = 𝑋 𝑖 𝑗 = 𝑚 𝑛 − ∑︁ 𝑖 = 𝑥 𝑖 . (6)Meanwhile, by Eq. (5), (cid:107) 𝑋 (cid:107) 𝐹 can be written as (cid:107) 𝑋 (cid:107) 𝐹 = trace (cid:16) 𝑋 T 𝑋 (cid:17) = trace (cid:16) Σ (cid:17) = 𝑚 − ∑︁ 𝑖 = 𝑠 𝑖 . (7)Since 𝑠 is non-negative and non-increasing, by selecting thelargest 𝑟 singular values and setting the remaining 𝑠 𝑖 to zero,the resultant matrix 𝐴 is the best rank- 𝑟 approximation of 𝑋 in terms of minimizing the approximation error energy [35].This sets the theoretical basis for the data de-noising methodproposed here. In practice, most of the singular values areclose to zero [see Fig. 2(l-a) for illustration], which can beattributed to the noise. Therefore, 𝑋 can be well approximatedwith a few predominant signal components only.That said, one question remains to be answered: whichindex separates the signal components from the noise ones?Despite different solutions in literature, it is certainly impossi-ble to agree on a universal approach that fits in every realisticsituation and fulfills every practical requirement.Nevertheless, a majority of the methods choose to look intothe intrinsic structure of the singular values, or its variantslike singular entropy [29, 36], to search for the “elbow point”in Fig. 2(l-a). To name a few, one can inspect the first-orderdifference [29, 32], the first-order quotient [27, 30], or simplyselect the intersection of two extrapolated lines by eye [31]. Allthose methods depend on extra deciding parameters, which areusually specified ad hoc and vary from case to case. Conse-quently, it is rather difficult to implement them in an automated manner to de-noise a large batch of sequences. What is worse,the smoothed sequence is prone to subjective bias, and thusmay vary from person to person.A more advanced method seeks to approximate 𝑋 whenregularized by the nuclear norm of the low-rank approxima-tor [23, 37]. To select an appropriate regularization parameter,this method relies on various unbiased risk estimators accord-ing to the prior knowledge on the noise statistics. Other sophis-ticated methods based on neural network dictionary learningrequire considerable training data, and often involve many it-erations to solve non-linear minimization problems [25, 33].Apparently, those methods are computationally rather intense.Here, we propose to tackle this challenge by inspecting theleft singular vectors. As can be seen in Fig. 2(l-c), the firstthree belonging to the signal are quite smooth, whereas thenoise vectors are very wiggly. Moreover, there exists a drasticbehavioral change at the critical index, which separates thesignal from the noise. This observation is further confirmedby Figs. 2(c-c) and 2(r-c), where the signal and the noise areindividually inspected. To characterize such an impression, aquantity named Normalized Mean Total Variation (NMTV) isdefined. For a given sequence 𝑤 of length 𝑚 , the NMTV 𝜉 iscalculated as 𝜉 = (cid:205) 𝑚 − 𝑖 = | 𝑤 𝑖 − 𝑤 𝑖 + |( 𝑚 − )( 𝑤 max − 𝑤 min ) . (8)Obviously, 𝜉 is between zero and one, where zero means novariation and one corresponds to a zigzag pattern. A largerNMTV indicates that the sequence is more oscillatory. Hence,the NMTV is able to describe the smoothness of the sequence.Since the noise is usually wiggly, its SVD components can bediscriminated by thresholding on the NMTVs of u 𝑖 . Numericalexperiments have revealed that a universal threshold of 0 . C. Signal reconstruction
Having selected the first 𝑟 SVD components, the low-rankapproximation 𝐴 of 𝑋 is 𝐴 = 𝑟 − ∑︁ 𝑖 = 𝑠 𝑖 u 𝑖 v T 𝑖 . (9)In general, 𝐴 is no longer partially circulant. Consequently,cyclic anti-diagonal averaging is applied to reconstruct thesmoothed sequence ˆ 𝑎 from 𝐴 :ˆ 𝑎 𝑖 = mean 𝑗 + 𝑘 ≡ 𝑖 ( mod 𝑛 ) (cid:0) 𝐴 𝑗𝑘 (cid:1) , ≤ 𝑖 < 𝑛. (10) D. Practical considerations
So far, the choice of 𝑚 has not been addressed. It is nev-ertheless worth noting that when 𝑚 = 𝑛 , i.e., 𝑋 is completelycirculant, the method proposed here is equivalent to the Fourierthresholding [7]. Since any circulant matrix is diagonalizable s (l-b) signal + noise (c-b) signal (r-b) noise U (l-c) 0 1 2 3 4 5(c-c) 0 1 2 3 4 5(r-c)0 25 50 75 1000.00.5 (l-a) 0 25 50 75 1000.00.5 (c-a) 0 25 50 75 1000.00.5 (r-a) FIG. 2. Typical examples of SVD components in the case of noise-corrupted signal (left), sole signal (center), and sole noise (right). (a) Fullspectra of singular values in a non-increasing order, where the blue crosses belong to the signal and the green dots belong to the noise. (b)Zoomed spectra focusing on the largest six singular values. (c) The corresponding left singular vectors. by the unitary discrete Fourier transform matrix, its singularvalues obtained by SVD are in fact the modulus of the Fouriercoefficients of 𝑥 [38].However, to select a larger 𝑚 does not necessarily result ina better smoothing effect, as the signal energy is diluted byexcessive singular values. A relatively small 𝑚 is not muchhelpful either due to an insufficient number of elements forthe cyclic anti-diagonal averaging. Although in practice thechoice of 𝑚 may be subject to the specific purpose of thede-noising problem, one can generally decide on an optimal 𝑚 which minimizes the NMTV of the de-noised sequenceto attain the smoothest result. Numerical experiments havesuggested an empirical rule to achieve an optimal de-noisingeffect: 0 . 𝑛 < 𝑚 < . 𝑛 .Sometimes, it will be unfortunate to notice a ringing arti-fact in the smoothed data, in particular when the signal hasa significant gap between its two ends (Fig. 3). This shouldbe owing to the same reason as for the Gibbs phenomenonin Fourier transform [39]. As a cure for the artifact, a lineartrend, if at all, in the signal should be removed to close thegap before de-noising. Whether the de-trending is practicallynecessary is judged by a comparison between the signal gap Δ and the noise standard deviation 𝜎 .Unfortunately, neither of them are known, and hence theymust be estimated from 𝑥 . Despite the wiggles at both ends of 𝑥 , the respective signal end-values can be recovered somehowby averaging a few neighboring data, based on which Δ is esti-mated. Moreover, 𝜎 can be estimated from the noise singularvalues. By recalling Eqs. (1), (6), and (7), it can be writtenthat 𝑛 − ∑︁ 𝑖 = 𝑎 𝑖 + 𝑛 − ∑︁ 𝑖 = 𝑑 𝑖 + 𝑛 − ∑︁ 𝑖 = 𝑎 𝑖 𝑑 𝑖 = 𝑚 𝑟 − ∑︁ 𝑖 = 𝑠 𝑖 + 𝑚 𝑚 − ∑︁ 𝑖 = 𝑟 𝑠 𝑖 , (11)where, due to the cancellation among 𝑑 𝑖 , the summation of thecross terms on the left side contributes rather little. Therefore, with de-trendingwithout de-trending FIG. 3. Illustration of the ringing artifact in the smoothed data. Theblue line is obtained with linear de-trending, whereas the green linewithout. 𝜎 can be approximated as 𝜎 = √︄ (cid:205) 𝑛 − 𝑖 = 𝑑 𝑖 𝑛 ≈ √︄ (cid:205) 𝑚 − 𝑖 = 𝑟 𝑠 𝑖 𝑚𝑛 . (12)Should 𝜎 turn out to be less than Δ after de-noising, a lineartrend is significantly present and must be removed. However,it may happen sometimes that for the de-trended yet noisy se-quence, 𝜎 is still less than Δ , which suggests that the estimateof the latter by averaging is strongly biased by the signal it-self. Under such circumstances, it is advisable to reduce thenumber of elements for averaging and redo the de-trending.This adaptive de-trending process will surely terminate, in theworst case, when the averaging number becomes one.In summary, the algorithm of the proposed data de-noisingmethod is organized as follows.1. Begin with a noisy sequence 𝑥 and a row number 𝑚 .2. Construct a partial circulant matrix 𝑋 by Eq. (3), thenapply SVD to it.3. Calculate the NMTVs of the left singular vectors u 𝑖 by Eq. (8), then select the smallest 𝑟 that satisfiesNMTV ( u 𝑟 − ) < . ≤ NMTV ( u 𝑟 ) as the approxi-mation rank.4. Estimate the signal gap Δ by averaging a few end-data in 𝑥 , as well as the noise standard deviation 𝜎 by Eq. (12).5. If 𝜎 < Δ , linearly de-trend 𝑥 , then repeat steps 2–4;otherwise proceed.6. Build a low-rank approximation 𝐴 by Eq. (9).7. End with a smoothed sequence ˆ 𝑎 by Eq. (10). III. EXAMPLES
To illustrate the smoothing effect of the proposed method,four synthetic signals of the same length 𝑛 = ≤ 𝑖 < periodic signal: 𝑎 𝑖 = sin ( . 𝜋𝑖 )− sin ( . 𝜋𝑖 )+ cos [ sin ( . 𝜋𝑖 )] , (13)which contains exactly 2 periods. asymmetric peak: 𝑎 𝑖 = erfc ( − . 𝑖 ) · 𝑒 − . 𝑖 , (14)where erfc ( 𝑧 ) = − 𝜋 − / (cid:16)∫ 𝑧 − 𝑧 𝑒 − 𝑡 𝑑𝑡 (cid:17) is the comple-mentary error function. multiple peaks on a ramp: 𝑎 𝑖 = 𝑒 −( . 𝑖 − ) + (cid:2) + ( . 𝑖 − ) (cid:3) − + 𝑒 − . 𝑖 , (15)which contains a Gaussian peak and a Lorentzian peak. chaotic dynamics: It is well known that a Duffing oscillatorwill undergo chaos under certain conditions [40]. Here,we adopt the following parametrization of the Duffingequation: (cid:165) 𝑎 + . (cid:164) 𝑎 − 𝑎 ( − 𝑎 ) = . ( . 𝜋𝑡 ) , (16)where 𝑎 denotes the oscillation amplitude, and the dot-ted versions denote its derivatives with respect to time 𝑡 . With the initial value ( 𝑎, (cid:164) 𝑎 ) = ( , ) , the differen-tial equation is numerically solved by the Runge-Kuttafourth-order method in an interval of 0 ≤ 𝑡 <
50 with aresolution of 0 . 𝜌 ( dB ) =
10 log (cid:18) (cid:104) 𝑎 (cid:105) − (cid:104) 𝑎 (cid:105) (cid:104) 𝑑 (cid:105) − (cid:104) 𝑑 (cid:105) (cid:19) , (17)where (cid:104)·(cid:105) denotes arithmetic mean. It is clear that a smaller 𝜌 will lead to a more wiggly sequence.A fixed row number 𝑚 =
250 is selected for all the smooth-ing tests. The de-noising goodness is assessed by the Normal-ized Root-Mean-Square Deviation (NRMSD) of the smoothedsequence ˆ 𝑎 from the signal 𝑎 , which is defined as 𝛿 ( % ) = √︄ (cid:104)( ˆ 𝑎 − 𝑎 ) (cid:105)(cid:104) 𝑎 (cid:105) − (cid:104) 𝑎 (cid:105) . (18)A smaller 𝛿 indicates a greater power to reveal the groundtruth.The test results are shown in Figs. 4–7. It can be seen that theproposed method is well able to smooth the noisy sequences,regardless of the underlying signals, even when the SNR isdown to 0 dB. The achieved NRMSDs are quite similar forboth types of noise, although the normal noise usually resultsin a slightly larger NRMSD, which may be owing to its largerkurtosis. IV. APPLICATIONS
Owing to its non-parametric nature, the method presentedhere is promising to find its applications in vast data de-noisingscenarios. Furthermore, by allowing for a manual selection ofthe approximation rank, the proposed method will be a pow-erful data analysis technique not just for de-noising. In whatfollows, three data analysis challenges in precision storage-ring mass spectrometry are presented to showcase its diverseapplications in reality [41].
A. Pulse timing
In the isochronous mass spectrometry [42], a relativisticcharged particle periodically circulates in a vacuum environ-ment due to the confinement by magnetic field [43]. A time-of-flight detector consisting of an ultra thin carbon foil is placed inthe vacuum chamber to register every passage of the particle,which is signaled by a sharp voltage pulse [44]. Fig. 8 shows azoomed example of such a pulse. A precise determination ofthe pulse onset is critically important for a high-precision ( < S N R ( d B ) (a) uniform noise normal noise uniformnormal FIG. 4. Illustration of the smoothing effect on the noise-corrupted periodic signals with various SNRs. (a) The noise obeys a uniformdistribution. The orange line is the ground truth, while the blue line is the de-noised approximation. (b) Same as (a), except the noise isnormally distributed. (c) The resultant NRMSDs obtained in different cases. S N R ( d B ) (a) uniform noise normal noise uniformnormal FIG. 5. Same as Fig. 4, but for an asymmetric peak. S N R ( d B ) (a) uniform noise normal noise uniformnormal FIG. 6. Same as Fig. 4, but for a Gaussian peak (left) and a Lorentzian peak (right) on a ramp. S N R ( d B ) (a) uniform noise normal noise uniformnormal FIG. 7. Same as Fig. 4, but for the amplitude of a Duffing oscillator. −20−100 (a) this work (b) moving average v o l t a g e ( m V ) (c) wavelet shrinkage total variation FIG. 8. A zoomed pulse recorded by a time-of-flight detector, whichis de-noised by (a) the proposed method in this work, (b) movingaverage, (c) wavelet shrinkage, and (d) total variation. The crosseson the leading edge mark the 90% to 10% interval used to calculatethe fall time. See Sec. IV A for the detail.
With the same pulse of length 𝑛 =
401 in Fig. 8, we havecompared four de-noising methods of non-parametric type,namely the one proposed here, moving average, wavelet shrink-age, and total variation. The NMTVs of their de-noised data,which are shown in Table I, are controlled to be almost thesame to ensure comparable smoothness. Afterwards, the falltime is measured for each smoothed sequence, except for thetotal variation since the staircase effect unfeasibly allows for aclear definition of falling interval. In what follows, the techni-cal implementing details are presented.
TABLE I. Characteristics of de-noised data by various methods.Method NMTV ( ‰ ) Fall time (ns)this work 5.70 moving average 5.69 0.53wavelet shrinkage 5.71 0.51total variation 5.70 — For the proposed method, a row number 𝑚 =
105 is de-termined. As for the moving average, a Gaussian kernel 𝑔 oflength 45 and standard deviation 9 is employed: 𝑔 𝑖 ∝ 𝑒 − 𝑖 / , − ≤ 𝑖 ≤ . (19)When applying the wavelet shrinkage, the Symlets-4 isadopted to perform the wavelet decomposition. Afterwards,the wavelet coefficients are thresholded on 3 .
406 using thenon-negative garrote scheme [48, 49]. The wavelet transformis carried out in
Python with the
PyWavelets package [50].Lastly, the total variation is based on the algorithm for one-dimensional data in Ref. [51], and computed with the C codeavailable in Ref. [52]. The regularization parameter for totalvariation is selected to be 2 . B. Pre-whitening
In the Schottky mass spectrometry [54], a dedicated res-onator is employed to detect a revolving particle in the storagering by sensing its electromagnetic radiation [55]. Owing tothe periodic motion of the particle, the power spectral densityof its Schottky signal detected by the resonator peaks at each −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5intermediate frequency (MHz)05101520 p o w e r s p e c t r a l d e n s i t y ( a r b . u n i t ) referencefittingthis work FIG. 9. Background estimation of a Schottky spectrum with a sharppeak on top of it. The blue line is a direct fitting of a Lorentzianfunction, whereas the red line is obtained by the proposed method inthis work. The green line acts like a reference. harmonic of the particle’s revolution frequency [56]. Amongthose harmonics, the one in the resonant range of the resonatoris band-pass filtered and mixed down to the base band for dataacquisition [57].Fig. 9 shows an example of a peak harmonic on top of aresonant background. In certain cases, the integral peak areais of the experimental interest, from which the particle numbercan be inferred [56, 58]. This is a prior step for, e.g., themeasurement of the particle’s lifetime [59], or the calibrationof the particle’s position [60]. Therefore, a proper deductionof the background is of the experimental importance.Although the Lorentzian shape of the background is wellunderstood [61], it is improper to directly apply fitting to thespectrum as the peak will strongly deviate the result (see blueline in Fig. 9). In contrast, the method proposed here witha bit of adaptations can elegantly eliminate the peak interfer-ence. First recall that the Fourier coefficients of any noiseare zero-mean circular complex Gaussian [62]. Consequently,the power spectral density of the noise, which is the squaredmodulus of the coefficients, obeys a chi-squared distributionwith 2 degrees of freedom. Its probability density functionis in fact a negative exponent, which happens to be the samedistribution as a speckle noise obeys [63]. Since the specklenoise is usually modeled as a multiplicative noise [64], it isanalogous to conclude that the noise in the Schottky spectrumis also multiplicative.Therefore, a logarithmic transform should first be applied tothe spectrum so as to use the additive model in Eq. (1). Thetransformed spectrum of length 𝑛 = 𝑚 =
290 rows. This time onlythe first 2 components are taken to reconstruct the background.As can be seen in Fig. 9, the resultant approximation of redline is in excellent agreement with the reference of green line,which is obtained by fitting a “bald” spectrum when particlesare purged from the storage ring.The proposed method hence opens up a possibility to deductthe resonant background in situ . It will be no longer neces-sary to interrupt an ongoing experiment just for the referencemeasurements, which increases effective on-target beam time. − − p o w e r s p e c t r a l d e n s i t y ( a r b . u n i t s ) (a) −400 −200 0 200 400intermediate frequency (kHz)0.00.30.6 P W R ( d B / k H z ) (b) FIG. 10. Weak peak detection in a Schottky spectrum by theproposed method in this work. (a) The red line is the de-noised spec-trum, and the green line, which is obtained by averaging 100 similarspectra, provides a reference. All local maxima in the red spectrumare marked with blue crosses, which also respectively indicate theirprominences (vertical extent) and widths (horizontal extent). Thedetected peaks are labelled, in a descending order of PWR, as 1, 2,and 3. (b) The PWR of all eight peak candidates, among which thelargest three are considered as real peaks associated with particles.
This is in particular useful in case the yields of aimed particlesare extremely low.
C. Peak detection
Another common challenge in Schottky mass spectrometryis the detection of weak peaks. Fig. 10(a) shows an exampleof such a case. Apart from the predominant peak in the center,it is impossible to unambiguously detect other peaks in thenoisy spectrum. A normal solution is to continuously collecta number of spectra and average them to improve the SNR.For example, the green line in Fig. 10(a) is obtained from100-fold averaging, which can barely reveal two weaker peakson the left side. An obvious drawback of this approach is thedegradation of time resolution, which, in this case, is 100 timesworse. It is thus unfavorable for the time-resolved Schottkymass spectrometry [65], and in particular for the single-particledecay spectroscopy [66]. Even worse is that sometimes it isunfeasible to collect sufficient spectra, because, e.g., the aimedparticle may have already decayed [67].By using the proposed method, the noisy spectrum of length 𝑛 = 𝑚 =
430 rows. A smoothed spectrumis reconstructed from the first 7 SVD components, which isshown as the red line in Fig. 10(a). Peaks are searched amonglocal maxima in the smoothed spectrum as the candidates. Foreach candidate, its prominence and width are subsequentlymeasured, where the prominence is the relative height fromthe summit to the highest adjacent valley, and the width isthe horizontal span at half prominence [see blue crosses inFig. 10(a) for illustration]. Afterwards, the Prominence-to-Width Ratio (PWR), shown in Fig. 10(b), is calculated foreach candidate to characterize its significance. Based on thisparameter, in total three peaks are decisively identified withthe largest PWRs and labelled with 1, 2, and 3 in Fig. 10(a).The remaining candidates are only considered as bumps dueto their insignificant PWRs. It is clear in Fig. 10(a) that thedetected peaks perfectly align with those in the averaged spec-trum, and, moreover, with a 100-fold improvement of timeresolving power.Once the locations of weak peaks are coarsely determinedfrom the smoothed spectrum, they can subsequently be trans-ferred to a dedicated peak-finding algorithm as the requiredprior knowledge to refine the result [68]. What is more, theproposed method can be deployed to a real-time task to fasttrack particles in a dynamic process such as orbital-electroncapture [69] or bound-state 𝛽 − decay [70] on an event-by-eventbasis. V. CONCLUSION
An SVD-based one-dimensional data de-noising scheme hasbeen proposed. Owing to its non-parametric nature, it worksfor any additive noise, as well as multiplicative noise afterlogarithmic transform. By construction of a partial circu-lant matrix, the proposed method well balances computationalsimplicity and smoothing efficiency. The proposed NMTV criterion to select the optimal approximation rank allows foran integration into automated processes, which is a compellingfeature for certain applications in reality. With the real-worlddata, the proposed method has proved its strong competitive-ness among other non-parametric de-noising methods such asmoving average, wavelet shrinkage, and total variation. Fur-thermore, a few interesting applications of the method otherthan de-noising were also addressed, and its competence hasbeen demonstrated with cases in precision storage-ring massspectrometry. Lastly, a
Python implementation of the pro-posed method is provided and placed in the public domain [71].
ACKNOWLEDGMENTS
This work was partially supported by the CSC and DAADunder the Project Based Personnel Exchange Program (No.57389367), by the Key Research Program of Frontier Sci-ences of CAS (No. QYZDJ-SSW-S, No. XDPB09), by the Chi-nese National Key Program for Science and Technology R&D(No. 2016YFA0400504, No. 2018YFA0404400), and by theMajor State Basic Research Development Program of China(No. 2013CB834401). Y.A.L. acknowledges the support bythe CAS President’s International Fellowship Initiative (No.2016VMA043), by the European Research Council (ERC) un-der the Horizon 2020 Research and Innovation Program (No.682841 “ASTRUm”). Y.H.Z. acknowledges the support by theExtreMe Matter Institute (EMMI) at GSI. [1] R. J. Hyndman, Moving Averages, in
International Encyclope-dia of Statistical Science , edited by M. Lovric (Springer, Berlin,Heidelberg, 2011) pp. 866–869.[2] R. M. Clark, Non-Parametric Estimation of a Smooth Regres-sion Function, J. Royal Stat. Soc. B , 107 (1977).[3] S. W. Roberts, Control Chart Tests Based on Geometric MovingAverages, Technometrics , 239 (1959).[4] C. Tomasi and R. Manduchi, Bilateral filtering for gray and colorimages, in Proc. ICCV ’98 (Narosa Publishing House, Bombay,India, 1998) pp. 839–846.[5] A. Buades, B. Coll, and J. Morel, A non-local algorithm forimage denoising, in
Proc. CVPR ’05 , Vol. 2 (San Diego, CA,USA, 2005) pp. 60–65.[6] J. R. Ullmann, Picture analysis in character recognition, in
Dig-ital Picture Analysis , edited by A. Rosenfeld (Springer, Berlin,Heidelberg, 1976) pp. 295–343.[7] G. C. Huang, Noise Reduction by Adaptive Threshold in DigitalSignal Processing, in
Proc. ISEMC ’74 (San Francisco, CA,USA, 1974) pp. 1–7.[8] I. Daubechies,
Ten Lectures on Wavelets , CBMS-NSF RegionalConference Series in Applied Mathematics (Society for Indus-trial and Applied Mathematics, 1992).[9] D. L. Donoho and I. M. Johnstone, Ideal spatial adaptation bywavelet shrinkage, Biometrika , 425 (1994).[10] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard,Wavelet Shrinkage: Asymptopia?, J. Royal Stat. Soc. B , 301(1995). [11] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variationbased noise removal algorithms, Physica D , 259 (1992).[12] D. Strong and T. Chan, Edge-preserving and scale-dependentproperties of total variation regularization, Inverse Probl. ,S165 (2003).[13] V. Caselles, A. Chambolle, and M. Novaga, The Discontinu-ity Set of Solutions of the TV Denoising Problem and SomeExtensions, Multiscale Model. Simul. , 879 (2007).[14] W. S. Cleveland, Robust Locally Weighted Regression andSmoothing Scatterplots, J. Am. Stat. Assoc. , 829 (1979).[15] A. Savitzky and M. J. E. Golay, Smoothing and Differentiationof Data by Simplified Least Squares Procedures, Anal. Chem. , 1627 (1964).[16] N. Wiener, Extrapolation, interpolation, and smoothing of sta-tionary time series: with engineering applications. (MIT Press,Cambridge, 1949).[17] R. E. Kalman, A New Approach to Linear Filtering and Predic-tion Problems, J. Basic Eng. , 35 (1960).[18] G. Stewart, On the Early History of the Singular Value Decom-position, SIAM Rev. , 551 (1993).[19] K. F. Pearson, LIII. On lines and planes of closest fit to systemsof points in space, Philos. Mag. , 559 (1901).[20] H. Andrews and C. Patterson, Singular Value Decomposition(SVD) Image Coding, IEEE Trans. Commun. , 425 (1976).[21] C.-C. Chang, P. Tsai, and C.-C. Lin, SVD-based digital imagewatermarking scheme, Pattern Recognit. Lett. , 1577 (2005).[22] Q. Guo, C. Zhang, Y. Zhang, and H. Liu, An Efficient SVD-Based Method for Image Denoising, IEEE Trans. Circuits Syst. Video Technol. , 868 (2016).[23] T. Furnival, R. K. Leary, and P. A. Midgley, Denoising time-resolved microscopy image sequences with singular valuethresholding, Ultramicroscopy , 112 (2017).[24] K. Hermus, I. Dologlou, P. Wambacq, and D. V. Comper-nolle, Fully adaptive SVD-based noise removal for robust speechrecognition, in Proc. EUROSPEECH ’99 (Budapest, Hungary,1999) pp. 1951–1954.[25] A. Al-Zaben and A. Al-Smadi, Extraction of foetal ECG bycombination of singular value decomposition and neuro-fuzzyinference system, Phys. Med. Biol. , 137 (2006).[26] L. Chiron, M. A. van Agthoven, B. Kieffer, C. Rolando, and M.-A. Delsuc, Efficient denoising algorithms for large experimentaldatasets and their applications in Fourier transform ion cyclotronresonance mass spectrometry, Proc. Natl. Acad. Sci. , 1385(2014).[27] F. Cong, J. Chen, G. Dong, and F. Zhao, Short-time matrixseries based singular value decomposition for rolling bearingfault diagnosis, Mech. Syst. Signal Process. , 218 (2013).[28] M. Zhao and X. Jia, A novel strategy for signal denoising usingreweighted SVD and its applications to weak fault feature en-hancement of rotating machinery, Mech. Syst. Signal Process. , 129 (2017).[29] X. Zhao and B. Ye, Selection of effective singular values usingdifference spectrum and its application to fault diagnosis ofheadstock, Mech. Syst. Signal Process. , 1617 (2011).[30] H. Jiang, J. Chen, G. Dong, T. Liu, and G. Chen, Study onHankel matrix-based SVD and its application in rolling elementbearing fault diagnosis, Mech. Syst. Signal Process. , 338(2015).[31] T. Schanze, Compression and Noise Reduction of BiomedicalSignals by Singular Value Decomposition, IFAC-PapersOnLine , 361 (2018).[32] H. Hassanpour, A time-frequency approach for noise reduction,Digit. Signal Process. , 728 (2008).[33] W. Gong, H. Li, and D. Zhao, An Improved Denoising ModelBased on the Analysis K-SVD Algorithm, Circuits Syst. SignalProcess. , 4006 (2017).[34] K. Shin, J. Hammond, and P. White, Iterative SVD method fornoise reduction of low-dimensional chaotic time series, Mech.Syst. Signal Process. , 115 (1999).[35] C. Eckart and G. Young, The approximation of one matrix byanother of lower rank, Psychometrika , 211 (1936).[36] W.-X. Yang and P. W. Tse, Development of an advanced noisereduction method for vibration analysis based on singular valuedecomposition, NDT E Int. , 419 (2003).[37] E. J. Candès and B. Recht, Exact Matrix Completion via ConvexOptimization, Found. Comput. Math. , 717 (2009).[38] H. Karner, J. Schneid, and C. W. Ueberhuber, Spectral decom-position of real circulant matrices, Linear Algebra Appl. ,301 (2003).[39] J. W. Gibbs, Fourier’s Series, Nature , 200 (1898).[40] G. Duffing, Erzwungene Schwingungen bei veränderlicherEigenfrequenz und ihre technische Bedeutung , SammlungVieweg No. 41–42 (F. Vieweg & sohn, Braunschweig, 1918).[41] B. Franzke, H. Geissel, and G. Münzenberg, Mass and lifetimemeasurements of exotic nuclei in storage rings, Mass Spectrom.Rev. , 428 (2008).[42] M. Hausmann et al. , First isochronous mass spectrometry at theexperimental storage ring ESR, Nucl. Instrum. Meth. A ,569 (2000).[43] H. S. Xu, Y. H. Zhang, and Y. A. Litvinov, Accurate massmeasurements of exotic nuclei with the CSRe in Lanzhou, Int.J. Mass Spectrom. , 162 (2013). [44] B. Mei et al. , A high performance time-of-flight detector ap-plied to isochronous mass measurement at CSRe, Nucl. Instrum.Meth. A , 109 (2010).[45] X. L. Tu et al. , Direct mass measurements of short-lived 𝐴 = 𝑍 − Ge, As, Se, and Kr and their impact onnucleosynthesis in the 𝑟 𝑝 process, Phys. Rev. Lett. , 112501(2011).[46] Y. H. Zhang et al. , Mass measurements of the neutron-deficient Ti, Cr, Fe, and Ni nuclides: First test of the isobaricmultiplet mass equation in 𝑓 𝑝 -shell nuclei, Phys. Rev. Lett. ,102501 (2012).[47] X. Xu et al. , Identification of the Lowest 𝑇 = , 𝐽 𝜋 = + IsobaricAnalog State in Co and Its Impact on the Understanding of 𝛽 -Decay Properties of Ni, Phys. Rev. Lett. , 182503 (2016).[48] L. Breiman, Better Subset Regression Using the NonnegativeGarrote, Technometrics , 373 (1995).[49] H.-Y. Gao, Wavelet Shrinkage Denoising Using the Non-Negative Garrote, J. Comput. Graph. Stat. , 469 (1998).[50] G. Lee, R. Gommers, F. Wasilewski, K. Wohlfahrt, A. O’Leary,H. Nahrstaedt, and Contributors, PyWavelets—Wavelet Trans-forms in Python, GitHub (2006).[51] L. Condat, A Direct Algorithm for 1-D Total Variation Denois-ing, IEEE Signal Process. Lett. , 1054 (2013).[52] L. Condat, Total variation denoising of 1-D signals—Version2.0 (2017).[53] W. Zhang et al. , A timing detector with pulsed high-voltagepower supply for mass measurements at CSRe, Nucl. Instrum.Meth. A , 38 (2014).[54] Y. A. Litvinov et al. , Mass measurement of cooled neutron-deficient bismuth projectile fragments with time-resolved Schot-tky mass spectrometry at the FRS-ESR facility, Nucl. Phys. A , 3 (2005).[55] F. Nolden et al. , A fast and sensitive resonant Schottky pick-upfor heavy ion storage rings, Nucl. Instrum. Meth. A , 69(2011).[56] S. Chattopadhyay, Some fundamental aspects of fluctuations andcoherence in charged-particle beams in storage rings, AIP Conf.Proc. , 467 (1985).[57] C. Trageser et al. , A new data acquisition system for Schot-tky signals in atomic physics experiments at GSI’s and FAIR’sstorage rings, Phys. Scr. , 014062 (2015).[58] Y. A. Litvinov and F. Bosch, Beta decay of highly charged ions,Rep. Prog. Phys. , 016301 (2011).[59] X. L. Tu et al. , First application of combined isochronous andSchottky mass spectrometry: Half-lives of fully ionized Cr + and Fe + atoms, Phys. Rev. C , 014321 (2018).[60] X. Chen et al. , Accuracy improvement in the isochronous massmeasurement using a cavity doublet, Hyperfine Interact. , 51(2015).[61] X. Chen, Non-interceptive position detection for short-livedradioactive nuclei in heavy-ion storage rings , Ph.D. thesis,Ruprecht-Karls-Universität Heidelberg (2015).[62] J. Schoukens and J. Renneboog, Modeling the noise influenceon the Fourier coefficients after a discrete Fourier transform,IEEE Trans. Instrum. Meas.
IM-35 , 278 (1986).[63] J. W. Goodman, Statistical Properties of Laser Speckle Patterns,in
Laser Speckle and Related Phenomena , edited by J. C. Dainty(Springer, Berlin, Heidelberg, 1975) pp. 9–75.[64] H. H. Arsenault and M. Denis, Image processing in signal-dependent noise, Can. J. Phys. , 309 (1983).[65] Y. A. Litvinov et al. , Precision experiments with time-resolvedSchottky mass spectrometry, Nucl. Phys. A , 473 (2004).[66] P. Kienle et al. , High-resolution measurement of the time-modulated orbital electron capture and of the 𝛽 + decay of hydrogen-like Pm + ions, Phys. Lett. B , 638 (2013).[67] P. M. Walker, Y. A. Litvinov, and H. Geissel, The ILIMA projectat FAIR, Int. J. Mass Spectrom. , 247 (2013).[68] X. Chen, Band-limited peak-finding method for a noisy fre-quency spectrum, in Proc. STORI ’17 (Kanazawa, Japan, 2017). [69] Y. A. Litvinov et al. , Measurement of the 𝛽 + and orbital electron-capture decay rates in fully ionized, hydrogenlike, and helium-like Pr ions, Phys. Rev. Lett. , 262501 (2007).[70] F. Bosch et al. , Observation of bound-state 𝛽 − decay of fully ion-ized Re:
Re-
Os cosmochronometry, Phys. Rev. Lett.77