A Novel Approach for Ridge Detection and Mode Retrieval of Multicomponent Signals Based on STFT
11 A Novel Approach for Ridge Detection andMode Retrieval of Multicomponent SignalsBased on STFT
Nils Laurent, Sylvain Meignen
Abstract
Time-frequency analysis is often used to study non stationary multicomponent signals, which canbe viewed as the surperimposition of modes, associated with ridges in the TF plane. To understandsuch signals, it is essential to identify their constituent modes. This is often done by performing ridgedetection in the time-frequency plane which is then followed by mode retrieval. Unfortunately, existingridge detectors are often not enough robust to noise therefore hampering mode retrieval. In this paper,we therefore develop a novel approach to ridge detection and mode retrieval based on the analysis ofthe short-time Fourier transform of multicomponent signals in the presence of noise, which will proveto be much more robust than state-of-the-art methods based on the same time-frequency representation.
Index Terms
AM/FM multicomponent signals, Short-time Fourier transform, Ridge detection, Mode retrieval.
I. I
NTRODUCTION
Many signals such as audio signals (music, speech, bird songs) [1], medical data (electrocardiogram,thoracic and abdominal movement signals), can be modeled as a superimposition of amplitude- andfrequency-modulated (AM-FM) modes [2], [3], and are therefore called multicomponent signals (MCSs).Time-frequency analysis is often used to deal with such signals [4]–[6], since their modes are associatedwith curves in the time-frequency (TF) plane, called ridges . To extract these ridges is essential whenone is interested in separating a MCS into its constituent modes [7]. For that purpose, several TF-basedtechniques were developed [8] [9] using the idea that the ridges correspond to local maxima along thefrequency axis of the modulus of some time-frequency representation (TFR). Indeed, it was shown in[10], [11] that these maxima approximate the instantaneous frequency (IF) of the modes, the quality ofapproximation depending on the noise level and on the length of the analysis window. This concept has
September 29, 2020 DRAFT a r X i v : . [ ee ss . SP ] S e p then been used on various types of TFRs such as for instance the short-time Fourier transform (STFT)[12], [13], or synchrosqueezed transforms either based on the continuous wavelet transform [14] or STFT[15]–[17]. TF ridges are also used in demodulation approaches still developed for the purpose of moderetrieval (MR) [18], [19]. An alternative TF-based MR technique, not explicitly using ridge detection,consists of finding locally the linear chirp that fits the best the STFT of the signal [20].The main limitation of the above techniques is that at high noise level, the local maxima along thefrequency axis that define the ridges in the noiseless case may no longer exist. In this regard, the analysisof these maxima proposed in [10] assumes a low noise level. In case of high noise level, a study wascarried out on the TFR associated with the Wigner-Ville distribution (WVD) in [21], basically remarkingthat peak searching based techniques are not able to cope with multicomponent or monocomponentsignals contaminated by strong noise. The key ideas of the algorithm proposed in [21] were first that,if the WVD maximum at the considered time instant is not at the IF point, there is a high probabilitythat the IF is at a point having one of the largest WVD values. The second argument was based on theassumption that the IF variations between two consecutive points is not extremely large.Following [21], it seems to be possible to follow the ridges by considering local maxima and thenby chaining them using some proximity criterion in the TF plane. However, as we will see, when oneconsiders STFT as TFR, the noise can generate zeros in the vicinity of true IF location splitting the ridgeinto two chains of local maxima. It is therefore illusory to try and build the ridges by simply chaininglocal maxima in the TF plane. The second argument used in [21] is however very interesting in that even,at a high noise level, the local maxima associated with the signal components are close in the TF plane.Though there exist many different techniques to extract the ridges in the TF plane, mainly usingoptimization procedures as in [9], [22], they rely on an initial so-called skeleton of the transform whichis usually not available in heavy noise situations. Indeed, to reconstruct the modes, these approaches usemodified versions of least-square minimizations in which the data terms are ridge points, supposed tobe reliable enough otherwise the algorithms fail. However, to assign a TF point to a specific ridge isprobably one of the most complicated aspect of RD and should be handled with care. To perform thiskind of initial estimation of ridge points, one needs to carefully analyze the behavior of the coefficientsof the TFR in the vicinity of local maxima. In the present paper, to accurately assign a TF point to aridge, we will make the assumption that the modes are not crossing, our goal being to improve RD inthe TF plane in very noisy situations in a fully adaptive way. It is however possible to deal with crossingmodes by imposing regularity constraints on the extracted modes in the TF plane [8], [9], [15], or byanalyzing the signal in the time domain using a parametric approach [23], but this is not the scope ofthe present paper.
September 29, 2020 DRAFT
Indeed, we aim to propose a fully adaptive RD that is very competitive in noisy situations. For thatpurpose, we will first analyze carefully the energy in the TF plane in the vicinity of local maxima. Inparticular, we will see that some of these local maxima can be gathered into so-called relevant ridgeportions (RRPs) that will be the basis to our new approach. As already mentioned in the context of WVD[21], the effect of noise on TF ridges is to split them into ridge portions that need to be identified andthen gathered together when they correspond to the same mode. This is in essence the goal we pursuein the present paper. The paper is organized as follows: in Section II, we recall basic notations regardingSTFT and MCSs as well the most commonly used TF-based RD technique. Since the latter depends onseveral arbitrary parameters, we recall in Section III, how it can be made more adaptive by using a localchirp rate estimate, as recently proposed in [13]. Then, taking into account that, in noisy situations, amode cannot be associated with a single ridge computed from chaining local maxima in the TF plane,we alternatively consider that it corresponds to a set of so-called relevant ridge portions (RRPs), whichare defined in Section IV. This helps us define a new RD technique in Section V. Section VI is thendevoted to the comparison of the proposed new technique to state-of-the-art TF-based methods for RDand MR, highlighting the improvement brought by the former especially in the presence of heavy noise.An application to the analysis of gravitational-wave concludes the paper.II. D
EFINITIONS AND N OTATIONS
A. Short-Time Fourier Transform
Let ˜ f be a discrete signal of length L altered by an additive noise ε , and such that ˜ f [ n ] = ˜ f ( nL ) : ˜ f := f + ε, (1)and g a discrete real window supported on [ − ML , ML ] . In that context, we define the short time Fouriertransform (STFT) as follows: V g ˜ f [ m, k ] := N − (cid:88) n =0 ˜ f [ n + m − M ] g [ n − M ] e − iπ kN ( n − M ) , (2)with M + 1 ≤ N , where N is the number of frequency bins, the index k corresponding to the frequency k LN . The STFT of the signal ˜ f is invertible, provided g [0] (cid:54) = 0 , since one has: ˜ f [ n ] = 1 g [0] N N − (cid:88) k =0 V g ˜ f [ n, k ] . (3) September 29, 2020 DRAFT
B. Multicomponent Signal Definition
In this paper, we will intensively study MCSs defined as a superimposition of AM-FM components ormodes: f [ n ] = P (cid:88) p =1 f p [ n ] with f p [ n ] = A p [ n ] e i πφ p [ n ] , (4)for some finite P ∈ N , A p [ n ] and φ (cid:48) p [ n ] being respectively the instantaneous amplitude (IA) and frequency(IF) of f p satisfying: A p [ n ] > , φ (cid:48) p [ n ] > and φ (cid:48) p +1 [ n ] > φ (cid:48) p [ n ] for each time index n .We also assume that A p is differentiable with | A (cid:48) p [ n ] | small compared with φ (cid:48) p [ n ] , that the modes areseparated with resolution ∆ and their modulations are bounded by B f , i.e. for each time index n , ∀ ≤ p ≤ P − , φ (cid:48) p +1 [ n ] − φ (cid:48) p [ n ] > ∀ ≤ p ≤ P, | φ (cid:48)(cid:48) p [ n ] | ≤ B f . (5) C. Classical Ridge Detection
A commonly used RD approach was originally proposed in [8], and then used in [14]. The goal is tocompute an integer estimate ϕ p [ n ] of φ (cid:48) p [ n ] N/L by extracting, on the spectrogram, a ridge correspondingto mode p . This is actually done by computing max ϕ (cid:88) ≤ p ≤ P ≤ n ≤ L − | V gf [ n, ϕ p [ n ]] | − α ( ∆ ϕ p [ n ] L N ) − β ( ∆ ϕ p [ n ] L N ) , (6)with ϕ = ( ϕ p ) p =1 , ··· ,P where ϕ p : { , · · · , L − } (cid:55)→ { , · · · , N − } , α and β both positive, and inwhich ∆ ϕ p [ n ] L N = ( ϕ p [ n +1] − ϕ p [ n ]) L N and ∆ ϕ p [ n ] L N = ( ϕ p [ n +1] − ϕ p [ n ]+ ϕ p [ n − L N , are approximations of φ (cid:48)(cid:48) p [ n ] and φ (cid:48)(cid:48)(cid:48) p [ n ] .Taking into account that regularization terms associated with α and β are not relevant when a ridgeis associated with a local maximum of the STFT magnitude and that to consider such penalizationparameters in noisy situations leads to inaccurate IF estimation [19], one can alternatively put a boundon the frequency modulation allowed while extracting the ridges, and then replace (6) by a peelingalgorithm where a first mode is extracted as follows [13]: max ϕ L − (cid:88) n =0 | V gf [ n, ϕ [ n ]] | , s.t. | ∆ ϕ [ n ] | L N ≤ B f . (7)Then, after ϕ is computed, one defines V gf, := V gf , and RD continues replacing V gf, by: V gf, [ n, k ] := , if ϕ [ n ] − (cid:100) ∆ NL (cid:101) ≤ k ≤ ϕ [ n ] + (cid:100) ∆ NL (cid:101) V gf, [ n, k ] , otherwise, (8) September 29, 2020 DRAFT in which (cid:100) X (cid:101) is the smallest integer larger than X . This then enables the computation of ϕ replacing V gf by V gf, in (7), and then the definition of V gf, from V gf, the same way as V gf, from V gf, . Such aprocedure is iterated until P ridges ( ϕ p ) p =1 , ··· ,P are extracted.In practice, to compute a candidate for ϕ , one first fixes an initial time index n , computes k := argmax ≤ k ≤ N − | V gf [ n , k ] | , (9)and puts ϕ [ n ] := k . Then, to define ϕ on { n + 1 , · · · , L − } , one uses the following recurringprinciple starting from n = n : ϕ [ n + 1] := argmax k (cid:26) | V gf [ n + 1 , k ] | , ϕ [ n ] − (cid:100) N B f L (cid:101) ≤ k ≤ ϕ [ n ] + (cid:100) N B f L (cid:101) (cid:27) . (10)The same principle is applied on { , · · · , n − } , again starting from n = n but replacing n + 1 by n − in (10). Finally, other initialization time indices than n are considered to define other candidatesfor ϕ , the ridge finally kept being the one among all candidates, maximizing the energy in the TF plane,i.e. (cid:80) n | V gf [ n, ϕ [ n ]] | . This RD will be called Simple Ridge Detection (S-RD) in the sequel.There are however two strong limitations to S-RD. The first one is that, in a noisy context, each modecannot be associated at each time instant with a local maximum of | V g ˜ f | along the frequency axis. As aresult, S-RD enforces the extraction of P ridges even when these are not associated with modes. This isillustrated in Fig. 1, in which we first display, in Fig. 1 (a), the magnitude of the STFT of a linear chirpalong with the three largest local maximum along the frequency axis at each time instant: we see thatthese maxima are not chained in the TF plane. To have a clearer view of the effect of noise on the ridgewe display in Fig. 1 (b), a zoom on a region containing the expected ridge location and correspondingto the situation where a global maximum along the frequency axis is split into two local maxima. Insuch a case, it transpires that the noise generates a zero of the STFT close to the location of the IFs ofthe modes, surrounded by two local maxima. This is the reason why any technique defining the ridge byfollowing the local maxima along the frequency axis will result in poor IF estimation. Such a paradigm ishowever the basis to RD technique like S-RD, and one of the goal of the present paper is to circumventthis limitation. Note that this figure also illustrates the dependency of RD techniques such as S-RD onthe initialization time n . The second important drawback of S-RD is that the upper bound B f for themodulation is fixed a priori: S-RD does not take into account the value of the modulation and even notits sign. To deal with this issue is also one of the aim of the present paper. In this regard, we first recallin the following section how to introduce some kind of adaptivity in RD by removing the dependenceof S-RD on upper frequency bound B f , as proposed in [13]. September 29, 2020 DRAFT (a) (b)
Fig. 1. (a): Magnitude of the STFT of a noisy linear chirp along with its maxima along the frequency axis; (b): Zoom on theeffect of noise on local maxima; Cyan coefficients are among the two largest coefficients along the frequency axis, red onescorrespond to the third largest along the frequency axis.
III. F
ULLY A DAPTIVE R IDGE D ETECTION
To circumvent the lack of adaptivity in S-RD, a novel approach called modulation based ridge detection (MB-RD) was proposed in [13]. In a nutshell, this approach considers the complex approximation of themodulation of the mode used in the definition of the second order synchrosqueezing transform and definedas [17]: ˜ q f [ n, k ] = 12 iπ V g (cid:48)(cid:48) f [ n, k ] V gf [ n, k ] − ( V g (cid:48) f [ n, k ]) V tgf [ n, k ] V g (cid:48) f [ n, k ] − V tg (cid:48) f [ n, k ] V gf [ n, k ] , (11)in which V g (cid:48) f , V tgf , V g (cid:48)(cid:48) f , V tg (cid:48) f are respectively STFTs of f computed with windows n (cid:55)→ g (cid:48) [ n ] , ( tg )[ n ] , g (cid:48)(cid:48) [ n ] and ( tg (cid:48) )[ n ] . In that context, ˆ q f [ n, k ] = (cid:60) { ˜ q f [ n, k ] } consists of an estimate of the modulation of theclosest mode to [ n, k ] in the TF plane. To extract the first ridge MB-RD uses the same recurring principleas S-RD introduced in Section II-C, but, instead of defining ϕ with B f as upper bound for the modulation,MB-RD uses ˆ q f and a constant C > , meaning (10) is replaced by: ϕ [ n + 1] := argmax k (cid:26) | V gf [ n + 1 , k ] | , ϕ [ n ] + (cid:100) ˆ q f [ n, ϕ [ n ]] NL (cid:99) − C ≤ k ≤ ϕ [ n ] + (cid:100) ˆ q f [ n, ϕ [ n ]] NL (cid:99) + C (cid:27) , (12)in which (cid:100) X (cid:99) is the closest integer to X , the user-defined constant C compensating for potential estimationerrors. The motivation for (12) lies in the fact that ˆ q f is a first order estimate of φ (cid:48)(cid:48) corresponding to thelocal orientation of the ridge.Though MB-RD improves S-RD in that it is more adaptive, both techniques are based on the assumptionthat a mode generates a significant local maximum along the frequency axis at each time instant, which September 29, 2020 DRAFT is not the case in heavy noise situations. Another limitation is that since the ridges are extracted oneafter the other using a peeling algorithm [13], if RD fails for one mode then the detection process willalso most probably fail for the next ones. To deal with this issue, we propose, in the following section,to introduce the concept of relevant ridge portions (RRPs) that will be subsequently used to define ournew RD technique. It will consist in computing all the ridges simultaneously, thus getting rid of thetraditional peeling algorithm.IV. D
EFINITION OF R ELEVANT R IDGE P ORTIONS
Let us consider, for the sake of simplicity that g is the Gaussian window g [ n ] = e − π n σ L , for whichit can be shown that if f is a linear chirp with constant amplitude A , i.e. f [ n ] = Ae iπφ [ n ] with φ (cid:48)(cid:48) aconstant function, one has [17]: | V gf [ n, k ] | ≈ ALσ (1 + σ φ (cid:48)(cid:48) [ n ] ) − e − π σ k LN − φ (cid:48) [ n ])21+ σ φ (cid:48)(cid:48) [ n ]2 , (13)whose standard deviation is: √ πσ (cid:112) σ φ (cid:48)(cid:48) [ n ] ≈ std LC [ n, ϕ [ n ]] := 1 √ πσ (cid:113) σ ˆ q f [ n, ϕ [ n ]] . (14)We then define an interval corresponding to this standard deviation around ϕ [ n ] , the global maximum of | V gf | along the frequency axis at time index n , namely: I LC [ n, ϕ [ n ]] = (cid:20) (cid:98) ϕ [ n ] − std LC [ n, ϕ [ n ]] NL (cid:99) , (cid:100) ϕ [ n ] + std LC [ n, ϕ [ n ]] NL (cid:101) (cid:21) . (15)For a MCS defined as in (4), at each time instant n , | V gf | admits P local maxima along the frequencyaxis, and for each of them, one can define an interval I LC by considering that each mode can be locallyapproximated by a linear chirp. Now, if some noise is added to f to obtain ˜ f , some other local maximanot associated with a mode arise in the TF plane. As the magnitude of | V g ˜ f | at a local maximum is usuallylarger when it corresponds to a mode rather than to noise, we strengthen these maxima by first computingat each local maximum [ n, k ] , the interval I LC [ n, k ] , and then by defining the auxiliary variable: S LC [ n, k ] = (cid:88) k ∈ I LC [ n,k ] | V g ˜ f [ n, k ] | . (16)Using the variable S LC , we then construct P ridge portions starting at time index n , by considering ( ϕ p [ n ]) p =1 , ··· ,P such that ( S LC [ n , ϕ p [ n ]]) p =1 , ··· ,P are the P largest local maxima at time index n .From each of these points, we define ridge portions using a variant of (12), by introducing first ψ p [ n +1] := ϕ p [ n ] + (cid:100) ˆ q ˜ f [ n , ϕ p [ n ]] NL (cid:99) , and then ϕ p [ n + 1] := argmin k {| k − ψ p [ n + 1] | , s.t. S LC [ n + 1 , k ] is a local maximum } . (17) September 29, 2020 DRAFT
The differences with (12) is that we use S LC instead of V g ˜ f and that, for mode p , we look for the closestmaximum to ψ p [ n + 1] along the frequency axis of S LC , thus avoiding the use of an extra parameter C . Then [ n + 1 , ϕ p [ n + 1]] belongs to the ridge starting at [ n , ϕ p [ n ]] only if S LC [ n + 1 , ϕ p [ n + 1]] is among the P + 1 largest maxima of S LC at time index n + 1 , otherwise the ridge construction isstopped. To consider the P + 1 largest maxima is motivated by the fact that when a local maximumcorresponding to a ridge is destroyed by the noise it often gives rise to two maxima as illustrated inFig. 1 (b), therefore the P . largest maxima not to miss any information. Then, we add one to take intoaccount the fact that locally some maxima related to noise may be larger than some others related tosignal.The procedure is then iterated forward and backward (from n ) until the construction of each of the P ridge portions is stopped. Thus, with such a formulation, each initialization point n leads to P ridgeportions, with a priori different lengths. It is important to mention that, contrary to MB-RD and S-RD,we do not construct the ridges one by one using a so-called peeling algorithm as in [13], [14], [17], [20],but alternatively construct P ridge portions at a time, and do not impose that the ridge portions startingat time n last for the whole time span.Now, investigating the stability of these ridge portions with respect to the initialization point n , weconsider other initialization points around n , and compute the associated P ridge portions using the justdescribed procedure. If a ridge portion computed at time index n is present in the set of the P ridgeportions for s successive initialization time indices including n , the latter consists of a relevant ridgeportion (RRP) at scale s . V. RD B ASED ON
RRP S In this section, we are going to build a new RD technique based on RRPs. For that purpose, we firstintroduce a procedure to gather the RRPs associated with the same mode, and then explain how to buildRD from these gathered RRPs.
A. Gathering RRPs Based on TF Location
Our motivation to gather RRPs is that though heavy noise splits the ridge associated with a mode inmany different RRPs, these remain close to each other in the TF plane, following [21] in that matter.Our goal is thus to associate to a mode a set of RRPs by defining the notion of intersection for
RRP s .For that purpose, let us denote ( R j ) j ∈ J the set of RRPs. For each R i , we denote [ n , k ] and [ n , k ] September 29, 2020 DRAFT the beginning and the end of R i , then define I P H [ k ] = (cid:104) (cid:98) k − NL √ πσ (cid:99) , (cid:100) k + NL √ πσ (cid:101) (cid:105) ( P H standing for pure harmonic ). Using these notations, we define the neighborhood of R i , as follows: N R i = { [ n, l ] , n ∈ [ n − ∆ t, n ] , l ∈ I P H [ k ] } (cid:91) { [ n, l ] , [ n, k ] ∈ R i and l ∈ I P H [ k ] } (cid:91) { [ n, l ] , n ∈ [ n , n + ∆ t ] , l ∈ I P H [ k ] } . (18)At each location on a RRP, the frequency neighborhood consists of the interval corresponding to 3 timesthe standard deviation computed assuming no modulation (because to take into account the modulationin that context may lead to instabilities), and at each end of an RRP we extend this neighborhood usingthe time parameter ∆ t , which corresponds to the maximal time distance, measured in number of timebins, between successive RRPs associated with the same mode.In that context, we say that R i and R j are intersecting if N R i (cid:84) N R j (cid:54) = ∅ . All the intersecting RRPsare then gathered together to obtain a set ( M q ) q ∈ Q , that is each M q is the union of some RRPs, andwe then define the energy of M q as: R ( M q ) = (cid:88) [ m,l ] ∈M q S LC [ m, l ] . (19)All the points in M q are then given the energy R ( M q ) . Finally, we only keep in M q only one TF pointper time index: if there are several TF points in M q associated with one time index we only keep in M q the one associated with the most energetic RRP (the energy of R i being defined as R ( R i ) ). B. Definition of New RD Technique
For the sake of simplicity let us now assume that the set M = ( M q ) q ∈ Q is ranked according todecreasing energies. Starting from q = 0 , we consider the first index q such that P elements in ( M q ) q =0 , ··· ,q contains some points associated with the same time index. Let us denote ( M p ) p =1 , ··· ,P , thecorresponding elements of M , ranked according to increasing frequencies. To obtain a first approximationof the p th ridge, we compute the following polynomial approximation: D p = argmin D (cid:88) [ n,k ] ∈M p | k − D ( nL ) | R ( M p ) , p = 1 , · · · , P, (20)where D is some polynomial of degree d . We then associate with each polynomial D p a TF regiondefined as: T F D p = (cid:110) [ n, k ] , ≤ n ≤ L − , k ∈ I P H ( D p ( nL )) , (cid:111) (21)in which I P H ( D p ( nL )) corresponds to the definition given in the previous section except we use roundbrackets instead of square brackets since D p ( nL ) is not located on the frequency grid. September 29, 2020 DRAFT0
If the TF regions ( T F D p ) p =1 , ··· ,P intersect some elements in M other than ( M p ) p =1 , ··· ,P thenthe corresponding RRPs are added, with the corresponding energy, in minimization (20), and updatedapproximation polynomials are computed. Such a procedure is iterated until no new elements in M areintersected by the updated ( T F D p ) p =1 , ··· ,P ( for the sake of simplicity, we still denote by ( D p ) p =1 , ··· ,P the set of polynomials obtained after these iterations). Finally, we define the energy corresponding tothese polynomial approximations as follows: E := P (cid:88) p =1 L − (cid:88) n =0 S LC ( n, D nL )) , (22)in which the round brackets are used to mean D p ( nL ) is not necessarily on the frequency grid. The set ( D p ) p =1 , ··· ,P consists of a first approximation for the P ridges. Then, we consider the next index q larger than q such that there exists a subset ( M p ) p =1 , ··· ,P of ( M q ) q =0 , ··· ,q having one time index incommon and different from ( M p ) p =1 , ··· ,P . Solving the updated optimization problem for each p : D p = argmin D (cid:0) R ( M p ) (cid:1) (cid:88) [ n,k ] ∈M p | k − D ( nL ) | + (cid:0) R ( M p ) (cid:1) (cid:88) [ n,k ] ∈M p | k − D ( nL ) | , (23)and then iterating as explained before, we get an updated set of approximation polynomials which we call ( D p ) p =1 , ··· ,P , to which we associate the energy E following (22). If E > E , then the approximationpolynomials become the set ( D p ) p =1 , ··· ,P , otherwise one keeps the original set ( D p ) p =1 , ··· ,P . Such aprocedure is iterated until all the elements in M have been considered, and we end up with a set ofpolynomials called ( D finp ) p =1 , ··· ,P . We finally rerun this procedure starting with ( D finp ) p =1 , ··· ,P as initialpolynomials values and replacing I P H ( D ( nL )) by ˜ I LC ( n, D finp ( nL )) , defined by: ˜ I LC ( n, D finp ( nL )) = (cid:20) (cid:98) D finp ( nL )) − std LC ( n, D finp ( nL )) NL (cid:99) , (cid:100) D finp ( nL )) + 3 std LC ( n, D finp ( nL )) NL (cid:101) (cid:21) , (24)with std LC ( n, D finp ( nL )) := √ πσ (cid:113) σ ( D finp ) (cid:48) ( nL ) . This post-processing step enables to take intoaccount some ridge portions that were wrongly left apart. Note that if the final set of polynomials obtainedat the end the procedure is such that the polynomials are intersecting, we consider instead the set of nonintersecting polynomials obtained with the just described procedure and associated with the largest energy.For the sake of simplicity, we still denote by ( D finp ) p =1 , ··· ,P this final set of approximation polynomials.In the sequel, we denote [ F − p [ n ] , F + p [ n ]] := ˜ I LC ( n, D finp ( nL )) , and we omit p in the case of amonocomponent signal. Furthermore, the set of RRPs used in the construction of ( D finp ) p =1 , ··· ,P aredenoted by ( M finp ) p =1 , ··· ,P . From now on, the proposed RD method is called RRP-RD. Note that forparticular applications, in particular when the phases of the modes contain fast oscillations, polynomialapproximation may be replaced by spline approximation, as illustrated in Section VI-C on a gravitationalwave signal. September 29, 2020 DRAFT1 (a) (b) (c) (d) (e) (f)
Fig. 2. (a): STFT of a noisy linear chirp (SNR = -10 dB); (b): STFT of a noisy mode with cosine phase (SNR = -10 dB); (c):STFT of a noisy mode with exponential phase (SNR = -10 dB); (d): ridge D fin detected for the signal displayed in (a), withthe set of RRPs M fin used in the computation of the ridge D fin , along with F − [ n ] and F + [ n ] for each n ; (e): same as (d)but for the signal whose STFT is displayed in (b); (f): same as (d) but for the signal whose STFT is displayed in (c) We display in Fig. 2 an illustration of the proposed RD on a noisy linear chirp, a noisy mode withcosine phase, and a noisy mode with exponential phase (displayed on the first row of that figure). In eachcase, the noise is a complex Gaussian noise and the noise level corresponds to an input SNR of − dB(the input SNR is defined as SN R ( f, ˜ f ) = 20 log (cid:16) (cid:107) f (cid:107) (cid:107) ˜ f − f (cid:107) (cid:17) ). On the second row of Fig. 2, we display D fin along with the RRPs used for its construction, namely M fin , and also the interval [ F − [ n ] , F + [ n ]] for each n . These first illustrations show that the proposed procedure seems to be efficient in very noisysituations ; this will be further quantified in Section VI. We also illustrate the procedure on the twomode signals of Fig. 3, which consist either of two linear chirps, two modes with cosine phase, and alast signal made of a linear chirp plus an exponential chirp. The input SNR is still fixed to − dB. Weagain notice that RRP-RD seems to be well adapted to deal with MCS in the presence of heavy noise,regardless of the modulation of the modes. September 29, 2020 DRAFT2 (a) (b) (c) (d) (e) (f)
Fig. 3. (a): STFT of two noisy linear chirps (SNR = -10 dB); (b): STFT of two noisy modes with cosine phases, with differentmodulation (SNR = -10 dB); (c): STFT of a signal made of a linear chirp and a mode with exponential phase (SNR = -10dB); (d): ridges ( D finp ) p =1 , detected for the signal displayed in (a), along with ( M finp ) p =1 , . We also display the interval [ F − p [ n ] , F + p [ n ]] , for each n and p ; (e): same as (d) but for the signal whose STFT is displayed in (b); (f): same as (d) but forthe signal whose STFT is displayed in (c). C. Mode Reconstruction
Having defined RRP-RD, we explain how we proceed with mode reconstruction. A simple strategyconsists of summing the coefficients in [ F − p [ n ] , F + p [ n ]] for each n , namely f p [ n ] ≈ g [0] N (cid:88) k ∈ [ F − p [ n ] ,F − p [ n ]] V g ˜ f [ n, k ] . (25)To take into account the fact that the intervals [ F − p [ n ] , F + p [ n ]] and [ F − p +1 [ n ] , F + p +1 [ n ]] may intersect forsome time index, in such instances these intervals are replaced by [ F − p [ n ] , F + p [ n ]+ F − p +1 [ n ]2 ] and [ F + p [ n ]+ F − p +1 [ n ]2 , F + p +1 [ n ]] respectively. This reconstruction procedure is denoted by RRP-MR in the sequel.An alternative technique for mode reconstruction was recently proposed in [24], and is based on alinear chirp approximation for the mode. In our context, the technique proposed in [24] would consider k := (cid:98) D finp [ n ] NL (cid:101) , and then the following approximation for the STFT of f p (see [24] for details): V gf p [ n, k ] ≈ V g ˜ f [ n, k ] e πσ i ( Dfinp ) (cid:48) ( nL ) σ Dfinp ) (cid:48) ( nL )2 σ [ L ( k − k ) N ( L ( k k ) N − D finp ( nL )) ] . (26) September 29, 2020 DRAFT3
If one denotes ˜ V gf p the estimation of V gf p given by (26), the retrieval of f p can be carried out through: f p [ n ] ≈ g (0) N N − (cid:88) k =0 ˜ V gf p [ n, k ] . (27)This technique applied to RRP-RD will be denoted by RRP-MR-LCR (LCR standing for linear chirpreconstruction ).To compare RRP-MR to reconstruction based on S-RD and MB-RD, it is natural to consider for eachtime n the interval: ¯ I LC [ n, ϕ p [ n ]] = (cid:20) (cid:98) ϕ p [ n ] − std LC [ n, ϕ p [ n ] NL (cid:99) , (cid:100) ϕ p [ n ] + 3 std LC [ n, ϕ p [ n ]] NL (cid:101) (cid:21) := [ F − p [ n ] , F + p [ n ]] , (28)which is the same as (24), except that D finp is replaced by ϕ p , and that the modulation is estimatedby ˆ q ˜ f [ n, ϕ p [ n ]] instead of ( D finp ) (cid:48) . Then the reconstruction of mode p is carried out by means of theformula: f p [ n ] ≈ g [0] N (cid:88) k ∈ [ F − p [ n ] ,F + p [ n ]] V g ˜ f [ n, k ] . (29)For the sake of a fair comparison with RRP-MR, if [ F − p [ n ] , F + p [ n ]] intersects [ F − p +1 [ n ] , F + p +1 [ n ]] these arereplaced by [ F − p [ n ] , F + p [ n ]+ F − p +1 [ n ]2 ] and [ F + p [ n ]+ F − p +1 [ n ]2 , F + p +1 [ n ]] respectively. This type of reconstructiontechnique used with S-RD or MB-RD are called S-MR and MB-MR respectively.VI. N UMERICAL A PPLICATIONS
In this section, we first investigate the quality of RRP-RD compared with S-RD and MB-RD onmulticomponent simulated signals, in Section VI-A. Then, we assess the quality of the just describedmode reconstruction techniques in Section VI-B, on the same simulated signals. We finally investigate thebehavior of RRP-RD on a gravitational-wave signal in Section VI-C (all the Matlab programs enabling thefigures reproduction is available at https://github.com/Nils-Laurent/RRP-RD). Note that to compute theSTFT in all cases, we use a Gaussian window such that its standard deviation corresponds to the minimalR´enyi entropy [25], which is proved to be a good trade-off minimizing interference between the modesin the TF plane [26]. We are aware of recent works on adaptive window determination as developed in[27], [28], but though to choose the window adaptively may ease ridge determination, adaptation requirethe knowledge of a rough version of the ridges, which is very critical in noisy situations.
September 29, 2020 DRAFT4
A. Comparison on RRP-RD, S-RD and MB-RD on Simulated Signals
Our goal in this section is to show that RRP-RD is much more relevant in noisy situations than S-RDand MB-RD. For that purpose we compute RD results for the signals of Fig. 3, when the input SNRvaries between -10 dB and 4 dB. We choose to consider only low input SNRs because at higher SNRsthe modes are associated with a local maximum at each time instant, and the ridge detection is lesschallenging.When the signal is made of two linear chirps as in Fig. 3 (a), the detection results depicted in Fig.4 (a) tell us that RRP-RD performs much better than S-RD and MB-RD. Note that the approximationpolynomials used in RRP-RD are both with degree 5 (to reduce their orders does not significantly changethe results). For that example L = 4096 and ∆ t is set to , meaning the maximal time between tworidge portions associated with the same mode is seconds and s defining the RRPs is set to .Comparing the results associated with S-RD and MB-RD, we notice that the former behaves better thanthe latter at low noise level and similarly at a higher noise level. Such a behavior is related to the factthat the modulation operator ˆ q ˜ f is not accurate at locations where the ridge is split, and MB-RD failsto follow the different ridge portions corresponding to a mode (in these simulations, C is set to ). Onthe contrary, since S-RD uses the fixed modulation parameter B f (here set to ), it is able to followridge portions that are not necessarily connected which explains why it works better that MB-RD at highnoise level. On that example, for positive input SNRs, since S-RD and MB-RD lead to the same results,local maxima along the frequency axis of the spectrogram associated with each mode exist at each timeinstant: the gain in output SNR brought by RRP-RD arises from polynomial approximation. For negativeinput SNRs, such maxima no longer exist for each time instant and the gain brought by RRP-RD isrelated to the relevance of the grouping of RRPs. -10 -5 001020304050607080 (a) -10 -5 0010203040506070 (b) -10 -5 0010203040506070 (c) Fig. 4. (a): Comparison between S-RD, MB-RD and RRP-RD, for the signal of Fig. 3 (a): for each mode p = 1 , , computationof output SNR between IF φ (cid:48) p and estimated IF with respect to input SNR (the results are averaged over 30 realizations of thenoise); (b): same as (a) but for the signal of Fig. 3 (b); (c): same as (a) but for the signal of Fig. 3 (c) September 29, 2020 DRAFT5
Now, analyzing RD results for the signal of Fig. 3 (b), we remark that the conclusions for mode f are similar to those for linear chirps. Indeed, at low input SNRs RRP-RD performs much better than theother two techniques since it better copes with the absence of a local maximum along the frequency axisclose to the IF locations of the modes. On the contrary, when such maxima exist, namely for input SNRssuch that S-RD and MB-RD coincide, the gain in terms of output SNR with RRP-RD is less importantthan in the case of linear chirps since to approximate a cosine phase with a polynomial of degree 5 leadsto larger errors. As for mode f , which is much more modulated that f , we see that at a high noiselevel, RRP-RD is still much better than the other two techniques, but when the noise level decreases,S-RD behaves better than RRP-RD due to inaccuracy in IF approximation using a polynomial of degree5. In such a case, since the modulation operator ˆ q ˜ f is more sensitive to noise when the mode is moremodulated, MB-RD behaves worse than the other two techniques.Finally, regarding the signal of Fig. 3 (c), we only comment on the results related to the mode withexponential phase, for which we again notice that RRP-RD behaves much better than the other two testedmethods at low input SNR, meaning the grouping of RRPs is still performing well in that case. Whenthe noise level decreases, namely when f generates a local maximum along the frequency axis at eachtime instant, RRP-RD and S-RD behaves similarly (an exponential phase can be accurately approximatedby a polynomial with degree 5). On the contrary, the modulation operator ˆ q ˜ f is not accurate enough tofollow the local maxima along the frequency axis of the spectrogram of f . -10 -5 0-505101520 (a) -10 -5 0-50510152025 (b) -10 -5 0-50510152025 (c) Fig. 5. (a): For each mode p = 1 , , output SNR between mode f p of signal of Fig. 3 (a) and reconstructed mode for eachmethods, namely S-MR, MB-MR, RRP-MR, and RRP-MR-LCR (the results are averaged over 30 noise realizations); (b): samebut with signal of Fig. 3 (b); (c): same but with signal of Fig. 3 (c); B. Comparison of the Mode Reconstruction Techniques
In this section, we investigate the quality of mode reconstrutcion techniques S-MR, MB-MR, RRP-MR, and RRP-MR-LCR still for the signals of Fig. 3. Looking at the results of Fig. 5 (a) related to the
September 29, 2020 DRAFT6 signal of Fig. 3 (a), it transpires that while the RRP-RD is much better than S-RD and MB-RD at highnoise levels this is not reflected in mode reconstruction. This is due to the fact that too much noise isincluded in intervals [ F − p [ n ] , F + p [ n ]] and [ F − p [ n ] , F + p [ n ]] for each n , and consequently the coefficientsused for reconstrution are not relevant. Alternatively, when one considers RRP-MR-LCR, the results aresignificantly improved, meaning that at low SNR one had rather use the information on the ridge toreconstruct rather than summing the coefficients in the vicinity of the ridge. Such a conclusion remainstrue for the mode f of the signal of Fig. 3 (b), and also for the mode f of that signal, but only at low inputSNRs. Indeed, at high input SNRs, RRP-MR-LCR is hampered by inaccuracy in phase approximationwith a polynomial of degree 5, which entails larger errors in IF and CR estimations in the linear chirpapproximation involved in this technique. Finally, for the signal of Fig. 3 (c), the reconstruction results arealways much better with RRP-MR-LCR than with the other tested techniques (the ridge approximationprovided by RRP-RD for the exponential chirp being of good quality). To conclude on that part, weshould say that to reconstruct the modes in very noisy scenarios, one had rather use the information onthe ridge to construct a linear chirp approximation than sum the coefficients in the TF plane. C. Application to Gravitational-Wave Signals
In this section, we investigate the applicability of RRP-RD and RRP-MR-LCR to a transient gravitational-wave signal, generated by the coalescence of two stellar-mass black holes. This event, called
GW150914 ,was detected by the LIGO detector Hanford, Washington and closely matches the waveform AlbertEinstein predicted almost 100 years ago in his general relativity theory for the inspiral, the merger of apair of black holes and the ringdown of the resulting single black hole [29]. The observed signal has alength of 3441 samples in T = 0 . seconds. (a) (b) (c) (d) Fig. 6. (a): STFT of the Hanford signal; (b): Spline associated with RRP-RD (tol = 3) along with the ridge portion M fin ; (c):Denoised STFT used by RRP-MR-LCR technique; (d): Signal reconstructed using RRP-MR-LCR along with the one predictedby numerical relativity. September 29, 2020 DRAFT7
We first display in Fig. 6 (a), the STFT of such a signal. Since the latter is first very slightlymodulated during the inspiral phase, then behaves like an exponential chirp during the merging andas a fast decreasing phase during the ringdown, to try to approximate the instantaneous frequency witha polynomial in RRP-RD is not appropriate. Therefore, keeping the same formalism we use splineapproximation instead. This means that, after the merging step, we consider the cubic spline s s = argmin s ∈ S M (cid:90) T ( s (cid:48)(cid:48) ( t )) dt, (30)in which S M is the spline space defined at the knots M (here we omit the subscript p because welook for a single mode), under the constraints: (cid:88) [ n,k ] ∈M | k − s ( nL ) | R ( M ) ≤ D, (31)where D is a tolerance parameter that can easily related to a number of frequency bins: tol = R ( M ) (cid:113) D M ,where M denotes the cardinal of M , should be of the order of a few frequency bins. Then, the sametype of approach as in the polynomial approximation is carried out, integrating new ridge portions inthe minimization process, to obtain in the end the spline s fin corresponding to the set of ridge portions M fin . The spline obtained using RRP-RD along with the set M fin corresponding to the STFT of Fig.6 (a) is displayed in Fig. 6 (b). Then, at each point on the spline, one can define a denoised STFT basedon the linear chirp approximation following (26) in which D finp is replaced by s fin . Such a STFT isdisplayed in Fig. 6 (c). Finally, the signal reconstructed with RRP-MR-LCR is displayed in Fig. 6 (d)along with the signal given by the numerical relativity [30], showing a great similarity between the twosignals. In this respect, to investigate the quality of mode reconstruction, we compute the SNR betweenthe signals obtained with the different reconstruction techniques and that given by the numerical relativity.The results are displayed in Table I, showing that RRP-MR-LCR behaves slightly better than RRP-MR,and that each method is only slightly sensitive to the parameter tol. tol RRP-MR
RRP-MR-LCR
BETWEEN THE SIGNALS OBTAINED WITH
RRP-MR OR RRP-MCR-LR
AND THE SIGNAL GIVEN BY THE NUMERICALRELATIVITY
The results obtained here are interesting in that they reflect that RRP-RD enables the detection of theringdown. Similar results were obtained using higher order syncrosqueezed STFT [31] or second order
September 29, 2020 DRAFT8 synchrosqueezed continuous wavelet transform [32], but we here show that if one uses a robust ridgedetector as RRP-RD, it is not necessary to reassign the transform to detect the ringdown.VII. C
ONCLUSION
In this paper, we have introduced a novel technique to detect the ridges made by the modes of amulticomponent signal in the time-frequency plane. Our focus was to design the technique in sucha way that it enables the computation of the ridges in very noisy situations. For that purpose we havebrought about the notion of relevant ridge portions, which we subsequently used in our ridge detector. Theproposed technique show significant improvements over state-of-the-art methods based on time-frequencyrepresentations on simulated signals, and is also interesting to analyze gravitational-wave signals. Someremaining limitations of the present work that it cannot deal with crossing modes and requires that thenumber of modes is fixed for the whole signal duration. In a near a future, we will investigate how toadapt our algorithm to such situations. R
EFERENCES [1] R. Gribonval and E. Bacry, “Harmonic decomposition of audio signals with matching pursuit,”
IEEE Transactions onSignal Processing , vol. 51, no. 1, pp. 101–111, 2003.[2] Y.-Y. Lin, H.-t. Wu, C.-A. Hsu, P.-C. Huang, Y.-H. Huang, and Y.-L. Lo, “Sleep apnea detection based on thoracic andabdominal movement signals of wearable piezoelectric bands,”
IEEE journal of biomedical and health informatics , vol. 21,no. 6, pp. 1533–1545, 2017.[3] C. L. Herry, M. Frasch, A. J. Seely, and H.-T. Wu, “Heart beat classification from single-lead ecg using the synchrosqueezingtransform,”
Physiological Measurement , vol. 38, no. 2, pp. 171–187, 2017.[4] P. Flandrin,
Time-frequency/time-scale analysis . Academic Press, 1998, vol. 10.[5] B. Boashash,
Time frequency signal analysis and processing - A comprehensive reference . Gulf Professional Publishing,2003.[6] L. Stankovic, M. Dakovic, and T. Thayaparan,
Time-frequency signal analysis with applications . Artech house, 2014.[7] F. Auger, P. Flandrin, Y.-T. Lin, S. McLaughlin, S. Meignen, T. Oberlin, and H.-T. Wu, “Time-frequency reassignmentand synchrosqueezing: An overview,”
IEEE Signal Processing Magazine , vol. 30, no. 6, pp. 32–41, 2013.[8] R. Carmona, W. Hwang, and B. Torresani, “Characterization of signals by the ridges of their wavelet transforms,”
IEEETransactions on Signal Processing , vol. 45, no. 10, pp. 2586–2590, Oct 1997.[9] ——, “Multiridge detection and time-frequency reconstruction,”
IEEE Transactions on Signal Processing , vol. 47, no. 2,pp. 480–492, Feb 1999.[10] L. Stankovi´c, “A measure of some time–frequency distributions concentration,”
Signal Processing , vol. 81, no. 3, pp.621–631, 2001.[11] L. Stankovic, M. Dakovic, and V. Ivanovic, “Performance of spectrogram as if estimator,”
Electronics Letters , vol. 37,no. 12, pp. 797–799, 2001.[12] S. Meignen and D.-H. Pham, “Retrieval of the modes of multicomponent signals from downsampled short-time fouriertransform,”
IEEE Transactions on Signal Processing , vol. 66, no. 23, pp. 6204–6215, 2018.
September 29, 2020 DRAFT9 [13] M. A. Colominas, S. Meignen, and D.-H. Pham, “Fully adaptive ridge detection based on stft phase information,”
IEEESignal Processing Letters , 2020.[14] I. Daubechies, J. Lu, and H.-T. Wu, “Synchrosqueezed wavelet transforms: an empirical mode decomposition-like tool,”
Applied and Computational Harmonic Analysis , vol. 30, no. 2, pp. 243–261, 2011.[15] G. Thakur, E. Brevdo, N. S. FuˇcKar, and H.-T. Wu, “The synchrosqueezing algorithm for time-varying spectral analysis:Robustness properties and new paleoclimate applications,”
Signal Processing , vol. 93, no. 5, pp. 1079–1094, May 2013.[16] T. Oberlin, “Analyse de signaux multicomposantes : contributions `a la d´ecomposition modale empirique, aux repr´esentationstemps-fr´equence et au synchrosqueezing,” Ph.D. dissertation, Math´ematiques Appliqu´ees de l’Universit´e de Grenoble,Novembre 2013.[17] R. Behera, S. Meignen, and T. Oberlin, “Theoretical analysis of the second-order synchrosqueezing transform,”
Appliedand Computational Harmonic Analysis , vol. 45, no. 2, pp. 379–404, 2018.[18] C. Li and M. Liang, “A generalized synchrosqueezing transform for enhancing signal time–frequency representation,”
Signal Processing , vol. 92, no. 9, p. 2264–2274, Sep 2012.[19] S. Meignen, D.-H. Pham, and S. McLaughlin, “On demodulation, ridge detection, and synchrosqueezing for multicomponentsignals,”
IEEE Transactions on Signal Processing , vol. 65, no. 8, pp. 2093–2103, 2017.[20] M. A. Colominas, S. Meignen, and D.-H. Pham, “Time-frequency filtering based on model fitting in the time-frequencyplane,”
IEEE Signal Processing Letters , vol. 26, no. 5, pp. 660–664, 2019.[21] I. Djurovi´c and L. Stankovi´c, “An algorithm for the wigner distribution based instantaneous frequency estimation in a highnoise environment,”
Signal Processing , vol. 84, no. 3, pp. 631–643, 2004.[22] X. Zhu, Z. Zhang, J. Gao, and W. Li, “Two robust approaches to multicomponent signal reconstruction from stft ridges,”
Mechanical Systems and Signal Processing , vol. 115, pp. 720–735, 2019.[23] S. Chen, Z. Peng, Y. Yang, X. Dong, and W. Zhang, “Intrinsic chirp component decomposition by using fourier seriesrepresentation,”
Signal Processing , vol. 137, pp. 319–327, 2017.[24] N. Laurent and S. Meignen, “A novel time-frequency technique for mode retrieval based on linear chirp approximation,”
IEEE Signal Processing Letters , 2020.[25] R. G. Baraniuk, P. Flandrin, A. J. Janssen, and O. J. Michel, “Measuring time-frequency information content using ther´enyi entropies,”
IEEE Transactions on Information theory , vol. 47, no. 4, pp. 1391–1409, 2001.[26] S. Meignen, M. Colominas, and D.-H. Pham, “On the use of r´enyi entropy for optimal window size computation inthe short-time fourier transform,” in
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and SignalProcessing (ICASSP) . IEEE, 2020, pp. 5830–5834.[27] L. Li, H. Cai, H. Han, Q. Jiang, and H. Ji, “Adaptive short-time fourier transform and synchrosqueezing transform fornon-stationary signal separation,”
Signal Processing , vol. 166, p. 107231, 2020.[28] L. Li, H. Cai, and Q. Jiang, “Adaptive synchrosqueezing transform with a time-varying parameter for non-stationary signalseparation,”
Applied and Computational Harmonic Analysis , 2019.[29] B. P. Abbott, R. Abbott, T. Abbott, M. Abernathy, F. Acernese, K. Ackley, C. Adams, T. Adams, P. Addesso, R. Adhikari et al. , “Observation of gravitational waves from a binary black hole merger,”
Physical review letters , vol. 116, no. 6, p.061102, 2016.[30] B. P. Abbott, R. Abbott et al. , “GW151226: Observation of gravitational waves from a 22-solar-mass binary black holecoalescence,”
Physical Review Letters , vol. 116, no. 24, p. 241103, 2016.[31] D. H. Pham and S. Meignen, “High-order synchrosqueezing transform for multicomponent signals analysis-with anapplication to gravitational-wave signal.”
IEEE Trans. Signal Processing , vol. 65, no. 12, pp. 3168–3178, 2017.
September 29, 2020 DRAFT0 [32] S. Meignen, T. Oberlin, and D.-H. Pham, “Synchrosqueezing transforms: From low-to high-frequency modulations andperspectives,”
Comptes Rendus Physique , vol. 20, no. 5, pp. 449–460, 2019., vol. 20, no. 5, pp. 449–460, 2019.