Improved multipath time delay estimation using cepstrum subtraction
IIMPROVED MULTIPATH TIME DELAY ESTIMATION USING CEPSTRUM SUBTRACTION
Eric L. Ferguson ∗ , Stefan B. Williams Australian Centre for Field RoboticsThe University of Sydney, Australia
Craig T. Jin
Computing and Audio Research LaboratoryThe University of Sydney, Australia
ABSTRACT
When a motor-powered vessel travels past a fixed hydrophone in amultipath environment, a Lloyd’s mirror constructive/destructive in-terference pattern is observed in the output spectrogram. The powercepstrum detects the periodic structure of the Lloyd’s mirror patternby generating a sequence of pulses (rahmonics) located at the funda-mental quefrency (periodic time) and its multiples. This sequence isreferred to here as the ‘rahmonic component’ of the power cepstrum.The fundamental quefrency, which is the reciprocal of the frequencydifference between adjacent interference fringes, equates to the mul-tipath time delay. The other component of the power cepstrum isthe non-rahmonic (extraneous) component, which combines with therahmonic component to form the (total) power cepstrum. A dataprocessing technique, termed ‘cepstrum subtraction’, is described.This technique suppresses the extraneous component of the powercepstrum, leaving the rahmonic component that contains the desiredmultipath time delay information. This technique is applied to realacoustic recordings of motor-vessel transits in a shallow water envi-ronment, where the broadband noise radiated by the vessel arrives atthe hydrophone via a direct ray path and a time-delayed multipath.The results show that cepstrum subtraction improves multipath timedelay estimation by a factor of two for the at-sea experiment.
Index Terms — time delay estimation, underwater acoustics,cepstrum, source localization, autocorrelation
1. INTRODUCTION
In underwater acoustics, the Lloyd’s mirror effect [1, 2] is observedas an interference pattern in the spectrogram (variation with timeof the spectrum) of a receiving hydrophone. It occurs when thesinusoidal signal components that compose a source of broadbandradiated noise arrive at the receiver via the direct propagation pathand combine with amplitude-scaled time-delayed replicas arrivingvia indirect propagation paths (multipaths). If the two signal arrivalsare in phase , they reinforce each other through coherent addition re-sulting in constructive interference; when they are in antiphase , theycancel and destructive interference occurs. As a result, the receivedintensities of a source’s spectral lines are observed to undergo a pe-riodic, or cyclic, variation with range [3]. The resultant intensity isgiven by A r = A d + A i + 2 A d A i cos( κδl ) , (1)where A d and A i are the amplitudes of the respective direct path andindirect path arrivals, κ = 2 π/λ ( λ denotes wavelength) is the spa-tial frequency (or wavenumber) and δl is the path length difference ∗ The author gratefully acknowledges the contributions provided by theIEEE Oceanic Society Scholarships, the University of Sydney Electrical andInformation Engineering PhD Scholarship, and the Defence Science andTechnology Group Postgraduate Research Scholarship. between the two paths; the last term in (1) is oscillatory due to inter-ference between the direct path and multipath arrivals, which givesrise to a Lloyd’s mirror interference pattern [4]. Destructive interfer-ence fringes occur when κδl = (2 k − π , k = 1 , , , ..., where k is the fringe order (number). For adjacent destructive interferencefringes ( κ k +1 − κ k ) δl = 2 π . (2)Hence from (2), at any given instant of time, the destructive interfer-ence fringes have a uniform spacing in the direction of the frequencyaxis which is given by ( f k +1 − f k ) = cδl = 1 τ i,d , (3)where c is the isospeed of sound travel in the underwater mediumand τ i,d is the time difference in the arrival times of the signal prop-agating via the respective indirect and direct paths. Note that thetemporal frequency difference f k +1 − f k equates to the reciprocalof the multipath time delay.Fundamentally, the cepstrum is the spectrum of a logarithmicspectrum and it is used in practice to detect periodicity in a frequencyspectrum . The quefrency is a measure of the periodicity of the spec-tral ripple and it has the units of (periodic) time. In speech analysis,periodicity in the voice spectrum is observed as a regular spacingin frequency of the fundamental voice frequency (pitch) and its har-monics [5, 6, 7]. If the spectrum of the logarithm of the speech powerspectrum is computed, a peak will appear corresponding to the pitchperiod of the voiced-speech segment being analysed [6].For the present case of a transit of a broadband source past anacoustic sensor in an underwater acoustic multipath propagation en-vironment, periodic structure is observed in the output spectrogramas an ordered arrangement of Lloyd’s mirror interference fringeswhich form a regular periodic pattern. The quefrency can be con-sidered as the periodic time associated with a series of interferencefringes that are uniformly spaced in frequency. The power cepstrumis generated by taking the inverse Fourier transform of the logarithmof the power spectrum [8]. The first rahmonic is the fundamentalperiod, which is equal to the reciprocal of the frequency interval be-tween any two adjacent destructive interference fringes (or, equally,between any two adjacent constructive interference fringes). Hence,cepstrum analysis is suited to detecting and quantifying periodicityin a spectrogram. It is also suited to multipath time delay estima-tion [9, 10].The original contribution of this research is a novel data pro-cessing technique that suppresses cepstrum components extraneousto the rahmonics, thereby improving the estimation of the multipathtime delay for broadband signals arriving at a receiving hydrophone.This improvement in time delay estimation is demonstrated usingreal data collected during an at-sea experiment. a r X i v : . [ c s . S D ] O c t . UNDERWATER ACOUSTIC SENSOR OUTPUT MODEL2.1. Direct path signal combined with indirect path signal For the present case, the signal of interest is continuous broadbandacoustic noise radiated by a small motor-powered boat underway.The output of the hydrophone, which is located above the seafloor,is modelled as a combination of: (a) an underwater acoustic signalpropagating directly from the source to the sensor s ( t ) , where t de-notes time, and (b) an amplitude-scaled time-delayed replica propa-gating via an indirect path that includes a seafloor boundary reflec-tion αs ( t − τ β ) , where α is the attenuation (or boundary reflection)coefficient ( < α ≤ ), and τ β is the time delay. The combinationof the signal with a single echo can be represented as [7] x ( t ) = s ( t ) + αs ( t − τ β ) . (4)The time delay between the indirect propagation path and thedirect propagation path results in a phase difference between the di-rect path and multipath signals equivalent to πfτ β , where f is thesignal frequency. Assuming that the transmission loss is the samefor each propagation path, the instantaneous power of the resultantsignal is given by [7] (cid:12)(cid:12) X ( f ) (cid:12)(cid:12) = (cid:12)(cid:12) S ( f ) (cid:12)(cid:12) (cid:0) α + 2 α cos(2 πfτ β ) (cid:1) , (5)where the power spectrum of the direct path signal (cid:12)(cid:12) S ( f ) (cid:12)(cid:12) is mod-ulated by a periodic function of the frequency. Taking the logarithmof the output power spectrum converts the product in (5) to a sum oftwo components, i.e. log (cid:12)(cid:12) X ( f ) (cid:12)(cid:12) = log (cid:12)(cid:12) S ( f ) (cid:12)(cid:12) + log (cid:0) α + 2 α cos(2 πfτ β ) (cid:1) , (6) where the additive periodic component has a period determined bythe multipath delay τ β . For the periodic structure of a Lloyd’s mirror interference pattern,the quefrency (or periodic time) of the first harmonic is the recipro-cal of the uniform frequency difference between adjacent destructiveinterference fringe minima. For the present case of Lloyd’s mirrorinterference pattern formation resulting from the combination of thedirect and indirect propagation path signals at the sensor, the que-frency of the first rahmonic is the multipath time delay. In otherwords, cepstrum analysis of the hydrophone’s noisy signal outputprovides an estimate of the multipath time delay τ β , which is thedifference in the arrival times of the direct path signal and the in-direct path signal at the hydrophone. The definition of the powercepstrum C XX ( τ ) can be expressed mathematically as [8] C XX ( τ ) = F − (cid:8) log (cid:12)(cid:12) X ( f ) (cid:12)(cid:12) (cid:9) , (7)where τ is the quefrency (or delay time), F − denotes the inverseFourier transform and (cid:12)(cid:12) X ( f ) (cid:12)(cid:12) is equivalent the two-sided powerspectrum of a received signal x ( t ) .In the present case, the power cepstrum is given by the inverseFourier transform F − of (6), which can be written as C ( τ ) = F − (cid:0) log (cid:12)(cid:12) S ( f ) (cid:12)(cid:12) (cid:1) + ∞ (cid:88) n =1 a n δ ( τ − nτ β ) . (8)where a n is the strength of the n th rahmonic and it is given by a n = 2 n ( − n +1 α n for n = 1 , , , ... (9) The rahmonic strengths are observed to alternate in sign and de-crease as the reciprocal of the rahmonic number n in the same wayas the Mercator series. Hence, the power cepstrum of the receivedsignal consists of the power cepstrum of the direct path signal S ( f ) and a sequence of impulse functions (or rahmonics) of strength a n located at quefrency τ β and its multiples. Figure 1(a) shows the com-puted power cepstrum C ( τ ) when a broadband direct-path signal iscombined with a time-delayed replica ( α = 1 , τ β = 224 µ s). Fig-ure 1(b) and Figure 1(c) show the variation with quefrency of the re-spective power cepstrum components; the non-rahmonic componentwhich is given by C ( τ ) = F − (cid:0) log (cid:12)(cid:12) S ( f ) (cid:12)(cid:12) (cid:1) and the rahmonicimpulse component which is given by C ( τ ) = (cid:80) ∞ n =1 a n δ ( τ − nτ β ) . Figure 1(d) shows the autocorrelation as a function of positivelag with the maximum value of the echo’s sinc function correspond-ing to a lag (i.e. the echo time delay) of µ s. In practice, additivenoise biases the autocorrelation function and may obscure the peakscorresponding to propagation time delay difference [11]. Also, whenthere is more that one indirect propagation path, the identification ofindividual propagation paths using autocorrelation analysis becomesmore difficult, and so cepstrum analysis is preferred because eachpropagation path has a series of more readily identifiable cepstrumpeaks located at the rahmonic quefrencies that characterize a specificpropagation path [11]. A novel data processing technique is presented that suppresses cep-strum components extraneous to the rahmonics. From (8), the powercepstrum of the received signal C ( τ ) consists of the non-rahmoniccomponents C ( τ ) derived from stationary process S ( f ) (which re-mains stationary after inverse Fourier transformation [12]), and a se-quence of rahmonic impulse functions C ( τ ) for another (indepen-dent) stationary process [13]. Note that C ( τ ) is independent of thetime delay variable τ β , whereas C ( τ ) is dependent with impulseslocated at quefrency τ β and its multiples.The ensemble average of the power cepstrum over M realiza-tions may be given by M M (cid:88) m =1 C m ( k ) = 1 M M (cid:88) m =1 C ,m ( k ) + 1 M M (cid:88) m =1 C ,m ( k ) , (10)where C ,m ( k ) and C ,m ( k ) are the m th realizations at discrete que-frency k for the non-rahmonic and rahmonic components, respec-tively. For τ β (cid:54) = 0 and assuming τ β is independent of realization m ,then the expected value of C ( k ) at any quefrency is zero, i.e. lim M →∞ M M (cid:88) m =1 C ,m ( k ) = 0 . (11)Therefore, the non-rahmonic component C ( k ) can be estimated bytaking the average of the received power cepstrum ¯ C ( k ) over a suf-ficiently long time (i.e. for a sufficiently large M ), where ¯ C ( k ) = 1 M M (cid:88) l =1 C m ( k ) . (12)Note that the long time-averaged rahmonic contribution at quefrency k is negligible for a sufficiently large M . This approach is consistentwith other ensemble averaging methods [14].The rahmonic component for realization m can be estimated bysubtracting the estimate of the non-rahmonic component from the
012 (a)1012 (b)0 200 400 600 800 10001012 (c) A u t o c o rr e l a t i o n R xx ( τ ) M a g n i t u d e C ( τ ) M a g n i t u d e C ( τ ) M a g n i t u d e C ( τ ) Fig. 1 . (a) Variation with quefrency of the power cepstrum C ( τ ) of a direct-path signal combined with an indirect-path echo α = 1 , τ β = 224 µ s. (b) Variation with quefrency of the power cepstrumcomponent C ( τ ) . (c) Similar to (b), but for the rahmonic impulsecomponent C ( τ ) . (d) Variation with lag of the autocorrelation func-tion for the direct path signal combined with the multipath echo.received power cepstrum. The cepstrum subtraction process is de-scribed mathematically by, ˆ C ,m ( k ) = C m ( k ) − a ( k ) ¯ C ( k ) , (13)where ˆ C ,m ( k ) is an estimate of the rahmonic component for the m th frame at discrete quefrency k , C m ( k ) is the received power cep-strum for the m th frame at quefrency k , and ¯ C ( k ) is the long time-averaged received cepstrum given by (12). The parameter a ( k ) con-trols the amount of the non-rahmonic component that is subtractedfrom the power cepstrum. For full non-rahmonic component sub-traction a ( k ) = 1 and for over-subtraction a ( k ) > , analogousto spectral subtraction [15]. The cepstrum subtraction process is vi-sualized in Figure 1, where the non-rahmonic contribution in (b) issubtracted from the received power cepstrum (a), leaving only therahmonic component in (c).Note that the cepstrum subtraction method has the same formu-lation as cepstrum mean normalization, which is used in speech pro-cessing, but with the addition of the factor a ( k ) [5, 16, 17]. Ratherthan removing channel-effects (distortions) for Mel-frequency cep-strum coefficients, instead cepstrum components extraneous to therahmonics are suppressed. Also the two methods are applied to theoutputs of different signal models.Figure 2 shows various power cepstrograms for a simulatedmotor-boat transit using synthetic broadband radiated noise. The Q u e f r e n c y ( s ) (a)0 50 100 150 200 250Time (s)050100150200250300 Q u e f r e n c y ( s ) (c)050100150200250300 Q u e f r e n c y ( s ) (b) Fig. 2 . Various power cepstrograms for simulated acoustic data,with quefrencies up to µ s. (a) Variation with time of the powercepstrum, (b) Similar to (a), but with cepstrum subtraction wherethe non-rahmonic component is estimated from the entire transit( M = 2700 ), and (c) Similar to (b) but the non-rahmonic com-ponent is estimated from the last seconds ( M = 20 ) of receivedcepstra, which have significant rahmonic contributions that bias theestimate.source/sensor geometry is the same for the at-sea experiment dis-cussed below in subsection 3.1. Figure 2(a) shows the powercepstrum without cepstrum subtraction. The horizontal bandingassociated with the non-rahmonic component is prominent, whichcan be seen to interfere with the rahmonics. Figure 2(b) shows thatthe effect of cepstrum subtraction is to suppress the non-rahmoniccomponent, thus restoring the fidelity of the rahmonic component.Here, M = 2700 which is sufficiently large so that the contributionof the rahmonic component to the time averaged-power cepstrumis negligible. The strengths of the rahmonics relative to the non-rahmonics have increased. In contrast, Figure 2(c) shows that thecepstrum subtraction process has suppressed both the rahmonicand non-rahmonic components because M = 20 is small, i.e. thecontribution of the rahmonic component to the time-average powercepstrum is significiant. Cepstrum subtraction requires then multi-path time delay τ β to be a rapidly changing function of m , otherwisethe rhamonic strengths bias the estimate of the non-rahmonic com-ponent of the power cepstrum, violating (11).
3. EXPERIMENTAL RESULTS
The multipath time delay, or the time difference in the arrivals ofthe signal via the direct propagation path and the signal via the in-direct seafloor-reflected multipath, is estimated using various cep-strum methods and the autocorrelation function. The performanceof the time delay estimation methods are compared using acousticdata collected during an at-sea experiment. .0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0Cepstrum Subtraction Factor455055606570 M A E ( s ) Fig. 3 . Variation with the cepstrum subtraction factor a ( k ) of themean absolute error for multipath time delay estimation. M A E ( s ) Auto CorrelationReceived CepstrumCepstrum Subtraction
Fig. 4 . Variation with signal-to-noise ratio of the mean absolute errorfor mulitpath time delay estimation.
An experiment was conducted where a propeller-driven boat, whichradiated broadband underwater acoustic noise, transited past a hy-drophone located m above the seafloor in very shallow water( m deep). Fourteen transits were recorded over a two day period,where the vessel approached the sensor from different directions. Aspectrogram of the hydrophone’s output was observed to display aLloyd’s mirror interference pattern over the entire frequency band( ≤ kHz) of the receiver, which was sampled at kHz. Record-ing commenced when the vessel was inbound at a ground range of m. The vessel transited past the sensor, and recording was ter-minated when the vessel was m outbound. The vessel’s positionwas logged at . s intervals using a RF tracking device.The multipath time delay is predicted as a function of thevessel’s ground range, assuming an isovelocity ( ms − ) soundpropagation medium and specular reflection from a flat seafloor [18].Acoustic recordings are divided into . s duration sound clips. Thepower cepstrum, power cepstrum with cepstrum subtraction, andautocorrelation function are computed for each sound clip. Theestimate of the multipath time delay is the quefrency (or lag) corre-sponding to the peak value in each of these functions. The estimatedmultipath time delay is compared with the predicted time delay. Sixminutes of background recordings (with no source present) was alsocollected in order to measure the ambient noise spectrum, which wasused to produce various levels of additive noise for signal-to-noiseratio (SNR) experiments. Figure 3 shows the mean absolute error (MAE) for multipath timedelay estimation with cepstrum subtraction, as a function of the sub-traction factor a ( k ) in (13). When a ( k ) = 0 , there is no cepstrumsubtraction and the time delay is estimated from the received powercepstrum. When a ( k ) = 1 , full cepstrum subtraction occurs andwhen a ( k ) > , there is over-subtraction. Non-rahmonic compo-nents are estimated by taking the mean of the received power cepstrafor the entire acoustic dataset. The MAE decreases with cepstrumsubtraction, which demonstrates the benefit of non-rahmonic com-ponent suppression in the received power cepstrum. The minimumMAE occurs when a ( k ) = 1 . . This level of over-subtraction re-duces the MAE by µ s when compared with nil subtraction i.e. a ( k ) = 0 .The multipath time delay is estimated from (a) the power cep-strum, (b) the power cepstrum with cepstrum (over) subtraction, and(c) the autocorrelation function. The estimate of the multipath timedelay is the quefrency (or lag) corresponding to the peak value ineach of these functions. Colored noise with the same power spectraldensity as background noise recordings (without the source) is addedto recordings in varying amounts in order to change the SNR. Thisenables time delay estimation performance to be tested as a func-tion of SNR. Figure 4 shows that cepstrum subtraction provides theoverall performance, confirming that suppression of non-rahmoniccomponents improves time delay estimation performance.
4. CONCLUSION
Cepstrum analysis enables estimation of the multipath time delayfor broadband signals arriving via direct and indirect signal prop-agation paths at a receiving hydrophone. A cepstrum subtractionmethod suppresses the cepstrum components extraneous to the rha-monics. As a result, the magnitudes of the rahmonics relative tothe non-rahmonics are increased. Also, there is an improvement inmultipath time delay estimation, which is demonstrated using realdata collected during an at-sea experiment. The experimental resultsshowed improved multipath time delay estimation performance in avery shallow water environment when compared with other time de-lay estimation methods, and robustness to decreasing signal-to-noiseratios. Multipath time delay estimation using cepstrum subtractionis superior to using the autocorrelation function when the SNR ≥ .However, without cepstrum subtraction, the autocorrelation functionis better than the power cepstrum for multipath time delay estima-tion in the present application. Cepstrum over-subtraction improvesmultipath time delay estimation.
5. REFERENCES [1] Robert J Urick,
Principles of Underwater Sound , pp. 123–125,McGraw-Hill, New York, 2nd edition, 1975.[2] William M Carey, “Lloyd’s mirror-image interference effects,”
Acoustics Today , vol. 5, no. 2, pp. 14–20, 2009.[3] Daphne Kapolka, Jason K Wilson, Joseph A Rice, and PaulHursky, “Equivalence of the waveguide invariant and two pathray theory methods for range prediction based on lloyd’s mirrorpatterns,” in
Proceedings of Meetings on Acoustics 155 . ASA,2008, vol. 4, p. 070002.[4] Adrian D Jones, David W Bartel, and Paul A Clarke, “An ap-plication of range-frequency striations to seafloor inversion inhallow oceans,” in
Proceedings of Acoustics 2012-Fremantle ,2012, pp. 1–7.[5] A Michael Noll, “Short-time spectrum and “cepstrum” tech-niques for vocal-pitch detection,”
The Journal of the AcousticalSociety of America , vol. 36, no. 2, pp. 296–302, 1964.[6] A Michael Noll, “Cepstrum pitch determination,”
The Journalof the Acoustical Society of America , vol. 41, no. 2, pp. 293–309, 1967.[7] Alan V Oppenheim and Ronald W Schafer, “From frequencyto quefrency: A history of the cepstrum,”
IEEE Signal Pro-cessing Magazine , vol. 21, no. 5, pp. 95–106, 2004.[8] Robert B Randall, “A history of cepstrum analysis and its ap-plication to mechanical problems,”
Mechanical Systems andSignal Processing , vol. 97, pp. 3–19, 2017.[9] Eric L Ferguson, Rishi Ramakrishnan, Stefan B Williams, andCraig T Jin, “Convolutional neural networks for passive mon-itoring of a shallow water environment using a single sensor,”in
Acoustics, Speech and Signal Processing (ICASSP), 2017IEEE International Conference on . IEEE, 2017, pp. 2657–2661.[10] Eric L Ferguson, Stefan B Williams, and Craig T Jin, “Soundsource localization in a multipath environment using convo-lutional neural networks,” in
Acoustics, Speech and SignalProcessing (ICASSP), 2018 IEEE International Conference on .IEEE, 2018, pp. 2386–2390.[11] Julius S Bendat and Allan G Piersol, “Engineering applicationsof correlation and spectral analysis,”
John Wiley & Sons , pp.Sect. 6.2, 145–153, 1993.[12] Julius S Bendat and Allan G Piersol,
Random Data: Analy-sis and Measurement Procedures , chapter 3, pp. 56–98, JohnWiley & Sons, New York, 1st edition, 1971.[13] Julius S Bendat and Allan G Piersol,
Random Data: Analysisand Measurement Procedures , chapter 1, pp. 1–36, John Wiley& Sons, New York, 1st edition, 1971.[14] Julius S Bendat and Allan G Piersol,
Random Data: Analysisand Measurement Procedures , pp. 344–380, John Wiley &Sons, New York, 1st edition, 1971.[15] S V Vaseghi,
Advanced Digital Signal Processing , chapter 12,pp. 321–338, John Wiley & Sons, West Sussex, UK, 4th edi-tion, 2008.[16] Philip N Garner, “Cepstral normalisation and the signal tonoise ratio spectrum in automatic speech recognition,”
SpeechCommunication , vol. 53, no. 8, pp. 991–1001, 2011.[17] Sadaoki Furui, “Cepstral analysis technique for automaticspeaker verification,”
IEEE Transactions on Acoustics, Speech,and Signal Processing , vol. 29, no. 2, pp. 254–272, 1981.[18] Eric L Ferguson, Rishi Ramakrishnan, Stefan B Williams, andCraig T Jin, “Deep learning approach to passive monitoringof the underwater acoustic environment,”