Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Patrick A. Naylor is active.

Publication


Featured researches published by Patrick A. Naylor.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm

Patrick A. Naylor; Anastasis Kounoudes; Jon Gudnason; Mike Brookes

We present the Dynamic Programming Projected Phase-Slope Algorithm (DYPSA) for automatic estimation of glottal closure instants (GCIs) in voiced speech. Accurate estimation of GCIs is an important tool that can be applied to a wide range of speech processing tasks including speech analysis, synthesis and coding. DYPSA is automatic and operates using the speech signal alone without the need for an EGG signal. The algorithm employs the phase-slope function and a novel phase-slope projection technique for estimating GCI candidates from the speech signal. The most likely candidates are then selected using a dynamic programming technique to minimize a cost function that we define. We review and evaluate three existing methods of GCI estimation and compare the new DYPSA algorithm to them. Results are presented for the APLAWD and SAM databases for which 95.7% and 93.1% of GCIs are correctly identified


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review

Thomas Drugman; Mark G. Thomas; Jon Gudnason; Patrick A. Naylor; Thierry Dutoit

The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the glottal closure instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers. The five techniques compared are the Hilbert Envelope-based detection (HE), the Zero Frequency Resonator-based method (ZFR), the Dynamic Programming Phase Slope Algorithm (DYPSA), the Speech Event Detection using the Residual Excitation And a Mean-based Signal (SEDREAMS) and the Yet Another GCI Algorithm (YAGA). The efficacy of these methods is first evaluated on clean speech, both in terms of reliabililty and accuracy. Their robustness to additive noise and to reverberation is also assessed. A further contribution of the paper is the evaluation of their performance on a concrete application of speech processing: the causal-anticausal decomposition of speech. It is shown that for clean speech, SEDREAMS and YAGA are the best performing techniques, both in terms of identification rate and accuracy. ZFR and SEDREAMS also show a superior robustness to additive noise and reverberation.


Signal Processing | 2006

Adaptive algorithms for sparse echo cancellation

Patrick A. Naylor; Jingjing Cui; Mike Brookes

The cancellation of echoes is a vital component of telephony networks. In some cases the echo response that must be identified by the echo canceller is sparse, as for example when telephony traffic is routed over networks with unknown delay such as packet-switched networks. The sparse nature of such a response causes standard adaptive algorithms including normalized LMS to perform poorly. This paper begins by providing a review of techniques that aim to give improved echo cancellation performance when the echo response is sparse. In addition, adaptive filters can also be designed to exploit sparseness in the input signal by using partial update procedures. This concept is discussed and the MMax procedure is reviewed. We proceed to present a new high performance sparse adaptive algorithm and provide comparative echo cancellation results to show the relative performance of the existing and new algorithms. Finally, an efficient low cost implementation of our new algorithm using partial update adaptation is presented and evaluated. This algorithm exploits both sparseness of the echo response and also sparseness of the input signal in order to achieve high performance without high computational cost.


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Inference of Room Geometry From Acoustic Impulse Responses

Fabio Antonacci; Jason Filos; Mark R. P. Thomas; Emanuel A. P. Habets; Augusto Sarti; Patrick A. Naylor; Stefano Tubaro

Acoustic scene reconstruction is a process that aims to infer characteristics of the environment from acoustic measurements. We investigate the problem of locating planar reflectors in rooms, such as walls and furniture, from signals obtained using distributed microphones. Specifically, localization of multiple two- dimensional (2-D) reflectors is achieved by estimation of the time of arrival (TOA) of reflected signals by analysis of acoustic impulse responses (AIRs). The estimated TOAs are converted into elliptical constraints about the location of the line reflector, which is then localized by combining multiple constraints. When multiple walls are present in the acoustic scene, an ambiguity problem arises, which we show can be addressed using the Hough transform. Additionally, the Hough transform significantly improves the robustness of the estimation for noisy measurements. The proposed approach is evaluated using simulated rooms under a variety of different controlled conditions where the floor and ceiling are perfectly absorbing. Results using AIRs measured in a real environment are also given. Additionally, results showing the robustness to additive noise in the TOA information are presented, with particular reference to the improvement achieved through the use of the Hough transform.


international conference on acoustics, speech, and signal processing | 2008

Blind estimation of reverberation time based on the distribution of signal decay rates

Jimi Yung-Chuan Wen; Emanuël A. P. Habets; Patrick A. Naylor

The reverberation time is one of the most prominent acoustic characteristics of an enclosure. Its value can be used to predict speech intelligibility, and is used by speech enhancement techniques to suppress reverberation. The reverberation time is usually obtained by analysing the decay rate of (i) the energy decay curve that is observed when a noise source is switched off, and (ii) the energy decay curve of the room impulse response. Estimating the reverberation time using only the observed reverberant speech signal, i.e., blind estimation, is required for speech evaluation and enhancement techniques. Recently, (semi) blind methods have been developed. Unfortunately, these methods are not very accurate when the source consists of a human speaker, and unnatural speech pauses are required to detect and/or track the decay. In this paper we extract and analyse the decay rate of the energy envelope blindly from the observed reverberation speech signal in the short-time Fourier transform domain. We develop a method to estimate the reverberation time using a property of the distribution of the decay rates. Experimental results using simulated and real reverberant speech signals demonstrate the performance of the new method.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

A Class of Sparseness-Controlled Algorithms for Echo Cancellation

Pradeep Loganathan; Andy W. H. Khong; Patrick A. Naylor

In the context of acoustic echo cancellation (AEC), it is shown that the level of sparseness in acoustic impulse responses can vary greatly in a mobile environment. When the response is strongly sparse, convergence of conventional approaches is poor. Drawing on techniques originally developed for network echo cancellation (NEC), we propose a class of AEC algorithms that can not only work well in both sparse and dispersive circumstances, but also adapt dynamically to the level of sparseness using a new sparseness-controlled approach. Simulation results, using white Gaussian noise (WGN) and speech input signals, show improved performance over existing methods. The proposed algorithms achieve these improvement with only a modest increase in computational complexity.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

The SIGMA Algorithm: A Glottal Activity Detector for Electroglottographic Signals

Mark R. P. Thomas; Patrick A. Naylor

Accurate estimation of glottal closure instants (GCIs) and opening instants (GOIs) is important for speech processing applications that benefit from glottal-synchronous processing. The majority of existing approaches detect GCIs by comparing the differentiated EGG signal to a threshold and are able to provide accurate results during voiced speech. More recent algorithms use a similar approach across multiple dyadic scales using the stationary wavelet transform. All existing approaches are however prone to errors around the transition regions at the end of voiced segments of speech. This paper describes a new method for EGG-based glottal activity detection which exhibits high accuracy over the entirety of voiced segments, including, in particular, the transition regions, thereby giving significant improvement over existing methods. Following a stationary wavelet transform-based preprocessor, detection of excitation due to glottal closure is performed using a group delay function and then true and false detections are discriminated by Gaussian mixture modeling. GOI detection involves additional processing using the estimated GCIs. The main purpose of our algorithm is to provide a ground-truth for GCIs and GOIs. This is essential in order to evaluate algorithms that estimate GCIs and GOIs from the speech signal only, and is also of high value in the analysis of pathological speech where knowledge of GCIs and GOIs is often needed. We compare our algorithm with two previous algorithms against a hand-labeled database. Evaluation has shown an average GCI hit rate of 99.47% and GOI of 99.35%, compared to 96.08 and 92.54 for the best-performing existing algorithm.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech

Mike Brookes; Patrick A. Naylor; Jon Gudnason

Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measures ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find that when using a fixed-length analysis window, the best measures can detect the instant of glottal closure in 97% of larynx cycles with a standard deviation of 0.6 ms and that in 9% of these cycles an additional excitation instant is found that normally corresponds to glottal opening. We show that some improvement in detection rate may be obtained if the analysis window length is adapted to the speech pitch. If the measures are applied to the preemphasized speech instead of to the LPC residual, we find that the timing accuracy worsens but the detection rate improves slightly. We assess the computational cost of evaluating the measures and we present new recursive algorithms that give a substantial reduction in computation in all cases.


international conference on acoustics, speech, and signal processing | 2002

The DYPSA algorithm for estimation of glottal closure instants in voiced speech

Anastasis Kounoudes; Patrick A. Naylor; Mike Brookes

We present the DYPSA algorithm for automatic and reliable estimation of glottal closure instants (GCIs) in voiced speech. Reliable GCI estimation is essential for closed-phase speech analysis, from which can be derived features of the vocal tract and, separately, the voice source. It has been shown that such features can be used with significant advantages in applications such as speaker recognition. DYPSA is automatic and operates using the speech signal alone without the need for an EGG or Laryngograph signal. It incorporates a new technique for estimating GCI candidates and employs dynamic programming to select the most likely candidates according to a defined cost function. We review and evaluate three existing methods and compare our new algorithm to them. Results for DYPSA show GCI detection accuracy to within ±0.25ms on 87% of the test database and fewer than 1% false alarms and misses.


asilomar conference on signals, systems and computers | 2006

Efficient Use Of Sparse Adaptive Filters

Andy W. H. Khong; Patrick A. Naylor

We present a novel adaptive algorithm exploiting the sparseness of an impulse response for network echo cancellation. This sparseness-controlled improved proportionate normalized least mean square (SC-IPNLMS) algorithm is based on IPNLMS which allocates a step-size gain proportional to each filter coefficient. The proposed SC-IPNLMS algorithm achieves improved convergence over IPNLMS by estimating the sparseness of the impulse response and allocating gains for each step- size such that a higher weighting is given to the proportionate term of the IPNLMS for sparse impulse responses. For a less sparse impulse response, a higher weighting will be allocated to the NLMS term. Simulation results presented show improved performance over the IPNLMS algorithm during convergence before and after an echo path change has been introduced. We also discuss the computational complexity of the proposed algorithm.

Collaboration


Dive into the Patrick A. Naylor's collaboration.

Top Co-Authors

Avatar

Nikolay D. Gaubitch

Delft University of Technology

View shared research outputs
Top Co-Authors

Avatar

Mike Brookes

Imperial College London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Andy W. H. Khong

Nanyang Technological University

View shared research outputs
Top Co-Authors

Avatar

Emanuel A. P. Habets

University of Erlangen-Nuremberg

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Toon van Waterschoot

Katholieke Universiteit Leuven

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge