Jon Gudnason
Imperial College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jon Gudnason.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Patrick A. Naylor; Anastasis Kounoudes; Jon Gudnason; Mike Brookes
We present the Dynamic Programming Projected Phase-Slope Algorithm (DYPSA) for automatic estimation of glottal closure instants (GCIs) in voiced speech. Accurate estimation of GCIs is an important tool that can be applied to a wide range of speech processing tasks including speech analysis, synthesis and coding. DYPSA is automatic and operates using the speech signal alone without the need for an EGG signal. The algorithm employs the phase-slope function and a novel phase-slope projection technique for estimating GCI candidates from the speech signal. The most likely candidates are then selected using a dynamic programming technique to minimize a cost function that we define. We review and evaluate three existing methods of GCI estimation and compare the new DYPSA algorithm to them. Results are presented for the APLAWD and SAM databases for which 95.7% and 93.1% of GCIs are correctly identified
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Mike Brookes; Patrick A. Naylor; Jon Gudnason
Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measures ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find that when using a fixed-length analysis window, the best measures can detect the instant of glottal closure in 97% of larynx cycles with a standard deviation of 0.6 ms and that in 9% of these cycles an additional excitation instant is found that normally corresponds to glottal opening. We show that some improvement in detection rate may be obtained if the analysis window length is adapted to the speech pitch. If the measures are applied to the preemphasized speech instead of to the LPC residual, we find that the timing accuracy worsens but the detection rate improves slightly. We assess the computational cost of evaluating the measures and we present new recursive algorithms that give a substantial reduction in computation in all cases.
international conference on acoustics, speech, and signal processing | 2008
Jon Gudnason; Mike Brookes
We propose a novel feature set for speaker recognition that is based on the voice source signal. The feature extraction process uses closed-phase LPC analysis to estimate the vocal tract transfer function. The LPC spectrum envelope is converted to cepstrum coefficients which are used to derive the voice source features. Unlike approaches based on inverse-filtering, our procedure is robust to LPC analysis errors and low-frequency phase distortion. We have performed text-independent closed-set speaker identification experiments on the TIMIT and the YOHO databases using a standard Gaussian mixture model technique. Compared to using mel- frequency cepstrum coefficients, the misclassification rate for the TIMIT database reduced from 1.51% to 0.16% when combined with the proposed voice source features. For the YOHO database the mis- classification rate decreased from 13.79% to 10.07%. The new feature vector also compares favourably to other proposed voice source feature sets.
ieee international radar conference | 2005
Jingjing Cui; Jon Gudnason; Mike Brookes
Automatic target recognition from high range resolution radar profiles remains an important and challenging problem. In this paper, we present a novel feature set for this task that combines a noise-robust superresolution characterisation of the target scattering centres derived using the MUSIC algorithm with a representation of the targets radar shadow shape. To obtain the shadow shape features, three alternative spectral estimation methods are investigated. Using a hidden Markov model to represent aspect dependence, we demonstrate that the inclusion of the shadow features results in a significant improvement in recognition performance. Using azimuth apertures of 3/spl deg/ and 6/spl deg/ in a 10-target classification task from the MSTAR database, we obtain overall classification error rates of 1.3% and 0.2% respectively. These results are significantly better than those obtained by other published methods on the same database.
international conference on acoustics, speech, and signal processing | 2009
Mark R. P. Thomas; Jon Gudnason; Patrick A. Naylor
This paper presents a data-driven approach to the modelling of voice source waveforms. The voice source is a signal that is estimated by inverse-filtering speech signals with an estimate of the vocal tract filter. It is used in speech analysis, synthesis, recognition and coding to decompose a speech signal into its source and vocal tract filter components. Existing approaches parameterize the voice source signal with physically- or mathematically-motivated models. Though the models are well-defined, estimation of their parameters is not well understood and few are capable of reproducing the large variety of voice source waveforms. Here we present a data-driven approach to classify types of voice source waveforms based upon their melfrequency cepstrum coefficients with Gaussian mixture modelling. A set of “prototype” waveform classes is derived from a weighted average of voice source cycles from real data. An unknown speech signal is then decomposed into its prototype components and resynthesized. Results indicate that with sixteen voice source classes, low resynthesis errors can be achieved.
IEEE Transactions on Aerospace and Electronic Systems | 2009
Jon Gudnason; Jingjing Cui; Mike Brookes
The work presented here introduces a procedure for the automatic recognition of ground-based targets from high range resolution (HRR) profile sequences that may be obtained from a synthetic aperture radar (SAR) platform. The procedure incorporates an adaptive target mask and uses a superresolution algorithm to identify the cross-range positions of target scattering centers. These are used to generate a pseudoimage of the target whose low-order discrete cosine transform coefficients form the recognizer feature vector. Within the recognizer, the states of a hidden Markov model (HMM) are used to represent the target orientation and a Gaussian mixture model is used for the feature vector distribution. In a closed-set identification experiment, the misclassification rate for ten MSTAR targets was 2.8%. Also presented are results from open-set experiments and investigates the effect on recognizer performance of variations in feature vector dimension, azimuth aperture, and target variants.
workshop on applications of signal processing to audio and acoustics | 2007
Mark R. P. Thomas; Nikolay D. Gaubitch; Jon Gudnason; Patrick A. Naylor
Speech signals for hands-free telecommunication applications are received by one or more microphones placed at some distance from the talker. In an office environment, for example, unwanted signals such as reverberation and background noise from computers and other talkers will degrade the quality of the received signal. These unwanted components have an adverse effect upon speech processing algorithms and impair intelligibility. This paper demonstrates the use of the Multichannel DYPSA algorithm to identify glottal closure instants (GCIs) from noisy, reverberant speech. Using the estimated GCIs, a spatiotemporal averaging technique is applied to attenuate the unwanted components. Experiments with a microphone array demonstrate the dereverberation and noise suppression of the spatiotemporal averaging method, showing up to a 5 dB improvement in segmental SNR and 0.33 in normalized Bark spectral distortion score.
international conference on acoustics, speech, and signal processing | 2010
Mark R. P. Thomas; Jon Gudnason; Patrick A. Naylor; Bernd Geiser; Peter Vary
Artificial bandwidth extension (ABWE) of speech signals aims to estimate wideband speech (50 Hz – 7 kHz) from narrowband signals (300 Hz – 3.4 kHz). Applying the source-filter model of speech, many existing algorithms estimate vocal tract filter parameters independently of the source signal. However, many current methods for extending the narrowband voice source signal are limited to straightforward signal processing techniques which are only effective for high-band estimation. This paper presents a method for ABWE that employs novel data-driven modelling and an existing spectral mirroring technique to estimate the wideband source signal in both the high and low extension bands. A state-of-the-art Hidden Markov Model-based estimator evaluates the temporal and spectral envelopes in the missing frequency bands, with which the ABWE speech signal is synthesized. Informal listening tests comparing two existing source estimation techniques and two permutations of the proposed approach show an improvement in the perceived bandwidth of speech signals, in particular towards low frequencies. Subjective tests on the same data show a preference for the proposed techniques over the existing methods under test.
ieee radar conference | 2008
Jingjing Cui; Jon Gudnason; Mike Brookes
This paper presents a novel fusion technique for automatic target recognition from high range resolution radar profiles when observations from multiple viewpoints are available. The fusion technique entails only a straightforward modification of the transition probabilities of a single-viewpoint target model in which a Hidden Markov Model is used to represent the unknown target orientation. Evaluations using the MSTAR database indicate that the new technique can reduce classification errors by about two orders of magnitude when compared to single viewpoint observations and, in a 10-target classification experiment, gave almost perfect recognition.
conference of the international speech communication association | 2009
Jon Gudnason; Mark R. P. Thomas; Patrick A. Naylor; Daniel P. W. Ellis
The paper presents a voice source waveform modeling techniques based on principal component analysis (PCA) and Gaussian mixture modeling (GMM). The voice source is obtained by inverse-filteirng speech with the estimated vocal tract filter. This decomposition is useful in speech analysis, synthesis, recognition and coding. Existing models of the voice source signal are based on function-fitting or physically motivated assumptions and although they are well defined, estimation of their parameters is not well understood and few are capable of reproducing the large variety of voice source waveforms. Here, a data-driven approach is presented for signal decomposition and classification based on the principal components of the voice source. The principal components are analyzed and the ‘prototype’ voice source signals corresponding to the Gaussian mixture means are examined. We show how an unknown signal can be decomposed into its components and/or prototypes and resynthesized. We show how the techniques are suited for both low bitrate or high quality analysis/synthesis schemes.