Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mike Brookes is active.

Publication


Featured researches published by Mike Brookes.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm

Patrick A. Naylor; Anastasis Kounoudes; Jon Gudnason; Mike Brookes

We present the Dynamic Programming Projected Phase-Slope Algorithm (DYPSA) for automatic estimation of glottal closure instants (GCIs) in voiced speech. Accurate estimation of GCIs is an important tool that can be applied to a wide range of speech processing tasks including speech analysis, synthesis and coding. DYPSA is automatic and operates using the speech signal alone without the need for an EGG signal. The algorithm employs the phase-slope function and a novel phase-slope projection technique for estimating GCI candidates from the speech signal. The most likely candidates are then selected using a dynamic programming technique to minimize a cost function that we define. We review and evaluate three existing methods of GCI estimation and compare the new DYPSA algorithm to them. Results are presented for the APLAWD and SAM databases for which 95.7% and 93.1% of GCIs are correctly identified


IEEE Transactions on Audio, Speech, and Language Processing | 2014

PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise

Sira Gonzalez; Mike Brookes

We present PEFAC, a fundamental frequency estimation algorithm for speech that is able to identify voiced frames and estimate pitch reliably even at negative signal-to-noise ratios. The algorithm combines a normalization stage, to remove channel dependency and to attenuate strong noise components, with a harmonic summing filter applied in the log-frequency power spectral domain, the impulse response of which is chosen to sum the energy of the fundamental frequency harmonics while attenuating smoothly-varying noise components. Temporal continuity constraints are applied to the selected pitch candidates and a voiced speech probability is computed from the likelihood ratio of two classifiers, one for voiced speech and one for unvoiced speech/silence. We compare the performance of our algorithm with that of other widely used algorithms and demonstrate that it performs well in both high and low levels of additive noise.


Signal Processing | 2006

Adaptive algorithms for sparse echo cancellation

Patrick A. Naylor; Jingjing Cui; Mike Brookes

The cancellation of echoes is a vital component of telephony networks. In some cases the echo response that must be identified by the echo canceller is sparse, as for example when telephony traffic is routed over networks with unknown delay such as packet-switched networks. The sparse nature of such a response causes standard adaptive algorithms including normalized LMS to perform poorly. This paper begins by providing a review of techniques that aim to give improved echo cancellation performance when the echo response is sparse. In addition, adaptive filters can also be designed to exploit sparseness in the input signal by using partial update procedures. This concept is discussed and the MMax procedure is reviewed. We proceed to present a new high performance sparse adaptive algorithm and provide comparative echo cancellation results to show the relative performance of the existing and new algorithms. Finally, an efficient low cost implementation of our new algorithm using partial update adaptation is presented and evaluated. This algorithm exploits both sparseness of the echo response and also sparseness of the input signal in order to achieve high performance without high computational cost.


IEEE Transactions on Circuits and Systems | 2006

A Spectral Model for RF Oscillators With Power-Law Phase Noise

Arsenia Chorti; Mike Brookes

In this paper, we apply correlation theory methods to obtain a model for the near-carrier oscillator power-spectral density (PSD). Based on the measurement-driven representation of phase noise as a sum of power-law processes, we evaluate closed form expressions for the relevant oscillator autocorrelation functions. These expressions form the basis of an enhanced oscillator spectral model that has a Gaussian PSD at near-carrier frequencies followed by a sequence of power-law regions. New results for the effect of white phase noise, flicker phase noise and random walk frequency modulated phase noise on the near-carrier oscillator PSD are derived. In particular, in the case of 1/f phase noise, we show that despite its lack of stationarity it is possible to derive a closed form expression for its effect on an oscillator PSD and show that the oscillator output can be considered to be wide-sense stationary


IEEE Transactions on Audio, Speech, and Language Processing | 2006

A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech

Mike Brookes; Patrick A. Naylor; Jon Gudnason

Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measures ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find that when using a fixed-length analysis window, the best measures can detect the instant of glottal closure in 97% of larynx cycles with a standard deviation of 0.6 ms and that in 9% of these cycles an additional excitation instant is found that normally corresponds to glottal opening. We show that some improvement in detection rate may be obtained if the analysis window length is adapted to the speech pitch. If the measures are applied to the preemphasized speech instead of to the LPC residual, we find that the timing accuracy worsens but the detection rate improves slightly. We assess the computational cost of evaluating the measures and we present new recursive algorithms that give a substantial reduction in computation in all cases.


international conference on acoustics, speech, and signal processing | 2002

The DYPSA algorithm for estimation of glottal closure instants in voiced speech

Anastasis Kounoudes; Patrick A. Naylor; Mike Brookes

We present the DYPSA algorithm for automatic and reliable estimation of glottal closure instants (GCIs) in voiced speech. Reliable GCI estimation is essential for closed-phase speech analysis, from which can be derived features of the vocal tract and, separately, the voice source. It has been shown that such features can be used with significant advantages in applications such as speaker recognition. DYPSA is automatic and operates using the speech signal alone without the need for an EGG or Laryngograph signal. It incorporates a new technique for estimating GCI candidates and employs dynamic programming to select the most likely candidates according to a defined cost function. We review and evaluate three existing methods and compare our new algorithm to them. Results for DYPSA show GCI detection accuracy to within ±0.25ms on 87% of the test database and fewer than 1% false alarms and misses.


international conference on acoustics, speech, and signal processing | 2008

Voice source cepstrum coefficients for speaker identification

Jon Gudnason; Mike Brookes

We propose a novel feature set for speaker recognition that is based on the voice source signal. The feature extraction process uses closed-phase LPC analysis to estimate the vocal tract transfer function. The LPC spectrum envelope is converted to cepstrum coefficients which are used to derive the voice source features. Unlike approaches based on inverse-filtering, our procedure is robust to LPC analysis errors and low-frequency phase distortion. We have performed text-independent closed-set speaker identification experiments on the TIMIT and the YOHO databases using a standard Gaussian mixture model technique. Compared to using mel- frequency cepstrum coefficients, the misclassification rate for the TIMIT database reduced from 1.51% to 0.16% when combined with the proposed voice source features. For the YOHO database the mis- classification rate decreased from 13.79% to 10.07%. The new feature vector also compares favourably to other proposed voice source feature sets.


international conference on image processing | 2003

Precise real-time outlier removal from motion vector fields for 3D reconstruction

Andreas Dante; Mike Brookes

Finding the correct correspondences in an image sequence is a significant task for deriving 3D structure from motion. Most research has concentrated on extracting and matching salient feature points for correspondence. Block-matching has largely been disregarded due to its significant number of correspondence-outliers and its complexity. However, nowadays real-time hardware is available to obtain block-motion vectors. We present a fast method to filter out more than 99.7% of all outliers and show that the obtained correspondences can be used to derive the 3D scene depth of real image sequences.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2004

Multiple light source detection

Christos-Savvas Bouganis; Mike Brookes

This paper presents the V2R algorithm, a novel method for multiple light source detection using a Lambertian sphere as a calibration object. The algorithm segments the image of the sphere into regions that are each illuminated by a single virtual light and subtracts the virtual lights of adjacent regions to estimate the light source vectors. The algorithm uses all pixels within a region to form a robust estimate of the corresponding virtual light. The circumstances under which the light source detection problem lacks a unique solution are discussed in detail and the way in which the V2R algorithm resolves the ambiguity is explained. The V2R algorithm includes novel procedures for identifying the critical lines that bound the regions, for estimating the light source vectors, and for identifying opposite light pairs. Experiments are performed on synthetic and real images and the performance of the V2R algorithm is compared to that of a recent algorithm from the literature. The experimental results demonstrate that the proposed algorithm is robust and that it gives substantially improved accuracy.


Journal of the Acoustical Society of America | 2012

Effects of noise suppression on intelligibility: dependency on signal-to-noise ratios.

Gaston Hilkhuysen; Nikolay D. Gaubitch; Mike Brookes; Mark Huckvale

The effects on speech intelligibility of three different noise reduction algorithms (spectral subtraction, minimal mean squared error spectral estimation, and subspace analysis) were evaluated in two types of noise (car and babble) over a 12 dB range of signal-to-noise ratios (SNRs). Results from these listening experiments showed that most algorithms deteriorated intelligibility scores. Modeling of the results with a logit-shaped psychometric function showed that the degradation in intelligibility scores was largely congruent with a constant shift in SNR, although some additional degradation was observed at two SNRs, suggesting a limited interaction between the effects of noise suppression and SNR.

Collaboration


Dive into the Mike Brookes's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nikolay D. Gaubitch

Delft University of Technology

View shared research outputs
Top Co-Authors

Avatar

Jon Gudnason

Imperial College London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jingjing Cui

Imperial College London

View shared research outputs
Top Co-Authors

Avatar

Yu Wang

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge