Jonathan Darch
University of East Anglia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jonathan Darch.
Computer Speech & Language | 2007
Qin Yan; Saeed Vaseghi; Esfandiar Zavarehei; Ben Milner; Jonathan Darch; P.R. White; Ioannis Andrianakis
This paper presents a formant tracking linear prediction (LP) model for speech processing in noise. The main focus of this work is on the utilization of the correlation of the energy contours of speech, along the formant tracks, for improved formant and LP model estimation in noise. The approach proposed in this paper provides a systematic framework for modelling and utilization of the inter-frame correlation of speech parameters across successive speech frames; the within frame correlations are modelled by the LP parameters. The formant tracking LP model estimation is composed of three stages: (1) a pre-cleaning spectral amplitude estimation stage where an initial estimate of the LP model of speech for each frame is obtained, (2) a formant classification and estimation stage using probability models of formants and Viterbi-decoders and (3) an inter-frame formant de-noising and smoothing stage where Kalman filters are used to model the formant trajectories and reduce the effect of residue noise on formants. The adverse effects of car and train noise on estimates of formant tracks and LP models are investigated. The evaluation results for the estimation of the formant tracking LP model demonstrate that the proposed combination of the initial noise reduction stage with formant tracking and Kalman smoothing stages, results in a significant reduction in errors and distortions.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Ben Milner; Jonathan Darch
This paper examines the effect of applying noise compensation to acoustic speech feature prediction from noisy mel-frequency cepstral coefficient (MFCC) vectors within a distributed speech recognition architecture. An acoustic speech feature (comprising fundamental frequency, formant frequencies, speech/nonspeech classification, and voicing classification) is predicted from an MFCC vector in a maximum a posteriori (MAP) framework using phoneme-specific or global models of speech. The effect of noise is considered and three different noise compensation methods, that have been successful in robust speech recognition, are integrated within the MAP framework. Experiments show that noise compensation can be applied successfully to prediction with best performance given by a model adaptation method that performs only slightly worse than matched training and testing. Further experiments consider application of the predicted acoustic features to speech reconstruction. A series of human listening tests show that the predicted features are sufficient for speech reconstruction and that noise compensation improves speech quality in noisy conditions.
Speech Communication | 2006
Jonathan Darch; Ben Milner; Saeed Vaseghi
Novel methods are presented for predicting formant frequencies and voicing class from mel-frequency cepstral coefficients (MFCCs). It is shown how Gaussian mixture models (GMMs) can be used to model the relationship between formant frequencies and MFCCs. Using such models and an input MFCC vector, a maximum a posteriori (MAP) prediction of formant frequencies can be made. The specific relationship each speech sound has between MFCCs and formant frequencies is exploited by using state-specific GMMs within a framework of a set of hidden Markov models (HMMs). Formant prediction accuracy and voicing prediction of speaker-independent male speech are evaluated on both a constrained vocabulary connected digits database and a large vocabulary database. Experimental results show that for HMM–GMM prediction on the connected digits database, voicing class prediction error is less than 3.5%. Less than 1.8% of frames have formant frequency percentage errors greater than 20% and the mean percentage error of the remaining frames is less than 3.7%. Further experiments show prediction accuracy under noisy conditions. For example, at a signal-to-noise ratio (SNR) of 0 dB, voicing class prediction error increases to 9.4%, less than 4.3% of frames have formant frequency percentage errors over 20% and the formant frequency percentage error for the remaining frames is less than 5.7%.
international conference on acoustics, speech, and signal processing | 2008
Ben Milner; Jonathan Darch; Saeed Vaseghi
This paper examines the effect of applying noise compensation to improve acoustic speech feature prediction from noise contaminated MFCC vectors, as may be encountered in distributed speech recognition (DSR). A brief review of maximum a posteriori prediction of acoustic speech features (voicing, fundamental and formant frequencies) from MFCC vectors is made. Two noise compensation methods are then applied; spectral subtraction and model adaptation. Spectral subtraction is used to filter noise from the received MFCC vectors, while model adaptation is applied to adapt the joint models of acoustic features and MFCCs to account for noise contamination. Experiments examine acoustic feature prediction accuracy in noise and results show that the two noise compensation methods significantly improve prediction accuracy in noise. The technique of model adaptation was found to be better than spectral subtraction and could restore performance close to that achieved in matched training and testing.
international conference on acoustics, speech, and signal processing | 2005
Jonathan Darch; Ben Milner; Xu Shao; Saeed Vaseghi; Qin Yang
This work proposes a novel method of predicting formant frequencies from a stream of mel-frequency cepstral coefficients (MFCC) feature vectors. Prediction is based on modelling the joint density of MFCCs and formant frequencies using a Gaussian mixture model (GMM). Using this GMM and an input MFCC vector, two maximum a posteriori (MAP) prediction methods are developed. The first method predicts formants from the closest, in some sense, cluster to the input MFCC vector, while the second method takes a weighted contribution of formants predicted from all clusters. Experimental results are presented using the ETSI Aurora connected digit database and show that predicted formant frequencies are within 3.2% of reference formant frequencies.
Journal of the Acoustical Society of America | 2008
Jonathan Darch; Ben Milner; Saeed Vaseghi
The aim of this work is to develop methods that enable acoustic speech features to be predicted from mel-frequency cepstral coefficient (MFCC) vectors as may be encountered in distributed speech recognition architectures. The work begins with a detailed analysis of the multiple correlation between acoustic speech features and MFCC vectors. This confirms the existence of correlation, which is found to be higher when measured within specific phonemes rather than globally across all speech sounds. The correlation analysis leads to the development of a statistical method of predicting acoustic speech features from MFCC vectors that utilizes a network of hidden Markov models (HMMs) to localize prediction to specific phonemes. Within each HMM, the joint density of acoustic features and MFCC vectors is modeled and used to make a maximum a posteriori prediction. Experimental results are presented across a range of conditions, such as with speaker-dependent, gender-dependent, and gender-independent constraints, and these show that acoustic speech features can be predicted from MFCC vectors with good accuracy. A comparison is also made against an alternative scheme that substitutes the higher-order MFCCs with acoustic features for transmission. This delivers accurate acoustic features but at the expense of a significant reduction in speech recognition accuracy.
international conference on acoustics, speech, and signal processing | 2007
Jonathan Darch; Ben Milner; Ibrahim Almajai; Saeed Vaseghi
This work develops a statistical framework to predict acoustic features (fundamental frequency, formant frequencies and voicing) from MFCC vectors. An analysis of correlation between acoustic features and MFCCs is made both globally across all speech and within phoneme classes, and also from speaker-independent and speaker-dependent speech. This leads to the development of both a global prediction method, using a Gaussian mixture model (GMM) to model the joint density of acoustic features and MFCCs, and a phoneme-specific prediction method using a combined hidden Markov model (HMM)-GMM. Prediction accuracy measurements show the phoneme-dependent HMM-GMM system to be more accurate which agrees with the correlation analysis. Results also show prediction to be more accurate from speaker-dependent speech which also agrees with the correlation analysis.
international conference on acoustics, speech, and signal processing | 2007
Ibrahim Almajai; Ben Milner; Jonathan Darch; Saeed Vaseghi
conference of the international speech communication association | 2006
Ibrahim Almajai; Ben Milner; Jonathan Darch
Computer Speech & Language | 2008
Qin Yan; Saeed Vaseghi; Esfandiar Zavarehei; Ben Milner; Jonathan Darch; P.R. White; Ioannis Andrianakis