Paul M. McCourt | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul M. McCourt is active.

Explore More

Publication

Featured researches published by Paul M. McCourt.

international conference on acoustics speech and signal processing | 1998

Multi-resolution cepstral features for phoneme recognition across speech sub-bands

Paul M. McCourt; S. Vaseght; Naomi Harte

Multi-resolution sub-band cepstral features strive to exploit discriminative cues in localised regions of the spectral domain by supplementing the full bandwidth cepstral features with sub-band cepstral features derived from several levels of sub-band decomposition. Multi-resolution feature vectors, formed by concatenation of the sub-band cepstral features into an extended feature vector, are shown to yield better performance than conventional MFCCs for phoneme recognition on the TIMIT database. Possible strategies for the recombination of partial recognition scores from independent multi-resolution sub-band models are explored. By exploiting the sub-band variations in signal to noise ratio for linearly weighted recombination of the log likelihood probabilities we obtained improved phoneme recognition performance in broadband noise compared to MFCC features. This is an advantage over a purely sub-band approach using non-linear recombination which is robust only to narrow band noise.

international conference on acoustics, speech, and signal processing | 2000

State based sub-band LP Wiener filters for speech enhancement in car environments

Aimin Chen; Saeed Vaseghi; Paul M. McCourt

The performance of Wiener filters in restoring the quality and intelligibility of noisy speech depends on: (i) the accuracy of the estimates of the power spectra or the correlation values of the noise and the speech processes, and (ii) on the Wiener filter structure. In this paper a Bayesian method is proposed where model combination and model decomposition are employed for the estimation of parameters required to implement subband LP Wiener filters. The use of subband LP Wiener filters provides advantages in terms of improved parameter estimates and also in restoring the temporal-spectral composition of speech. The method is evaluated, and compared with the parallel model combination, using the TIMIT continuous speech database with BMW and VOLVO car noise databases.

Computer Speech & Language | 2000

Multi-resolution sub-band features and models for HMM-based phonetic modelling

Paul M. McCourt; Saeed Vaseghi; Bernard Doherty

HMM acoustic models are typically trained on a single set of cepstral features extracted over the full bandwidth of mel-spaced filterbank energies. In this paper, multi-resolution sub-band transformations of the log energy spectra are introduced based on the conjecture that additional cues for phonetic discrimination may exist in the local spectral correlates not captured by the full-band analysis. In this approach the discriminative contribution from sub-band features is considered to supplement rather than substitute for full-band features. HMMs trained on concatenated multi-resolution cepstral features are investigated, along with models based on linearly combined independent multi-resolution streams, in which the sub-band and full-band streams represent different resolutions of the same signal. For the stream-based models, discriminative training of the linear combination weights to a minimum classification error criteria is also applied. Both the concatenated feature and the independent stream modelling configurations are demonstrated to outperform traditional full-band cepstra for HMM-based acoustic phonetic modelling on the TIMIT database. Experiments on context-independent modelling achieve a best increase on the core test set from an accuracy of 62.3% for full-band models to a 67.5% accuracy for discriminately weighted multi-resolution sub-band modelling. A triphone accuracy of 73.9% achieved on the core test set improves notably on full-band cepstra and compares well with results previously published on this task.

international conference on acoustics, speech, and signal processing | 2000

Full covariance modelling and adaptation in sub-bands

Bernard Doherty; Saeed Vaseghi; Paul M. McCourt

With regard to the current interest in sub-band based modelling in the ASR community, this paper explores the gains in recognition performance and complexity reduction achieved by sub-band based full covariance modelling and speaker adaptation. With sub-band features, instead of a single large covariance matrix, it is now possible to have a set of smaller matrices making it practical to use Gaussian distributions employing full covariance matrices. This benefit is further demonstrated to give a significant complexity reduction in the implementation of speaker adaptation by maximum likelihood linear regression. The use of sub-band cepstra moreover presents the opportunity of capturing localised discriminative cues which contribute to increased recognition. In light of these gains, this paper explores the advantages of sub-band full covariance modelling and presents experimental evaluation on the WSJCAMO continuous speech database.

international conference on acoustics speech and signal processing | 1999

Discriminative spectral-temporal multiresolution features for speech recognition

Philip McMahon; Naomi Harte; Saeed Vaseghi; Paul M. McCourt

Multi-resolution features, which are based on the premise that there may be more cues for phonetic discrimination in a given sub-band than in another, have been shown to outperform the standard MFCC feature set for both classification and recognition tasks on the TIMIT database. This paper presents an investigation into possible strategies to extend these ideas from the spectral domain into both the spectral and temporal domains. Experimental work on the integration of segmental models, which are better at capturing the longer term phonetic correlation of a phonetic unit, into the discriminative multi-resolution framework is presented. Results are presented which show that including this supplementary temporal information offers an improvement performance for the phoneme classification task over the standard multi-resolution MFCC feature set with time derivatives appended. Possible strategies for the extension of theses techniques into the area of continuous speech recognition are discussed.

international conference on acoustics, speech, and signal processing | 1993

Transform coding at 4.8 kbit/sec using interleaving of transform frames and dual gain-shape vector quantisation

Paul M. McCourt; H. A. Kaouri

The dual gain shape VQ (vector quantization) transform coding algorithm refines the distribution of quantization resources by supplementing interband bit allocation with intraband reallocation, and, by introducing two levels of vector normalization, allows the design of more efficient codebooks. Better use of limited bit resources at the coding rate of 4.8 kbit/s is thus achieved, yielding a 1 dB improvement in coding performance. The introduction of frame overlapping in the transform domain, by implementing a continuous frame interleave, gives significant improvements in the subjective quality of the coded speech. The effect produced by interleaving could be explained as an exploitation of interframe correlation, in creating vectors for VQ, and as an implicit promotion of frame interdependency in bit allocations.<<ETX>>

conference of the international speech communication association | 1998