Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Michael J. Carey is active.

Publication


Featured researches published by Michael J. Carey.


international conference on spoken language processing | 1996

Robust prosodic features for speaker identification

Michael J. Carey; Eluned S. Parris; Harvey Lloyd-Thomas; Stephen J. Bennett

The paper describes the use of prosodic features for speaker identification. Features based on the pitch and energy contours of speech are described and the relative importance of each feature for speaker identification is investigated. The mean and variance of the pitch period in voiced sections of speech are shown to be particularly useful at discriminating between speakers. Fusing these features with a hidden Markov model speaker identification system gave a marked improvement in figure of merit; over 30% gain was achieved on the six NIST 1995 evaluation tests presented. Handset variability is known to have an adverse effect on performance when traditional spectral features are used, e.g. cepstra. Results are presented showing that the prosodic features are more robust to handset variability.


international conference on acoustics speech and signal processing | 1996

Language independent gender identification

Eluned S. Parris; Michael J. Carey

This paper describes a novel technique specifically developed for gender identification which combines acoustic analysis and pitch. Two sets of hidden Markov models, male and female, are matched to the speech using the Viterbi algorithm and the most likely sequence of models with corresponding likelihood scores are produced. Linear discriminant analysis is used to normalise the models and reduce bias towards a particular gender. An enhanced version of the pitch estimation algorithm used for IMBE speech coding is used to give an average pitch estimate for the speaker. The information provided by the acoustic analysis and pitch estimation are combined using a linear classifier to identify the gender of the speech. The system was tested on three British English databases giving less than 1% identification error rate with two seconds of speech. Further tests without optimisation on eleven languages of the OGI database gave error rates less than 5.2% and an average of 2.0%.


international conference on acoustics, speech, and signal processing | 1991

A speaker verification system using alpha-nets

Michael J. Carey; Eluned S. Parris; J.S. Bridle

Speaker verification is performed by comparing the output probabilities of two Markov models of the same phonetic unit. One of these Markov models is speaker-specific, being built from utterances from the speaker whose identity is to be verified. The second model is built from utterances from a large population of speakers. The performance of the system is improved by treating the pair of models as a connectionist network, an alpha-net, which then allows discriminative training to be carried out. Experimental results show that adapting the spectral observation probabilities of each state of the model by the back propagation of errors can correct misclassification errors. The real-time implementation of the system produced an average digit error rate of 4.5% and only one misclassification in 600 trials using a five-digit sequence.<<ETX>>


international conference on acoustics speech and signal processing | 1999

Improving a GMM speaker verification system by phonetic weighting

Roland Auckenthaler; Eluned S. Parris; Michael J. Carey

This paper compares two approaches to speaker verification, Gaussian mixture models (GMMs) and hidden Markov models (HMMs). The GMM based system outperformed the HMM system, this was mainly due to the ability of the GMM to make better use of the training data. The best scoring GMM frames were strongly correlated with particular phonemes, e.g. vowels and nasals. Two techniques were used to try and exploit the different amounts of discrimination provided by the phonemes to improve the performance of the GMM based system. Applying linear weighting to the phonemes showed that less than half of the phonemes were contributing to the overall system performance. Using an MLP to weight the phonemes provided a significant improvement in performance for male speakers but no improvement has yet been achieved for women.


international conference on acoustics speech and signal processing | 1996

Statistical models for topic identification using phoneme substrings

Jerry H. Wright; Michael J. Carey; Eluned S. Parris

Phoneme substrings that are recurrent within training data are detected and logged using dynamic programming procedures. The resulting keystrings (cluster centroids) are awarded a usefulness rating based on smoothed occurrence probabilities in wanted and unwanted data. The rankings of the keystrings by usefulness measured on training, development test and final test data for three language-pairs from the OGI multi-language corpus are highly consistent, showing that language-specific features are being found. Statistical measures of local association also suggest that keystring occurrences can be correlated in a manner similar to that of keywords for a particular topic. With improved recognition accuracy it should be possible to exploit this information in order to enhance performance in topic identification.


international conference on acoustics, speech, and signal processing | 1995

Improved topic spotting through statistical modelling of keyword dependencies

Jerry H. Wright; Michael J. Carey; Eluned S. Parris

Keywords are chosen on the basis of their usefulness for discriminating a topic from background speech. Good topic recognition can be achieved with a small set of well-chosen keywords, but particular combinations of keywords often achieve better discrimination than can be accounted for by regarding them as independent. This paper describes a higher-order statistical approach involving models of keyword-topic interdependence. A linear-logistic model brings some improvement in performance, but better results are obtained using log-linear contingency table models. Although the potential number of these is very large, good models tend to be simple and are suggested by heuristic measures inferred from the training data. The approach is tested using a broadcast radio database.


international conference on acoustics, speech, and signal processing | 2000

User validation for mobile telephones

Michael J. Carey; Roland Auckenthaler

A combination of text-independent speaker verification and user profiling as a new biometric for crime prevention on mobile telephones is proposed, The verification carried out on the speech throughout the call hence obviates the need for direct user involvement while providing high impostor rejection. Low user rejection is achieved by monitoring the pattern of numbers called. While the pattern is substantially unchanged the speaker verification threshold is low minimising the level of false rejections. The threshold is raised if the calling pattern deviates from the normal. Analysis of a limited number of user call records shows that the users tend to call a small set of numbers repetitively and that deviation from this pattern are infrequent. Tests of a Gaussian mixture model based speaker verification system on an appropriate database gave an equal error rate of 4% showing that a text independent system can approach the performance of a text dependent system.


international conference on acoustics, speech, and signal processing | 1994

Automatic language identification using sub-word models

Roger C. F. Tucker; Michael J. Carey; Eluned S. Parris

The paper describes initial experiments on automatic language identification with the particular aim of discriminating languages in the same language group. Subword models were built from the English, Dutch and Norwegian sections of the EUROM1 database using fully automatic segmentation based on TIMIT-derived models. Three techniques were then examined. In the first technique only acoustic differences between the phonemes of each language were used. The second technique relied on the relative frequencies of the phonemes of each language, while the third technique combined the two sources of information. The latter technique proved the best giving 97% accuracy for English vs. Dutch, and 90% across the three languages.<<ETX>>


IEEE Signal Processing Letters | 2012

Contrasting the Effects of Different Frequency Bands on Speaker and Accent Identification

Saeid Safavi; Abualsoud Hanani; Martin J. Russell; Peter Jancovic; Michael J. Carey

This letter presents an experimental study investigating the effect of frequency sub-bands on regional accent identification (AID) and speaker identification (SID) performance on the ABI-1 corpus. The AID and SID systems are based on Gaussian mixture modeling. The SID experiments show up to 100% accuracy when using the full 11.025 kHz bandwidth. The best AID performance of 60.34% is obtained when using band-pass filtered (0.23-3.4 kHz) speech. The experiments using isolated narrow sub-bands show that the regions (0-0.77 kHz) and (3.40-11.02 kHz) are the most useful for SID, while those in the region (0.34-3.44 kHz) are best for AID. AID experiments are also performed with intersession variability compensation, which provides the biggest performance gain in the (2.23-5.25 kHz) region.


Speech Communication | 2010

A segment selection technique for speaker verification

Mohaddeseh Nosratighods; Eliathamby Ambikairajah; Julien Epps; Michael J. Carey

The performance of speaker verification systems degrades considerably when the test segments are utterances of very short duration. This might be either due to variations in score-matching arising from the unobserved speech sounds of short speech utterances or the fact that the shorter the utterance, the greater the effect of individual speech sounds on the average likelihood score. In other words, the effects of individual speech sounds have not been cancelled out by a large number of speech sounds in very short utterances. This paper presents a score-based segment selection technique for discarding portions of speech that result in poor discrimination ability in a speaker verification task. Theory is developed to detect the most significant and reliable speech segments based on the probability that the test segment comes from a fixed set of cohort models. This approach, suitable for any duration of test utterance, reduces the effect of acoustic regions of the speech that are not accurately modelled due to sparse training data, and makes a decision based only on the segments that provide the best-matched scores from the segment selection algorithm. The proposed segment selection technique provides reductions in relative error rate of 22% and 7% in terms of minimum Detection Cost Function (DCF) and Equal Error Rate (EER) compared with a baseline used the segment-based normalization, when evaluated on the short utterances of NIST 2002 Speaker Recognition Evaluation dataset.

Collaboration


Dive into the Michael J. Carey's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Peter Jancovic

University of Birmingham

View shared research outputs
Top Co-Authors

Avatar

Saeid Safavi

University of Birmingham

View shared research outputs
Top Co-Authors

Avatar

Ying Liu

University of Birmingham

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Julien Epps

University of New South Wales

View shared research outputs
Researchain Logo
Decentralizing Knowledge