Fergus R. McInnes
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fergus R. McInnes.
Journal of the Acoustical Society of America | 2002
Maurílio Nunes Vieira; Fergus R. McInnes; Mervyn A. Jack
This study compared acoustic and electroglottographic (EGG) jitter from [a] vowels of 103 dysphonic speakers. The EGG recordings were chosen according to their intensity, signal-to-noise ratio, and percentage of unvoiced intervals, while acoustic signals were selected based on voicing detection and the reliability of jitter extraction. The agreement between jitter measures was expressed numerically as a normalized difference. In 63.1% (65/103) of the cases the differences fell within +/-22.5%. Positive differences above +22.5% were associated with increased acoustic jitter and occurred in 12.6% (13/103) of the speakers. These were, typically, cases of small nodular lesions without problems in the posterior larynx. On the other hand, substantial rises in EGG jitter leading to differences below -22.5% took place in 24.3% (25/103) of the speakers and were related to hyperfunctional voices, creaky-like voices, small laryngeal asymmetries affecting the arytenoids, or small-to-moderate glottal chinks. A clinically relevant outcome of the study was the possibility of detecting gentle laryngeal asymmetries among cases of large unilateral increase in EGG jitter. These asymmetries can be linked with vocal problems that are often overlooked in endoscopic examinations.
Pattern Recognition | 1996
Jeng-Shyang Pan; Fergus R. McInnes; Mervyn A. Jack
Abstract Some fast clustering algorithms for vector quantization (VQ) based on the LBG recursive algorithm are presented and compared. Experimental results in comparison to the conventional vector-quantization (VQ) clustering algorithm with speech data demonstrate that the best approach will save more than 99% in the number of multiplications, as well as considerable saving in the number of additions. The increase in the number of comparisons is moderate. An improve absolute error inequality (AEI) criterion for Euclidean distortion measure is also proposed and utilized in the VQ clustering algorithm.
IEEE Transactions on Speech and Audio Processing | 1998
Nestor Becerra Yoma; Fergus R. McInnes; Mervyn A. Jack
Addresses the problem of speech recognition with signals corrupted by additive noise at moderate signal-to-noise ratio (SNR). A model for additive noise is presented and used to compute the uncertainty about the hidden clean signal so as to weight the estimation provided by spectral subtraction. Weighted dynamic time warping (DTW) and Viterbi (HMM) algorithms are tested, and the results show that weighting the information along the signal can substantially increase the performance of spectral subtraction, an easily implemented technique, even with a poor estimation for noise and without using any information about the speaker. It is also shown that the weighting procedure can reduce the error rate when cepstral mean normalization is also used to cancel the convolutional noise.
international conference on acoustics, speech, and signal processing | 1992
J. L. Hieronymus; D. McKelvie; Fergus R. McInnes
The authors describe the results of an experiment to study the effectiveness of using acoustic stress to improve automatic speech recognition. The CSTR speech recognition system uses hidden semi-Markov models (HSMM) with a separate lexical search component. A hybrid prosodic component has been included which determines the sentence level stress and marks the vowel of stressed syllables as stressed in the phoneme lattice. Lexical stress is marked on all content words in the lexicon. Adding stress information to the system in this way resulted in a 65% reduction in word error rate and a 45% reduction in sentence error rate, relative to a baseline system without prosody.<<ETX>>
international conference on acoustics speech and signal processing | 1998
Néstor Becerra Yoma; Fergus R. McInnes; Mervyn A. Jack
A weighted Viterbi algorithm (HMM) is proposed and applied in combination with spectral subtraction and cepstral mean normalization to cancel both additive and convolutional noise in speech recognition. The weighted Viterbi approach is compared and used in combination with state duration modelling. The results presented show that a proper weight on the information provided by static parameters can substantially reduce the error rate, and that the weighting procedure improves better the robustness of the Viterbi algorithm than the introduction of temporal constraints with a low computational load. Finally, it is shown that the weighted Viterbi algorithm in combination with temporal constraints leads to a high recognition accuracy at moderate SNRs without the need of an accurate noise model.
international conference on spoken language processing | 1996
Maurílio Nunes Vieira; Fergus R. McInnes; Mervyn A. Jack
Dysphonic voices were used to compare electroglottographic (EGG) and acoustic measures of fundamental frequency (F/sub 0/) and jitter using a wavematching and an event based technique. Continuous speech was considered in the first part of the study, where the effects of pre filtering the acoustic signals and linearly smoothing the F/sub 0/ contours were analysed. The second part of the investigation compared jitter from sustained vowels (/i/ /a/, /u/), resulting in poor agreement for /i/ and /u/. In /a/ vowels, however, a relatively small mean normalised absolute difference (10.95%) was obtained with a method that is being proposed which combines peak picking and zero crossings, being able to detect a waveform pattern observed in such vowels and reject unreliable measures.
International Journal of Bank Marketing | 2009
Gareth Peevers; Fergus R. McInnes; Hazel Morton; A. Matthews; Mervyn A. Jack
Purpose – The purpose of this paper is to deliver empirical data comparing the effects of music with the effects of providing waiting time information on customers who are kept on hold when telephoning their bank. It aims to discover if either has a more positive impact on their affective responses (satisfaction), and to discern if these effects are measurably different to a telephone call without music, or waiting time information, and for different durations of wait.Design/methodology/approach – The methodology is an empirical study using bank customers as participants. Questionnaires and user observations techniques are employed to collect quantitative data which are analysed using repeated measures ANOVAs.Findings – Overall the presence of updates, or music, has a positive influence on satisfaction when compared to just a ringing tone, but for a waiting time of one minute music has no influence on satisfaction. The acceptable waiting time threshold plays a very critical influence on satisfaction with ...
IEEE Transactions on Speech and Audio Processing | 2001
Néstor Becerra Yoma; Fergus R. McInnes; Mervyn A. Jack; Sandra Dotto Stump; Lee Luan Ling
This paper addresses the problem of temporal constraints in the Viterbi algorithm using conditional transition probabilities. The results here presented suggest that in a speaker dependent small vocabulary task the statistical modelling of state durations is not relevant if the max and min state duration restrictions are imposed, and that truncated probability densities give better results than a metric previously proposed [1]. Finally, context dependent and context independent temporal restrictions are compared in a connected word speech recognition task and it is shown that the former leads to better results with the same computational load.
acm multimedia | 2013
Chidansh Amitkumar Bhatt; Maryam Habibi; Sandy Ingram; Stefano Masneri; Fergus R. McInnes; Nikolaos Pappas; Oliver Schreer
This paper presents the MUST-VIS system for the MediaMixer/VideoLectures.NET Temporal Segmentation and Annotation Grand Challenge. The system allows users to visualize a lecture as a series of segments represented by keyword clouds, with relations to other similar lectures and segments. Segmentation is performed using a multi-factor algorithm which takes advantage of the audio (through automatic speech recognition and word-based segmentation) and video (through the detection of actions such as writing on the blackboard). The similarity across segments and lectures is computed using a content-based recommendation algorithm. Overall, the graph-based representation of segment similarity appears to be a promising and cost-effective approach to navigating lecture databases.
Interacting with Computers | 2011
Nancie Gunson; Diarmid Marshall; Fergus R. McInnes; Mervyn A. Jack
This paper describes an experiment to investigate the usability of voiceprints for customer authentication in automated telephone banking. The usability of voiceprint authentication using digits (random strings and telephone numbers) and sentences (branded and unbranded) are compared in a controlled experiment with 204 telephone banking customers. Results indicate high levels of usability and customer acceptance for voiceprint authentication in telephone banking. Customers find voiceprint authentication based on digits more usable than that based on sentences, and a majority of participants would prefer to use digits.