William A. Ainsworth
Keele University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by William A. Ainsworth.
International Journal of Human-computer Studies \/ International Journal of Man-machine Studies | 1992
William A. Ainsworth; S. R. Pratt
Abstract In a noisy environment speech recognizers make mistakes. In order that these errors can be detected the system can synthesize the word recognized and the user can respond by saying “correction” when the word was not recognized correctly. The mistake can then be corrected. Two error-correcting strategies have been investigated. In one, repetition-with-elimination, when a mistake has been detected the system eliminates its last response from the active vocabulary and then the user repeats the word that has been misrecognized. In the other, elimination-without-repetition, the system suggests the next-most-likely word based on the output of its pattern-matching algorithm. It was found that the former strategy, with the user repeating the word, required less trials to correct the recognition errors. A model which relates the average number of corrections to the recognition rate has been developed which provides a good fit to the data.
IEEE Transactions on Audio and Electroacoustics | 1973
William A. Ainsworth
The feasibility of converting English text into speech using an inexpensive computer and a small amount of stored data has been investigated. The text is segmented into breath groups, the orthography is converted into a phonemic representation, lexical stress is assigned to appropriate syllables, then the resulting string of symbols is converted by synthesis-by-rule into the parameter values for controlling an analogue speech synthesizer. The algorithms for performing these conversions are described in detail and evaluated independently, and the intelligibility of the resulting synthetic speech is assessed by listening tests.
IEEE Transactions on Speech and Audio Processing | 1998
Fabrice Plante; Georg Meyer; William A. Ainsworth
An improvement of the speech spectrogram based on the method of reassignment is presented. This method consists of moving each point of the spectrogram to a new point that represents the distribution of the energy in the time-frequency window more accurately. Examples of natural speech show an improvement of the energy localization in both time and frequency domains. This allows a better description of speech features.
International Journal of Human-computer Studies \/ International Journal of Man-machine Studies | 1988
William A. Ainsworth
Abstract No matter how much the performance of speech recognition systems improves, it is unlikely that perfect recognition will always be possible in practical situations. Environmental sounds will interfere with the recognition. In such circumstances it is sensible to provide feedback so that any errors which occur may be detected and corrected. In some situations, such as when the eyes are busy or over the telephone, it is necessary to provide feedback auditorily. This takes time, so the most efficient procedure should be determined. In the case of entering digits into a computer the question arises as to whether feedback should be provided after each digit has been spoken or after a string of digits has been recognized. It has been be found that this depends upon the accuracy of the recognizer and on the times required for recognizing the utterances and for changing from recognizing to synthesizing speech.
International Journal of Human-computer Studies \/ International Journal of Man-machine Studies | 1974
William A. Ainsworth
A system for synthesizing speech from a phonetic input is described. A string of phonetic symbols representing the sentence to be uttered is transformed into the control signals required by a parametric speech synthesizer using a small digital computer. The performance of the system was investigated by listening tests. In the first set of experiments consonant-vowel syllables were synthesized, and presented to listeners for identification. The vowels were readily identified, but the fricatives less so. In the second set of experiments the intelligibility of synthesized sentences was examined. It was found that after about an hour of transcribing the sentences, listeners identified about 90% of the words correctly.
Phonetica | 2001
René Carré; William A. Ainsworth; Paul Jospa; Shinji Maeda; Valerie Pasdeloup
In this paper, the perceptual effects of vowel-to-vowel transitions determined by different temporal variations of model parameters which specify the shapes of the vocal tract area function are investigated. It is shown that, (a) the method of deformation of the vocal tract area function between two targets can be perceptually important and (b) conversely, within certain limits, the time course of parameters from one state to another, and the precise synchronization of two parameters is not important for the correct identification of a vowel series. These characteristics are necessary but not sufficient to prove the existence of a phonetic gesture percept.
Speech Communication | 1983
Kuldip Kumar Paliwal; William A. Ainsworth; D. Lindsay
Abstract An experiment has been performed where various two-formant models reported in the literature were assessed as to their ability to predict the formant frequencies obtained in a vowel identification task. An alternative model is proposed in which the auditory processing of vowel sounds is assumed to take place in two stages: a peripheral processing stage and a central processing stage. In the peripheral stage the speech spectrum is transformed to its auditory equivalent and the formant frequencies are extracted from this spectrum using a peak-picking mechanism. The central stage performs a two-formant approximation on the results of the first stage operation, and it is this formant pair that vowel identification is taken to operate on during vowel perception. The first and second formant frequencies of this two-formant model are taken to be equal to the first and second formant frequencies extracted at the first stage plus a perturbation term which accounts for the interaction effects of the neighbouring formants. The perturbation caused by each of these neighbouring formants is inversely proportional to its separation from the main formants. This model compares favourably with previous models in its prediction of the formant frequencies obtained from the vowel identification task.
IEEE Transactions on Audio and Electroacoustics | 1972
William A. Ainsworth
A system for synthesizing speech is described. It consists of an electronic synthesizer controlled by a small digital computer. The computer uses stored rules to convert a phonetic input into the analogue voltages required for driving the synthesizer. It has been found possible to make the system operate in real time so that the acoustic output is generated at normal speaking rates.
Journal of the Acoustical Society of America | 1999
Dekun Yang; Georg Meyer; William A. Ainsworth
The fundamental process of auditory scene analysis is the organization of elementary acoustic features in a complex auditory scene into grouped meaningful auditory streams. There are two important issues which need to be addressed for modeling auditory scene analysis. The first issue is concerned with the representation of elementary acoustic features, whilst the second issue is related to the binding mechanism. This paper presents a neural model for auditory scene analysis in which a two‐dimensional amplitude modulation (AM) map is used to represent elementary acoustic features and the synchronization of neural oscillators is adopted as the binding mechanism. The AM map captures the modulation frequencies of sound signals filtered by an auditory filterbank. Since the modulation frequencies are the F0‐related features for voiced speech signals, F0‐based segregation can be utilized to group the auditory streams. The grouping of F0‐related features is attained as the formation of the synchronization of nonl...
Speech Communication | 1997
William A. Ainsworth; René Carré
Abstract Speech analysis shows that the second formant transitions in vowel–vowel utterances are not always of the same duration as those of the first formant transitions nor are they always synchronised. Moreover the formant transitions often move initially in a different direction from their final target. In order to investigate whether these deviations from linearity and synchrony are perceptually significant a series of listening tests have been conducted with the vowel pair /a/–/i/. It was found that delays between the first and second formant transitions of up to 30 ms are not perceived, nor are differences in duration of up to 40 ms if the first and second formants start or end simultaneously. If the second formant transition is symmetric in time with respect to the first formant differences of up to 50 ms are tolerated. Excursions in second formant transition shape of up to about 500 Hz are also not perceived. These results suggest that most of the deviations from linearity and synchrony found in natural vowel–vowel utterances are not perceptually significant.