Thomas C. Walters
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas C. Walters.
Neural Computation | 2010
Richard F. Lyon; Martin Rehn; Samy Bengio; Thomas C. Walters; Gal Chechik
To create systems that understand the sounds that humans are exposed to in everyday life, we need to represent sounds with features that can discriminate among many different sound classes. Here, we use a sound-ranking framework to quantitatively evaluate such representations in a large-scale task. We have adapted a machine-vision method, the passive-aggressive model for image retrieval (PAMIR), which efficiently learns a linear mapping from a very large sparse feature space to a large query-term space. Using this approach, we compare different auditory front ends and different ways of extracting sparse features from high-dimensional auditory images. We tested auditory models that use an adaptive polezero filter cascade (PZFC) auditory filter bank and sparse-code feature extraction from stabilized auditory images with multiple vector quantizers. In addition to auditory image models, we compare a family of more conventional mel-frequency cepstral coefficient (MFCC) front ends. The experimental results show a significant advantage for the auditory models over vector-quantized MFCCs. When thousands of sound files with a query vocabulary of thousands of words were ranked, the best precision at top-1 was 73 and the average precision was 35, reflecting a 18 improvement over the best competing MFCC front end.
Journal of the Acoustical Society of America | 2009
Richard E. Turner; Thomas C. Walters; Jessica J. M. Monaghan; Roy D. Patterson
This paper investigates the theoretical basis for estimating vocal-tract length (VTL) from the formant frequencies of vowel sounds. A statistical inference model was developed to characterize the relationship between vowel type and VTL, on the one hand, and formant frequency and vocal cavity size, on the other. The model was applied to two well known developmental studies of formant frequency. The results show that VTL is the major source of variability after vowel type and that the contribution due to other factors like developmental changes in oral-pharyngeal ratio is small relative to the residual measurement noise. The results suggest that speakers adjust the shape of the vocal tract as they grow to maintain a specific pattern of formant frequencies for individual vowels. This formant-pattern hypothesis motivates development of a statistical-inference model for estimating VTL from formant-frequency data. The technique is illustrated using a third developmental study of formant frequencies. The VTLs of the speakers are estimated and used to provide a more accurate description of the complicated relationship between VTL and glottal pulse rate as children mature into adults.
Archive | 2010
Roy D. Patterson; Etienne Gaudrain; Thomas C. Walters
This chapter is about the sounds made by musical instruments and how we perceive them. It explains the basics of musical note perception, such as why a particular instrument plays a specific range of notes; why instruments come in families; and why we hear distinctive differences between members of a given instrument family, even when they are playing the same note. The answers to these questions might, at first, seem obvious; one could say that brass instruments all make the same kind of sound because they are all made of brass, and the different members of the family sound different because they are different sizes. But answers at this level just prompt more questions, such as: What do we mean when we say the members of a family produce the same sound? What is it that is actually the same, and what is it that is different, when different instruments within a family play the same melody on the same notes? To answer these and similar questions, we examine the relationship between the physical variables of musical instruments, such as the length, mass, and tension of a string, and the variables of auditory perception, such as pitch, timbre, and loudness. The discussion reveals that there are three acoustic properties of musical sounds, as they occur in the air, between the instrument and the listener, that are particularly useful in summarizing the effects of the physical properties on the musical tones they produce, and in explaining how these musical tones produce the perceptions that we hear.
Journal of the Acoustical Society of America | 2007
David R. R. Smith; Thomas C. Walters; Roy D. Patterson
Glottal‐pulse rate (GPR) and vocal‐tract length (VTL) are important determinants of the perceived sex and age of the speaker [D. R. R. Smith and R. D. Patterson, J. Acoust. Soc. Am. 118, 3177–3186 (2005)]. Our previous research simulated the voices of variously‐sized speakers of both sexes by manipulating the recorded vowels of one adult male talker. The current study explored whether there are additional cues in the voices of men, women, and children that influence judgements of speaker sex and age. We manipulated the recorded vowels of an adult man, adult woman, young boy, and young girl, and determined the effect upon the distribution of sex and age responses (man, woman, boy, girl). Results show that the distribution of sex and age judgements across the GPR‐VTL plane is heavily influenced by GPR and VTL, but it is also affected by the original talker’s size (or age). The effect of original talker appears to be mainly due to the consistent difference between oral‐pharyngeal length ratios of children an...
international symposium on circuits and systems | 2010
Roy D. Patterson; Thomas C. Walters; Jessica J. M. Monaghan; Christian Feldbauer; Toshio Irino
The syllables of speech contain information about the vocal tract length (VTL) of the speaker as well as the phonetic message. Ideally, the pre-processor used for automatic speech recognition (ASR) should segregate the phonetic message from the VTL information. This paper describes a method to calculate VTL-invariant auditory feature vectors from speech, using a method in which the message and the VTL are segregated. Spectra produced by an auditory filterbank are summarized by a Gaussian mixture model (GMM) to produce a low-dimensional feature vector. These features are evaluated for robustness in comparison with conventional mel-frequency cepstral coefficients (MFCCs) using a hidden-Markov-model (HMM) recognizer. A dynamic, compressive gammachirp (dcGC) auditory filterbank is also introduced. The dcGC provides a level-dependent spectral analysis, with near instantaneous compression, and two-tone suppression.
Advances in Experimental Medicine and Biology | 2013
Roy D. Patterson; D. Timothy Ives; Thomas C. Walters; Richard F. Lyon
Lyon (J Acoust Soc Am 130:3893-3904, 2011) has described how a cascade of simple asymmetric resonators (CAR) can be used to simulate the filtering of the basilar membrane and how the gain of the resonators can be manipulated by a feedback network to simulate the fast-acting compression (FAC) characteristic of cochlear processing. When the compression is applied to complex tones, each pair of primary components produces both quadratic and cubic distortion tones (DTs), and the cascade architecture of the CAR-FAC system propagates them down to their appropriate place along the basilar membrane, where they combine additively with each other and any primary components at that frequency. This suggests that CAR-FAC systems might be used to study the role of compressive distortion in the perception of complex sounds and that behavioural measurements of cochlear distortion data might be useful when tuning the parameters of CAR-FAC systems.
Archive | 2010
Roy D. Patterson; Thomas C. Walters; Jessica J. M. Monaghan; Etienne Gaudrain
The purpose of this paper is to draw attention to the definition of timbre as it pertains to the vowels of speech. There are two forms of size information in these “source-filter” sounds, information about the size of the excitation mechanism (the vocal folds), and information about the size of the resonators in the vocal tract that filter the excitation before it is projected into the air. The current definitions of pitch and timbre treat the two forms of size information differently. In this paper, we argue that the perception of speech sounds by humans suggests that the definition of timbre would be more useful if it grouped the size variables together and separated the pair of them from the remaining properties of these sounds.
Journal of the Acoustical Society of America | 2005
Roy D. Patterson; Thomas C. Walters; Toshio Irino
At the heart of each syllable of speech is a vowel; the wave consists of a stream of glottal pulses, each with a resonance attached. The vowel contains three important components of the information in the larger communication: the glottal pulse rate (the pitch), the resonance shape (the message), and the resonance scale (the vocal tract length). Recent experiments on the perception of vowels show that variability in glottal pulse rate and vocal tract length has surprisingly little effect on the humans ability to recognise the vowel or discriminate speaker size, despite the variability it imparts to the spectra of these sounds. We appear to have an automatic normalization process to scale vowels and extract the message independent of the carrier. Many animal calls are like syllables in form and duration, and normalization is essential here as well if animals are to correctly identify the species of the sender and not be confused by changes in pulse rate and resonance scale that simply indicate a size diffe...
Archive | 2011
Geremy A. Heitz; Adam Berenzweig; Jason Weston; Ron Weiss; Sally A. Goldman; Thomas C. Walters; Samy Bengio; Douglas Eck; Jay M. Ponte; Ryan Michael Rifkin
Archive | 2010
Richard F. Lyon; Martin Rehn; Thomas C. Walters; Samy Bengio; Gal Chechik