Kathy M. Carbonell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kathy M. Carbonell is active.

Explore More

Publication

Featured researches published by Kathy M. Carbonell.

Frontiers in Psychology | 2014

Speech is not special… again.

Kathy M. Carbonell; Andrew J. Lotto

THE “SPECIALNESS” OF SPEECH As is apparent from reading the first line of nearly any research or review article on speech, the task of perceiving speech sounds is complex and the ease with which humans acquire, produce and perceive these sounds is remarkable. Despite the growing appreciation for the complexity of the perception of music, speech perception remains the most amazing and poorly understood auditory (and, if we may be so bold, perceptual) accomplishments of humans. Over the years, there has been considerable debate on whether this achievement is the result of general perceptual/cognitive mechanisms or “special” processes dedicated to the mapping of speech acoustics to linguistic representations (for reviews see Trout, 2001; Diehl et al., 2004). The most familiar proposal of the “specialness” of speech perception is the various incarnations of the Motor Theory of speech proposed by Liberman et al. (1967; Liberman and Mattingly, 1985, 1989). Given the status of research into audition in the 1950s and 1960s, it is not surprising that speech appeared to require processing not available in “normal” hearing. Much of the work at the time used relatively simple tones and noises to get at the basic psychoacoustics underlying the perception of pitch and loudness (though some researchers like Harvey Fletcher were also working on some basics of speech perception, Fletcher and Galt, 1950; Allen, 1996). Liberman and his collaborators discovered that the discrimination of acoustic changes in speech sounds did not look like the psychoacoustic measures of discrimination for pitch and loudness. Instead of following a Weber or Fechner law, the discrimination function had a peak near the categorization boundary between contrasting phonemes—a pattern of perceptual results that is referred to as Categorical Perception (Liberman et al., 1957). In addition, the acoustic cues to phonemic identity were not readily apparent with similar spectral patterns resulting in different phonemic percepts and acoustically disparate patterns resulting in identical phonemic percepts—the problem of “lack of invariance” (e.g., Liberman et al., 1952). The perception of these varying acoustic patterns was highly context-sensitive to preceding and following phonetic content in ways that appeared specific to the communicative constraints of speech and not applicable to the perception of other sounds—as in demonstrations of perceptual compensation for coarticulation, speaking rate normalization and talker normalization (e.g., Ladefoged and Broadbent, 1957; Miller and Liberman, 1979; Mann, 1980). One major source of evidence in favor of a Motor Theory account of speech perception is that information about a speaker’s production (anatomy or kinematics) from non-auditory sources can affect phonetic perception. The famed McGurk effect (McGurk and MacDonald, 1976), in which visual presentation of a talker can alter the auditory phonetic percept, is taken as evidence that listeners are integrating information about production from this secondary source. Fowler and Deckle (1991) have demonstrated a similar effect using haptic information gathered by touching the speaker’s face (see also Sato et al., 2010). Gick and Derrick (2009) reported that perception of consonant— vowel tokens in noise are biased toward voiceless stops (e.g., /pa/) when they are accompanied by a small burst of air on the skin of the listener, which could be interpreted as the aspiration that would more likely accompany the release of a voiceless stop. In addition, there have been several studies that have demonstrated that manipulations of the listener’s articulators can affect perception, which are supportive of the Motor Theory proposal that the mechanisms of production underlie the perception of speech. For example, Ito et al. (2009) obtained shifts in phoneme categorization resulting from external manipulation of the skin around the listener’s mouth in ways that would correspond to the deformations typical of producing these speech sounds (see also Yeung and Werker, 2013 for a similar demonstration with infants). Recently, Mochida et al. (2013) found that the ability to categorize consonants can be influenced by the simultaneous silent production of these consonants. Typically, these studies are proffered as evidence for a direct role of speech motor processing in speech perception. Independent of this proposed motor basis of perception, others have suggested the existence of a special speech or phonetic mode of perception based on evidence of neural and behavioral responses to the same stimuli being modulated by whether or not the listener believes the signal to be speech or non-speech (e.g., Tomiak et al., 1987; Vroomen and Baart, 2009; Stekelenburg and Vroomen, 2012).

Journal of Voice | 2015

Discriminating Simulated Vocal Tremor Source Using Amplitude Modulation Spectra

Kathy M. Carbonell; Rosemary A. Lester; Brad H. Story; Andrew J. Lotto

OBJECTIVES/HYPOTHESIS Sources of vocal tremor are difficult to categorize perceptually and acoustically. This article describes a preliminary attempt to discriminate vocal tremor sources through the use of spectral measures of the amplitude envelope. The hypothesis is that different vocal tremor sources are associated with distinct patterns of acoustic amplitude modulations. STUDY DESIGN Statistical categorization methods (discriminant function analysis) were used to discriminate signals from simulated vocal tremor with different sources using only acoustic measures derived from the amplitude envelopes. METHODS Simulations of vocal tremor were created by modulating parameters of a vocal fold model corresponding to oscillations of respiratory driving pressure (respiratory tremor), degree of vocal fold adduction (adductory tremor), and fundamental frequency of vocal fold vibration (F0 tremor). The acoustic measures were based on spectral analyses of the amplitude envelope computed across the entire signal and within select frequency bands. RESULTS The signals could be categorized (with accuracy well above chance) in terms of the simulated tremor source using only measures of the amplitude envelope spectrum even when multiple sources of tremor were included. CONCLUSIONS These results supply initial support for an amplitude-envelope-based approach to identify the source of vocal tremor and provide further evidence for the rich information about talker characteristics present in the temporal structure of the amplitude envelope.

Journal of the Acoustical Society of America | 2012

Discriminating languages with general measures of temporal regularity and spectral variance

Kathy M. Carbonell; Dan Brenner; Andrew J. Lotto

There has been a lot of recent interest in distinguishing languages based on their rhythmic differences. A common successful approach involves measures of relative durations and duration variability of vowels and consonants in utterances. Recent studies have shown that more general measures of temporal regularities in the amplitude envelope in separate frequency bands (the Envelope Modulation Spectrum) can reliably discriminate between English and Spanish [Carbonell et al. J. Acoust. Soc. Am. 129, 2680.]. In the current study, these temporal structure measures were supplemented with measures of the mean and variance of spectral energy in octave bands as well as with traditional linguistic measures. Using stepwise discriminant analysis and a set of productions from Japanese, Korean and Mandarin speakers, this suite of both acoustic and linguistic measures were tested together and pitted against each other to determine the most efficient discriminators of language. The results provide insight into what the ...

Journal of the Acoustical Society of America | 2011

Stable production rhythms across languages for bilingual speakers.

Kathy M. Carbonell; Kaitlin L. Lansford; Rene L. Utianski; Julie M. Liss; Sarah C. Sullivan; Andrew J. Lotto

There has been a great deal of work on classifying spoken languages according to their perceived or acoustically‐measured rhythmic structures. The current study examined the speech of 12 Spanish‐English bilinguals producing sentences in both languages using rhythmic measures based on the amplitude envelopes extracted from different frequency regions—the envelope modulation spectrum (EMS). Using discriminant factor analysis, EMS variables demonstrated a moderate ability to classify the language being spoken suggesting that rhythmic differences between languages survive even when speaker is controlled. More interesting is the fact that EMS variables could reliably classify which speaker produced each sentence even across languages. This result suggests that there are stable rhythmic structures in an individual talker’s speech that are apparent above and beyond the structural constraints of the language spoken. The EMS appears capable of describing systematic characteristics of both the talker and the langua...

Journal of the Acoustical Society of America | 2011

Discriminating language and talker using non-linguistic measures of rhythm, spectral energy and f0

Kathy M. Carbonell; Kaitlin L. Lansford; Rene L. Utianski; Julie M. Liss; Andrew J. Lotto

Recent studies have shown that rhythm metrics calculated from amplitude envelopes extracted from octave bands across the spectrum (the envelope modulation spectrum or EMS) can reliably discriminate between spoken Spanish and English even when produced by the same speakers [Carbonell et al., J. Acoust. Soc. Am. 129, 2680]. Additionally, bilingual speakers could be discriminated fairly seven females and five males well on EMS variables even across sentences spoken in the different languages. In the current study, EMS, a general acoustic measure with no reference to phoneme/linguistic entities, was supplemented with measures of the mean and variance of spectral energy in each octave band as well as the mean and variance of fundamental frequency. Using stepwise discriminant analysis and the set of bilingual productions of Spanish and English, it was determined that language discrimination was excellent using both EMS and spectral measures, whereas spectral and f0 measures were most informative for speaker dis...

Journal of the Acoustical Society of America | 2010

Presence of preceding sound affects the neural representation of speech sounds: Behavioral data.

Kathy M. Carbonell; Radhika Aravamudhan; Andrew J. Lotto

Traditionally, context‐sensitive speech perception has been demonstrated by eliciting shifts in target sound categorization through manipulation of the phonemic/spectral content of surrounding context. For example, changing the third formant frequency of a preceding context (from /al/ to /ar/) can result in significant shifts in target categorization (from /ga/ to /da/). However, it is probable that the most salient difference in context is between the presence or absence of any other sound. The question becomes whether this large change in context has substantial effects on target categorization as well. In the current study, participants were asked to categorize members of a series of syllables varying from /ga/ to /da/ presented in isolation or following /al/, /ar/, or /a/. The typical shifts in categorization were obtained for /al/ vsersus /ar/ contexts, but the shift in response between isolated presentation and any of the audible context conditions was much larger (with more /da/ responses in isolat...

Journal of the Acoustical Society of America | 2013

Degraded word recognition in isolation vs a carrier phrase

Kathy M. Carbonell; Andrew J. Lotto

Recognizing a spoken word presented in isolation is a markedly different task from recognizing a word in a carrier phrase. The presence of a carrier phrase provides additional challenges such as lexical segmentation but also provides additional information relevant to word recognition such as speaking rate and talker-specific spectral characteristics. The current set of studies is part of an attempt to determine how target word recognition differs in isolation versus in a carrier phrase. In an initial experiment, a set of 220 spoken CVC words were noise-vocoded (6 channel) and presented to listeners either in isolation or following a noise-vocoded carrier phrase—“The next word on the list is…” The target words were transcribed in each condition and scored for initial consonant accuracy and overall word accuracy. Despite the lack of semantic or syntactic information provided by the carrier phrase accuracy for both word and consonant recognition were much higher in the carrier phrase context. A second exper...

Journal of the Acoustical Society of America | 2012

Discriminating vocal tremor source from amplitude envelope modulations

Kathy M. Carbonell; Brad H. Story; Rosemary A. Lester; Andrew J. Lotto

Vocal tremor can have a variety of physiological sources. For example, tremors can result from involuntary oscillation of respiratory muscles (respiratory tremor), or of the muscles responsible for vocal fold adduction (adductory tremor) or lengthening (f0 tremor). While the sources of vocal tremor are distinct, they are notoriously difficult to categorize both perceptually and acoustically. In order to develop acoustic measures that can potentially distinguish sources of tremor, speech samples were synthesized using a kinematic model of the vocal folds attached to a model of the vocal tract and trachea [Titze, JASA, 75, 570-580; Story, 2005, JASA, 117, 3231-3254]. Tremors were created by modulating parameters of the vocal fold model corresponding to the three types mentioned above. The acoustic measures were related to temporal regularities in the amplitude envelope computed across the entire signal and select frequency bands. These measures could reliably categorize the samples by tremor source (as dete...

160th Meeting Acoustical Society of America 2010 | 2012

Absence or presence of preceding sound can change perceived phonetic identity

Kathy M. Carbonell; Andrew J. Lotto

Participants were asked to categorize a series of syllables varying from /ga/ to /da/ presented in isolation or following /al/, /ar/, /a/, or filtered noise bands. Typical shifts in categorization were obtained for /al/ vs. /ar/ contexts as predicted by compensation for coarticulation, but the shift in response between isolated presentation and any of the context conditions was much larger, even when the context was broadband noise. These results suggest that the effect of the presence of any context sound is greater than the effect of the content of the context sounds.

Journal of the Acoustical Society of America | 2010

Presence of preceding sound affects the neural representation of speech sounds: Frequency following response data.

Radhika Aravamudhan; Kathy M. Carbonell; Andrew J. Lotto

A substantial body of literature has focused on context effects in speech perception in which manipulation of the phonemic or spectral content of preceding sounds (e.g., /al/ versus /ar/) result in a shift in the perceptual categorization of a target syllable (e.g., /da/ versus /ga/). In a previous study utilizing the frequency‐following response (FFR) to measure neural correlates of these context effects [R. Aravamudhan, J. Acoust. Soc. Am. 126, 2204], it was noted that the representation of target formant trajectories were much weaker when the stimulus was presented in isolation versus following some type of context. To examine this effect explicitly, a series of syllables varying from /da/ to /ga/ was presented to listeners either in isolation or following the syllables /a/, /al/, or /ar/ (with a 50‐ms silent gap between context and target). FFR measures were obtained from EEG recordings while participants listened passively. The resulting narrow‐band spectrograms over the grand averages demonstrated t...

Explore More