Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Peter F. Assmann is active.

Publication


Featured researches published by Peter F. Assmann.


Journal of the Acoustical Society of America | 2004

Cochlear implant speech recognition with speech maskers

Ginger S. Stickney; Fan-Gang Zeng; Ruth Y. Litovsky; Peter F. Assmann

Speech recognition performance was measured in normal-hearing and cochlear-implant listeners with maskers consisting of either steady-state speech-spectrum-shaped noise or a competing sentence. Target sentences from a male talker were presented in the presence of one of three competing talkers (same male, different male, or female) or speech-spectrum-shaped noise generated from this talker at several target-to-masker ratios. For the normal-hearing listeners, target-masker combinations were processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli, a normal-hearing control group maintained high levels of intelligibility down to target-to-masker ratios as low as 0 dB and showed a release from masking, producing better performance with single-talker maskers than with steady-state noise. In contrast, no masking release was observed in either implant or normal-hearing subjects listening through an implant simulation. The performance of the simulation and implant groups did not improve when the single-talker masker was a different talker compared to the same talker as the target speech, as was found in the normal-hearing control. These results are interpreted as evidence for a significant role of informational masking and modulation interference in cochlear implant speech recognition with fluctuating maskers. This informational masking may originate from increased target-masker similarity when spectral resolution is reduced.


Journal of the Acoustical Society of America | 1986

Modeling the role of inherent spectral change in vowel identification

Terrance M. Nearey; Peter F. Assmann

Statistical analysis of F1 and F2 measurements from nucleus and offglide sections of isolated Canadian English vowels shows significant formant frequency change not only for the ‘‘phonetic diphthongs’’ /e/ and /o/, but also for the ‘‘monophthongs’’ /ι/, /q/, and /1/. In a perceptual experiment, brief sections were extracted from ‘‘nucleus’’ and ‘‘offglide’’ portions of naturally produced vowels. Two sections from each vowel were presented to listeners in each of three conditions: (1) natural order (nucleus followed by offglide); (2) repeated nucleus (nucleus followed by itself); and (3) reverse (offglide followed by nucleus). Listeners’ error rates for the natural order condition were comparable to those for unmodified full vowels (averaging 14% and 13%, respectively). Significantly higher error rates were found for the repeated nucleus (32%) and reverse (38%) conditions. Observed confusion matrices were strongly correlated with predictions from a pattern recognition model incorporating the formant measur...


Archive | 2004

The Perception of Speech Under Adverse Conditions

Peter F. Assmann; Quentin Summerfield

Speech is the primary vehicle of human social interaction. In everyday life, speech communication occurs under an enormous range of different environmental conditions. The demands placed on the process of speech communication are great, but nonetheless it is generally successful. Powerful selection pressures have operated to maximize its effectiveness. The adaptability of speech is illustrated most clearly in its resistance to distortion. In transit from speaker to listener, speech signals are often altered by background noise and other interfering signals, such as reverberation, as well as by imperfections of the frequency or temporal response of the communication channel. Adaptations for robust speech transmission include adjustments in articulation to offset the deleterious effects of noise and interference (Lombard 1911; Lane and Tranel 1971); efficient acousticphonetic coupling, which allows evidence of linguistic units to be conveyed in parallel (Hockett 1955; Liberman et al. 1967; Greenberg 1996; see Diehl and Lindblom, Chapter 3); and specializations of auditory perception and selective attention (Darwin and Carlyon 1995). Speech is a highly efficient and robust medium for conveying information under adverse conditions because it combines strategic forms of redundancy to minimize the loss of information. Coker and Umeda (1974, p. 349) define redundancy as “any characteristic of the language that forces spoken messages to have, on average, more basic elements per message, or more cues per basic element, than the barest minimum [necessary for conveying the linguistic message].” This definition does not address the function of redundancy in speech communication, however. Coker and Umeda note that “redundancy can be used effectively; or it can be squandered on uneven repetition of certain data, leaving other crucial items very vulnerable to noise. . . . But more likely, if a redundancy is a property of a language and has to be learned, then it has a purpose.” Coker and Umeda conclude that the purpose of redundancy in speech communication is to provide a basis for error correction and resistance to noise.


Journal of the Acoustical Society of America | 2000

Time-varying spectral change in the vowels of children and adults

Peter F. Assmann; William F. Katz

Recent studies have shown that time-varying changes in formant pattern contribute to the phonetic specification of vowels. This variation could be especially important in childrens vowels, because children have higher fundamental frequencies (f0s) than adults, and formant-frequency estimation is generally less reliable when f0 is high. To investigate the contribution of time-varying changes in formant pattern to the identification of childrens vowels, three experiments were carried out with natural and synthesized versions of 12 American English vowels spoken by children (ages 7, 5, and 3 years) as well as adult males and females. Experiment 1 showed that (i) vowels generated with a cascade formant synthesizer (with hand-tracked formants) were less accurately identified than natural versions; and (ii) vowels synthesized with steady-state formant frequencies were harder to identify than those which preserved the natural variation in formant pattern over time. The decline in intelligibility was similar across talker groups, and there was no evidence that formant movement plays a greater role in childrens vowels compared to adults. Experiment 2 replicated these findings using a semi-automatic formant-tracking algorithm. Experiment 3 showed that the effects of formant movement were the same for vowels synthesized with noise excitation (as in whispered speech) and pulsed excitation (as in voiced speech), although, on average, the whispered vowels were less accurately identified than their voiced counterparts. Taken together, the results indicate that the cues provided by changes in the formant frequencies over time contribute materially to the intelligibility of vowels produced by children and adults, but these time-varying formant frequency cues do not interact with properties of the voicing source.


Journal of the Acoustical Society of America | 1991

Perception of concurrent vowels : effects of harmonic misalignment and pitch-period asynchrony

Quentin Summerfield; Peter F. Assmann

Three experiments examined the ability of listeners to identify steady-state synthetic vowel-like sounds presented concurrently in pairs to the same ear. Experiment 1 confirmed earlier reports that listeners identify the constituents of such pairs more accurately when they differ in fundamental frequency (f0) by about a half semitone or more, compared to the condition where they have the same f0. When the constituents have different f0s, corresponding harmonics of the two vowels are misaligned in frequency and corresponding pitch periods are asynchronous in time. These differences provide cues that might aid identification. Experiments 2 and 3 determined whether listeners can use these cues, divorced from a difference in f0, to improve their accuracy of identification. Harmonic misalignment was beneficial when the constituents had an f0 of 200 Hz so that the harmonics of each constituent were well separated in frequency. Pitch-period asynchrony was beneficial when the constituents had an f0 of 50 Hz so that the onsets of the pitch periods of each constituent were well separated in time. Neither cue was beneficial when both constituents had an f0 of 100 Hz. It is unlikely, therefore, that either cue contributed to the improvement in performance found in Experiment 1 where the constituents were given different f0s close to 100 Hz. Rather, it is argued that performance improved in Experiment 1 primarily because the two f0s specified two pitches that could be used to segregate the contributions of each vowel in the composite waveform.


Journal of the Acoustical Society of America | 2007

Effects of cochlear implant processing and fundamental frequency on the intelligibility of competing sentences

Ginger S. Stickney; Peter F. Assmann; Janice Chang; Fan-Gang Zeng

Speech perception in the presence of another competing voice is one of the most challenging tasks for cochlear implant users. Several studies have shown that (1) the fundamental frequency (F0) is a useful cue for segregating competing speech sounds and (2) the F0 is better represented by the temporal fine structure than by the temporal envelope. However, current cochlear implant speech processing algorithms emphasize temporal envelope information and discard the temporal fine structure. In this study, speech recognition was measured as a function of the F0 separation of the target and competing sentence in normal-hearing and cochlear implant listeners. For the normal-hearing listeners, the combined sentences were processed through either a standard implant simulation or a new algorithm which additionally extracts a slowed-down version of the temporal fine structure (called Frequency-Amplitude-Modulation-Encoding). The results showed no benefit of increasing F0 separation for the cochlear implant or simulation groups. In contrast, the new algorithm resulted in gradual improvements with increasing F0 separation, similar to that found with unprocessed sentences. These results emphasize the importance of temporal fine structure for speech perception and demonstrate a potential remedy for difficulty in the perceptual segregation of competing speech sounds.


Hearing Research | 2006

Effects of electrode design and configuration on channel interactions

Ginger S. Stickney; Philipos C. Loizou; Lakshmi N. Mishra; Peter F. Assmann; Robert V. Shannon; Jane M. Opie

A potential shortcoming of existing multichannel cochlear implants is electrical-field summation during simultaneous electrode stimulation. Electrical-field interactions can disrupt the stimulus waveform prior to neural activation. To test whether speech intelligibility can be degraded by electrical-field interaction, speech recognition performance and interaction were examined for three Clarion electrode arrays: the pre-curved, enhanced bipolar electrode array, the enhanced bipolar electrode with an electrode positioner, and the Hi-Focus electrode with a positioner. Channel interaction was measured by comparing stimulus detection thresholds for a probe signal in the presence of a sub-threshold perturbation signal as a function of the separation between the two simultaneously stimulated electrodes. Correct identification of vowels, consonants, and words in sentences was measured with two speech strategies: one which used simultaneous stimulation and another which used sequential stimulation. Speech recognition scores were correlated with measured electrical-field interaction for the strategy which used simultaneous stimulation but not the strategy which used sequential stimulation. Higher speech recognition scores with the simultaneous strategy were generally associated with lower levels of electrical-field interaction. Electrical-field interaction accounted for as much as 70% of the variance in speech recognition scores, suggesting that electrical-field interaction is a significant contributor to the variability found across patients who use simultaneous strategies.


Ear and Hearing | 2003

Comparison of speech processing strategies used in the Clarion implant processor.

Philipos C. Loizou; Ginger S. Stickney; Lakshmi N. Mishra; Peter F. Assmann

Objective To evaluate the performance of the various speech processing strategies supported by the Clarion S-Series implant processor. Design Five different speech-processing strategies [the Continuous Interleaved Sampler (CIS), the Simultaneous Analog Stimulation (SAS), the Paired Pulsatile Sampler (PPS), the Quadruple Pulsatile Sampler (QPS) and the hybrid (HYB) strategies] were implemented on the Clarion Research Interface platform. These speech-processing strategies varied in the degree of electrode simultaneity, with the SAS strategy being fully simultaneous (all electrodes are stimulated at the same time), the PPS and QPS strategies being partially simultaneous and the CIS strategy being completely sequential. In the hybrid strategy, some electrodes were stimulated using SAS, and some were stimulated using CIS. Nine Clarion CIS users were fitted with the above speech processing strategies and tested on vowel, consonant and word recognition in quiet. Results There were no statistically significant differences in the mean group performance between the CIS and SAS strategies on vowel and sentence recognition. A statistically significant difference was found only on consonant recognition. Individual results, however, indicated that most subjects performed worse with the SAS strategy compared with the CIS strategy on all tests. About 33% of the cochlear implant users benefited from the PPS and QPS strategies on consonant and word recognition. Conclusions If temporal information were the primary factor in speech recognition with cochlear implants then SAS should consistently produce higher speech recognition scores than CIS. That was not the case, however, because most CIS users performed significantly worse with the SAS strategy on all speech tests. Hence, there seems to be a trade-off between improving the temporal resolution with an increasing number of simultaneous channels and introducing distortions from electrical-field interactions. Performance for some CI users improved when the number of simultaneous channels increased to two (PPS strategy) and four (QPS strategy). The improvement with the PPS and QPS strategies must be due to the higher rates of stimulation. The above results suggest that CIS users are less likely to benefit with the SAS strategy, and they are more likely to benefit from the PPS and QPS strategies, which provide higher rates of stimulation with small probability of channel interaction.


Journal of the Acoustical Society of America | 2005

Synthesis fidelity and time-varying spectral change in vowels

Peter F. Assmann; William F. Katz

Recent studies have shown that synthesized versions of American English vowels are less accurately identified when the natural time-varying spectral changes are eliminated by holding the formant frequencies constant over the duration of the vowel. A limitation of these experiments has been that vowels produced by formant synthesis are generally less accurately identified than the natural vowels after which they are modeled. To overcome this limitation, a high-quality speech analysis-synthesis system (STRAIGHT) was used to synthesize versions of 12 American English vowels spoken by adults and children. Vowels synthesized with STRAIGHT were identified as accurately as the natural versions, in contrast with previous results from our laboratory showing identification rates 9%-12% lower for the same vowels synthesized using the cascade formant model. Consistent with earlier studies, identification accuracy was not reduced when the fundamental frequency was held constant across the vowel. However, elimination of time-varying changes in the spectral envelope using STRAIGHT led to a greater reduction in accuracy (23%) than was previously found with cascade formant synthesis (11%). A statistical pattern recognition model, applied to acoustic measurements of the natural and synthesized vowels, predicted both the higher identification accuracy for vowels synthesized using STRAIGHT compared to formant synthesis, and the greater effects of holding the formant frequencies constant over time with STRAIGHT synthesis. Taken together, the experiment and modeling results suggest that formant estimation errors and incorrect rendering of spectral and temporal cues by cascade formant synthesis contribute to lower identification accuracy and underestimation of the role of time-varying spectral change in vowels.


Attention Perception & Psychophysics | 1989

Auditory enhancement and the perception of concurrent vowels

Quentin Summerfield; Peter F. Assmann

Listeners identified both constituents ofdouble vowels created by summing the waveforms of pairs of synthetic vowels with the same duration and fundamental frequency, Accuracy of identification was significantly above chance. Effects of introducing such double vowels by visual or acoustical precursor stimuli were examined. Precursors specified the identity of one of the two constituent vowels. Performance was scored as the accuracy with which the other vowel was identified. Visual precursors were standard English spellings of one member of the vowel pair; acoustical precursors were 1-sec segments of one member of the vowel pair. Neither visual precursors nor contralateral acoustical precursors improved performance over the condition with no precursor. Thus, knowledge of the identity of one of the constituents of a double vowel does not help listeners to identify the other constituent. A significant improvement in performance did occur with ipsilateral acoustical precursors, consistent with earlier demonstrations that frequency components which undergo changes in spectral amplitude achieve enhanced auditory prominence relative to unchanging components. This outcome demonstrates the joint but independent operation of auditory and perceptual processes underlying the ability of listeners to understand speech despite adversely peaked frequency responses in communication channels.

Collaboration


Dive into the Peter F. Assmann's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Vahid Montazeri

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Shaikat Hossain

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sneha V. Bharadwaj

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

William F. Katz

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Daniel J. Hubbard

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge