Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Susan Shaiman is active.

Publication


Featured researches published by Susan Shaiman.


Journal of the Acoustical Society of America | 2007

Speech signal modification to increase intelligibility in noisy environments

Sungyub Yoo; J. Robert Boston; Amro El-Jaroudi; Ching-Chung Li; John D. Durrant; Kristie Kovacyk; Susan Shaiman

The role of transient speech components on speech intelligibility was investigated. Speech was decomposed into two components--quasi-steady-state (QSS) and transient--using a set of time-varying filters whose center frequencies and bandwidths were controlled to identify the strongest formant components in speech. The relative energy and intelligibility of the QSS and transient components were compared to original speech. Most of the speech energy was in the QSS component, but this component had low intelligibility. The transient component had much lower energy but was almost as intelligible as the original speech, suggesting that the transient component included speech elements important to speech perception. A modified version of speech was produced by amplifying the transient component and recombining it with the original speech. The intelligibility of the modified speech in background noise was compared to that of the original speech, using a psychoacoustic procedure based on the modified rhyme protocol. Word recognition rates for the modified speech were significantly higher at low signal-to-noise ratios (SNRs), with minimal effect on intelligibility at higher SNRs. These results suggest that amplification of transient information may improve the intelligibility of speech in noise and that this improvement is more effective in severe noise conditions.


Journal of the Acoustical Society of America | 1989

Kinematic and electromyographic responses to perturbation of the jaw

Susan Shaiman

The task-dependent organization of sensorimotor mechanisms during the production of speech was investigated using a perturbation paradigm. Six subjects received unanticipated jaw perturbations before and during tongue elevation for [aedae], in which the lips do not participate, and bilabial closure for [aebae], in which the tongue does not participate. A strain gauge system was used to monitor inferior-superior displacements of the upper lip, lower lip, and jaw, while hooked-wire electrodes monitored muscle activity in various muscles of the lips, jaw, and tongue. Results indicated significant compensatory kinematic adjustments to jaw perturbations in the lips and/or jaw during [aebae], but no labial compensations during [aedae] (with the exception of one subject). EMG responses were inconsistent and not necessarily indicative of the kinematic findings. Individual subjects responded to perturbations reliably but differently, using different combinations of involved articulators to achieve bilabial closure and lingua-alveolar contact. The current study supports earlier research which suggests that the components of the motor system are flexibly assembled, based on the requirements of the specific task. That is, compensatory responses to sensory information occur only when such responses are functionally necessary.


Experimental Brain Research | 2002

Task-specific sensorimotor interactions in speech production

Susan Shaiman; Vincent L. Gracco

Speaking involves the activity of multiple muscles moving many parts (articulators) of the vocal tract. In previous studies, it has been shown that mechanical perturbation delivered to one moving speech articulator, such as the lower lip or jaw, results in compensatory responses in the perturbed and other non-perturbed articulators, but not in articulators that are uninvolved in the specific speech sound being produced. These observations suggest that the speech motor control system may be organized in a task-specific manner. However, previous studies have not used the appropriate controls to address the mechanism by which this task-specific organization is achieved. A lack of response in a non-perturbed articulator may simply reflect the fact that the muscles examined were not active. Alternatively, there may be a specific gating of somatic sensory signals due to task requirements. The present study was designed to address the nature of the underlying sensorimotor organization. Unanticipated mechanical loads were applied to the upper lip during the “p” in “apa” and “f” in “afa” in six subjects. Both lips are used to produce “p”, while only the lower lip is used for “f”. For “apa”, both upper lip and lower lip responses were observed following upper lip perturbation. For “afa”, no upper lip or lower lip responses were observed following the upper lip perturbation. The differential response of the lower lip, which was phasically active during both speech tasks, indicates that the neural organization of these two speech tasks differs not only in terms of the different muscles used to produce the different movements, but also in terms of the sensorimotor interactions within and across the two lips.


Journal of Phonetics | 2001

Kinematics of compensatory vowel shortening: the effect of speaking rate and coda composition on intra- and inter-articulatory timing

Susan Shaiman

Abstract The acoustic shortening of vowels has been demonstrated to occur across a variety of contextual variations, including speaking rate and coda composition (i.e., singleton consonant vs. consonant cluster). The current study examined two possible types of kinematic adjustments which may account for changes in vowel duration. Results indicated that increased speaking rates were effected by changes primarily in the dynamic specification of the jaw opening gesture, but may also have occurred in combination with increased articulatory overlap of the vowel by the successive consonant(s). Consonant cluster productions appeared to be effected solely by increased articulatory overlap, with these kinematic adjustments being maintained across the different speaking rates. As the timing of the jaw closing onset shifted earlier for cluster productions, the relative timing of the upper lip and jaw did not remain invariant. Substantial intersubject variation was observed in the implementation of these kinematic strategies. These findings suggest that, in order to achieve the intended acoustic-perceptual goals, both intra- and inter-articulatory timing may not be absolutely invariant, but rather, systematically and individually organized across task manipulations.


Signal Processing | 2007

New signal decomposition method based speech enhancement

C. Tantibundhit; J.R. Boston; Ching-Chung Li; John D. Durrant; Susan Shaiman; Kristie Kovacyk; Amro El-Jaroudi

The auditory system, like the visual system, may be sensitive to abrupt stimulus changes, and the transient component in speech may be particularly critical to speech perception. If this component can be identified and selectively amplified, improved speech perception in background noise may be possible. This paper describes an algorithm to decompose speech into tonal, transient, and residual components. The modified discrete cosine transform (MDCT) was used to capture the tonal component and the wavelet transform was used to capture transient features. A hidden Markov chain (HMC) model and a hidden Markov tree (HMT) model were applied to capture statistical dependencies between the MDCT coefficients and between the wavelet coefficients, respectively. The transient component identified by the wavelet transform was selectively amplified and recombined with the original speech to generate modified speech, with energy adjusted to equal the energy of the original speech. The intelligibility of the original and modified speech was evaluated in eleven human subjects using the modified rhyme protocol. Word recognition rate results show that the modified speech can improve speech intelligibility at low SNR levels (8% at -15dB, 14% at -20dB, and 18% at -25dB) and has minimal effect on intelligibility at higher SNR levels.


international conference on acoustics, speech, and signal processing | 2006

Speech Enhancement Using Transient Speech Components

C. Tantibundhit; J.R. Boston; Ching-Chung Li; John D. Durrant; Susan Shaiman; Kristie Kovacyk; Amro El-Jaroudi

This paper describes an algorithm to decompose speech into tonal, transient, and residual components. The algorithm uses an MDCT-based hidden Markov chain model to isolate the tonal component and a wavelet-based hidden Markov tree model to isolate the transient component. We suggest that the auditory system, like the visual system, is probably sensitive to abrupt stimulus changes and that the transient component in speech may be particularly critical to speech perception. To test this suggestion, the transient component isolated by our algorithm was selectively amplified and recombined with the original speech to generate enhanced speech, with energy adjusted to be equal to the energy of the original speech. The intelligibility of the original and enhanced speech was evaluated in eleven human subjects by the modified rhyme protocol. The word recognition rates show that the enhanced speech can provide substantial improvement in speech intelligibility at low SNR levels (8% at -15 dB, 14% at -20 dB, and 18% at -25 dB)


Journal of the Acoustical Society of America | 1991

Different phase-stable relationships of the upper lip and jaw for production of vowels and diphthongs.

Susan Shaiman; Robert J. Porter

Relational invariants have been reported in the timing of articulatory gestures across suprasegmental changes, such as rate and stress. In the current study, the relative timing of the upper lip and jaw was investigated across changes in both suprasegmental and segmental characteristics of speech. The onset of upper lip movement relative to the vowel-to-vowel jaw cycle during intervocalic bilabial production was represented as a phase angle, and analyzed across changes in stress, vowel height, and vowel/diphthong identity. Results indicated that the relative timing of the upper lip and jaw varied systematically with changes in stress and vowel/diphthong identity, while remaining constant across changes in vowel height. It appears that modifications in relative timing may be due to adjustments in the jaw cycle as a result of the compound nature of jaw movement for diphthongs as compared to vowels, with further modifications due to the effect of stress on these compound movements.


Journal of the Acoustical Society of America | 2016

Relationship between tongue positions and formant frequencies in female speakers

Jimin Lee; Susan Shaiman; Gary Weismer

This study examined the relationship (1) between acoustic vowel space and the corresponding tongue kinematic vowel space and (2) between formant frequencies (F1 and F2) and tongue x-y coordinates for the same time sampling point. Thirteen healthy female adults participated in this study. Electromagnetic articulography and synchronized acoustic recordings were utilized to obtain vowel acoustic and tongue kinematic data across ten speech tasks. Intra-speaker analyses showed that for 10 of the 13 speakers the acoustic vowel space was moderately to highly correlated with tongue kinematic vowel space; much weaker correlations were obtained for inter-speaker analyses. Correlations of individual formants with tongue positions showed that F1 varied strongly with tongue position variations in the y dimension, whereas F2 was correlated in equal magnitude with variations in the x and y positions. For within-speaker analyses, the size of the acoustic vowel space is likely to provide a reasonable inference of size of the tongue working space for most speakers; unfortunately there is no a priori, obvious way to identify the speakers for whom the covariation is not significant. A second conclusion is that F1 variations reflect tongue height, but F2 is a much more complex reflection of tongue variation in both dimensions.


Speech Communication | 2014

Effects of perturbation and prosody on the coordination of speech and gesture

Heather Leavy Rusiewicz; Susan Shaiman; Jana M. Iverson; Neil Szuminsky

The temporal alignment of speech and gesture is widely acknowledged as primary evidence of the integration of spoken language and gesture systems. Yet there is a disconnect between the lack of experimental research on the variables that affect the temporal relationship of speech and gesture and the overwhelming acceptance that speech and gesture are temporally coordinated. Furthermore, the mechanism of the temporal coordination of speech and gesture is poorly represented. Recent experimental research suggests that gestures overlap prosodically prominent points in the speech stream, though the effects of other variables such as perturbation of speech are not yet studied in a controlled paradigm. The purpose of the present investigation was to further investigate the mechanism of this interaction according to a dynamic systems framework. Fifteen typical young adults completed a task that elicited the production of contrastive prosodic stress on different syllable positions with and without delayed auditory feedback while pointing to corresponding pictures. The coordination of deictic gestures and spoken language was examined as a function of perturbation, prosody, and position of the target syllable. Results indicated that the temporal parameters of gesture were affected by all three variables. The findings suggest that speech and gesture may be coordinated due to internal pulse-based temporal entrainment of the two motor systems.


workshop on applications of signal processing to audio and acoustics | 2005

Speech enhancement based on transient speech information

Sungyub Yoo; J.R. Boston; John D. Durrant; Kristie Kovacyk; Stacey Karn; Susan Shaiman; Amro El-Jaroudi; Ching-Chung Li

The purpose of this study is to create information and signal processing algorithms to enhance the interpretation of speech in noise. An algorithm based on time-frequency analysis was employed to extract quasi-steady-state (QSS) energy from the speech signal, leaving a residual signal of predominantly transient components. The transient component was selectively amplified and recombined with the original speech to generate enhanced speech. The energy of the enhanced speech was adjusted to be equal to that of the original speech, and the intelligibility of the enhanced speech was compared to that of the original speech in background noise. Psychometric functions showed the enhanced speech had higher recognition scores at most noise levels, with greater improvement at lower SNRs. These results suggest that subjects can identify enhanced speech better than the original speech at severe noise conditions

Collaboration


Dive into the Susan Shaiman's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ching-Chung Li

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

J.R. Boston

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar

Sungyub Yoo

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Neil Szuminsky

University of Pittsburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Stacey Karn

University of Pittsburgh

View shared research outputs
Researchain Logo
Decentralizing Knowledge