Steven M. Lulich | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Steven M. Lulich is active.

Explore More

Publication

Featured researches published by Steven M. Lulich.

Journal of Phonetics | 2010

Subglottal resonances and distinctive features

Steven M. Lulich

Abstract This paper addresses the phonetic basis of the distinctive feature [±back]. The second subglottal resonance (Sg2) is known to fall near the boundary between [−back] and [+back] vowels, and it has been claimed that Sg2 actually defines this distinction. In this paper, new evidence in support of this hypothesis is presented from 14 adult and 9 child speakers of American English, in which accelerometer recordings of subglottal acoustics were made simultaneously with speech recordings. The first three formants and the second subglottal resonance were measured, and both Sg2 and F3–3.5 bark were tested as boundaries between front and back vowels in the F2-dimension. It was found that Sg2 provides a reliable boundary between front and back vowels for children of all ages, as well as for adults, whereas F3–3.5 bark provides a similarly reliable boundary only for older children and adults. Furthermore, a study of connected speech in one adult male indicated that Sg2 forms a boundary between front and back vowels in such speech as well as in laboratory speech. Some implications for quantal theory and landmark theory are discussed, as well as the possibility that subglottal resonances might play a broader role in speech production.

Journal of the Acoustical Society of America | 2012

Subglottal resonances of adult male and female native speakers of American English.

Steven M. Lulich; John R. Morton; Harish Arsikere; Mitchell S. Sommers; Gary K. F. Leung; Abeer Alwan

This paper presents a large-scale study of subglottal resonances (SGRs) (the resonant frequencies of the tracheo-bronchial tree) and their relations to various acoustical and physiological characteristics of speakers. The paper presents data from a corpus of simultaneous microphone and accelerometer recordings of consonant-vowel-consonant (CVC) words embedded in a carrier phrase spoken by 25 male and 25 female native speakers of American English ranging in age from 18 to 24 yr. The corpus contains 17,500 utterances of 14 American English monophthongs, diphthongs, and the rhotic approximant [[inverted r]] in various CVC contexts. Only monophthongs are analyzed in this paper. Speaker height and age were also recorded. Findings include (1) normative data on the frequency distribution of SGRs for young adults, (2) the dependence of SGRs on height, (3) the lack of a correlation between SGRs and formants or the fundamental frequency, (4) a poor correlation of the first SGR with the second and third SGRs but a strong correlation between the second and third SGRs, and (5) a significant effect of vowel category on SGR frequencies, although this effect is smaller than the measurement standard deviations and therefore negligible for practical purposes.

Journal of the Acoustical Society of America | 2009

Automatic detection of the second subglottal resonance and its application to speaker normalizationa)

Shizhen Emily Wang; Steven M. Lulich; Abeer Alwan

Speaker normalization typically focuses on inter-speaker variabilities of the supraglottal (vocal tract) resonances, which constitute a major cause of spectral mismatch. Recent studies have shown that the subglottal airways also affect spectral properties of speech sounds, and promising results were reported using the subglottal resonances for speaker normalization. This paper proposes a reliable algorithm to automatically estimate the second subglottal resonance (Sg2) from speech signals. The algorithm is calibrated on childrens speech data with simultaneous accelerometer recordings from which Sg2 frequencies can be directly measured. A cross-language study with bilingual Spanish-English children is performed to investigate whether Sg2 frequencies are independent of speech content and language. The study verifies that Sg2 is approximately constant for a given speaker and thus can be a good candidate for limited data speaker normalization and cross-language adaptation. A speaker normalization method using Sg2 is then presented. This method is computationally more efficient than maximum-likelihood based vocal tract length normalization (VTLN), with performance better than VTLN for limited adaptation data and cross-language adaptation. Experimental results confirm that this method performs well in a variety of testing conditions and tasks.

international conference on acoustics, speech, and signal processing | 2008

Speaker normalization based on subglottal resonances

Shizhen Wang; Abeer Alwan; Steven M. Lulich

Speaker normalization typically focuses on variabilities of the supra-glottal (vocal tract) resonances, which constitute a major cause of spectral mismatch. Recent studies show that the subglottal airways also affect spectral properties of speech sounds. This paper presents a speaker normalization method based on estimating the second and third subglottal resonances. Since the subglottal airways do not change for a specific speaker, the subglottal resonances are independent of the sound type (i.e., vowel, consonant, etc.) and remain constant for a given speaker. This context-free property makes the proposed method suitable for limited data speaker adaptation. This method is computationally more efficient than maximum-likelihood based VTLN, with performance better than VTLN especially for limited adaptation data. Experimental results confirm that this method performs well in a variety of testing conditions and tasks.

international conference on acoustics, speech, and signal processing | 2012

Automatic height estimation using the second subglottal resonance

Harish Arsikere; Gary K. F. Leung; Steven M. Lulich; Abeer Alwan

This paper presents an algorithm for automatically estimating speaker height. It is based on: (1) a recently-proposed model of the subglottal system that explains the inverse relation observed between subglottal resonances and height, and (2) an improved version of our previous algorithm for automatically estimating the second subglottal resonance (Sg2). The improved Sg2 estimation algorithm was trained and evaluated on recently-collected data from 30 and 20 adult speakers, respectively. Sg2 estimation error was found to reduce by 29%, on average, as compared to the previous algorithm. The height estimation algorithm, employing the inverse relation between Sg2 and height, was trained on data from the above-mentioned 50 adults. It was evaluated on 563 adult speakers in the TIMIT corpus, and the mean absolute height estimation error was found to be less than 5.6cm.

Journal of the Acoustical Society of America | 2010

A new speech corpus for studying subglottal acoustics in speech production, perception, and technology.

Steven M. Lulich; John R. Morton; Mitchell S. Sommers; Harish Arsikere; Yi‐Hui Lee; Abeer Alwan

Subglottal resonances have received increasing attention in recent studies of speech production, perception, and technology. They affect voice production, divide vowels and consonants into discrete categories, affect vowel perception, and are useful in automatic speech recognition. We present a new speech corpus of simultaneous microphone and (subglottal) accelerometer recordings of 25 adult male and 25 adult female speakers of American English (AE), between 22 and 25 years of age. Additional recordings of 50 gender‐balanced bilingual Spanish/AE speaking adults, as well as 100 child speakers of Spanish and AE, are under way. The AE adult corpus consists of 35 monosyllables (14 “hVd” and 21 “CVb” words, where C is [b, d, g], and V includes all AE monophthongs and diphthongs) in a phonetically neutral carrier phrase (“I said a ____ again”), with 10 repetitions of each word by each speaker, resulting in 17 500 individual microphone (and accelerometer) waveforms. Hand‐labeling of the target vowel in each utterance is currently under way. The corpus fills a gap in the literature on subglottal acoustics and will be useful for future studies in speech production, perception, and technology. It will be freely available to the speech research community. [Work supported in part by the NSF.]

Journal of the Acoustical Society of America | 2011

Automatic estimation of the first subglottal resonance

Harish Arsikere; Steven M. Lulich; Abeer Alwan

This letter focuses on the automatic estimation of the first subglottal resonance (Sg1). A database comprising speech and subglottal data of native American English speakers and bilingual Spanish/English speakers was used for the analysis. Data from 11 speakers (five males and six females) were used to derive an empirical relation among the first formant frequency, fundamental frequency, and Sg1. Using the derived relation, Sg1 was automatically estimated from voiced sounds in English and Spanish sentences spoken by 22 different speakers (11 males and 11 females). The error in estimating Sg1 was less than 50 Hz, on average.

international conference on acoustics, speech, and signal processing | 2011

Automatic estimation of the second subglottal resonance from natural speech

Harish Arsikere; Steven M. Lulich; Abeer Alwan

This paper deals with the automatic estimation of the second subglottal resonance (Sg2) from natural speech spoken by adults, since our previous work focused only on estimating Sg2 from isolated diphthongs. A new database comprising speech and subglottal data of native American English (AE) speakers and bilingual Spanish/English speakers was used for the analysis. Data from 11 speakers (6 females and 5 males) were used to derive an empirical relation among the second and third formant frequencies (F2 and F3) and Sg2. Using the derived relation, Sg2 was automatically estimated from voiced sounds in English and Spanish sentences spoken by 20 different speakers (10 males and 10 females). On average, the error in estimating Sg2 was less than 100 Hz in at least 9 isolated AE vowels and less than 40 Hz in continuous speech consisting of English or Spanish sentences.

Journal of the Acoustical Society of America | 2009

Source-filter interaction in the opposite direction: subglottal coupling and the influence of vocal fold mechanics on vowel spectra during the closed phase

Steven M. Lulich; Matías Zañartu; Daryush D. Mehta; Robert E. Hillman

Studies of speech source-filter interaction usually investigate the effect of the speech transfer function (loading) on vocal fold vibration and the voice source. In this study we explore how vocal fold mechanics affect the transfer function throughout the glottal cycle, with emphasis on the closed phase. Coupling between the subglottal and supraglottal airways is modulated by the laryngeal impedance. Although coupling is generally thought to occur only during the open phase of vocal fold vibration, a posterior glottal opening and the vocal fold tissue itself can allow sound transmission, thereby introducing coupling during the closed phase as well. The impedance of the vocal fold tissue at closure is shown to be small enough to permit coupling throughout the phonatory cycle, even in the absence of a posterior glottal opening. Openand closed-phase coupling is characterized using mathematical models of the subglottal and supraglottal airways, and the parallel laryngeal impedances of the membranous glottis, posterior glottal opening, and vocal fold tissue. Examples from sustained vowels are presented, using synchronous recordings of neck skin acceleration, laryngeal high-speed videoendoscopy, electroglottography, and radiated acoustic pressure.

Journal of the Acoustical Society of America | 2010

Relations among subglottal resonances, vowel formants, and speaker height, gender, and native language.

Harish Arsikere; Yi‐Hui Lee; Steven M. Lulich; John R. Morton; Mitchell S. Sommers; Abeer Alwan

Subglottal resonances (SGRs) have recently been used in automatic speaker normalization (SN), leading to improvements in children’s speech recognition [Wang et al. (2009)]. It is hypothesized that human listeners use SGRs for SN as well. However, the suitability of SGRs for SN has not been adequately investigated. SGRs and formants from adult speakers of American English and Mexican Spanish were measured using a new speech corpus with simultaneous (subglottal) accelerometer recordings [Lulich et al. (2010)]. The corpus has been analyzed at a broad level to understand relations among SGRs, speaker height, native language and gender, and formant frequencies as well as the variation of SGRs across vowels and speakers. It is shown that SGRs are roughly constant for a given speaker, regardless of their native spoken language, but differ from speaker to speaker. SGRs are therefore well suited for use in SN and perhaps in speaker identification. Preliminary analyzes also show that SGRs are correlated with each o...

Explore More