Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Donald G. Childers is active.

Publication


Featured researches published by Donald G. Childers.


Journal of the Acoustical Society of America | 1991

Vocal quality factors: analysis, synthesis, and perception.

Donald G. Childers; C. K. Lee

The purpose of this study was to examine several factors of vocal quality that might be affected by changes in vocal fold vibratory patterns. Four voice types were examined: modal, vocal fry, falsetto, and breathy. Three categories of analysis techniques were developed to extract source-related features from speech and electroglottographic (EGG) signals. Four factors were found to be important for characterizing the glottal excitations for the four voice types: the glottal pulse width, the glottal pulse skewness, the abruptness of glottal closure, and the turbulent noise component. The significance of these factors for voice synthesis was studied and a new voice source model that accounted for certain physiological aspects of vocal fold motion was developed and tested using speech synthesis. Perceptual listening tests were conducted to evaluate the auditory effects of the source model parameters upon synthesized speech. The effects of the spectral slope of the source excitation, the shape of the glottal excitation pulse, and the characteristics of the turbulent noise source were considered. Applications for these research results include synthesis of natural sounding speech, synthesis and modeling of vocal disorders, and the development of speaker independent (or adaptive) speech recognition systems.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1986

Two-channel speech analysis

Ashok Krishnamurthy; Donald G. Childers

We present a two-channel technique for improving speech analysis in certain applications. One channel is the signal from an electroglottograph (EGG), which monitors the vibratory motion of the vocal folds. The other channel is the speech signal obtained from a conventional microphone. We show how the EGG can be used as a tool for validating speech processing algorithms and estimating possible lower bounds for both computation and performance of these algorithms, particularly closed-phase speech analysis. Our system is used to classify speech segments as voiced, unvoiced, mixed voiced, and silent and to estimate the fundamental frequency of voicing. This four-way classification is not implemented as a complete algorithm; it still requires some user judgments and decisions. The technical results, however, illustrate an EGG-based algorithm for voiced/unvoiced-silent classification. In addition, we illustrate how automatic on-line inverse filtering can be achieved. The results demonstrate the superiority of the closed-phase covariance analysis method over several other commonly used methods. Source-tract coupling is shown to be a significant factor in linear prediction analysis, a factor commonly ignored to date. Various applications of our two-channel approach are described along with the major disadvantage, namely, that in some situations the EGG channel cannot be acquired.


Journal of the Acoustical Society of America | 1991

Gender recognition from speech. Part II: Fine analysis

Donald G. Childers; Ke Wu

The purpose of this research was to investigate the potential effectiveness of digital speech processing and pattern recognition techniques for the automatic recognition of gender from speech. In part I Coarse Analysis [K. Wu and D. G. Childers, J. Acoust. Soc. Am. 90, 1828-1840 (1991)] various feature vectors and distance measures were examined to determine their appropriateness for recognizing a speakers gender from vowels, unvoiced fricatives, and voiced fricatives. One recognition scheme based on feature vectors extracted from vowels achieved 100% correct recognition of the speakers gender using a database of 52 speakers (27 male and 25 female). In this paper a detailed, fine analysis of the characteristics of vowels is performed, including formant frequencies, bandwidths, and amplitudes, as well as speaker fundamental frequency of voicing. The fine analysis used a pitch synchronous closed-phase analysis technique. Detailed formant features, including frequencies, bandwidths, and amplitudes, were extracted by a closed-phase weighted recursive least-squares method that employed a variable forgetting factor, i.e., WRLS-VFF. The electroglottograph signal was used to locate the closed-phase portion of the speech signal. A two-way statistical analysis of variance (ANOVA) was performed to test the differences between gender features. The relative importance of grouped vowel features was evaluated by a pattern recognition approach. Numerous interesting results were obtained, including the fact that the second formant frequency was a slightly better recognizer of gender than fundamental frequency, giving 98.1% versus 96.2% correct recognition, respectively. The statistical tests indicated that the spectra for female speakers had a steeper slope (or tilt) than that for males. The results suggest that redundant gender information was imbedded in the fundamental frequency and vocal tract resonance characteristics. The feature vectors for female voices were observed to have higher within-group variations than those for male voices. The data in this study were also used to replicate portions of the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] study of vowels for male and female speakers.


Journal of the Acoustical Society of America | 1986

A model for vocal fold vibratory motion, contact area, and the electroglottogram

Donald G. Childers; Douglas M. Hicks; G. P. Moore; Y. Alsaka

The electroglottogram (EGG) has been conjectured to be related to the area of contact between the vocal folds. This hypothesis has been substantiated only partially via direct and indirect observations. In this paper, a simple model of vocal fold vibratory motion is used to estimate the vocal fold contact area as a function of time. This model employs a limited number of vocal fold vibratory features extracted from ultra high-speed laryngeal films. These characteristics include the opening and closing vocal fold angles and the lag (phase difference) between the upper and lower vocal fold margins. The electroglottogram is simulated using the contact area, and the EGG waveforms are compared to measured EGGs for normal male voices producing both modal and pulse register tones. The model also predicts EGG waveforms for vocal fold vibration associated with a nodule or polyp.


Journal of the Acoustical Society of America | 1995

Modeling the glottal volume‐velocity waveform for three voice types

Donald G. Childers; Chieteuk Ahn

The purpose of this study was to model features of the glottal volume-velocity waveform for three voice types: modal voice, vocal fry, and breathy voice. The study analyzed data measured from two sustained vowels and one sentence uttered by nine adult, male subjects who represented examples of the three voice types. The primary analysis procedure was glottal inverse filtering, which estimated the glottal volume-velocity waveform. The estimated glottal volume-velocity waveform was then fit to an LF model waveform. Four parameters of the LF model were adjusted to minimize the mean-squared error between the estimated glottal waveform and the LF model waveform. Statistical averages and standard deviations of the four parameters of the LF glottal waveform model were calculated using the data for each voice type. The four LF model parameters characterize important low-frequency features of the glottal waveform, namely, the glottal pulse width, pulse skewness, abruptness of closure of the glottal pulse, and the spectral tilt of the glottal pulse. Statistical analysis included ANOVA and multiple linear regression analysis. The ANOVA results demonstrated that there was a difference in three of the four LF model parameters for the three voice types. The linear regression analysis between the four LF model parameters and a formal rating by a listening test of the quality of the three voice types was used to determine the most significant LF model parameters for each voice type. A simple rule was devised for synthesizing the three voice types with a formant synthesizer using the LF glottal waveform model. Listener evaluations of the synthesized speech tended to confirm the results determined by the analysis procedures.


IEEE Transactions on Biomedical Engineering | 1994

Measuring and modeling vocal source-tract interaction

Donald G. Childers; Chun-Fan Wong

The quality of synthetic speech is affected by two factors: intelligibility and naturalness. At present, synthesized speech may be highly intelligible, but often sounds unnatural. Speech intelligibility depends on the synthesizers ability to reproduce the formants, the formant bandwidths, and formant transitions, whereas speech naturalness is thought to depend on the excitation waveform characteristics for voiced and unvoiced sounds. Voiced sounds may be generated by a quasiperiodic train of glottal pulses of specified shape exciting the vocal tract filter. It is generally assumed that the glottal source and the vocal tract filter are linearly separable and do not interact. However, this assumption is often not valid, since it has been observed that appreciable source-tract interaction can occur in natural speech. Previous experiments in speech synthesis have demonstrated that the naturalness of synthetic speech does improve when source-tract interaction is simulated in the synthesis process. The purpose of this paper is two-fold: (1) to present an algorithm for automatically measuring source-tract interaction for voiced speech, and (2) to present a simple speech production model that incorporates source-tract interaction into the glottal source model, This glottal source model controls: (1) the skewness of the glottal pulse, and (2) the amount of the first formant ripple superimposed on the glottal pulse. A major application of the results of this paper is the modeling of vocal disorders.<<ETX>>


Journal of the Acoustical Society of America | 1994

Speech synthesis by glottal excited linear prediction

Donald G. Childers; Hwai-Tsu Hu

This paper describes a linear predictive (LP) speech synthesis procedure that resynthesizes speech using a 6th-order polynomial waveform to model the glottal excitation. The coefficients of the polynomial model form a vector that represents the glottal excitation waveform for one pitch period. A glottal excitation code book with 32 entries for voiced excitation is designed and trained using two sentences spoken by different speakers. The purpose for using this approach is to demonstrate that quantization of the glottal excitation waveform does not significantly degrade the quality of speech synthesized with a glottal excitation linear predictive (GELP) synthesizer. This implementation of the LP synthesizer is patterned after both a pitch-excited LP speech synthesizer and a code excited linear predictive (CELP) speech coder. In addition to the glottal excitation codebook, we use a stochastic codebook with 256 entries for unvoiced noise excitation. Analysis techniques are described for constructing both codebooks. The GELP synthesizer, which resynthesizes speech with high quality, provides the speech scientist a simple speech synthesis procedure that uses established analysis techniques, that is able to reproduce all speed sounds, and yet also has an excitation model waveform that is related to the derivative of the glottal flow and the integral of the residue. It is conjectured that the glottal excitation codebook approach could provide a mechanism for quantitatively comparing the differences in glottal excitation codebooks for male and female speakers and for speakers with vocal disorders and for speakers with different voice types such as breathy and vocal fry voices. Conceivably, one could also convert the voice of a speaker with one voice type, e.g., breathy, to the voice of a speaker with another voice type, e.g., vocal fry, by synthesizing speech using the vocal tract LP parameters for the speaker with the breathy voice excited by the glottal excitation codebook trained for vocal fry.


Speech Communication | 1995

Glottal source modeling for voice conversion

Donald G. Childers

Abstract This paper describes recent advances in glottal source modeling for speech synthesis. In particular two procedures for modeling the glottal excitation waveform are described and applied to voice conversion. One model uses a polynomial to represent the glottal excitation waveform for one pitch period. The coefficients of the polynomial model form a vector that is used to design a glottal excitation code book with 32 entries for voiced excitation. The codebook is designed and trained using two sentences spoken by different speakers. Speech is synthesized using a quantized glottal excitation waveform for one speaker as the excitation for a glottal excitation linear predictive (GELP) synthesizer designed using tract parameters obtained from the speech of another speaker. Our implementation of the LP synthesizer is patterned after both a pitch-excited LP speech synthesizer and a code excited linear predictive (CELP) speech coder. In addition to the glottal excitation codebook, we use a stochastic codebook with 256 entries for unvoiced noise excitation. Analysis techniques are described for constructing both codebooks. The GELP synthesizer, which resynthesizes speech with high quality, provides the speech scientist with a simple speech synthesis procedure that uses established analysis techniques, that is able to reproduce all speech sounds, and yet also has an excitation model waveform that is related to the derivative of the glottal flow and the integral of the residue. Another approach uses the LF glottal volume-velocity waveform to model the characteristics of three voice types: modal, breathy, and vocal fry (creaky). We then convert a modal voice to sound like a breathy or vocal fry voice using the vocal tract characteristics for modal voice and the glottal volume-velocity waveform model for breathy and vocal fry voices as the excitation.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1989

Formant speech synthesis: improving production quality

Neal B. Pinto; Donald G. Childers; Ajit L. Lalwani

The authors describe analysis and synthesis methods for improving the quality of speech produced by D.H. Klatts (J. Acoust. Soc. Am., vol.67, p.971-95, 1980) software formant synthesizer. Synthetic speech generated using an excitation waveform resembling the glotal volume-velocity was found to be perceptually preferred over speech synthesized using other types of excitation. In addition, listeners ranked speech tokens synthesized with an excitation waveform that simulated the effects of source-tract interaction higher in neutralness than tokens synthesized without such interaction. A series of algorithms for silent and voiced/unvoiced/mixed excitation interval classification, pitch detection, formant estimation and formant tracking was developed. The algorithms can utilize two channels of input data, i.e., speech and electroglottographic signals, and can therefore surpass the performance of single-channel (acoustic-signal-based) algorithms. The formant synthesizer was used to study some aspects of the acoustic correlates of voice quality, e.g., male/female voice conversion and the simulation of breathiness, roughness, and vocal fry. >


IEEE Transactions on Biomedical Engineering | 1984

Electroglottography for Laryngeal Function Assessment and Speech Analysis

Donald G. Childers; J. N. Larar

The methodology of electroglottography is briefly outlined, Major emphasis is given to validating key features of the electroglottographic (EGG) waveform using ultrahigh-speed laryngeal films. We show how the instants of glottal closure and opening may be identified from the EGG waveform. This information may be used to improve speech analysis techniques such as the pitch synchronous, closed phase, covariance analysis method. Other applications include pitch detection, the determination of intervals of voicing, unvoicing, mixed voicing and silence, improving speech synthesis, and assisting the automation of inverse filtering.

Collaboration


Dive into the Donald G. Childers's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ke Wu

University of Florida

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

B. Yegnanarayana

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge