Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yingyong Qi is active.

Publication


Featured researches published by Yingyong Qi.


Journal of the Acoustical Society of America | 1997

Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals

Yingyong Qi; Robert E. Hillman

The quantity, harmonic-to-noise ratio (HNR), has been used to estimate the level of noise in human voice signals. HNR estimation can be accomplished in two ways: (1) on a time-domain basis, in which HNR is computed directly from the acoustic waveform; and (2) on a frequency-domain basis, in which HNR is computed from a transformed representation of the waveform. An algorithm for computing HNR in the frequency domain was modified and tested in the work described here. The modifications were designed to reduce the influence of spectral leakage in the computation of harmonic energy, and to remove the necessity of spectral baseline shifting prescribed in one existing algorithm [G. de Krom, J. Speech Hear. Res. 36, 254-266 (1993)]. Frequency-domain estimations of HNR based on this existing algorithm and our modified algorithm were compared to time-domain estimations on synthetic signals and human pathological voice samples. Results indicated a highly significant, linear correlation between frequency- and time-domain estimations of HNR for our modified approach.


Journal of the Acoustical Society of America | 1999

The estimation of signal-to-noise ratio in continuous speech for disordered voices

Yingyong Qi; Robert E. Hillman; Claudio Milstein

Presented is a method of estimating the signal-to-noise ratio (SNR) of continuous utterances for patients with various types of voice disorders that ranged in severity of dysphonia from mild to severe. The SNR is estimated based on the residual that is left after systematically removing the short- and long-term correlations that exist in the speech signal. Results indicate that the SNR is consistent with human perceptual judgments, particularly those that consistently differentiate close-to-normal versus highly disphonic voices.


Journal of the Acoustical Society of America | 1995

Enhancement of female esophageal and tracheoesophageal speech

Yingyong Qi; Bernd Weinberg; Ning Bi

Qi [J. Acoust. Soc. Am. 88, 1228-1235 (1990)] has demonstrated that (1) linear predictive (LP) methods can be used to separate vocal tract transfer functions from source functions of vowels produced by alaryngeal talkers and that (2) vowels synthesized with reconstructed transfer functions and totally synthetic voicing excitation sources have improved source-related properties over those present in the original vowels. Here, an extension of this work which is directed to the general goal of developing systems (devices) to enhance the quality of alaryngeal speech is reported. The specific goal of the present project was to determine whether speech, i.e., words spoken by female esophageal and tracheoesophageal talkers, could be enhanced by means of LP-based analysis and synthesis methods. Words spoken by four female alaryngeal talkers were analyzed and synthesized. A perceptual evaluation was completed to permit the quality of the synthetic and the original words to be compared. Listeners generally preferred to listen to the synthesized words, indicating that alaryngeal speech enhancement was accomplished.


Journal of Voice | 1992

Acoustic and temporal correlates of perceived age

Thomas Shipp; Yingyong Qi; Ruth A. Huntley; Harry Hollien

Summary This research was carried out in an effort to discover which, of a number of factors, relate to perceived age as determined from speech samples generated in a previous study of age perception. Three groups of 10 male talkers each were formed whose perceived ages were 27–35, 53–57, and 75–85 years, respectively. Acoustic and temporal measures were made of subjects one-sentence recordings. Analysis of these data led to group measures and contrasts. Metrics that differentiated among the groups were: speech rate (total time and syllables per second), breath management (number of breaths and breath pause duration), and fundamental frequency.


Journal of the Acoustical Society of America | 1992

Time normalization in voice analysis

Yingyong Qi

The harmonics-to-noise ratio (HNR) has been widely accepted for quantifying the irregular or noise component of voice. HNR, however, is usually inflated by cycle-to-cycle variations of fundamental frequency period because zero padding is used for time normalization of the wavelet. In this study, a new method was developed for analyzing waveform perturbations of voice. In this method, noise components of voice were calculated from the discrepancies between wavelets after they had been optimally aligned in time. The optimal time normalization of wavelets was accomplished using procedures of dynamic time warping (DTW). This method was evaluated using both synthetic and natural voices, and significant reductions in noise were obtained. The harmonics-to-noise ratio obtained using DTW for time normalization was also shown to be independent of fundamental frequency perturbations.


Journal of the Acoustical Society of America | 1995

Minimizing the effect of period determination on the computation of amplitude perturbation in voice

Yingyong Qi; Bernd Weinberg; Ning Bi; Wolfgang J. Hess

Current methods of computing amplitude perturbation present in human voices depend upon being able to accurately determine fundamental period. In this paper, two methods of estimating the amplitude perturbation present in human voices, which do not depend on accurate determination of the boundaries between fundamental periods, are described. In both of these methods, amplitude perturbation is computed as the variance of an ensemble of periods calculated after these periods have been aligned in time. In one method, time alignment is accomplished using zero-phase transformation. In the second method, an unconstrained dynamic programming procedure is used. The accuracy of estimating amplitude perturbation by these two methods is evaluated using synthetic and natural voice signals and is also compared with an estimation using zero-padding based time alignment. The unconstrained dynamic programming method is shown to provide accurate estimation of voice amplitude perturbation over a variety of signal conditions.


Journal of the Acoustical Society of America | 1992

Analysis of nasal consonants using perceptual linear prediction

Yingyong Qi; Robert A. Fox

Until recently, speech analysis techniques have been built around the all-pole linear predictive model. This study examines the effectiveness of using the perceptual linear predictive method for analyzing nasal consonants. Six speakers (three men and three women) produced 300 CV syllables with initial nasal consonants /m/ and /n/. A threshold-based boundary detection algorithm was developed to extract nasal segments from the CV contexts. Poles of a fifth-order perceptual linear predictive model were calculated and the frequency of the second pole was used to characterize the place of articulation of nasal consonants. Results indicated that the frequency for the second transformed pole was significantly lower for /m/ than for /n/ and was independent of factors such as a vowel context and gender of the speaker. A nasal identification rate of 86% was obtained based on the frequency of the second pole. The use of the perceptual linear predictive method may thus overcome some difficulties associated with analyzing nasal consonants.


Journal of the Acoustical Society of America | 1992

An adaptive method for tracking voicing irregularities

Yingyong Qi; Thomas Shipp

A method has been developed for tracking irregularities in the acoustic waveform of a sustained phonation using the adaptive Wiener filter. Irregularities are determined by the technique of correlation cancellation. The algorithm is evaluated using sustained vowels produced by a formant synthesizer and by subjects with and without phonatary disorders. Results indicate that the method is capable of differentiating between normal and abnormal voices. Most significantly, however, it can also track sporadic or nonstationary irregularities in the shape of an individual acoustic wavelet. This method is expected to be a useful tool for the acoustics analysis of voice production.


Journal of the Acoustical Society of America | 1999

Estimation of minimum glottal flow using optimal, low‐pass filtering

Yingyong Qi; Robert E. Hillman

The amount of minimum glottal flow in each period of a sustained phonantion is an important parameter in voice research and clinic. In many cases, it is highly desireable to be able to measure the minimum glottal flow automatically. Here, we present a method for estimating the minimum glottal flow using an optimal, low‐pass filter. The cutoff frequency of the low‐pass filter is determined so that the sum of the variance within each ‘‘closed’’ phase of a recorded flow signal and the difference between the recorded and filtered flow signals is minimal. The minimum glottal flow is derived from this optimally, low‐pass‐filtered signal. This simple optimization procedure results in a complete automatic estimation of minimum glottal flow. Experiments using synthetic and real flow signals indicated that the method is highly accurate and robust.


Journal of the Acoustical Society of America | 1996

Relationships between time and frequency measurements of harmonics‐to‐noise ratio in human voice signals

Yingyong Qi; Robert E. Hillman

The quantity, harmonic‐to‐noise ratio (HNR), has been used to estimate the level of noise in human voice signals. HNR estimation can be accomplished in two ways: (1) Time‐domain techniques, in which HNR is computed directly from the acoustic waveform; and (2) frequency‐domain techniques, in which HNR is computed from a transformed representation of the waveform. It is unclear, however, how to operationally relate time‐ and frequency‐domain estimations of HNR because of the quasiperiodicity of human voice signals. This work demonstrates the relationship between time‐ and frequency‐domain estimations of HNR experimentally. Results indicate that the time‐domain estimation of HNR approximates the average levels of harmonics measured above a reference spectral envelope, which is obtained by cepstral analysis. [Work support by NIH.]

Collaboration


Dive into the Yingyong Qi's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Thomas Shipp

United States Department of Veterans Affairs

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ruth A. Huntley

University of South Carolina

View shared research outputs
Researchain Logo
Decentralizing Knowledge