Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yifan Gong is active.

Publication


Featured researches published by Yifan Gong.


international conference on acoustics, speech, and signal processing | 1990

Text-independent speaker recognition by trajectory space comparison

Yifan Gong; Jean-Paul Haton

The principle of trajectory space comparison for text-independent speaker recognition and some solutions to the space comparison problem based on vector quantization are presented. The comparison of the recognition rates of different solutions is reported. The experimental system achieves a 99.5% text-independent speaker recognition rate for 23 speakers, using five phrases for training and five for test. A speaker-independent continuous speech recognition system is built in which this principle is used for speaker adaptation.<<ETX>>


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1987

Time domain harmonic matching pitch estimation using time-dependent speech modeling

Yifan Gong; Jean Paul Haton

A new formulation of the pitch estimation problem is proposed. A speech signal is modeled as a sequence of a specified function in a time-dependent manner which allows the period and the amplitude of excitation of the signal to be time varying. The asymmetry of the signal distribution with respect to the time axis is profited from. A statistically optimized resemblance function derived from an energy criterion is obtained. The estimation of pitch period is achieved by maximizing this function. An estimation of the position and the amplitude of the maximum peak in each period for voiced speech segments is simultaneously provided. An interpretation in frequency domain shows that this approach is equivalent to the harmonic structure matching process of the recently proposed biological pitch perception. Experiments on the performances of the algorithm are presented for clean, noisy, and simulated telephone-line filtered speech, respectively. The result is extremely encouraging: the estimation is almost free from error for clean speech, no presence of the first harmonic is necessary, the estimation is of high noise immunity, and the speech model is sufficient for following rapid pitch variations. It appears that the method is at least as efficient as any existing pitch determination methods based on harmonic structure searching.


international conference on acoustics, speech, and signal processing | 1991

Continuous speech recognition based on high plausibility regions

Yifan Gong; Jean-Paul Haton; F. Mouria

The authors propose an approach to phoneme-based continuous speech recognition when a time function of the plausibility of observing each phoneme (spotting result) is given. They introduce a criterion for the best sentence, based on the sum of plausibilities of individual symbols composing the sentence. Based on the idea of making use of high plausibility regions to reduce the computational load while maintaining optimality, the method finds the most plausible sentences relating to the input speech. Two optimization procedures are defined to deal with the following embedded search processes: (1) finding the best path connecting peaks of the plausibility functions of two successive symbols, and (2) finding the best time transition slot index for two given peaks. Experimental results show that the method gives better recognition precision while requiring about 1/20 of the computing time of the traditional DP-based methods. The experimental system obtained a 95% sentence recognition rate on a multispeaker test.<<ETX>>


Speech Communication | 1998

Assessing the importance of the segmentation probability in segment-based speech recognition

Jan P. Verhasselt; Irina Illina; Jean-Pierre Martens; Yifan Gong; Jean Paul Haton

Abstract The segment-based speech recognition algorithms that have been developed over the years can be divided into two broad classes. On the one hand those using the conditional segment modeling formalism (CSM), which requires the computation of the likelihood of the sequence of acoustic vectors, conditioned on the sub-word unit sequence and corresponding segmentation. On the other hand those using the posterior segment modeling formalism (PSM), which requires the computation of the joint posterior probability of the unit sequence and segmentation, conditioned on the sequence of acoustic vectors. The latter probability can be written as the product of a segmentation probability and a unit classification probability. In this paper, we focus on the role of the segmentation probability. After having shown that the segmentation probability is not required in the CSM formalism, we motivate its importance in the PSM formalism. Next, we describe its modeling and training. Experiments with two PSM-based recognizers on several speech recognition tasks demonstrate that the segmentation probability is essential in order to obtain a high recognition accuracy. Moreover, the importance of the segmentation probability is shown to be strongly correlated with the magnitudes of the unit probability estimates on segments that do not correspond with a unit.


Speech Communication | 1996

Comparative experiments of several adaptation approaches to noisy speech recognition using stochastic trajectory models

Olivier Siohan; Yifan Gong; Jean Paul Haton

Abstract The paper describes experiments on noisy speech recognition, using acoustic models based on the framework of Stochastic Trajectory Models (STM). We present the theoretical framework of 4 different approaches dealing with speech model adaptation: model-specific linear regression, speech feature space transformation, noise and speech models combination, STM state-based filtering. Experiments are performed on a speaker-dependent, 1011 word continuous speech recognition application with a word-pair perplexity of 28, using vocabulary-independent acoustic training, context independent phone models, and in various noisy testing environments. To measure the performance of each approach, recognition rate variation is studied under different noise types and noise levels. Our results show that the linear regression approach significantly outperforms the other methods, for every tested noise types at medium SNRs (between 6 to 24 dB). For the Gaussian noise, with an SNR between 6 to 24 dB, we observe a reduction of the word error rate from 20% to 59% when the linear regression is used, compared to the other methods.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 1991

Signal-to-string conversion based on high likelihood regions using embedded dynamic programming

Yifan Gong; Jean Paul Haton

A method of signal-to-string conversion based on embedded dynamic programming (DP) which can adapt its search to the variation of the input signal is proposed. The optimizing process is guided by high-valued portions of the likelihood function of symbols composing the string and is solved by two embedded dynamic programming processes. Algorithms in a Pascal-like language relating to the solution are given. When applied to continuous speech recognition on a 100-word vocabulary using the phoneme as the basic recognition unit, the method is shown to achieve a 4% improvement in the recognition rate compared to a classical DP-based method. >


international conference on acoustics, speech, and signal processing | 1991

Non-linear vector interpolation by neural network for phoneme identification in continuous speech

Yifan Gong; Jean-Paul Haton

The correlations between vectors in a sequence of analysis frames are supposed to be specific to phonetic units in acoustic-phonetic decoding of speech. The authors propose nonlinear vector interpolation techniques to represent this correlation and to recognize phonemes. The interpolation is based on the decomposition of a frame sequence into two parts and on the construction of a function that interpolates one part using information from the second part. According to quantities to be interpolated, three families of interpolator models are developed. In a recognition system, each phonemic symbol is associated with a nonlinear vector interpolator which is trained to give minimum interpolation error for that specific phoneme. Multilayer feedforward neural networks are used to implement the nonlinear vector interpolators. For continuous speech under the phoneme spotting test using 16 PLCC-derived cepstrum coefficients as parametric vectors, the three categories of models gave compatible results.<<ETX>>


international conference on acoustics, speech, and signal processing | 1991

Neural network coupled with IIR sequential adapter for phoneme recognition in continuous speech

Yifan Gong; Y. Cheng; Jean-Paul Haton

The authors present an NN-IIR (neural network/infinite impulse response filter) system for phoneme recognition in continuous speech based on the idea of modeling the recognition process by state evolution and interpretation equations. This work gives a solution to temporal information representation in phoneme recognition using neural networks and recursive filters, yielding better recognition results for continuous speech. This recognition system has two promising properties, i.e., capabilities for dealing with sequential properties and for interpreting speech signals by means of a training process. It was shown experimentally that the NN-IIR network obtained good performance for continuous speech recognition. Preliminary experiments with limited training data indicate that the NN-IIR provides good discrimination power for plosives, which are highly context-dependent.<<ETX>>


international conference on acoustics, speech, and signal processing | 1992

Nonlinear vectorial interpolation for speaker recognition

Yifan Gong; Jean-Paul Haton

The authors address the problem of speaker recognition using very short utterances, both for training and for recognition. The authors propose to exploit speaker-specific correlations between two suitably defined parameter vector sequences. A nonlinear vectorial interpolation technique is used to capture speaker-specific information, through least-square-error minimization. The experiments show the feasibility of recognizing a speaker among a population of about 100 persons using only an utterance of one word both for training and for recognition.<<ETX>>


Speech Communication | 1993

Plausibility functions in continuous speech recognition: the VINICS system

Yifan Gong; Jean Paul Haton

Abstract We propose a new approach to pheneme-based continuous speech recognition when a time function of plausibility of observing each phoneme is given. We introduce a criterion for best sentence, related to the sum of plausibilities of individual symbols composing the sentence. Based on the idea of making use of a high plausibility region to reduce the computation load while keeping optimality, our method finds the most plausible sentences relating to the input speech, given the plausibility, μ a,n of observing each phoneme a at each time slot n . Two optimization procedures are defined to deal with the following embedded search processes: (1) find the best path connecting peaks of the plausibility functions of two successive symbols, and (2) find the best time transition slot index for two given peaks. Dynamic programming is used in these two procedures. Since the best path finding algorithm does not search slot by slot, the recognition is highly efficient. Experimental results with the VINICS system show that the method gives a better recognition precision while requiring about 1/20 computing time, compared to traditional DP based methods. The experimental system obtained a 95% sentence recognition rate on a speaker-dependent test.

Collaboration


Dive into the Yifan Gong's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jean-Paul Haton

French Institute for Research in Computer Science and Automation

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge