Dong-Suk Yuk
Rutgers University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dong-Suk Yuk.
international conference on acoustics speech and signal processing | 1999
Dong-Suk Yuk; James L. Flanagan
The performance of well-trained speech recognizers using high quality full bandwidth speech data is usually degraded when used in real world environments. In particular, telephone speech recognition is extremely difficult due to the limited bandwidth of the transmission channels. In this paper, neural network based adaptation methods are applied to telephone speech recognition and a new unsupervised model adaptation method is proposed. The advantage of the neural network based approach is that the retraining of speech recognizers for telephone speech is avoided. Furthermore, because the multi-layer neural network is able to compute nonlinear functions, it can accommodate for the non-linear mapping between full bandwidth speech and telephone speech. The new unsupervised model adaptation method does not require transcriptions and can be used with the neural networks. Experimental results on TIMIT/NTIMIT corpora show that the performance of the proposed methods is comparable to that of recognizers retained on telephone speech.
international conference on spoken language processing | 1996
Qiguang Lin; Ea-Ee Jan; ChiWei Che; Dong-Suk Yuk; James L. Flanagan
The paper describes two separate sets of speaker identification experiments. In the first set of experiments, the speech spectrum is selectively used for speaker identification. The results show that the higher portion of the speech spectrum contains more reliable idiosyncratic information on speakers than does the lower portion of equal bandwidth. In the second set of experiments, vector-quantization based Gaussian mixture models (VQGMMs) are developed for text-independent speaker identification. The system has been evaluated in the recent speaker identification evaluation organized by NIST. Details of the system design are given and the evaluation results are presented.
international conference on acoustics speech and signal processing | 1996
ChiWei Che; Qiguang Lin; Dong-Suk Yuk
This paper presents a speaker recognition system based on hidden Markov models (HMM). The system utilizes concatenated phoneme HMMs and works in a text-prompted mode. Each registered speaker has a separate set of HMMs which are trained using the Baum-Welch algorithm. The speaker recognition system has been evaluated with the YOHO voice verification corpus in terms of both speaker verification and closed-set speaker identification. It is shown that by using 10 seconds of testing speech, an error rate of 0.09% for male and 0.29% for female are obtained for speaker identification with a total population of 138 talkers. For speaker verification, under the 0% false rejection condition, the system achieves a false acceptance rate of 0.09% for male and 0% for female. This paper also studies effects of various factors (such as the mixture number and cohort selection) on the performance of speaker recognition.
international conference on acoustics speech and signal processing | 1999
Prabhu Raghavan; ChiWei Che; Dong-Suk Yuk; James L. Flanagan
Performance of automatic speech recognition systems trained on close-talking data suffers when used in a distant-talking environment due to the mismatch in the training and testing conditions. Microphone array sound capture can reduce some mismatch by removing ambient noise and reverberation but offers insufficient improvement in performance. However, using array signal capture in conjunction with a hidden Markov model (HMM) adaptation on the clean-speech models can result in improved recognition accuracy. This paper describes an experiment in which the output of an 8-element microphone array system using MFA processing is used for speech recognition with LT-MLLR adaptation. The recognition is done in two passes. In the first pass, an HMM trained on clean data is used to recognize the speech. Using the results of this pass, the HMM model is adapted to the environment using the LT-MLLR algorithm. This adapted model, a product of MFA and LT-MLLR, results in improved recognition performance.
ieee automatic speech recognition and understanding workshop | 1997
Dong-Suk Yuk; ChiWei Che; James L. Flanagan
The laboratory performance of well trained speech recognizers is usually degraded when they are used in real world environments. Robust speech recognition is therefore an important issue for successful application of speech recognizers. Neural network based transformation methods are studied to compensate for the mismatched conditions of training and testing. First, a feature transformation neural network is studied. Second, a maximum likelihood neural network is applied to model transformations. The advantage of the neural network based transformation methods is that retraining of the speech recognizer for each particular environment is avoided. Furthermore, because the multi layer neural network is known to be able to compute nonlinear functions, the neural network based transformation methods are able to establish nonlinear mapping functions between training and testing environments without specific knowledge about the distortion or the mismatched environments.
international conference on acoustics speech and signal processing | 1996
Dong-Suk Yuk; ChiWei Che; Limin Jin; Qiguang Lin
Environment-independent continuous speech recognition is important for the successful development of speech recognizers in real world applications. Linear compensation methods do not work well if the mismatches between training; and testing environments are not linear. In this paper, a neural network compensation technique is explored to mitigate the distortion resulting from additive noise, distant-talking, or telephone channels. The advantage of the neural network compensation method is that retraining of a speech recognizer for each particular application is avoided. Furthermore, since neural networks are trained to transform distorted speech feature vectors to those corresponding to clean speech, it may outperform a retrained speech recognizer trained on distorted speech. Three experiments are conducted to evaluate the capability of the neural network compensation method; recognition of additive noisy speech, distant-talking speech, and telephone speech.
Journal of the Acoustical Society of America | 1998
Dong-Suk Yuk; ChiWei Che; Prabhu Raghavan; Samir Chennoukh; James L. Flanagan
In a large vocabulary continuous speech recognition system, high‐level linguistic knowledge can enhance performance. However, integration of high‐level linguistic knowledge and complex acoustic models under an efficient search scheme is still problematic. Higher‐order n‐grams are so computationally expensive, especially when the size of vocabulary is large, that real time processing is not possible yet. In this report, the n‐best breadth search algorithm is proposed under the framework of the state space search, which can handle higher order n‐grams and complex subword acoustic models such as the cross‐word triphones. The n‐best breadth search is a combination of the best first search and the breadth first search. The proposed algorithm can be extended to handle other types of language models such as the stochastic context‐free grammar, and different types of acoustic models including the neural networks. Compared with the conventional beam‐search method, this pilot experiment shows that the proposed algo...
Journal of the Acoustical Society of America | 1998
Prabhu Raghavan; ChiWei Che; Samir Chenoukh; Dong-Suk Yuk; James L. Flanagan
Performance of automatic speech recognition systems trained on close‐talking data suffers when the systems are used in a distant‐talking environment due to the mismatch in training and testing conditions. Microphone array sound capture can remove some of the mismatch by removing ambient noise and reverberation, resulting in an approximation to a clean speech signal. However, this often does not improve the performance sufficiently. But, using array signal capture in conjunction with Hidden Markov Model (HMM) adaptation on the clean‐speech models can result in high recognition accuracy. This paper describes an experiment in which the output of an eight‐element microphone array system using MFA processing is used for speech recognition with LT‐MLLR adaptation. The recognition is done in two passes. In the first pass, an HMM trained on clean data is used to recognize the speech. Using the results of this pass, the HMM model is adapted to the environment using the LT‐MLLR algorithm. This adapted model is then...
Journal of the Acoustical Society of America | 1999
Mahesh Krishnamoorthy; Dong-Suk Yuk; Krishna Dayanidhi; Samir Chennoukh; D. Sinder; James L. Flanagan
Significant error in stop consonant recognition is caused by the confusion between voiced stop consonants and their unvoiced counterparts. The recognition is based on HMM’s which use 12 MFCC’s and energy with their time derivatives. The voicing state is the distinctive feature of homorganic stop consonants. According to recognition error‐rate analysis, it seems that the cepstral feature does not accurately represent the voicing state of the modeled phone. For this purpose, a voiced–unvoiced classifier in conjunction with HMM’s is proposed to improve the recognition of stop consonants. The recognition is done in two passes. In the first pass, a phone recognizer uses well‐trained HMM’s to identify a stop consonant. This pass provides the recognized stop consonant in addition to the log probability. In the second pass, the voiced–unvoiced classifier checks if the voicing state of the phone segment matches with its phonetic description. In the case of mismatch and low probability of recognition, the voiced (u...
Archive | 1996
Dong-Suk Yuk; Qiguang Lin; ChiWei Che; Li-jie Jin; James L. Flanagan