Jia-Lin Shen
National Taiwan University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jia-Lin Shen.
international conference on speech image processing and neural networks | 1994
Lin-Shan Lee; Keh-Jiann Chen; Chiu-yu Tseng; Ren-Yuan Lyu; Lee-Feng Chien; Hsin-Min Wang; Jia-Lin Shen; Sung-Chien Lin; Yen-Ju Yang; Bo-Ren Bai; Chi-ping Nee; Chun-Yi Liao; Shueh-Sheng Lin; Chung-Shu Yang; I-Jung Hung; Ming-Yu Lee; Rei-Chang Wang; Bo-Shen Lin; Yuan-Cheng Chang; Rung-Chiung Yang; Yung-Chi Huang; Chen-Yuan Lou; Tung-Sheng Lin
Golden Mandarin (II) is an intelligent single-chip based real-time Mandarin dictation machine for the Chinese language with a very large vocabulary for the input of unlimited Chinese texts into computers using voice. This dictation machine can be installed on any personal computer, in which only a single chip Motorola DSP 96002D is used, with a preliminary character correct rate around 95% at a speed of 0.6 sec per character. Various adaptation/learning functions have been developed for this machine, including fast adaptation to new speakers, on-line learning the voice characteristics, task domains, word pattern and noise environments of the users, so the machine can be easily personalized for each user. These adaptation/learning functions are the major subjects of the paper.<<ETX>>
IEEE Transactions on Speech and Audio Processing | 1997
Hsin-Min Wang; Tai-Hsuan Ho; Rung-Chiung Yang; Jia-Lin Shen; Bo-Ren Bai; Jenn-Chau Hong; Wei-Peng Chen; Tong-Lo Yu; Lin-Shan Lee
This correspondence presents the first known results of complete recognition of continuous Mandarin speech for the Chinese language with very large vocabulary but very limited training data. Various acoustic and linguistic processing techniques were developed, and a prototype system of a continuous speech Mandarin dictation machine has been successfully implemented. The best recognition accuracy achieved is 92.2% for finally decoded Chinese characters.
international conference on acoustics, speech, and signal processing | 1995
Hsin-Min Wang; Jia-Lin Shen; Yen-Ju Yang; Chiu-yu Tseng; Lin-Shan Lee
This paper presents the first known results for complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but very limited training data. Although some isolated-syllable-based or isolated-word-based large-vocabulary Mandarin speech recognition systems have been successfully developed, a continuous-speech-based system of this kind has never been reported before. For successful development of this system, several important techniques have been used, including acoustic modeling of a set of sub-syllabic models for base syllable recognition and another set of context-dependent models for tone recognition, a multiple candidate searching technique based on a concatenated syllable matching algorithm to synchronize base syllable and tone recognition, and a word-class-based Chinese language model for linguistic decoding. The best recognition accuracy achieved is 88.69% for finally decoded Chinese characters, with 88.69%, 91.57%, and 81.37% accuracy for base syllables, tones, and tonal syllables respectively.
international conference on acoustics, speech, and signal processing | 1997
Lee-Feng Chien; Sung-Chien Lin; Jenn-Chau Hong; Ming-Chiuan Chen; Hsin-Min Wang; Jia-Lin Shen; Keh-Jiann Chen; Lin-Shan Lee
In order to pursue high performance of Chinese information access on the Internet, this paper presents an attractive approach with a successful integration of efficient speech recognition and information retrieval techniques. A working system based on the proposed approach for speech retrieval of real-time Chinese netnews services has been implemented and tested. Very exciting performance has been achieved.
Computer Speech & Language | 1999
Jia-Lin Shen; Hsin-Min Wang; Ren-Yuan Lyu; Lin-Shan Lee
This paper presents an approach of automatic selection of phonetically distributed sentence sets for speaker adaptation, and applies the concept to the task of Mandarin speech recognition with very large vocabulary. This is a different approach to the adaptation data selection problem. A computer algorithm is developed to select minimum sets of phonetically distributed training sentences from a text corpus defining the desired task. These sentence sets not only include an almost minimum number of words and sentences that cover the desired acoustic units, but also have statistical distributions of these acoustic phonetic units very close to that in the given text corpus defining the desired task. In this way, more frequently used units can be better trained with higher accuracy, thus improving the overall performance, but the new user needs to produce only a small number of meaningful sentences to train the recognizer. Different sets of sentences selected using different phonetic criteria taking into consideration the statistics of the different acoustic units in the given corpus can then be integrated into a multi-stage adaptation procedure. With this procedure, the recognition performance can be improved incrementally stage by stage using the adaptation data produced with these sentence sets. This proposed approach is applied to an example task of Mandarin speech recognition with a very large vocabulary, both in isolated syllable and continuous speech modes and includes different subject domains in continuous speech recognition. Although the primary results obtained in this paper are for this example task, it is believed that many of the concepts and techniques developed here will also be very useful for other speaker adaptation problems and other languages.
international conference on spoken language processing | 1996
Jia-Lin Shen; Wen-Liang Hwang; Lin-Shan Lee
The paper presents the use of a variety of filters in the temporal trajectories of the frequency band spectrum to extract speech recognition features for environmental robustness. Three kinds of filters for emphasizing the statistically important parts of speech are proposed. First, a bank of RASTA-like band-pass filters to fit the statistical peaks of the modulation frequency band spectrum of speech are used. Secondly, a three-channel octave band-filter band with a smoothed rectangular window spline is applied. Thirdly, a data-driven filter is developed. Experimental results show that significant improvements for speech recognition using the proposed feature extraction approach under noisy environments can be achieved.
IEEE Transactions on Speech and Audio Processing | 2001
Jeih-weih Hung; Jia-Lin Shen; Lin-Shan Lee
Parallel model combination (PMC) techniques have been very successful and popularly used in many applications to improve the performance of speech recognition systems under noisy environments. However, it is believed that some assumptions and approximations made in this approach, primarily in the domain transformation and parameter combination processes, are not necessarily accurate enough in certain practical situations, which may degrade the achievable performance of PMC. In this paper, the possible sources that cause the performance degradation in these processes are carefully analyzed and discussed. Three new approaches, including the truncated Gaussian approach and the split mixture approach for the domain transformation process and the estimated cross-term approach for parameter combination process, are proposed in this paper in order to handle these problems, minimize such degradation, and improve the accuracy of the PMC techniques. These proposed approaches were analyzed and discussed with two recognition tasks, one relatively simple, and the other more complicated and realistic. Both sets of experiments showed that these proposed approaches are able to provide significant improvements over the original PMC method, especially when the SNR condition is worse.
international conference on acoustics speech and signal processing | 1998
Jeih-weih Hung; Jia-Lin Shen; Lin-Shan Lee
The parallel model combination (PMC) technique has been shown to achieve very good performance for speech recognition under noisy conditions. In this approach, the speech signal and the noise are assumed uncorrelated during modeling. A new correlated PMC is proposed by properly estimating and modeling the nonzero correlation between the speech signal and the noise. Preliminary experimental results show that this correlated PMC can provide significant improvements over the original PMC in terms of both the model differences and the recognition accuracies. Error rate reduction on the order of 14% can be achieved.
IEEE Transactions on Speech and Audio Processing | 1998
Ren-Yuan Lyu; I-Chung Hong; Jia-Lin Shen; Ming-Yu Lee; Lin-Shan Lee
A segmental probability model (SPM) is proposed for fast and accurate recognition of the highly confusing isolated Mandarin base-syllables by deleting the state transition probabilities of continuous density hidden Markov models (CHMM), abandoning the dynamic programming process, letting the states equally segment the base-syllables deterministically, and using several special approaches to improve the accuracy and speed. This is achieved by considering the special characteristics of the target vocabulary.
international conference on acoustics speech and signal processing | 1996
Jia-Lin Shen; Lin-Shan Lee
This paper presents a fast and accurate recognition of continuous Mandarin speech with very large vocabulary using an improved segmental probability model (SPM) approach. In order to extensively utilize the acoustic and linguistic knowledge to further improve the recognition performance, a few special techniques are thus developed. Preliminary simulation results show that the final achievable rate for the base syllable recognition with the improved segmental probability modeling is as high as 91.62%, which indicates a 18.48% error rate reduction and more than 3 times faster than the well-studied sub-syllable-based CHMM. Also, a tone recognizer and a word-based Chinese language model are included and the achieved recognition accuracy for the final decoded Chinese characters is 92.10%.