Ren-Yuan Lyu
Chang Gung University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ren-Yuan Lyu.
international conference on acoustics, speech, and signal processing | 2006
Dau-Cheng Lyu; Ren-Yuan Lyu; Yuang-Chin Chiang; Chun-Nan Hsu
We propose an integrated approach to do automatic speech recognition on code-switching utterances, where speakers switch back and forth between at least 2 languages. This one-pass framework avoids the degradation of accuracy due to the imperfectly intermediate decisions of language detection and language identification. It is based on a three-layer recognition scheme, which consists of a mixed-language HMM-based acoustic model, a knowledge-based plus data-driven probabilistic pronunciation model, and a tree-structured searching net. The traditional multi-pass recognizer including language boundary detection, language identification and language-dependent speech recognition is also implemented for comparison. Experimental results show that the proposed approach, with a much simpler recognition scheme, could achieve as high accuracy as that could be achieved by using the traditional approach
international conference on advanced learning technologies | 2004
Min-Siong Liang; Rhuei-Cheng Yang; Yuang-Chin Chiang; Dau-Cheng Lyu; Ren-Yuan Lyu
The paper describes a Taiwanese text-to-speech (TTS) system for Taiwanese language learning by using Taiwanese/Mandarin bilingual lexicon information. The TTS system is organized as three functional modules, which contain a text analysis module, a prosody module, and waveform synthesis modules. And then we set an experiment to evaluate the text analysis and tone-sandhi. An 89% labeling and 65% tone-sandhi accuracy rate can be achieved. With adopting proposed Taiwanese TTS component, talking electronic lexicon system, Taiwanese interactive spelling learning tool and Taiwanese TTS system can be built to help those who want to learn Taiwanese.
international conference natural language processing | 2003
Min-Siong Liang; Ren-Yuan Lyu; Yuang-Chin Chiang
Here, we describe an efficient algorithm to select phonetically balanced scripts for collecting a large-scale multilingual speech corpus. It is expected to collect a multilingual speech corpus covering three most frequently used languages in Taiwan, including Taiwanese (Min-nan), Hakka, and Mandarin Chinese. To achieve the objective, the first step is to construct a multilingual phonetic alphabet, namely Formosa phonetic alphabet (ForPA). In addition, the multilingual lexicons (Fomosa lexicons) are also important parts for building the corpus. Until now, this corpus containing 600 speakers speech of Taiwanese (Min-nan) and Mandarin Chinese has been finished and ready to release. There contains about 40 hours of speech in 247 thousand utterances in this release.
中文計算語言學期刊 | 2004
Ren-Yuan Lyu; Min-Siong Liang; Yuang-Chin Chiang
The Formosa speech database (ForSDat) is a multilingual speech corpus collected at Chang Gung University and sponsored by the National Science Council of Taiwan. It is expected that a multilingual speech corpus will be collected, covering the three most frequently used languages in Taiwan: Taiwanese (Min-nan), Hakka, and Mandarin. This 3-year project has the goal of collecting a phonetically abundant speech corpus of more than 1,800 speakers and hundreds of hours of speech. Recently, the first version of this corpus containing speech of 600 speakers of Taiwanese and Mandarin was finished and is ready to be released. It contains about 49 hours of speech and 247,000 utterances.
international conference on advanced learning technologies | 2007
Min-Siong Liang; Zien-Yong Hong; Ren-Yuan Lyu; Yuang-Chin Chiang
This paper describes an approach to pronunciation error detection for computer-assisted pronunciation teaching (CAPT). We focus on how to find the real pronunciation of the user. The data-driven based method was used to generate pronunciation errors hypotheses instead of knowledge-based method. In the experiment results, the error rate of pronunciation detection can achieve 10.56%. Finally, we applied this technique into our CAPT system.
International Journal of Computational Linguistics and Chinese Language Processing 10 | 2005
Dau-Cheng Lyu; Ren-Yuan Lyu; Yuang-Chin Chiang; Chun-Nan Hsu
In this paper, a bi-lingual large vocaburary speech recognition experiment based on the idea of modeling pronunciation variations is described. The two languages under study are Mandarin Chinese and Taiwanese (Min-nan). These two languages are basically mutually unintelligible, and they have many words with the same Chinese characters and the same meanings, although they are pronounced differently. Observing the bi-lingual corpus, we found five types of pronunciation variations for Chinese characters. A one-pass, three-layer recognizer was developed that includes a combination of bi-lingual acoustic models, an integrated pronunciation model, and a tree-structure based searching net. The recognizers performance was evaluated under three different pronunciation models. The results showed that the character error rate with integrated pronunciation models was better than that with pronunciation models, using either the knowledge-based or the data-driven approach. The relative frequency ratio was also used as a measure to choose the best number of pronunciation variations for each Chinese character. Finally, the best character error rates in Mandarin and Taiwanese testing sets were found to be 16.2% and 15.0%, respectively, when the average number of pronunciations for one Chinese character was 3.9.
international symposium on chinese spoken language processing | 2010
Dau-Cheng Lyu; Ren-Yuan Lyu; Cing-Lei Zhu; Ming-Tat Ko
In this paper, a language identification (LID) task is described on Mandarin/Taiwanese code-switching utterances. The proposed word-based lexical model of this LID system integrates acoustic, phonetic and lexical cues. The first two cues are obtained from a large vocabulary continuous speech recognition (LYCSR) system, and the last one is trained for a word-based lexical model. The lexical model is used to identify languages according to the frequency and context of each word by given a sequence of words recognized by the LVCSR system. Because the switching unit in the code-switching speech is a word, the experiments showed that, by using a word-based lexical model, 16% relative reduction of classification errors was achieved compared with that in those LVSCR-based LID systems.
International Journal of Computational Linguistics & Chinese Language Processing, Volume 13, Number 3, September 2008: Special Issue on Selected Papers from ROCLING XIX | 2008
Dau-Cheng Lyu; Chun-Nan Hsu; Yuang-Chin Chiang; Ren-Yuan Lyu
Due to abundant resources not always being available for resource-limited languages, training an acoustic model with unbalanced training data for multilingual speech recognition is an interesting research issue. In this paper, we propose a three-step data-driven phone clustering method to train a multilingual acoustic model. The first step is to obtain a clustering rule of context independent phone models driven from a well-trained acoustic model using a similarity measurement. For the second step, we further clustered the sub-phone units using hierarchical agglomerative clustering with delta Bayesian information criteria according to the clustering rules. Then, we chose a parametric modeling technique- model complexity selection -- to adjust the number of Gaussian components in a Gaussian mixture for optimizing the acoustic model between the new phoneme set and the available training data. We used an unbalanced trilingual corpus where the percentages of the amounts of the training sets for Mandarin, Taiwanese, and Hakka are about 60%, 30%, and 10%, respectively. The experimental results show that the proposed sub-phone clustering approach reduced relative syllable error rate by 4.5% over the best result of the decision tree based approach and 13.5% over the best result of the knowledge-based approach.
international symposium on chinese spoken language processing | 2008
Min-Siong Liang; Jian-Yung Hung; Ren-Yuan Lyu; Yuang-Chin Chiang
In this paper, we provided a strategy of error detection of pronunciation and applied it to the computer-assisted pronunciation teaching(CAPT), especially in Mandarin language learning. In our system, it can be divided into two parts: the sentence verification(SV) and syllable identification(SI). First was used to ban out-of-task sentences. We used the likelihood ratio test, which was computed between the maximum probability of a result under two different hypotheses, i.e. null hypothesis and alternative hypothesis models, to verify the deviation degree and decide whether the student pronunciation is out-of-task. In SV part, the experimental results was significant and had 91.0% rate of F-score. The second part was applied to recognize the content of speech read by the speaker. The recognition net was built as a sausage shape with pronunciation confusion table corresponding to confusion error patterns. Then, the system could find out the wrong pronounced syllable for the appropriate feedback to correct the pronunciation of the users. In the stage of SI, the best detection rate had a F-score rate of 77.2%.
international symposium on chinese spoken language processing | 2006
Dau-Cheng Lyu; Ren-Yuan Lyu; Yuang-Chin Chiang; Chun-Nan Hsu
Many approaches to automatic spoken language identification (LID) on monolingual speech are successfully, but LID on the code-switching speech identifying at least 2 languages from one acoustic utterance challenges these approaches. In [6], we have successfully used one-pass approach to recognize the Chinese character on the Mandarin-Taiwanese code-switching speech. In this paper, we introduce a classification method (named syllable-based duration classification) based on three clues: recognized common tonal syllable tonal syllable, the corresponding duration and speech signal to identify specific language from code-switching speech. Experimental results show that the performance of the proposed LID approach on code-switching speech exhibits closely to that of parallel tonal syllable recognition LID system on monolingual speech.