Dau-Cheng Lyu
Chang Gung University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dau-Cheng Lyu.
international conference on acoustics, speech, and signal processing | 2006
Dau-Cheng Lyu; Ren-Yuan Lyu; Yuang-Chin Chiang; Chun-Nan Hsu
We propose an integrated approach to do automatic speech recognition on code-switching utterances, where speakers switch back and forth between at least 2 languages. This one-pass framework avoids the degradation of accuracy due to the imperfectly intermediate decisions of language detection and language identification. It is based on a three-layer recognition scheme, which consists of a mixed-language HMM-based acoustic model, a knowledge-based plus data-driven probabilistic pronunciation model, and a tree-structured searching net. The traditional multi-pass recognizer including language boundary detection, language identification and language-dependent speech recognition is also implemented for comparison. Experimental results show that the proposed approach, with a much simpler recognition scheme, could achieve as high accuracy as that could be achieved by using the traditional approach
international conference on advanced learning technologies | 2004
Min-Siong Liang; Rhuei-Cheng Yang; Yuang-Chin Chiang; Dau-Cheng Lyu; Ren-Yuan Lyu
The paper describes a Taiwanese text-to-speech (TTS) system for Taiwanese language learning by using Taiwanese/Mandarin bilingual lexicon information. The TTS system is organized as three functional modules, which contain a text analysis module, a prosody module, and waveform synthesis modules. And then we set an experiment to evaluate the text analysis and tone-sandhi. An 89% labeling and 65% tone-sandhi accuracy rate can be achieved. With adopting proposed Taiwanese TTS component, talking electronic lexicon system, Taiwanese interactive spelling learning tool and Taiwanese TTS system can be built to help those who want to learn Taiwanese.
International Journal of Computational Linguistics and Chinese Language Processing 10 | 2005
Dau-Cheng Lyu; Ren-Yuan Lyu; Yuang-Chin Chiang; Chun-Nan Hsu
In this paper, a bi-lingual large vocaburary speech recognition experiment based on the idea of modeling pronunciation variations is described. The two languages under study are Mandarin Chinese and Taiwanese (Min-nan). These two languages are basically mutually unintelligible, and they have many words with the same Chinese characters and the same meanings, although they are pronounced differently. Observing the bi-lingual corpus, we found five types of pronunciation variations for Chinese characters. A one-pass, three-layer recognizer was developed that includes a combination of bi-lingual acoustic models, an integrated pronunciation model, and a tree-structure based searching net. The recognizers performance was evaluated under three different pronunciation models. The results showed that the character error rate with integrated pronunciation models was better than that with pronunciation models, using either the knowledge-based or the data-driven approach. The relative frequency ratio was also used as a measure to choose the best number of pronunciation variations for each Chinese character. Finally, the best character error rates in Mandarin and Taiwanese testing sets were found to be 16.2% and 15.0%, respectively, when the average number of pronunciations for one Chinese character was 3.9.
International Journal of Computational Linguistics & Chinese Language Processing, Volume 13, Number 3, September 2008: Special Issue on Selected Papers from ROCLING XIX | 2008
Dau-Cheng Lyu; Chun-Nan Hsu; Yuang-Chin Chiang; Ren-Yuan Lyu
Due to abundant resources not always being available for resource-limited languages, training an acoustic model with unbalanced training data for multilingual speech recognition is an interesting research issue. In this paper, we propose a three-step data-driven phone clustering method to train a multilingual acoustic model. The first step is to obtain a clustering rule of context independent phone models driven from a well-trained acoustic model using a similarity measurement. For the second step, we further clustered the sub-phone units using hierarchical agglomerative clustering with delta Bayesian information criteria according to the clustering rules. Then, we chose a parametric modeling technique- model complexity selection -- to adjust the number of Gaussian components in a Gaussian mixture for optimizing the acoustic model between the new phoneme set and the available training data. We used an unbalanced trilingual corpus where the percentages of the amounts of the training sets for Mandarin, Taiwanese, and Hakka are about 60%, 30%, and 10%, respectively. The experimental results show that the proposed sub-phone clustering approach reduced relative syllable error rate by 4.5% over the best result of the decision tree based approach and 13.5% over the best result of the knowledge-based approach.
international symposium on chinese spoken language processing | 2006
Dau-Cheng Lyu; Ren-Yuan Lyu; Yuang-Chin Chiang; Chun-Nan Hsu
Many approaches to automatic spoken language identification (LID) on monolingual speech are successfully, but LID on the code-switching speech identifying at least 2 languages from one acoustic utterance challenges these approaches. In [6], we have successfully used one-pass approach to recognize the Chinese character on the Mandarin-Taiwanese code-switching speech. In this paper, we introduce a classification method (named syllable-based duration classification) based on three clues: recognized common tonal syllable tonal syllable, the corresponding duration and speech signal to identify specific language from code-switching speech. Experimental results show that the performance of the proposed LID approach on code-switching speech exhibits closely to that of parallel tonal syllable recognition LID system on monolingual speech.
international conference on acoustics, speech, and signal processing | 2008
Dau-Cheng Lyu; Ren-Yuan Lyu
Phoneme set clustering of accurate modeling is important in the task of multilingual speech recognition, especially when each of the available language training corpora is mismatched, such as is the case between a major language, like Mandarin, and a minor language, like Taiwanese. In this paper, we present a data-driven approach for not only acquiring a proper phoneme set but optimizing the acoustic modeling in this situation. In order to obtain the phoneme set that is suitable for the unbalanced corpus, we use an agglomerative hierarchical clustering with delta Bayesian information criteria. Then for training each of the acoustic models, we choose a parametric modeling technique, model complexity selection, to adjust the number of mixtures for optimizing the acoustic model between the new phoneme set and the available training data. The experimental results are very encouraging in that the proposed approach reduces relative syllable error rate by 7.8% over the best result of the knowledge-based approach.
decision support systems | 2008
Dau-Cheng Lyu; Ren-Yuan Lyu; Yuang-Chin Chiang; Chun-Nan Hsu
This paper addresses a content management problem in situations where we have a collection of spoken documents in audio stream format in one language and a collection of related text documents in another. In our case, we have a huge digital archive of audio broadcast news in Taiwanese, but its transcriptions are unavailable. Meanwhile, we have a collection of related text-based news stories, but they are written in Chinese characters. Due to the lack of a standard written form for Taiwanese, manual transcription of spoken documents is prohibitively expensive, and automatic transcription by speech recognition is infeasible because of its poor performance for Taiwanese spontaneous speech. We present an approximate solution by aligning Taiwanese spoken documents with related text documents in Mandarin. The idea is to take advantage of the abundance of Mandarin text documents available in our application to compensate for the limitations of speech recognition systems. Experimental results show that even though our speech recognizer for spontaneous Taiwanese performs poorly, our approach still achieve a high (82.5%) alignment accuracy for sufficient for content management.
international workshop on cellular neural networks and their applications | 2005
Hong-wen Sie; Dau-Cheng Lyu; Zhong-Ing Liou; Ren-Yuan Lyu; Yuang-Chin Chiang
In the paper, we describe a multilingual ASR engine embedded on PDA. Our ASR can support multiple languages including Mandarin, Taiwanese and English simultaneously based on a unified three-layer framework, and a one-stage searching strategy. In the framework, there is a unified acoustic model for all the considered languages, a multiple pronunciation lexicon and a searching network, whose nodes represent the Chinese characters and English syllables with their multiple pronunciations. Under the architecture the system can not only reduce its memory and computational complexity but also deal with the issues about a character with multiple pronunciations. In general, the computer resource of PDA is quite limited when compared to PC. In this paper, much work has been done to alleviate the limitation of PDA. The experimental results show the system has good performance where the recognition rate achieves about 90% in the voiced command task with limited vocabulary.
international workshop on cellular neural networks and their applications | 2005
Dau-Cheng Lyu; Bo-Hou Yang; Ren-Yuan Lyu; Chun-Nan Hsu
This paper describes an index system from Taiwanese TV speech news to World Wide Web Chinese text documents. This system is based on two main techniques: automatic speech recognition (ASR) and bi-lingual text alignment. For the former, we utilized the speech-to-text approach to recognize the utterance of anchors in the TV news as Taiwanese tonal syllable sequences. Then we translated the Chinese text documents which obtained from the corresponding news website to the Taiwanese tonal syllables by a bi-lingual pronunciation lexicon. Afterward, a dynamic programming algorithm is used in the syllable-level alignment for linking the TV news and the documents. A corpus of speech data about 100 speakers and the text data with 840k Chinese characters were used to train the acoustic and language models in ASR. A bi-lingual lexicon contains 70k vocabularies is used as the resource of the pronunciation model for ASR and the statistical translation model for bi-lingual text alignment. Finally, the experiment of the TV news with 40 stories was evaluated for the document index system, and the accuracy rate of index is over 82% on average.
conference of the international speech communication association | 2008
Dau-Cheng Lyu; Ren-Yuan Lyu