Chiu-yu Tseng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chiu-yu Tseng is active.

Explore More

Publication

Featured researches published by Chiu-yu Tseng.

international conference on acoustics, speech, and signal processing | 1990

A real-time Mandarin dictation machine for Chinese language with unlimited texts and very large vocabulary

Lin-Shan Lee; Chiu-yu Tseng; Hung-yan Gu; Fu-hua Liu; C.H. Chang; Sung-Hsien Hsieh; Chia-ping Chen

A successfully implemented real-time Mandarin dictation machine which recognizes Mandarin speech with unlimited texts and very large vocabulary for the input of Chinese characters to computers is described. Isolated syllables including the tones are first recognized using specially trained hidden Markov models with special feature parameters. The exact characters are then identified from the syllables using a Markov Chinese language model. The real-time implementation is on an IBM PC/AT, connected to a set of special hardware boards on which ten TMS 320C25 chips operate in parallel. It takes only 0.45 s to dictate a character.<<ETX>>

international conference on speech image processing and neural networks | 1994

Golden Mandarin(II)-an intelligent Mandarin dictation machine for Chinese character input with adaptation/learning functions

Lin-Shan Lee; Keh-Jiann Chen; Chiu-yu Tseng; Ren-Yuan Lyu; Lee-Feng Chien; Hsin-Min Wang; Jia-Lin Shen; Sung-Chien Lin; Yen-Ju Yang; Bo-Ren Bai; Chi-ping Nee; Chun-Yi Liao; Shueh-Sheng Lin; Chung-Shu Yang; I-Jung Hung; Ming-Yu Lee; Rei-Chang Wang; Bo-Shen Lin; Yuan-Cheng Chang; Rung-Chiung Yang; Yung-Chi Huang; Chen-Yuan Lou; Tung-Sheng Lin

Golden Mandarin (II) is an intelligent single-chip based real-time Mandarin dictation machine for the Chinese language with a very large vocabulary for the input of unlimited Chinese texts into computers using voice. This dictation machine can be installed on any personal computer, in which only a single chip Motorola DSP 96002D is used, with a preliminary character correct rate around 95% at a speed of 0.6 sec per character. Various adaptation/learning functions have been developed for this machine, including fast adaptation to new speakers, on-line learning the voice characteristics, task domains, word pattern and noise environments of the users, so the machine can be easily personalized for each user. These adaptation/learning functions are the major subjects of the paper.<<ETX>>

Speech Communication | 2005

Fluent speech prosody: Framework and modeling

Chiu-yu Tseng; ShaoHuang Pin; Yehlin Lee; Hsin-Min Wang; Yong-cheng Chen

Abstract The prosody of fluent connected speech is much more complicated than concatenating individual sentence intonations into strings. We analyzed speech corpora of read Mandarin Chinese discourses from a top–down perspective on perceived units and boundaries, and consistently identified speech paragraphs of multiple phrases that reflected discourse rather than sentence effects in fluent speech. Subsequent cross-speaker and cross-speaking-rate acoustic analyses of identified speech paragraphs revealed systematic cross-phrase prosodic patterns in every acoustic parameter, namely, F0 contours, duration adjustment, intensity patterns, and in addition, boundary breaks. We therefore argue for a higher prosodic node that governs, constrains, and groups phrases to derive speech paragraphs. A hierarchical multi-phrase framework is constructed to account for the governing effect, with complimentary production and perceptual evidences. We show how cross-phrase F0 and syllable duration patterns templates are derived to account for the tune and rhythm characteristic to fluent speech prosody, and argue for a prosody framework that specifies phrasal intonations as subjacent sister constituent subject to higher terms. Output fluent speech prosody is thus cumulative results of contributions from every prosodic layer. To test our framework, we further construct a modular prosody model of multiple-phrase grouping with four corresponding acoustic modules and begin testing the model with speech synthesis. To conclude, we argue that any prosody framework of fluent speech should include prosodic contributions above individual sentences in production, with considerations of its perceptual effects to on-line processing; and development of unlimited TTS could benefit most appreciably by capturing and including cross-phrase relationships in prosody modeling.

international conference on acoustics, speech, and signal processing | 1993

Golden Mandarin (II)-an improved single-chip real-time Mandarin dictation machine for Chinese language with very large vocabulary

Lin-Shan Lee; Chiu-yu Tseng; Keh-Jiann Chen; I-Jung Hung; Ming-Yu Lee; Lee-Feng Chien; Yumin Lee; Ren-Yuan Lyu; Hsin-Min Wang; Yung-Chuan Wu; Tung-Sheng Lin; Hung-yan Gu; Chi-ping Nee; Chun-Yi Liao; Yeng-Ju Yang; Yuan-Cheng Chang; Rung-Chiung Yang

Golden Mandarin (II) is an improved single-chip real-time Mandarin dictation machine with a very large vocabulary for the input of unlimited Chinese sentences into computers using voice. In this dictation machine only a single-chip Motorola DSP 96002D on an Ariel DSP-96 card is used, with a preliminary character correct rate of around 95% in speaker-dependent mode at a speech of 0.36 s per character. This is achieved by many new techniques, primarily a segmental probability modeling technique for syllable recognition especially considering the characteristics of Mandarin syllables, and a word-lattice-based Chinese character bigram for character identification especially considering the structure of the Chinese language.<<ETX>>

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1989

The synthesis rules in a Chinese text-to-speech system

Lin-Shan Lee; Chiu-yu Tseng; Ming Ouhyoung

The synthesis rules developed for a successfully implemented Chinese text-to-speech system are described in detail. The design approach is based on a syllable concatenation that is rooted in the special characteristics of the Chinese language. Special attention given to the lexical tones and other prosodic rules, such as concatenation rules, sandhi rules, stress rules, intonation patterns, syllable duration rules, pause insertion rules, and energy modification rules. The rules are derived from the acoustic properties of Mandarin Chinese and therefore are useful not only in designing other Chinese text-to-speech systems, but also in understanding the characteristics of Mandarin sentences and processing Mandarin speech signals for other purposes such as segmentation or recognition. >

IEEE Transactions on Speech and Audio Processing | 1993

Golden Mandarin (I)-A real-time Mandarin speech dictation machine for Chinese language with very large vocabulary

Lin-Shan Lee; Chiu-yu Tseng; Hung-yan Gu; Fu-hua Liu; Chen-hao Chang; Yueh-hong Lin; Yumin Lee; Shih-Lung Tu; Shew-Heng Hsieh; Chian-hung Chen

The first successfully implemented real-time Mandarin dictation machine, which recognizes Mandarin speech with very large vocabulary and almost unlimited texts for the input of Chinese characters into computers, is described. The machine is speaker-dependent, and the input speech is in the form of sequences of isolated syllables. The machine can be decomposed into two subsystems. The first subsystem recognizes the syllables using hidden Markov models. Because every syllable can represent many different homonym characters and form different multisyllabic words with syllables on its right or left, the second subsystem is needed to identify the exact characters from the syllables and correct the errors in syllable recognition. The real-time implementation is on an IBM PC/AT, connected to three sets of specially designed hardware boards on which seven TMS 320C25 chips operate in parallel. The preliminary test results indicate that it takes only about 0.45 s to dictate a syllable (or character) with an accuracy on the order of 90%. >

IEEE Transactions on Signal Processing | 1991

Isolated-utterance speech recognition using hidden Markov models with bounded state durations

Hung-yan Gu; Chiu-yu Tseng; Lin-Shan Lee

Hidden Markov models (HMMs) with bounded state durations (HMM/BSD) are proposed to explicitly model the state durations of HMMs and more accurately consider the temporal structures existing in speech signals in a simple, direct, but effective way. A series of experiments have been conducted for speaker dependent applications using 408 highly confusing first-tone Mandarin syllables as the example vocabulary. It was found that in the discrete case the recognition rate of HMM/BSD (78.5%) is 9.0%, 6.3%, and 1.9% higher than the conventional HMMs and HMMs with Poisson and gamma distribution state durations, respectively. In the continuous case (partitioned Gaussian mixture modeling), the recognition rates of HMM/BSD (88.3% with 1 mixture, 88.8% with 3 mixtures, and 89.4% with 5 mixtures) are 6.3%, 5.0%, and 5.5% higher than those of the conventional HMMs, and 5.9% (with 1 mixture), 3.9% (with 3 mixtures) and 3.1% (with 1 mixture), 1.8% (with 3 mixtures) higher than HMMs with Poisson and gamma distributed state durations, respectively. >

IEEE Transactions on Speech and Audio Processing | 1993

Improved tone concatenation rules in a formant-based Chinese text-to-speech system

Lin-Shan Lee; Chiu-yu Tseng; Ching Jiang Hsieh

A set of improved tone concatenation rules to be used in a formant-based Mandarin Chinese text-to-speech system is presented. This system concatenates prestored syllables superimposed by additional tone patterns to obtain speech sentences for unlimited text, with the acoustic properties of each syllable modified by a set of synthesis rules. The tone concatenation rules are the most important among these synthesis rules, because they tell how the tone patterns for the syllables should be modified in an arbitrary sentence under various conditions of concatenating syllables of different tones on both sides. The improved tone concatenation rules are obtained empirically by carefully analyzing the tone pattern behavior under various tone concatenation conditions for many sentences in a database. A total of 14 representative tone patterns are defined for the five tones, and different rules about which pattern should be used under what kind of tone concatenation conditions are organized in detail. Preliminary subjective tests indicate that these rules actually give better synthesized speech for a formant-based Chinese text-to-speech system. >

international conference on acoustics, speech, and signal processing | 1995

Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data

Hsin-Min Wang; Jia-Lin Shen; Yen-Ju Yang; Chiu-yu Tseng; Lin-Shan Lee

This paper presents the first known results for complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but very limited training data. Although some isolated-syllable-based or isolated-word-based large-vocabulary Mandarin speech recognition systems have been successfully developed, a continuous-speech-based system of this kind has never been reported before. For successful development of this system, several important techniques have been used, including acoustic modeling of a set of sub-syllabic models for base syllable recognition and another set of context-dependent models for tone recognition, a multiple candidate searching technique based on a concatenated syllable matching algorithm to synchronize base syllable and tone recognition, and a word-class-based Chinese language model for linguistic decoding. The best recognition accuracy achieved is 88.69% for finally decoded Chinese characters, with 88.69%, 91.57%, and 81.37% accuracy for base syllables, tones, and tonal syllables respectively.

2009 Oriental COCOSDA International Conference on Speech Database and Assessments | 2009

Phonetic aspects of content design in AESOP (Asian English Speech cOrpus Project)

Tanya Visceglia; Chiu-yu Tseng; Mariko Kondo; Helen M. Meng; Yoshinori Sagisaka

This research is part of the ongoing multinational collaboration “Asian English Speech cOrpus Project” (AESOP), whose aim is to build up an Asian English speech corpus representing the varieties of English spoken in Asia. AESOP is an international consortium of linguists, speech scientists, psychologists and educators from Japan, Taiwan, Hong Kong, China, Thailand, Indonesia and Mongolia. Its primary aim is to collect and compare Asian English speech corpora from the countries listed above in order to derive a set of core properties common to all varieties of Asian English, as well as to discover features that are particular to individual varieties. Each research team will use a common recording setup and share an experimental task set, and will develop a common, open-ended annotation system. Moreover, AESOP-collected corpora will be an open resource, available to the research community at large. The initial stage of the phonetics aspect of this project will be devoted to designing spoken-language tasks which will elicit production of a large range of English segmental and suprasegmental characteristics. These data will be used to generate a catalogue of acoustic characteristics particular to individual varieties of Asian English, which will then be compared with the data collected by other AESOP members in order to determine areas of overlap between L1 and L2 English as well as differences among varieties of Asian English.

Explore More