Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tung-Hui Chiang is active.

Publication


Featured researches published by Tung-Hui Chiang.


international conference on computational linguistics | 1992

Syntactic ambiguity resolution using a discrimination and robustness oriented adaptive learning algorithm

Tung-Hui Chiang; Yi-Chung Lin; Keh-Yih Su

In this paper, a discrimination and robustness oriented adaptive learning procedure is proposed to deal with the task of syntactic ambiguity resolution. Owing to the problem of insufficient training data and approximation error introduced by the language model, traditional statistical approaches, which resolve ambiguities by indirectly and implicitly using maximum likelihood method, fail to achieve high performance in real applications. The proposed method remedies these problems by adjusting the parameters to maximize the accuracy rate directly. To make the proposed algorithm robust, the possible variations between the training corpus and the real tasks are also taken into consideration by enlarging the separation margin between the correct candidate and its competing members. Significant improvement has been observed in the test. The accuracy rate of syntactic disambiguation is raised from 46.0% to 60.62% by using this novel approach.


international conference on acoustics, speech, and signal processing | 1992

A unified framework to incorporate speech and language information in spoken language processing

Keh-Yih Su; Tung-Hui Chiang; Yi-Chung Lin

To enhance the performance of spoken language processing, a unified framework is proposed to integrate speech and language information together. This framework uses probabilistic formulation to characterize different language analyses generated from a language processing module. As probabilistic formulations are used in both speech and language processing modules, information from both modules can be easily integrated. To further improve the performance, a discrimination and robustness oriented learning procedure is proposed to adjust the parameters of probabilistic formulations. Significant improvement has been observed in the task of reading aloud Chinese computer manuals, which operates in a speaker dependent, isolated word mode.<<ETX>>


中文計算語言學期刊 | 1996

An Overview of Corpus-Based Statistics-Oriented (CBSO) Techniques for Natural Language Processing

Keh-Yih Su; Tung-Hui Chiang; Jing-Shin Chang

A Corpus-Based Statistics-Oriented (CBSO) methodology, which is an attempt to avoid the drawbacks of traditional rule-based approaches and purely statistical approaches, is introduced in this paper. Rule-based approaches, with rules induced by human experts, had been the dominant paradigm in the natural language processing community. Such approaches, however, suffer from serious difficulties in knowledge acquisition in terms of cost and consistency. Therefore, it is very difficult for such systems to be scaled-up. Statistical methods, with the capability of automatically acquiring knowledge from corpora, are becoming more and more popular, in part, to amend the shortcomings of rule-based approaches. However, most simple statistical models, which adopt almost nothing from existing linguistic knowledge, often result in a large parameter space and, thus, require an unaffordably large training corpus for even well-justified linguistic phenomena. The corpus-based statistics-oriented (CBSO) approach is a compromise between the two extremes of the spectrum for knowledge acquisition. CBSO approach emphasizes use of well-justified linguistic knowledge in developing the underlying language model and application of statistical optimization techniques on top of high level constructs, such as annotated syntax trees, rather than on surface strings, so that only a training corpus of reasonable size is needed for training and long distance dependency between constituents could be handled. In this paper, corpus-based statistics-oriented techniques are reviewed. General techniques applicable to CBSO approaches are introduced. In particular, we shall address the following important issues: (1) general tasks in developing an NLP system; (2) why CBSO is the preferred choice among different strategies; (3) how to achieve good performance systematically using a CBSO approach, and (4) frequently used CBSO techniques. Several examples are also reviewed.


international conference on computational linguistics | 1994

Automatic model refinement: with an application to tagging

Yi-Chung Lin; Tung-Hui Chiang; Keh-Yih Su

Statistical NLP models usually only consider coarse information and very restricted context to make the estimation of parameters feasible. To reduce the modeling error introduced by a simplified probabilistic model, the Classification and Regression Tree (CART) method was adopted in this paper to select more discriminative features for automatic model refinement. Because the features are adopted dependently during splitting the classification tree in CART, the number of training data in each terminal node is small, which makes the labeling process of terminal nodes not robust. This over-tuning phenomenon cannot be completely removed by cross - validation process (i.e., pruning process). A probabilistic classification model based on the selected discriminative features is thus proposed to use the training data more efficiently. In tagging the Brown Corpus, our probabilistic classification model reduces the error rate of the top 10 error dominant words from 5.71% to 4.35%, which shows 23.82% improvement over the unrefined model.


Computer Speech & Language | 1995

The effects of learning, parameter tying and model refinement for improving probabilistic tagging

Yi-Chung Lin; Tung-Hui Chiang; Keh-Yih Su

Abstract To reduce the estimation error introduced by insufficient training data, the parameters of probabilistic models are usually smoothed by different techniques, such as Good–Turing smoothing and back-off smoothing. However, the discriminative power of the model cannot be significantly enhanced simply with the smoothing techniques. Therefore, in this paper an adaptive learning method is adopted to enhance the discrimination power of a probabilistic model. Also, a novel tying scheme is proposed to tie the unreliable parameters which never or rarely occurred in the training data, so that those unreliable parameters can have more chance to be adjusted by the learning procedure. In the task of tagging Brown Corpus, this approach greatly reduces the number of parameters from 578 759 to 27 947 and reduces the error rate of the ambiguous words (i.e. the words with more than one possible part of speech) from 5?48 to 4?93%, corresponding to 10?4% error reduction rate. Furthermore, a probabilistic model is usually simplified to enable reliable estimates of its parameters using the limited amount of training data. As a consequence, the modelling error is increased because some discriminative features are sacrificed while simplifying that model. Therefore, a probabilistic classification model is proposed to reduce the modelling error by better using the discriminative features selected by the Classification and Regression Tree method. This proposed model achieves 19?16% error reduction rate for the top 30 error-contributing words, which contribute 31?64% of the overall tagging errors.


international conference on acoustics, speech, and signal processing | 1994

On jointly learning the parameters in a character-synchronous integrated speech and language model

Tung-Hui Chiang; Yi-Chung Lin; Keh-Yih Su

A joint learning algorithm which enabled the parameters in an integrated speech and language model to be trained jointly, was proposed in this paper. The integrated model enhanced the spoken language system with high level knowledge, and operated in a character-synchronous mode. This integration model was tested on the task of recognizing isolated Chinese characters in the speaker independent mode with very large vocabulary of 90,495 words, and the performance of 88.26% character accuracy rate was obtained. In contrast, only 75.71% accuracy rate was achieved with the baseline system, which directly coupled the speech recognizer with a character bi-gram language module. Afterwards, the parameters of both speech and language modules were jointly adjusted according to their contribution in discrimination. The dynamic range variations among the parameters in different modules were also well tuned during the learning processes. After applying this procedure to the character-synchronous integration model, a very promising result of 94.16% character accuracy (75.96% error reduction rate) was obtained.<<ETX>>


ROCLING | 1992

Statistical Models for Word Segmentation And Unknown Word Resolution.

Tung-Hui Chiang; Jing-Shin Chang; Ming-Yu Lin; Keh-Yih Su


ROCLING | 1993

A Preliminary Study On Unknown Word Problem In Chinese Word Segmentation.

Ming-Yu Lin; Tung-Hui Chiang; Keh-Yih Su


ROCLING | 1992

Discrimination Oriented Probabilistic Tagging.

Yi-Chung Lin; Tung-Hui Chiang; Keh-Yih Su


Computational Linguistics | 1995

Robust learning, smoothing, and parameter tying on syntactic ambiguity resolution

Tung-Hui Chiang; Keh-Yih Su; Yi-Chung Lin

Collaboration


Dive into the Tung-Hui Chiang's collaboration.

Top Co-Authors

Avatar

Keh-Yih Su

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Yi-Chung Lin

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Jing-Shin Chang

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Ming-Yu Lin

National Tsing Hua University

View shared research outputs
Researchain Logo
Decentralizing Knowledge