Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jing-Shin Chang is active.

Publication


Featured researches published by Jing-Shin Chang.


Journal of the Acoustical Society of America | 1995

Multiple score language processing system

Keh-Yih Su; Jing-Shin Chang; Jong-Nae Wang; Mei-Hui Su

A language processing system includes a mechanism for measuring the syntax trees of sentences of material to be translated and a mechanism for truncating syntax trees in response to the measuring mechanism. In a particular embodiment, a Score Function is provided for disambiguating or truncating ambiguities on the basis of composite scores, generated at different stages of the processing.


international conference on computational linguistics | 1992

A new quantitative quality measure for machine translation systems

Keh-Yih Su; Ming-Wen Wu; Jing-Shin Chang

In this paper, an objective quantitative quality measure is proposed to evaluate the performance of machine translation systems. The proposed method is to compare the raw translation output of an MT system with the final revised version for the customers, and then compute the editing efforts required to convert the raw translation to the final version. In contrast to the other proposals, the evaluation process can be done quickly and automatically. Hence, it can provide a quick response on any system change. A system designer can thus quickly find the advantages or faults of a particular performance dynamically. Application of such a measure to improve the system performance on-line on a parameterized and feedback-controlled system will be demonstrated. Furthermore, because the revised version is used directly as a reference, the performance measure can reflect the real quality gap between the system performance and customer expectation. A system designer can thus concentrate on practically important topics rather than on theoretically interesting issues.


中文計算語言學期刊 | 1997

An Unsupervised Iterative Method for Chinese New Lexicon Extraction

Jing-Shin Chang; Keh-Yih Su

An unsupervised iterative approach for extracting a new lexicon (or unknown words) from a Chinese text corpus is proposed in this paper. Instead of using a non-iterative segmentation-merging-filtering-and-disambiguation approach, the proposed method iteratively integrates the contextual constraints (among word candidates) and a joint character association metric to progressively improve the segmentation results of the input corpus (and thus the new word list.) An augmented dictionary, which includes potential unknown words (in addition to known words), is used to segment the input corpus, unlike traditional approaches which use only known words for segmentation. In the segmentation process, the augmented dictionary is used to impose contextual constraints over known words and potential unknown words within input sentences; an unsupervised Viterbi Training process is then applied to ensure that the selected potential unknown words (and known words) maximize the likelihood of the input corpus. On the other hand, the joint character association metric (which reflects the global character association characteristics across the corpus) is derived by integrating several commonly used word association metrics, such as mutual information and entropy, with a joint Gaussian mixture density function; such integration allows the filter to use multiple features simultaneously to evaluate character association, unlike traditional filters which apply multiple features independently. The proposed method then allows the contextual constraints and the joint character association metric to enhance each other; this is achieved by iteratively applying the joint association metric to truncate unlikely unknown words in the augmented dictionary and using the segmentation result to improve the estimation of the joint association metric. The refined augmented dictionary and improved estimation are then used in the next iteration to acquire better segmentation and carry out more reliable filtering. Experiments show that both the precision and recall rates are improved almost monotonically, in contrast to non-iterative segmentation-merging-filtering-and-disambiguation approaches, which often sacrifice precision for recall or vice versa. With a corpus of 311,591 sentences, the performance is 76% (bigram), 54% (trigram), and 70% (quadragram) in F-measure, which is significantly better than using the non-iterative approach with F-measures of 74% (bigram), 46% (trigram), and 58% (quadragram).


meeting of the association for computational linguistics | 1994

A Corpus-based Approach to Automatic Compound Extraction

Keh-Yih Su; Ming-Wen Wu; Jing-Shin Chang

An automatic compound retrieval method is proposed to extract compounds within a text message. It uses n-gram mutual information, relative frequency count and parts of speech as the features for compound extraction. The problem is modeled as a two-class classification problem based on the distributional characteristics of n-gram tokens in the compound and the non-compound clusters. The recall and precision using the proposed approach are 96.2% and 48.2% for bigram compounds and 96.6% and 39.6% for trigram compounds for a testing corpus of 49,314 words. A significant cutdown in processing time has been observed.


meeting of the association for computational linguistics | 1992

GPSM: A GENERALIZED PROBABILISTIC SEMANTIC MODEL FOR AMBIGUITY RESOLUTION

Jing-Shin Chang; Yih-Fen Luo; Keh-Yih Su

In natural language processing, ambiguity resolution is a central issue, and can be regarded as a preference assignment problem. In this paper, a Generalized Probabilistic Semantic Model (GPSM) is proposed for preference computation. An effective semantic tagging procedure is proposed for tagging semantic features. A semantic score function is derived based on a score function, which integrates lexical, syntactic and semantic preference under a uniform formulation. The semantic score measure shows substantial improvement in structural disambiguation over a syntax-based approach.


Archive | 1991

GLR Parsing with Scoring

Keh-Yih Su; Jong-Nae Wang; Mei-Hui Su; Jing-Shin Chang

In a machine translation system, the number of possible analyses associated with a given sentence is usually very large due to the ambiguous nature of natural languages. But, it is desirable that only the best one or two analyses be translated and passed to the post-editor so as to reduce the required efforts of post-editing. In addition, processing time for a sentence is usually limited when processing a large number of sentences in batch mode. Therefore, it is important, in a practical machine translation system, to obtain the best syntax tree which has the best annotated semantic interpretation within a reasonably short time. This is only possible with an intelligent parsing algorithm which can truncate undesirable analyses as early as possible and avoid wasting time in parsing those ambiguous constructions that will eventually be discarded.


Machine Translation | 1990

Some key issues in designing MT systems

Keh-Yih Su; Jing-Shin Chang

Development of a machine translation system (Mts) requires many tradeoffs in terms of the variety of available formalisms and control mechanisms. The tradeoffs involve issues in the generative power of grammar, formal linguistic power and efficiency of the parser, manipulation flexibility for knowledge bases, knowledge acquisition, degree of expressiveness and uniformity of the system, integration of the knowledge sources, and so forth. In this paper we discuss some basic decisions which must be made in constructing a large system. Our experience with an operational English-Chinese Mts, ArchTran, is presented to illustrate decision making related to procedural tradeoffs.


meeting of the association for computational linguistics | 1994

An Automatic Treebank Conversion Algorithm for Corpus Sharing

Jong-Nae Wang; Jing-Shin Chang; Keh-Yih Su

An automatic treebank conversion method is proposed in this paper to convert a treebank into another treebank. A new treebank associated with a different grammar can be generated automatically from the old one such that the information in the original treebank can be transformed to the new one and be shared among different research communities. The simple algorithm achieves conversion accuracy of 96.4% when tested on 8,867 sentences between two major grammar revisions of a large MT system.


language resources and evaluation | 2007

Mining atomic Chinese abbreviations with a probabilistic single character recovery model

Jing-Shin Chang; Wei-Lun Teng

An HMM-based single character recovery (SCR) model is proposed in this paper to extract a large set of atomic abbreviations and their full forms from a text corpus. By an “atomic abbreviation,” it refers to an abbreviated word consisting of a single Chinese character. This task is important since Chinese abbreviations cannot be enumerated exhaustively but the abbreviation process for compound words seems to be compositional. One can often decode an abbreviated word character by character to its full form. With a large atomic abbreviation dictionary, one may be able to handle multiple character abbreviation problems more easily based on the compositional property of abbreviations.


中文計算語言學期刊 | 1996

An Overview of Corpus-Based Statistics-Oriented (CBSO) Techniques for Natural Language Processing

Keh-Yih Su; Tung-Hui Chiang; Jing-Shin Chang

A Corpus-Based Statistics-Oriented (CBSO) methodology, which is an attempt to avoid the drawbacks of traditional rule-based approaches and purely statistical approaches, is introduced in this paper. Rule-based approaches, with rules induced by human experts, had been the dominant paradigm in the natural language processing community. Such approaches, however, suffer from serious difficulties in knowledge acquisition in terms of cost and consistency. Therefore, it is very difficult for such systems to be scaled-up. Statistical methods, with the capability of automatically acquiring knowledge from corpora, are becoming more and more popular, in part, to amend the shortcomings of rule-based approaches. However, most simple statistical models, which adopt almost nothing from existing linguistic knowledge, often result in a large parameter space and, thus, require an unaffordably large training corpus for even well-justified linguistic phenomena. The corpus-based statistics-oriented (CBSO) approach is a compromise between the two extremes of the spectrum for knowledge acquisition. CBSO approach emphasizes use of well-justified linguistic knowledge in developing the underlying language model and application of statistical optimization techniques on top of high level constructs, such as annotated syntax trees, rather than on surface strings, so that only a training corpus of reasonable size is needed for training and long distance dependency between constituents could be handled. In this paper, corpus-based statistics-oriented techniques are reviewed. General techniques applicable to CBSO approaches are introduced. In particular, we shall address the following important issues: (1) general tasks in developing an NLP system; (2) why CBSO is the preferred choice among different strategies; (3) how to achieve good performance systematically using a CBSO approach, and (4) frequently used CBSO techniques. Several examples are also reviewed.

Collaboration


Dive into the Jing-Shin Chang's collaboration.

Top Co-Authors

Avatar

Keh-Yih Su

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Chao-Lin Liu

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Tung-Hui Chiang

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Wei-Lun Teng

National Chi Nan University

View shared research outputs
Top Co-Authors

Avatar

Yi-Chung Lin

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Yi-Hsuan Chuang

National Chengchi University

View shared research outputs
Top Co-Authors

Avatar

Ming-Yu Lin

National Tsing Hua University

View shared research outputs
Top Co-Authors

Avatar

Sheng-Sian Lin

National Chi Nan University

View shared research outputs
Top Co-Authors

Avatar

Shih-Jay Chiou

National Chi Nan University

View shared research outputs
Top Co-Authors

Avatar

Shu-Fan Shih

National Chi Nan University

View shared research outputs
Researchain Logo
Decentralizing Knowledge