Jason S. Chang | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jason S. Chang is active.

Explore More

Publication

Featured researches published by Jason S. Chang.

north american chapter of the association for computational linguistics | 2003

Acquisition of English-Chinese transliterated word pairs from parallel-aligned texts using a statistical machine transliteration model

Chun-Jen Lee; Jason S. Chang

This paper presents a framework for extracting English and Chinese transliterated word pairs from parallel texts. The approach is based on the statistical machine transliteration model to exploit the phonetic similarities between English words and corresponding Chinese transliterations. For a given proper noun in English, the proposed method extracts the corresponding transliterated word from the aligned text in Chinese. Under the proposed approach, the parameters of the model are automatically learned from a bilingual proper name list. Experimental results show that the average rates of word and character precision are 86.0% and 94.4%, respectively. The rates can be further improved with the addition of simple linguistic processing.

meeting of the association for computational linguistics | 2006

FAST -- An Automatic Generation System for Grammar Tests

Chia-Yin Chen; Hsien-Chin Liou; Jason S. Chang

This paper introduces a method for the semi-automatic generation of grammar test items by applying Natural Language Processing (NLP) techniques. Based on manually-designed patterns, sentences gathered from the Web are transformed into tests on grammaticality. The method involves representing test writing knowledge as test patterns, acquiring authentic sentences on the Web, and applying generation strategies to transform sentences into items. At runtime, sentences are converted into two types of TOEFL-style question: multiple-choice and error detection. We also describe a prototype system FAST (Free Assessment of Structural Tests). Evaluation on a set of generated questions indicates that the proposed method performs satisfactory quality. Our methodology provides a promising approach and offers significant potential for computer assisted language learning and assessment.

ACM Transactions on Asian Language Information Processing | 2006

Alignment of bilingual named entities in parallel corpora using statistical models and multiple knowledge sources

Chun-Jen Lee; Jason S. Chang; Jyh-Shing Roger Jang

Named entity (NE) extraction is one of the fundamental tasks in natural language processing (NLP). Although many studies have focused on identifying NEs within monolingual documents, aligning NEs in bilingual documents has not been investigated extensively due to the complexity of the task. In this article we introduce a new approach to aligning bilingual NEs in parallel corpora by incorporating statistical models with multiple knowledge sources. In our approach, we model the process of translating an English NE phrase into a Chinese equivalent using lexical translation/transliteration probabilities for word translation and alignment probabilities for word reordering. The method involves automatically learning phrase alignment and acquiring word translations from a bilingual phrase dictionary and parallel corpora, and automatically discovering transliteration transformations from a training set of name-transliteration pairs. The method also involves language-specific knowledge functions, including handling abbreviations, recognizing Chinese personal names, and expanding acronyms. At runtime, the proposed models are applied to each source NE in a pair of bilingual sentences to generate and evaluate the target NE candidates; the source and target NEs are then aligned based on the computed probabilities. Experimental results demonstrate that the proposed approach, which integrates statistical models with extra knowledge sources, is highly feasible and offers significant improvement in performance compared to our previous work, as well as the traditional approach of IBM Model 4.

Information Sciences | 2006

Extraction of transliteration pairs from parallel corpora using a statistical transliteration model

Chun-Jen Lee; Jason S. Chang; Jyh-Shing Roger Jang

This paper describes a framework for modeling the machine transliteration problem. The parameters of the proposed model are automatically acquired through statistical learning from a bilingual proper name list. Unlike previous approaches, the model does not involve the use of either a pronunciation dictionary for converting source words into phonetic symbols or manually assigned phonetic similarity scores between source and target words. We also report how the model is applied to extract proper names and corresponding transliterations from parallel corpora. Experimental results show that the average rates of word and character precision are 93.8% and 97.8%, respectively.

中文計算語言學期刊 | 2004

Bilingual Collocation Extraction Based on Syntactic and Statistical Analyses

Chien-Cheng Wu; Jason S. Chang

In this paper, we describe an algorithm that employs syntactic and statistical analysis to extract bilingual collocations from a parallel corpus. Collocations are pervasive in all types of writing and can be found in phrases, chunks, proper names, idioms, and terminology. Therefore, automatic extraction of monolingual and bilingual collocations is important for many applications, including natural language generation, word sense disambiguation, machine translation, lexicography, and cross language information retrieval. Collocations can be classified as lexical or grammatical collocations. Lexical collocations exist between content words, while a grammatical collocation exists between a content word and function words or a syntactic structure. In addition, bilingual collocations can be rigid or flexible in both languages. Rigid collocation refers to words in a collocation must appear next to each other, or otherwise (flexible/elastic). We focus in this paper on extracting rigid lexical bilingual collocations. In our method, the preferred syntactic patterns are obtained from idioms and collocations in a machine-readable dictionary. Collocations matching the patterns are extracted from aligned sentences in a parallel corpus. We use a new alignment method based on punctuation statistics for sentence alignment. The punctuation-based approach is found to outperform the length-based approach with precision rates approaching 98%. The obtained collocations are subsequently matched up based on cross-linguistic statistical association. Statistical association between the whole collocations as well as words in collocations is used to link a collocation with its counterpart collocation in the other language. We implemented the proposed method on a very large Chinese-English parallel corpus and obtained satisfactory results.

meeting of the association for computational linguistics | 2004

TANGO: bilingual collocational concordancer

Jia-Yan Jian; Yu-Chia Chang; Jason S. Chang

In this paper, we describe TANGO as a collocational concordancer for looking up collocations. The system was designed to answer users query of bilingual collocational usage for nouns, verbs and adjectives. We first obtained collocations from the large monolingual British National Corpus (BNC). Subsequently, we identified collocation instances and translation counterparts in the bilingual corpus such as Sinorama Parallel Corpus (SPC) by exploiting the word-alignment technique. The main goal of the concordancer is to provide the user with a reference tools for correct collocation use so as to assist second language learners to acquire the most eminent characteristic of native-like writing.

meeting of the association for computational linguistics | 2003

TotalRecall: A Bilingual Concordance for Computer Assisted Translation and Language Learning

Jian-Cheng Wu; Kevin C. Yeh; Thomas C. Chuang; Wen-Chie Shei; Jason S. Chang

This paper describes a Web-based English-Chinese concordance system, Total-Recall, developed to promote translation reuse and encourage authentic and idiomatic use in second language writing. We exploited and structured existing high-quality translations from the bilingual Sinorama Magazine to build the concordance of authentic text and translation. Novel approaches were taken to provide high-precision bilingual alignment on the sentence, phrase and word levels. A browser-based user interface (UI) is also developed for ease of access over the Internet. Users can search for word, phrase or expression in English or Chinese. The Web-based user interface facilitates the recording of the user actions to provide data for further research.

empirical methods in natural language processing | 2009

Acquiring Translation Equivalences of Multiword Expressions by Normalized Correlation Frequencies

Ming-Hong Bai; Jia-Ming You; Keh-Jiann Chen; Jason S. Chang

In this paper, we present an algorithm for extracting translations of any given multiword expression from parallel corpora. Given a multiword expression to be translated, the method involves extracting a short list of target candidate words from parallel corpora based on scores of normalized frequency, generating possible translations and filtering out common subsequences, and selecting the top-n possible translations using the Dice coefficient. Experiments show that our approach outperforms the word alignment-based and other naive association-based methods. We also demonstrate that adopting the extracted translations can significantly improve the performance of the Moses machine translation system.

meeting of the association for computational linguistics | 2005

Learning Source-Target Surface Patterns for Web-based Terminology Translation

Jian-Cheng Wu; Tracy Lin; Jason S. Chang

This paper introduces a method for learning to find translation of a given source term on the Web. In the approach, the source term is used as query and part of patterns to retrieve and extract translations in Web pages. The method involves using a bilingual term list to learn source-target surface patterns. At runtime, the given term is submitted to a search engine then the candidate translations are extracted from the returned summaries and subsequently ranked based on the surface patterns, occurrence counts, and transliteration knowledge. We present a prototype called TermMine that applies the method to translate terms. Evaluation on a set of encyclopedia terms shows that the method significantly outperforms the state-of-the-art online machine translation systems.

conference of the association for machine translation in the americas | 2002

Adaptive Bilingual Sentence Alignment

Thomas C. Chuang; Jason S. Chang

We present a new approach to the problem of aligning English and Chinese sentences in a bilingual corpus based on adaptive learning. While using length information alone produces surprisingly good results for aligning bilingual French and English sentences with success rates well over 95%, it does not fair as well for the alignment of English and Chinese sentences. The crux of the problem lies in greater variability of lengths and match types of the matched sentences. We propose to cope with such variability via a two-pass scheme under which model parameters can be learned from the data at hand. Experiments show that under the approach bilingual English-Chinese texts can be aligned effectively across diverse domains, genres and translation directions with accuracy rates approaching 99%.

Explore More