Jian-Cheng Wu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jian-Cheng Wu is active.

Explore More

Publication

Featured researches published by Jian-Cheng Wu.

meeting of the association for computational linguistics | 2003

TotalRecall: A Bilingual Concordance for Computer Assisted Translation and Language Learning

Jian-Cheng Wu; Kevin C. Yeh; Thomas C. Chuang; Wen-Chie Shei; Jason S. Chang

This paper describes a Web-based English-Chinese concordance system, Total-Recall, developed to promote translation reuse and encourage authentic and idiomatic use in second language writing. We exploited and structured existing high-quality translations from the bilingual Sinorama Magazine to build the concordance of authentic text and translation. Novel approaches were taken to provide high-precision bilingual alignment on the sentence, phrase and word levels. A browser-based user interface (UI) is also developed for ease of access over the Internet. Users can search for word, phrase or expression in English or Chinese. The Web-based user interface facilitates the recording of the user actions to provide data for further research.

meeting of the association for computational linguistics | 2005

Learning Source-Target Surface Patterns for Web-based Terminology Translation

Jian-Cheng Wu; Tracy Lin; Jason S. Chang

This paper introduces a method for learning to find translation of a given source term on the Web. In the approach, the source term is used as query and part of patterns to retrieve and extract translations in Web pages. The method involves using a bilingual term list to learn source-target surface patterns. At runtime, the given term is submitted to a search engine then the candidate translations are extracted from the returned summaries and subsequently ranked based on the surface patterns, occurrence counts, and transliteration knowledge. We present a prototype called TermMine that applies the method to translate terms. Evaluation on a set of encyclopedia terms shows that the method significantly outperforms the state-of-the-art online machine translation systems.

conference of the association for machine translation in the americas | 2004

Extraction of Name and Transliteration in Monolingual and Parallel Corpora

Tracy Lin; Jian-Cheng Wu; Jason S. Chang

Named-entities in free text represent a challenge to text analysis in Machine Translation and Cross Language Information Retrieval. These phrases are often transliterated into another language with a different sound inventory and writing system. Named-entities found in free text are often not listed in bilingual dictionaries. Although it is possible to identify and translate named-entities on the fly without a list of proper names and transliterations, an extensive list of existing transliterations certainly will ensure high precision rate. We use a seed list of proper names and transliterations to train a Machine Transliteration Model. With the model it is possible to extract proper names and their transliterations in monolingual or parallel corpora with high precision and recall rates.

international joint conference on natural language processing | 2004

Bilingual sentence alignment based on punctuation statistics and lexicon

Thomas C. Chuang; Jian-Cheng Wu; Tracy Lin; Wen-Chie Shei; Jason S. Chang

This paper presents a new method of aligning bilingual parallel texts based on punctuation statistics and lexical information. It is demonstrated that the punctuation statistics prove to be effective means to achieve good results. The task of sentence alignment of bilingual texts written in disparate language pairs like English and Chinese is reportedly more difficult. We examine the feasibility of using punctuations for high accuracy sentence alignment. Encouraging precision rate is demonstrated in aligning sentences in bilingual parallel corpora based solely on punctuation statistics. Improved results were obtained when both punctuation statistics and lexical information were employed. We have experimented with an implementation of the proposed method on the parallel corpora of Sinorama Magazine and Records of the Hong Kong Legislative Council with satisfactory results.

north american chapter of the association for computational linguistics | 2009

Learning Bilingual Linguistic Reordering Model for Statistical Machine Translation

Han-Bin Chen; Jian-Cheng Wu; Jason S. Chang

In this paper, we propose a method for learning reordering model for BTG-based statistical machine translation (SMT). The model focuses on linguistic features from bilingual phrases. Our method involves extracting reordering examples as well as features such as part-of-speech and word class from aligned parallel sentences. The features are classified with special considerations of phrase lengths. We then use these features to train the maximum entropy (ME) reordering model. With the model, we performed Chinese-to-English translation tasks. Experimental results show that our bilingual linguistic model outperforms the state-of-the-art phrase-based and BTG-based SMT systems by improvements of 2.41 and 1.31 BLEU points respectively.

international joint conference on natural language processing | 2005

Web-based unsupervised learning for query formulation in question answering

Yi-Chia Wang; Jian-Cheng Wu; Tyne Liang; Jason S. Chang

Converting questions to effective queries is crucial to open-domain question answering systems. In this paper, we present a web-based unsupervised learning approach for transforming a given natural-language question to an effective query. The method involves querying a search engine for Web passages that contain the answer to the question, extracting patterns that characterize fine-grained classification for answers, and linking these patterns with n-grams in answer passages. Independent evaluation on a set of questions shows that the proposed approach outperforms a naive keyword-based approach in terms of mean reciprocal rank and human effort.

meeting of the association for computational linguistics | 2004

Subsentential translation memory for computer assisted writing and translation

Jian-Cheng Wu; Thomas C. Chuang; Wen-Chie Shei; Jason S. Chang

This paper describes a database of translation memory, TotalRecall, developed to encourage authentic and idiomatic use in second language writing. TotalRecall is a bilingual concordancer that support search query in English or Chinese for relevant sentences and translations. Although initially intended for learners of English as Foreign Language (EFL) in Taiwan, it is a gold mine of texts in English or Mandarin Chinese. TotalRecall is particularly useful for those who write in or translate into a foreign language. We exploited and structured existing high-quality translations from bilingual corpora from a Taiwan-based Sinorama Magazine and Official Records of Hong Kong Legislative Council to build a bilingual concordance. Novel approaches were taken to provide high-precision bilingual alignment on the subsentential and lexical levels. A browser-based user interface was developed for ease of access over the Internet. Users can search for word, phrase or expression in English or Mandarin. The Web-based user interface facilitates the recording of the user actions to provide data for further research.

meeting of the association for computational linguistics | 2015

WriteAhead: Mining Grammar Patterns in Corpora for Assisted Writing

Tzu-Hsi Yen; Jian-Cheng Wu; Jim Chang; Joanne Boisson; Jason S. Chang

This paper describes WriteAhead, a resource-rich, Interactive Writing Environment that provides L2 learners with writing prompts, as well as ”get it right” advice, to helps them write fluently and accurately. The method involves automatically analyzing reference and learner corpora, extracting grammar patterns with example phrases, and computing dubious, overused patterns. At run-time, as the user types (or mouses over) a word, the system automatically retrieves and displays grammar patterns and examples, most relevant to the word. The user can opt for patterns from a general corpus, academic corpus, learner corpus, or commonly overused dubious patterns found in a learner corpus. WriteAhead proactively engages the user with steady, timely, and spot-on information for effective assisted writing. Preliminary experiments show that WriteAhead fulfills the design goal of fostering learner independence and encouraging self-editing, and is likely to induce better writing, and improve writing skills in the long run.

conference on computational natural language learning | 2014

NTHU at the CoNLL-2014 Shared Task

Jian-Cheng Wu; Tzu-Hsi Yen; Jim Chang; Guan-Cheng Huang; Jimmy C. M. Chang; Hsiang-Ling Hsu; Yu-Wei Chang; Jason S. Chang

In this paper, we describe a system for correcting grammatical errors in texts written by non-native learners. In our approach, a given sentence with syntactic features are sent to a number of modules, each focuses on a specific error type. A main program integrates corrections from these modules and outputs the corrected sentence. We evaluated our system on the official test data of the CoNLL-2014 shared task and obtained 0.30 in F-measure.

Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing | 2014

Chinese Spell Checking Based on Noisy Channel Model

Hsun-wen Chiu; Jian-Cheng Wu; Jason S. Chang

Chinese spell checking is an important component of many Chinese NLP applications, including word processors, search engines, and automatic essay rating. Compared to English, Chinese has no word boundaries, and there are various Chinese input methods that cause different kinds of typos. Therefore, it is more difficult to develop a spell checker for Chinese. In this paper, we introduce a novel method for correcting Chinese errors based on sound or shape similarity. In our approach, potential typos in a given sentence are then corrected using a channel model and a character-based language model in the noisy channel model. In the training phase, we estimate the channel probabilities for each character based on ngrams in Web corpus. At run-time, the system generates correction candidates for each character in the given sentence and selects the appropriate correction using the channel model and the language model. The experimental results show that the proposed method achieves significantly better accuracy and recall than more complicated methods in the previous work.

Explore More