Chenhui Chu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chenhui Chu is active.

Explore More

Publication

Featured researches published by Chenhui Chu.

international conference on computational linguistics | 2014

Iterative Bilingual Lexicon Extraction from Comparable Corpora with Topical and Contextual Knowledge

Chenhui Chu; Toshiaki Nakazawa; Sadao Kurohashi

In the literature, two main categories of methods have been proposed for bilingual lexicon extraction from comparable corpora, namely topic model and context based methods. In this paper, we present a bilingual lexicon extraction system that is based on a novel combination of these two methods in an iterative process. Our system does not rely on any prior knowledge and the performance can be iteratively improved. To the best of our knowledge, this is the first study that iteratively exploits both topical and contextual knowledge for bilingual lexicon extraction. Experiments conduct on Chinese---English and Japanese---English Wikipedia data show that our proposed method performs significantly better than a state---of---the---art method that only uses topical knowledge.

ACM Transactions on Asian Language Information Processing | 2013

Chinese-Japanese Machine Translation Exploiting Chinese Characters

Chenhui Chu; Toshiaki Nakazawa; Daisuke Kawahara; Sadao Kurohashi

The Chinese and Japanese languages share Chinese characters. Since the Chinese characters in Japanese originated from ancient China, many common Chinese characters exist between these two languages. Since Chinese characters contain significant semantic information and common Chinese characters share the same meaning in the two languages, they can be quite useful in Chinese-Japanese machine translation (MT). We therefore propose a method for creating a Chinese character mapping table for Japanese, traditional Chinese, and simplified Chinese, with the aim of constructing a complete resource of common Chinese characters. Furthermore, we point out two main problems in Chinese word segmentation for Chinese-Japanese MT, namely, unknown words and word segmentation granularity, and propose an approach exploiting common Chinese characters to solve these problems. We also propose a statistical method for detecting other semantically equivalent Chinese characters other than the common ones and a method for exploiting shared Chinese characters in phrase alignment. Results of the experiments carried out on a state-of-the-art phrase-based statistical MT system and an example-based MT system show that our proposed approaches can improve MT performance significantly, thereby verifying the effectiveness of shared Chinese characters for Chinese-Japanese MT.

acm transactions on asian and low resource language information processing | 2016

Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora: A Case Study on Chinese--Japanese Wikipedia

Chenhui Chu; Toshiaki Nakazawa; Sadao Kurohashi

Parallel corpora are crucial for statistical machine translation (SMT); however, they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract either parallel sentences or fragments from them for SMT. In this article, we propose an integrated system to extract both parallel sentences and fragments from comparable corpora. We first apply parallel sentence extraction to identify parallel sentences from comparable sentences. We then extract parallel fragments from the comparable sentences. Parallel sentence extraction is based on a parallel sentence candidate filter and classifier for parallel sentence identification. We improve it by proposing a novel filtering strategy and three novel feature sets for classification. Previous studies have found it difficult to accurately extract parallel fragments from comparable sentences. We propose an accurate parallel fragment extraction method that uses an alignment model to locate the parallel fragment candidates and an accurate lexicon-based filter to identify the truly parallel fragments. A case study on the Chinese--Japanese Wikipedia indicates that our proposed methods outperform previously proposed methods, and the parallel data extracted by our system significantly improves SMT performance.

meeting of the association for computational linguistics | 2017

An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation

Chenhui Chu; Raj Dabre; Sadao Kurohashi

In this paper, we propose a novel domain adaptation method named “mixed fine tuning” for neural machine translation (NMT). We combine two existing approaches namely fine tuning and multi domain NMT. We first train an NMT model on an out-of-domain parallel corpus, and then fine tune it on a parallel corpus which is a mix of the in-domain and out-of-domain corpora. All corpora are augmented with artificial tags to indicate specific domains. We empirically compare our proposed method against fine tuning and multi domain methods and discuss its benefits and shortcomings.

meeting of the association for computational linguistics | 2016

Dependency Forest based Word Alignment

Hitoshi Otsuki; Chenhui Chu; Toshiaki Nakazawa; Sadao Kurohashi

A hierarchical word alignment model that searches for k-best partial alignments on target constituent 1-best parse trees has been shown to outperform previous models. However, relying solely on 1-best parses trees might hinder the search for good alignments because 1-best trees are not necessarily the best for word alignment tasks in practice. This paper introduces a dependency forest based word alignment model, which utilizes target dependency forests in an attempt to minimize the impact on limitations attributable to 1-best parse trees. We present how k-best alignments are constructed over target-side dependency forests. Alignment experiments on the Japanese-English language pair show a relative error reduction of 4% of the alignment score compared to a model with 1-best parse trees.

Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers | 2016

Cross-language Projection of Dependency Trees with Constrained Partial Parsing for Tree-to-Tree Machine Translation.

Yu Shen; Chenhui Chu; Fabien Cromieres; Sadao Kurohashi

Tree-to-tree machine translation (MT) that utilizes syntactic parse trees on both source and target sides suffers from the non-isomorphism of the parse trees due to parsing errors and the difference of annotation criterion between the two languages. In this paper, we present a method that projects dependency parse trees from the language side that has a high quality parser, to the side that has a low quality parser, to improve the isomorphism of the parse trees. We first project a part of the dependencies with high confidence to make a partial parse tree, and then complement the remaining dependencies with partial parsing constrained by the already projected dependencies. MT experiments verify the effectiveness of our proposed method.

16th Annual Conference of the European Association for Machine Translation, EAMT 2012 | 2012