Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ruiqiang Zhang is active.

Publication


Featured researches published by Ruiqiang Zhang.


north american chapter of the association for computational linguistics | 2006

Subword-based Tagging by Conditional Random Fields for Chinese Word Segmentation

Ruiqiang Zhang; Genichiro Kikui; Eiichiro Sumita

We proposed two approaches to improve Chinese word segmentation: a subword-based tagging and a confidence measure approach. We found the former achieved better performance than the existing character-based tagging, and the latter improved segmentation further by combining the former with a dictionary-based segmentation. In addition, the latter can be used to balance out-of-vocabulary rates and in-vocabulary rates. By these techniques we achieved higher F-scores in CITYU, PKU and MSR corpora than the best results from Sighan Bakeoff 2005.


international conference on computational linguistics | 2004

A unified approach in speech-to-speech translation: integrating features of speech recognition and machine translation

Ruiqiang Zhang; Genichiro Kikui; Hirofumi Yamamoto; Taro Watanabe; Frank K. Soong; Wai Kit Lo

Based upon a statistically trained speech translation system, in this study, we try to combine distinctive features derived from the two modules: speech recognition and statistical machine translation, in a loglinear model. The translation hypotheses are then rescored and translation performance is improved. The standard translation evaluation metrics, including BLEU, NIST, multiple reference word error rate and its position independent counterpart, were optimized to solve the weights of the features in the log-linear model. The experimental results have shown significant improvement over the baseline IBM model 4 in all automatic translation evaluation metrics. The largest was for BLEU, by 7.9% absolute.


workshop on statistical machine translation | 2008

Improved Statistical Machine Translation by Multiple Chinese Word Segmentation

Ruiqiang Zhang; Keiji Yasuda; Eiichiro Sumita

Chinese word segmentation (CWS) is a necessary step in Chinese-English statistical machine translation (SMT) and its performance has an impact on the results of SMT. However, there are many settings involved in creating a CWS system such as various specifications and CWS methods. This paper investigates the effect of these settings to SMT. We tested dictionary-based and CRF-based approaches and found there was no significant difference between the two in the qualty of the resulting translations. We also found the correlation between the CWS F-score and SMT BLEU score was very weak. This paper also proposes two methods of combining advantages of different specifications: a simple concatenation of training data and a feature interpolation approach in which the same types of features of translation models from various CWS schemes are linearly interpolated. We found these approaches were very effective in improving quality of translations.


meeting of the association for computational linguistics | 2006

Subword-Based Tagging for Confidence-Dependent Chinese Word Segmentation

Ruiqiang Zhang; Genichiro Kikui; Eiichiro Sumita

We proposed a subword-based tagging for Chinese word segmentation to improve the existing character-based tagging. The subword-based tagging was implemented using the maximum entropy (MaxEnt) and the conditional random fields (CRF) methods. We found that the proposed subword-based tagging outperformed the character-based tagging in all comparative experiments. In addition, we proposed a confidence measure approach to combine the results of a dictionary-based and a subword-tagging-based segmentation. This approach can produce an ideal tradeoff between the in-vocaulary rate and out-of-vocabulary rate. Our techniques were evaluated using the test data from Sighan Bakeoff 2005. We achieved higher F-scores than the best results in three of the four corpora: PKU(0.951), CITYU(0.950) and MSR(0.971).


ACM Transactions on Speech and Language Processing | 2008

Chinese word segmentation and statistical machine translation

Ruiqiang Zhang; Keiji Yasuda; Eiichiro Sumita

Chinese word segmentation (CWS) is a necessary step in Chinese-English statistical machine translation (SMT) and its performance has an impact on the results of SMT. However, there are many choices involved in creating a CWS system such as various specifications and CWS methods. The choices made will create a new CWS scheme, but whether it will produce a superior or inferior translation has remained unknown to date. This article examines the relationship between CWS and SMT. The effects of CWS on SMT were investigated using different specifications and CWS methods. Four specifications were selected for investigation: Beijing University (PKU), Hong Kong City University (CITYU), Microsoft Research (MSR), and Academia SINICA (AS). We created 16 CWS schemes under different settings to examine the relationship between CWS and SMT. Our experimental results showed that the MSRs specifications produced the lowest quality translations. In examining the effects of CWS methods, we tested dictionary-based and CRF-based approaches and found there was no significant difference between the two in the quality of the resulting translations. We also found the correlation between the CWS F-score and SMT BLEU score was very weak. We analyzed CWS errors and their effect on SMT by evaluating systems trained with and without these errors. This article also proposes two methods for combining advantages of different specifications: a simple concatenation of training data and a feature interpolation approach in which the same types of features of translation models from various CWS schemes are linearly interpolated. We found these approaches were very effective in improving the quality of translations.


meeting of the association for computational linguistics | 2007

Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation

Ruiqiang Zhang; Eiichiro Sumita

Data sparseness is one of the factors that degrade statistical machine translation (SMT). Existing work has shown that using morpho-syntactic information is an effective solution to data sparseness. However, fewer efforts have been made for Chinese-to-English SMT with using English morpho-syntactic analysis. We found that while English is a language with less inflection, using English lemmas in training can significantly improve the quality of word alignment that leads to yield better translation performance. We carried out comprehensive experiments on multiple training data of varied sizes to prove this. We also proposed a new effective linear interpolation method to integrate multiple homologous features of translation models.


international conference on acoustics, speech, and signal processing | 2000

Integrating detailed information into a language model

Ruiqiang Zhang; Ezra Black; Andrew M. Finch; Yoshinori Sagisaka

Applying natural language processing technique to language modeling is a key problem in speech recognition. This paper describes a maximum entropy-based approach to language modeling in which both words together with syntactic and semantic tags in the long history are used as a basis for complex linguistic questions. These questions are integrated with a standard trigram language model or a standard trigram language model combined with long history word triggers and the resulting language model is used to rescore the N-best hypotheses output of the ATRSPREC speech recognition system. The technique removed 24% of the correctable error of the recognition system.


Speech Communication | 2006

Integration of Speech Recognition and Machine Translation: Speech Recognition Word Lattice Translation

Ruiqiang Zhang; Genichiro Kikui

An important issue in speech translation is to minimize the negative effect of speech recognition errors on machine translation. We propose a novel statistical machine translation decoding algorithm for speech translation to improve speech translation quality. The algorithm can translate the speech recognition word lattice, where more hypotheses are utilized to bypass the misrecognized single-best hypothesis. The decoding involves converting the recognition word lattice to a translation word graph by a graph-based search, followed by a fine rescoring by an A* search. We show that a speech recognition confidence measure implemented by posterior probability is effective to improve speech translation. The proposed techniques were tested in a Japanese-to-English speech translation task, in which we measured the translation results in terms of a number of automatic evaluation metrics. The experimental results demonstrate a consistent and significant improvement in speech translation achieved by the proposed techniques.


IWSLT | 2006

The niCT-ATR statistical machine translation system for the IWSLT 2006 evaluation.

Ruiqiang Zhang; Hirofumi Yamamoto; Michael Paul; Hideo Okuma; Keiji Yasuda; Yves Lepage; Etienne Denoual; Daichi Mochihashi; Andrew M. Finch; Eiichiro Sumita


international joint conference on natural language processing | 2008

Method of Selecting Training Data to Build a Compact and Efficient Translation Model

Keiji Yasuda; Ruiqiang Zhang; Hirofumi Yamamoto; Eiichiro Sumita

Collaboration


Dive into the Ruiqiang Zhang's collaboration.

Top Co-Authors

Avatar

Eiichiro Sumita

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Genichiro Kikui

Okayama Prefectural University

View shared research outputs
Top Co-Authors

Avatar

Hirofumi Yamamoto

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Keiji Yasuda

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Andrew M. Finch

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Etienne Denoual

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar

Hideo Okuma

National Institute of Information and Communications Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge