Tong Xiao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tong Xiao is active.

Explore More

Publication

Featured researches published by Tong Xiao.

ACM Transactions on Asian Language Information Processing | 2011

Language Modeling for Syntax-Based Machine Translation Using Tree Substitution Grammars: A Case Study on Chinese-English Translation

Tong Xiao; Jingbo Zhu; Muhua Zhu

The poor grammatical output of Machine Translation (MT) systems appeals syntax-based approaches within language modeling. However, previous studies showed that syntax-based language modeling using (Context-Free) Treebank Grammars was not very helpful in improving BLEU scores for Chinese-English machine translation. In this article we further study this issue in the context of Chinese-English syntax-based Statistical Machine Translation (SMT) where Synchronous Tree Substitution Grammars (STSGs) are utilized to model the translation process. In particular, we develop a Tree Substitution Grammar-based language model for syntax-based MT, and present three methods to efficiently integrate the proposed language model into MT decoding. In addition, we design a simple and effective method to adapt syntax-based language models for MT tasks. We demonstrate that the proposed methods are able to benefit a state-of-the-art syntax-based MT system. On the NIST Chinese-English MT evaluation corpora, we finally achieve an improvement of 0.6 BLEU points over the baseline.

meeting of the association for computational linguistics | 2014

A Hybrid Approach to Skeleton-based Translation

Tong Xiao; Jingbo Zhu; Chunliang Zhang

In this paper we explicitly consider sentence skeleton information for Machine Translation (MT). The basic idea is that we translate the key elements of the input sentence using a skeleton translation model, and then cover the remain segments using a full translation model. We apply our approach to a state-of-the-art phrase-based system and demonstrate very promising BLEU improvements and TER reductions on the NIST Chinese-English MT evaluation data.

meeting of the association for computational linguistics | 2015

NiuParser: A Chinese Syntactic and Semantic Parsing Toolkit

Jingbo Zhu; Muhua Zhu; Qiang Wang; Tong Xiao

We present a new toolkit NiuParser for Chinese syntactic and semantic analysis. It can handle a wide range of Natural Language Processing (NLP) tasks in Chinese, including word segmentation, partof-speech tagging, named entity recognition, chunking, constituent parsing, dependency parsing, and semantic role labeling. The NiuParser system runs fast and shows state-of-the-art performance on several benchmarks. Moreover, it is very easy to use for both research and industrial purposes. Advanced features include the Software Development Kit (SDK) interfaces and a multi-thread implementation for system speed-up.

Natural Language Engineering | 2015

Improving syntactic rule extraction through deleting spurious links with translation span alignment

Jingbo Zhu; Qiang Li; Tong Xiao

Most statistical machine translation systems typically rely on word alignments to extract translation rules. This approach would suffer from a practical problem that even one spurious word alignment link can prevent some desirable translation rules from being extracted. To address this issue, this paper presents two approaches, referred to as sub-tree alignment and phrase-based forced decoding methods, to automatically learn translation span alignments from parallel data. Then, we improve the translation rule extraction by deleting spurious links and inserting new links based on bilingual translation span correspondences. Some comparison experiments are designed to demonstrate the effectiveness of the proposed approaches.

international conference on the computer processing of oriental languages | 2016

Better Addressing Word Deletion for Statistical Machine Translation

Qiang Li; Dongdong Zhang; Mu Li; Tong Xiao; Jingbo Zhu

Word deletion (WD) problems have a critical impact on the adequacy of translation and can lead to poor comprehension of lexical meaning in the translation result. This paper studies how the word deletion problem can be handled in statistical machine translation (SMT) in detail. We classify this problem into desired and undesired word deletion based on spurious and meaningful words. Consequently, we propose four effective models to handle undesired word deletion. To evaluate word deletion problems, we develop an automatic evaluation metric that highly correlates with human judgement. Translation systems are simultaneously tuned for the proposed evaluation metric and BLEU using minimum error rate training (MERT). The experimental results demonstrate that our methods achieve significant improvements in word deletion problems on Chinese-to-English translation tasks.

international conference natural language processing | 2018

Source Segment Encoding for Neural Machine Translation.

Qiang Wang; Tong Xiao; Jingbo Zhu

Sequential word encoding lacks explicit representations of structural dependencies (e.g. tree, segment) over the source words in neural machine translation. Instead of using source syntax, in this paper we propose a source segment encoding (SSE) approach to modeling source segments in encoding process by two methods. One is to encode off-the-shelf n-grams of the source sentence into original source memory. The other is to jointly learn an optimal segmentation model with the translation model in an end-to-end manner without any supervision of segmentation. Experimental results show that the SSE method yields an improvement of 2.1+ BLEU points over the baselines on the Chinese-English translation task.

CCL | 2017

Context Sensitive Word Deletion Model for Statistical Machine Translation

Qiang Li; Yaqian Han; Tong Xiao; Jingbo Zhu

Word deletion (WD) errors can lead to poor comprehension of the meaning of source translated sentences in phrase-based statistical machine translation (SMT), and have a critical impact on the adequacy of the translation results generated by SMT systems. In this paper, first we classify the word deletion into two categories, wanted and unwanted word deletions. For these two kinds of word deletions, we propose a maximum entropy based word deletion model to improve the translation quality in phrase-based SMT. Our proposed model are based on features automatically learned from a real-word bitext. In our experiments on Chinese-to-English news and web translation tasks, the results show that our approach is capable of generating more adequate translations compared with the baseline system, and our proposed word deletion model yields a +0.99 BLEU improvement and a (-2.20) TER reduction on the NIST machine translation evaluation corpora.

International Journal of Computer Processing of Languages | 2008

An Effective Approach for Coreference Resolution

Feiliang Ren; Jingbo Zhu; Huizhen Wang; Tong Xiao

We present a machine learning approach for coreference resolution of noun phrases. In our method, we use CRFs as a basic training model, and use active learning method to generate combined features so as to use existing features more effectively. We also propose a novel clustering algorithm which uses both linguistic knowledge and statistical knowledge. We build a coreference resolution system based on the proposed method and evaluate its performance from three aspects: the contributions of active learning; the effects of different clustering algorithms; and the resolution performance of different kinds of NPs. Experimental results show that additional performance gain can be obtained by using active learning method; clustering algorithm has a great effect on coreference resolutions performance and our clustering algorithm is very effective; and the key of coreference resolution is to improve the performance of the normal nouns resolution, especially the pronouns resolution.

meeting of the association for computational linguistics | 2012