Chengqing Zong | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chengqing Zong is active.

Explore More

Publication

Featured researches published by Chengqing Zong.

IEEE Intelligent Systems | 2013

Feature Ensemble Plus Sample Selection: Domain Adaptation for Sentiment Classification

Rui Xia; Chengqing Zong; Xuelei Hu; Erik Cambria

Domain adaptation problems often arise often in the field of sentiment classification. Here, the feature ensemble plus sample selection (SS-FE) approach is proposed, which takes labeling and instance adaptation into account. A feature ensemble (FE) model is first proposed to learn a new labeling function in a feature reweighting manner. Furthermore, a PCA-based sample selection (PCA-SS) method is proposed as an aid to FE. Experimental results show that the proposed SS-FE approach could gain significant improvements, compared to FE or PCA-SS, because of its comprehensive consideration of both labeling adaptation and instance adaptation.

international joint conference on natural language processing | 2009

A Framework of Feature Selection Methods for Text Categorization

Shoushan Li; Rui Xia; Chengqing Zong; Chu-Ren Huang

In text categorization, feature selection (FS) is a strategy that aims at making text classifiers more efficient and accurate. However, when dealing with a new task, it is still difficult to quickly select a suitable one from various FS methods provided by many previous studies. In this paper, we propose a theoretic framework of FS methods based on two basic measurements: frequency measurement and ratio measurement. Then six popular FS methods are in detail discussed under this framework. Moreover, with the guidance of our theoretical analysis, we propose a novel method called weighed frequency and odds (WFO) that combines the two measurements with trained weights. The experimental results on data sets from both topic-based and sentiment classification tasks show that this new method is robust across different tasks and numbers of selected features.

international conference on computational linguistics | 2008

Domain Adaptation for Statistical Machine Translation with Domain Dictionary and Monolingual Corpora

Hua Wu; Haifeng Wang; Chengqing Zong

Statistical machine translation systems are usually trained on large amounts of bilingual text and monolingual text. In this paper, we propose a method to perform domain adaptation for statistical machine translation, where in-domain bilingual corpora do not exist. This method first uses out-of-domain corpora to train a baseline system and then uses in-domain translation dictionaries and in-domain monolingual corpora to improve the in-domain performance. We propose an algorithm to combine these different resources in a unified framework. Experimental results indicate that our method achieves absolute improvements of 8.16 and 3.36 BLEU scores on Chinese to English translation and English to French translation respectively, as compared with the baselines using only out-of-domain corpora.

meeting of the association for computational linguistics | 2014

Bilingually-constrained Phrase Embeddings for Machine Translation

Jiajun Zhang; Shujie Liu; Mu Li; Ming Zhou; Chengqing Zong

We propose Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with different semantic meanings. The BRAE is trained in a way that minimizes the semantic distance of translation equivalents and maximizes the semantic distance of nontranslation pairs simultaneously. After training, the model learns how to embed each phrase semantically in two languages and also learns how to transform semantic embedding space in one language to the other. We evaluate our proposed method on two end-to-end SMT tasks (phrase table pruning and decoding with phrasal semantic similarities) which need to measure semantic similarity between a source phrase and its translation candidates. Extensive experiments show that the BRAE is remarkably effective in these two tasks.

meeting of the association for computational linguistics | 2008

Multi-domain Sentiment Classification

Shoushan Li; Chengqing Zong

This paper addresses a new task in sentiment classification, called multi-domain sentiment classification, that aims to improve performance through fusing training data from multiple domains. To achieve this, we propose two approaches of fusion, feature-level and classifier-level, to use training data from multiple domains simultaneously. Experimental studies show that multi-domain sentiment classification using the classifier-level approach performs much better than single domain classification (using the training data individually).

IEEE Transactions on Knowledge and Data Engineering | 2015

Dual Sentiment Analysis: Considering Two Sides of One Review

Rui Xia; Feng Xu; Chengqing Zong; Qianmu Li; Yong Qi; Tao Li

Bag-of-words (BOW) is now the most popular way to model text in statistical machine learning approaches in sentiment analysis. However, the performance of BOW sometimes remains limited due to some fundamental deficiencies in handling the polarity shift problem. We propose a model called dual sentiment analysis (DSA), to address this problem for sentiment classification. We first propose a novel data expansion technique by creating a sentiment-reversed review for each training and test review. On this basis, we propose a dual training algorithm to make use of original and reversed training reviews in pairs for learning a sentiment classifier, and a dual prediction algorithm to classify the test reviews by considering two sides of one review. We also extend the DSA framework from polarity (positive-negative) classification to 3-class (positive-negative-neutral) classification, by taking the neutral reviews into consideration. Finally, we develop a corpus-based method to construct a pseudo-antonym dictionary, which removes DSAs dependency on an external antonym dictionary for review reversion. We conduct a wide range of experiments including two tasks, nine datasets, two antonym dictionaries, three classification algorithms, and two types of features. The results demonstrate the effectiveness of DSA in supervised sentiment classification.

empirical methods in natural language processing | 2016

Exploiting Source-side Monolingual Data in Neural Machine Translation.

Jiajun Zhang; Chengqing Zong

Neural Machine Translation (NMT) based on the encoder-decoder architecture has recently become a new paradigm. Researchers have proven that the target-side monolingual data can greatly enhance the decoder model of NMT. However, the source-side monolingual data is not fully explored although it should be useful to strengthen the encoder model of NMT, especially when the parallel corpus is far from sufficient. In this paper, we propose two approaches to make full use of the sourceside monolingual data in NMT. The first approach employs the self-learning algorithm to generate the synthetic large-scale parallel data for NMT training. The second approach applies the multi-task learning framework using two NMTs to predict the translation and the reordered source-side monolingual sentences simultaneously. The extensive experiments demonstrate that the proposed methods obtain significant improvements over the strong attention-based NMT.

IEEE Intelligent Systems | 2015

Deep Neural Networks in Machine Translation: An Overview

Jiajun Zhang; Chengqing Zong

Deep neural networks (DNNs) are widely used in machine translation (MT). This article gives an overview of DNN applications in various aspects of MT.

ACM Transactions on Asian Language Information Processing | 2008

A Structure-Based Model for Chinese Organization Name Translation

Yufeng Chen; Chengqing Zong

Named entity (NE) translation is a fundamental task in multilingual natural language processing. The performance of a machine translation system depends heavily on precise translation of the inclusive NEs. Furthermore, organization name (ON) is the most complex NE for translation among all the NEs. In this article, the structure formulation of ONs is investigated and a hierarchical structure-based ON translation model for Chinese-to-English translation system is presented. First, the model performs ON chunking; then both the translation of words within chunks and the process of chunk-reordering are achieved by synchronous context-free grammar (CFG). The CFG rules are extracted from bilingual ON pairs in a training program. The main contributions of this article are: (1) defining appropriate chunk-units for analyzing the internal structure of Chinese ONs; (2) making the chunk-based ON translation feasible and flexible via a hierarchical CFG derivation; and (3) proposing a training architecture to automatically learn the synchronous CFG for constructing ONs with chunk-units from aligned bilingual ON pairs. The experiments show that the proposed approach translates the Chinese ONs into English with an accuracy of 93.75% and significantly improves the performance of a baseline statistical machine translation (SMT) system.

international conference on computational linguistics | 2003

Chinese utterance segmentation in spoken language translation

Chengqing Zong; Fuji Ren

This paper presents an approach to segmenting Chinese utterances for a spoken language translation (SLT) system in which Chinese speech is the source input. We propose this approach as a supplement to the function of sentence boundary detection in speech recognition, in order to identify the boundaries of simple sentences and fixed expressions within the speech recognition results of a Chinese input utterance. In this approach, the plausible boundaries of split units are determined using several methods, including keyword detection, pattern matching, and syntactic analysis. Preliminary experimental results have shown that this approach is helpful in improving the performance of SLT systems.

Explore More