Yabin Zheng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yabin Zheng is active.

Explore More

Publication

Featured researches published by Yabin Zheng.

empirical methods in natural language processing | 2009

Clustering to Find Exemplar Terms for Keyphrase Extraction

Zhiyuan Liu; Peng Li; Yabin Zheng; Maosong Sun

Keyphrases are widely used as a brief summary of documents. Since manual assignment is time-consuming, various unsupervised ranking methods based on importance scores are proposed for keyphrase extraction. In practice, the keyphrases of a document should not only be statistically important in the document, but also have a good coverage of the document. Based on this observation, we propose an unsupervised method for keyphrase extraction. Firstly, the method finds exemplar terms by leveraging clustering techniques, which guarantees the document to be semantically covered by these exemplar terms. Then the keyphrases are extracted from the document using the exemplar terms. Our method outperforms sate-of-the-art graph-based ranking methods (TextRank) by 9.5% in F1-measure.

ACM Transactions on Asian Language Information Processing | 2011

User Behaviors in Related Word Retrieval and New Word Detection: A Collaborative Perspective

Zhiyuan Liu; Yabin Zheng; Lixing Xie; Maosong Sun; Liyun Ru; Yonghui Zhang

Nowadays, user behavior analysis and collaborative filtering have drawn a large body of research in the machine learning community. The goal is either to enhance the user experience or discover useful information hidden in the data. In this article, we conduct extensive experiments on a Chinese input method data set, which keeps the word lists that users have used. Then, from the collaborative perspective, we aim to solve two tasks in natural language processing, that is, related word retrieval and new word detection. Motivated by the observation that two words are usually highly related to each other if they co-occur frequently in users’ records, we propose a novel semantic relatedness measure between words that takes both user behaviors and collaborative filtering into consideration. We utilize this measure to perform related word retrieval and new word detection tasks. Experimental results on both tasks indicate the applicability and effectiveness of our method.

international conference on machine learning and cybernetics | 2011

Extracting Chinese abbreviation-definition pairs from anchor texts

Lixing Xie; Yabin Zheng; Zhiyuan Liu; Maosong Sun; Canhui Wang

This paper proposes an automatic scheme to extract Chinese abbreviations and their corresponding definitions from large-scale anchor texts. This method is motivated by the observation that the more frequently two anchor texts point to the same web page, the more related they are. Since abbreviation-definition pairs are highly related, they can be extracted from these related words. Our method involves three steps. Firstly we utilize external statistical features to extract candidate abbreviation-definition pairs from anchor texts. Secondly we extract internal features from candidate pairs and adopt Conditional Random Fields (CRFs) to compute a score for each candidate pair. Finally we combine external and internal features to generate the final pairs. Experimental results show that this method can accurately extract Chinese abbreviation-definition pairs from anchor texts and combining both external and internal features is effective for extracting abbreviation-definition pairs.

international conference on natural computation | 2008

Text Classification Based on Transfer Learning and Self-Training

Yabin Zheng; Shaohua Teng; Zhiyuan Liu; Maosong Sun

Traditional text classification methods make a basic assumption: the training and test set are homologous, while this naive assumption may not hold in the real world, especially in the Web environment. Documents on the Web change from time to time, pre-trained model may be out of date when applied to new emerging documents. However some information of training set is nonetheless useful. In this paper we proposed a novel method to discover the constant common knowledge in both training and test set by transfer learning, then a model is built based on this knowledge to fit the distribution in test set. The model is reinforced iteratively by adding most confident instances in unlabeled test set to training set until convergence, which is a self-training process, preliminary experiment shows that our method achieves an approximately 8.92% improvement as compared to the standard supervised-learning method.

asia information retrieval symposium | 2009

Efficient Text Classification Using Term Projection

Yabin Zheng; Zhiyuan Liu; Shaohua Teng; Maosong Sun

In this paper, we propose an efficient text classification method using term projection. Firstly, we use a modified *** 2 statistic to project terms into predefined categories, which is more efficient compared to other clustering methods. Afterwards, we utilize the generated clusters as features to represent the documents. The classification is then performed in a rule-based manner or via SVM. Experiment results show that our modified *** 2 statistic feature selection method outperforms traditional *** 2 statistic especially at lower dimensionalities. And our method is also more efficient than Latent Semantic Analysis (LSA) on homogeneous dataset. Meanwhile, we can reduce the feature dimensionality by three orders of magnitude to save training and testing cost, and maintain comparable accuracy. Moreover, we could use a small training set to gain an approximately 4.3% improvement on heterogeneous dataset as compared to traditional method, which indicates that our method has better generalization capability.

empirical methods in natural language processing | 2010