Yabin Zheng
Tsinghua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yabin Zheng.
empirical methods in natural language processing | 2009
Zhiyuan Liu; Peng Li; Yabin Zheng; Maosong Sun
Keyphrases are widely used as a brief summary of documents. Since manual assignment is time-consuming, various unsupervised ranking methods based on importance scores are proposed for keyphrase extraction. In practice, the keyphrases of a document should not only be statistically important in the document, but also have a good coverage of the document. Based on this observation, we propose an unsupervised method for keyphrase extraction. Firstly, the method finds exemplar terms by leveraging clustering techniques, which guarantees the document to be semantically covered by these exemplar terms. Then the keyphrases are extracted from the document using the exemplar terms. Our method outperforms sate-of-the-art graph-based ranking methods (TextRank) by 9.5% in F1-measure.
ACM Transactions on Asian Language Information Processing | 2011
Zhiyuan Liu; Yabin Zheng; Lixing Xie; Maosong Sun; Liyun Ru; Yonghui Zhang
Nowadays, user behavior analysis and collaborative filtering have drawn a large body of research in the machine learning community. The goal is either to enhance the user experience or discover useful information hidden in the data. In this article, we conduct extensive experiments on a Chinese input method data set, which keeps the word lists that users have used. Then, from the collaborative perspective, we aim to solve two tasks in natural language processing, that is, related word retrieval and new word detection. Motivated by the observation that two words are usually highly related to each other if they co-occur frequently in users’ records, we propose a novel semantic relatedness measure between words that takes both user behaviors and collaborative filtering into consideration. We utilize this measure to perform related word retrieval and new word detection tasks. Experimental results on both tasks indicate the applicability and effectiveness of our method.
international conference on machine learning and cybernetics | 2011
Lixing Xie; Yabin Zheng; Zhiyuan Liu; Maosong Sun; Canhui Wang
This paper proposes an automatic scheme to extract Chinese abbreviations and their corresponding definitions from large-scale anchor texts. This method is motivated by the observation that the more frequently two anchor texts point to the same web page, the more related they are. Since abbreviation-definition pairs are highly related, they can be extracted from these related words. Our method involves three steps. Firstly we utilize external statistical features to extract candidate abbreviation-definition pairs from anchor texts. Secondly we extract internal features from candidate pairs and adopt Conditional Random Fields (CRFs) to compute a score for each candidate pair. Finally we combine external and internal features to generate the final pairs. Experimental results show that this method can accurately extract Chinese abbreviation-definition pairs from anchor texts and combining both external and internal features is effective for extracting abbreviation-definition pairs.
international conference on natural computation | 2008
Yabin Zheng; Shaohua Teng; Zhiyuan Liu; Maosong Sun
Traditional text classification methods make a basic assumption: the training and test set are homologous, while this naive assumption may not hold in the real world, especially in the Web environment. Documents on the Web change from time to time, pre-trained model may be out of date when applied to new emerging documents. However some information of training set is nonetheless useful. In this paper we proposed a novel method to discover the constant common knowledge in both training and test set by transfer learning, then a model is built based on this knowledge to fit the distribution in test set. The model is reinforced iteratively by adding most confident instances in unlabeled test set to training set until convergence, which is a self-training process, preliminary experiment shows that our method achieves an approximately 8.92% improvement as compared to the standard supervised-learning method.
asia information retrieval symposium | 2009
Yabin Zheng; Zhiyuan Liu; Shaohua Teng; Maosong Sun
In this paper, we propose an efficient text classification method using term projection. Firstly, we use a modified *** 2 statistic to project terms into predefined categories, which is more efficient compared to other clustering methods. Afterwards, we utilize the generated clusters as features to represent the documents. The classification is then performed in a rule-based manner or via SVM. Experiment results show that our modified *** 2 statistic feature selection method outperforms traditional *** 2 statistic especially at lower dimensionalities. And our method is also more efficient than Latent Semantic Analysis (LSA) on homogeneous dataset. Meanwhile, we can reduce the feature dimensionality by three orders of magnitude to save training and testing cost, and maintain comparable accuracy. Moreover, we could use a small training set to gain an approximately 4.3% improvement on heterogeneous dataset as compared to traditional method, which indicates that our method has better generalization capability.
empirical methods in natural language processing | 2010
Zhiyuan Liu; Wenyi Huang; Yabin Zheng; Maosong Sun
conference on computational natural language learning | 2011
Zhiyuan Liu; Xinxiong Chen; Yabin Zheng; Maosong Sun
international joint conference on artificial intelligence | 2009
Yabin Zheng; Zhiyuan Liu; Maosong Sun; Liyun Ru; Yonghui Zhang
international joint conference on artificial intelligence | 2011
Yabin Zheng; Chen Li; Maosong Sun
meeting of the association for computational linguistics | 2011
Yabin Zheng; Lixing Xie; Zhiyuan Liu; Maosong Sun; Yang Zhang; Liyun Ru