Zhongwu Zhai
Tsinghua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zhongwu Zhai.
web search and data mining | 2011
Zhongwu Zhai; Bing Liu; Hua Xu; Peifa Jia
In sentiment analysis of product reviews, one important problem is to produce a summary of opinions based on product features/attributes (also called aspects). However, for the same feature, people can express it with many different words or phrases. To produce a useful summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature group. Although several methods have been proposed to extract product features from reviews, limited work has been done on clustering or grouping of synonym features. This paper focuses on this task. Classic methods for solving this problem are based on unsupervised learning using some forms of distributional similarity. However, we found that these methods do not do well. We then model it as a semi-supervised learning problem. Lexical characteristics of the problem are exploited to automatically identify some labeled examples. Empirical evaluation shows that the proposed method outperforms existing state-of-the-art methods by a large margin.
knowledge discovery and data mining | 2011
Zhongwu Zhai; Bing Liu; Hua Xu; Peifa Jia
In opinion mining of product reviews, one often wants to produce a summary of opinions based on product features. However, for the same feature, people can express it with different words and phrases. To produce an effective summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature. Topic modeling is a suitable method for the task. However, instead of simply letting topic modeling find groupings freely, we believe it is possible to do better by giving it some pre-existing knowledge in the form of automatically extracted constraints. In this paper, we first extend a popular topic modeling method, called Latent Dirichlet Allocation (LDA), with the ability to process large scale constraints. Then, two novel methods are proposed to extract two types of constraints automatically. Finally, the resulting constrained-LDA and the extracted constraints are applied to group product features. Experiments show that constrained-LDA outperforms the original LDA and the latest mLSA by a large margin.
Expert Systems With Applications | 2011
Zhongwu Zhai; Hua Xu; Bada Kang; Peifa Jia
Features play a fundamental role in sentiment classification. How to effectively select different types of features to improve sentiment classification performance is the primary topic of this paper. Ngram features are commonly employed in text classification tasks; in this paper, sentiment-words, substrings, substring-groups, and key-substring-groups, which have never been considered in sentiment classification area before, are also extracted as features. The extracted features are then compared and analyzed. To demonstrate generality, we use two authoritative Chinese data sets in different domains to conduct our experiments. Our statistical analysis of the experimental results indicate the following: (1) different types of features possess different discriminative capabilities in Chinese sentiment classification; (2) character bigram features perform the best among the Ngram features; (3) substring-group features have greater potential to improve the performance of sentiment classification by combining substrings of different lengths; (4) sentiment words or phrases extracted from existing sentiment lexicons are not effective for sentiment classification; (5) effective features are usually at varying lengths rather than fixed lengths.
web intelligence | 2008
Zhongwu Zhai; Hua Xu; Peifa Jia
Opinion leaders play a very important role in information diffusion; they are found in all fields of society and influence the opinions of the masses in their fields. Most proposed algorithms on identifying opinion leaders in Internet social network are global measure algorithms and usually omit the fact that opinion leaders are field-limited. We propose and test several algorithms, including interest-field based algorithms and global measure algorithms, to identify opinion leaders in BBS. Our experiments show that different algorithms are sensitive to different indicators; the interest-field based algorithms which not only take into account of the social networkspsila structure but also the userspsila interest space are more reasonable and effective in identifying opinion leaders in BBS. The interest-field based algorithms are sensitive to the high status nodes in the social network, and their performance relies on the quality of field discovery.
Tsinghua Science & Technology | 2010
Zhongwu Zhai; Hua Xu; Peifa Jia
Abstract This paper is an empirical study of unsupervised sentiment classification of Chinese reviews. The focus is on exploring the ways to improve the performance of the unsupervised sentiment classification based on limited existing sentiment resources in Chinese. On the one hand, all available Chinese sentiment lexicons — individual and combined — are evaluated under our proposed framework. On the other hand, the domain dependent sentiment noise words are identified and removed using unlabeled data, to improve the classification performance. To the best of our knowledge, this is the first such attempt. Experiments have been conducted on three open datasets in two domains, and the results show that the proposed algorithm for sentiment noise words removal can improve the classification performance significantly.
knowledge discovery and data mining | 2010
Zhongwu Zhai; Hua Xu; Jun Li; Peifa Jia
An open problem in machine learning-based sentiment classification is how to extract complex features that outperform simple features; figuring out which types of features are most valuable is another Most of the studies focus primarily on character or word Ngrams features, but substring-group features have never been considered in sentiment classification area before In this study, the substring-group features are extracted and selected for sentiment classification by means of transductive learning-based algorithm To demonstrate generality, experiments have been conducted on three open datasets in three different languages: Chinese, English and Spanish The experimental results show that the proposed algorithms performance is usually superior to the best performance in related work, and the proposed feature subsumption algorithm for sentiment classification is multilingual Compared to the inductive learning-based algorithm, the experimental results also illustrate that the transductive learning-based algorithm can significantly improve the performance of sentiment classification As for term weighting, the experiments show that the “tfidf-c” outperforms all other term weighting approaches in the proposed algorithm.
international conference on computational linguistics | 2010
Zhongwu Zhai; Bing Liu; Hua Xu; Peifa Jia
national conference on artificial intelligence | 2011
Zhongwu Zhai; Bing Liu; Lei Zhang; Hua Xu; Peifa Jia
IEEE Intelligent Systems | 2012
Zhongwu Zhai; Bing Liu; Jingyuan Wang; Hua Xu; Peifa Jia
international conference natural language processing | 2009
Zhongwu Zhai; Hua Xu; Jun Li; Peifa Jia