Songbo Tan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Songbo Tan is active.

Explore More

Publication

Featured researches published by Songbo Tan.

Expert Systems With Applications | 2009

A survey on sentiment detection of reviews

Huifeng Tang; Songbo Tan; Xueqi Cheng

The sentiment detection of texts has been witnessed a booming interest in recent years, due to the increased availability of online reviews in digital form and the ensuing need to organize them. Till to now, there are mainly four different problems predominating in this research community, namely, subjectivity classification, word sentiment classification, document sentiment classification and opinion extraction. In fact, there are inherent relations between them. Subjectivity classification can prevent the sentiment classifier from considering irrelevant or even potentially misleading text. Document sentiment classification and opinion extraction have often involved word sentiment classification techniques. This survey discusses related issues and main approaches to these problems.

Expert Systems With Applications | 2008

An empirical study of sentiment analysis for chinese documents

Songbo Tan; Jin Zhang

Up to now, there are very few researches conducted on sentiment classification for Chinese documents. In order to remedy this deficiency, this paper presents an empirical study of sentiment categorization on Chinese documents. Four feature selection methods (MI, IG, CHI and DF) and five learning methods (centroid classifier, K-nearest neighbor, winnow classifier, Naive Bayes and SVM) are investigated on a Chinese sentiment corpus with a size of 1021 documents. The experimental results indicate that IG performs the best for sentimental terms selection and SVM exhibits the best performance for sentiment classification. Furthermore, we found that sentiment classifiers are severely dependent on domains or topics.

Expert Systems With Applications | 2005

Neighbor-weighted K-nearest neighbor for unbalanced text corpus

Songbo Tan

Text categorization or classification is the automated assigning of text documents to pre-defined classes based on their contents. Many of classification algorithms usually assume that the training examples are evenly distributed among different classes. However, unbalanced data sets often appear in many practical applications. In order to deal with uneven text sets, we propose the neighbor-weighted K-nearest neighbor algorithm, i.e. NWKNN. The experimental results indicate that our algorithm NWKNN achieves significant classification performance improvement on imbalanced corpora.

Expert Systems With Applications | 2006

An effective refinement strategy for KNN text classifier

Songbo Tan

Due to the exponential growth of documents on the Internet and the emergent need to organize them, the automated categorization of documents into predefined labels has received an ever-increased attention in the recent years. A wide range of supervised learning algorithms has been introduced to deal with text classification. Among all these classifiers, K-Nearest Neighbors (KNN) is a widely used classifier in text categorization community because of its simplicity and efficiency. However, KNN still suffers from inductive biases or model misfits that result from its assumptions, such as the presumption that training data are evenly distributed among all categories. In this paper, we propose a new refinement strategy, which we called as DragPushing, for the KNN Classifier. The experiments on three benchmark evaluation collections show that DragPushing achieved a significant improvement on the performance of the KNN Classifier.

european conference on information retrieval | 2009

Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis

Songbo Tan; Xueqi Cheng; Yuefen Wang; Hongbo Xu

In the community of sentiment analysis, supervised learning techniques have been shown to perform very well. When transferred to another domain, however, a supervised sentiment classifier often performs extremely bad. This is so-called domain-transfer problem. In this work, we attempt to attack this problem by making the maximum use of both the old-domain data and the unlabeled new-domain data. To leverage knowledge from the old-domain data, we proposed an effective measure, i.e., Frequently Co-occurring Entropy (FCE), to pick out generalizable features that occur frequently in both domains and have similar occurring probability. To gain knowledge from the new-domain data, we proposed Adapted Naive Bayes (ANB), a weighted transfer version of Naive Bayes Classifier. The experimental results indicate that proposed approach could improve the performance of base classifier dramatically, and even provide much better performance than the transfer-learning baseline, i.e. the Naive Bayes Transfer Classifier (NTBC).

conference on information and knowledge management | 2005

A novel refinement approach for text categorization

Songbo Tan; Xueqi Cheng; Moustafa Ghanem; Bin Wang; Hongbo Xu

In this paper we present a novel strategy, DragPushing, for improving the performance of text classifiers. The strategy is generic and takes advantage of training errors to successively refine the classification model of a base classifier. We describe how it is applied to generate two new classification algorithms; a Refined Centroid Classifier and a Refined Naïve Bayes Classifier. We present an extensive experimental evaluation of both algorithms on three English collections and one Chinese corpus. The results indicate that in each case, the refined classifiers achieve significant performance improvement over the base classifiers used. Furthermore, the performance of the Refined Centroid Classifier implemented is comparable, if not better, to that of state-of-the-art support vector machine (SVM)-based classifier, but offers a much lower computational cost.

conference on information and knowledge management | 2007

A novel scheme for domain-transfer problem in the context of sentiment analysis

Songbo Tan; Gaowei Wu; Huifeng Tang; Xueqi Cheng

In this work, we attempt to tackle domain-transfer problem by combining old-domain labeled examples with new-domain unlabeled ones. The basic idea is to use old-domain-trained classifier to label some informative unlabeled examples in new domain, and retrain the base classifier over these selected examples. The experimental results demonstrate that proposed scheme can significantly boost the accuracy of the base sentiment classifier on new domain.

Expert Systems With Applications | 2008

An improved centroid classifier for text categorization

Songbo Tan

In the context of text categorization, Centroid Classifier has proved to be a simple and yet efficient method. However, it often suffers from the inductive bias or model misfit incurred by its assumption. In order to address this issue, we propose a novel batch-updated approach to enhance the performance of Centroid Classifier. The main idea behind this method is to take advantage of training errors to successively update the classification model by batch. The technique is simple to implement and flexible to text data. The experimental results indicate that the technique can significantly improve the performance of Centroid Classifier.

web search and data mining | 2010

Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon

Weifu Du; Songbo Tan; Xueqi Cheng; Xiaochun Yun

Domain-oriented sentiment lexicons are widely used for fine-grained sentiment analysis on reviews; therefore, the automatic construction of domain-oriented sentiment lexicon is a fundamental and important task for sentiment analysis research. Most of existing construction approaches take only the kind of relationships between words into account, which makes them have a lot of room for improvement. This paper proposes an adapted information bottleneck method for the construction of domain-oriented sentiment lexicon. This approach can naturally make full use of the mutual reinforcement between documents and words by fusing three kinds of relationships either from words to documents or from words to words; either homogeneous or heterogeneous; either within-domain or cross-domain. The experimental results demonstrate that proposed method could dramatically improve the accuracy of the baseline approach on the construction of out-of-domain sentiment lexicon.

north american chapter of the association for computational linguistics | 2009

An Iterative Reinforcement Approach for Fine-Grained Opinion Mining

Weifu Du; Songbo Tan

With the in-depth study of sentiment analysis research, finer-grained opinion mining, which aims to detect opinions on different review features as opposed to the whole review level, has been receiving more and more attention in the sentiment analysis research community recently. Most of existing approaches rely mainly on the template extraction to identify the explicit relatedness between product feature and opinion terms, which is insufficient to detect the implicit review features and mine the hidden sentiment association in reviews, which satisfies (1) the review features are not appear explicit in the review sentences; (2) it can be deduced by the opinion words in its context. From an information theoretic point of view, this paper proposed an iterative reinforcement framework based on the improved information bottleneck algorithm to address such problem. More specifically, the approach clusters product features and opinion words simultaneously and iteratively by fusing both their semantic information and co-occurrence information. The experimental results demonstrate that our approach outperforms the template extraction based approaches.

Explore More