Fangtao Li | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fangtao Li is active.

Explore More

Publication

Featured researches published by Fangtao Li.

international joint conference on artificial intelligence | 2011

Incorporating reviewer and product information for review rating prediction

Fangtao Li; Nathan Nan Liu; Hongwei Jin; Kai Zhao; Qiang Yang; Xiaoyan Zhu

Traditional sentiment analysis mainly considers binary classifications of reviews, but in many real-world sentiment classification problems, non-binary review ratings are more useful. This is especially true when consumers wish to compare two products, both of which are not negative. Previous work has addressed this problem by extracting various features from the review text for learning a predictor. Since the same word may have different sentiment effects when used by different reviewers on different products, we argue that it is necessary to model such reviewer and product dependent effects in order to predict review ratings more accurately. In this paper, we propose a novel learning framework to incorporate reviewer and product information into the text based learner for rating prediction. The reviewer, product and text features are modeled as a three-dimension tensor. Tensor factorization techniques can then be employed to reduce the data sparsity problems. We perform extensive experiments to demonstrate the effectiveness of our model, which has a significant improvement compared to state of the art methods, especially for reviews with unpopular products and inactive reviewers.

conference on information and knowledge management | 2013

Adaptive co-training SVM for sentiment classification on tweets

Shenghua Liu; Fuxin Li; Fangtao Li; Xueqi Cheng; Huawei Shen

Sentiment classification is an important problem in tweets mining. There lack labeled data and rating mechanism for generating them in Twitter service. And topics in Twitter are more diverse while sentiment classifiers always dedicate themselves to a specific domain or topic. Thus it is a challenge to make sentiment classification adaptive to diverse topics without sufficient labeled data. Therefore we formally propose an adaptive multiclass SVM model which transfers an initial common sentiment classifier to a topic-adaptive one. To tackle the tweet sparsity, non-text features are explored besides the conventional text features, which are intuitively split into two views. An iterative algorithm is proposed for solving this model by alternating among three steps: optimization, unlabeled data selection and adaptive feature expansion steps. The algorithm alternatively minimizes the margins of two independent objectives on different views to learn coefficient matrices, which are collaboratively used for unlabeled tweets selection from the topic that the algorithm is adapting to. And then topic-adaptive sentiment words are expended based on the above selection, in turn to help the first two steps find more confident and unlabeled tweets and boost the final performance. Comparing with the well-known supervised sentiment classifiers and semi-supervised approaches, our algorithm achieves promising increases in accuracy averagely on the 6 topics from public tweet corpus.

international conference on innovative computing, information and control | 2007

An Aggressive Feature Selection Method based on Rough Set Theory

Fangtao Li; Tao Guan; Xian Zhang; Xiaoyan Zhu

Abstract Feature selection is an important component of text classification to reduce the data dimensionality. In this paper, we optimize the Johnsons Heuristic algorithm for rough set reduction, and then propose an aggressive feature selection method for text categorization. This method integrates the advantages of knowledge reduction in rough set (RS) theory and the conventional feature selection methods information gain (IG) and document frequency (DF). It is the first time that the rough set based feature selection method is experimented on the large-scale data set Reuters. And the results show that the proposed method can obtain higher categorization accuracy than IG and DF with much fewer features. In addition, comparing with the original rough set reduction, the proposed method reduces the computational time significantly. For the Reuters dataset, several discretization widths are adopted, and with our method, the quantities of features are reduced by 93.5%, 88.4% with only 0.61%, 0.13% decreases of F1 measure respectively.

international conference on computational linguistics | 2008

Answer Validation by Information Distance Calculation

Fangtao Li; Xian Zhang; Xiaoyan Zhu

In this paper, an information distance based approach is proposed to perform answer validation for question answering system. To validate an answer candidate, the approach calculates the conditional information distance between the question focus and the candidate under certain condition pattern set. Heuristic methods are designed to extract question focus and generate proper condition patterns from question. General search engines are employed to estimate the Kolmogorov complexity, hence the information distance. Experimental results show that our approach is stable and flexible, and outperforms traditional tfidf methods.

information technology and computer science | 2010

Summarizing Similar Questions for Chinese Community Question Answering Portals

Yang Tang; Fangtao Li; Minlie Huang; Xiaoyan Zhu

As online community question answering (cQA) portals like Yahoo! Answers1 and Baidu Zhidao2 have attracted over hundreds of millions of questions, how to utilize these questions and accordant answers becomes increasingly important for cQA websites. Prior approaches focus on using information retrieval techniques to provide a ranked list of questions based on their similarities to the query. Due to the high variance of question quality and answer quality, users have to spend lots of time on finding the truly best answers from retrieved results. In this paper, we develop an answer retrieval and summarization system which directly provides an accurate and comprehensive answer summary instead of a list of similar questions to user’s query. To fully explore the information of relations between queries and questions, between questions and answers, and between answers and sentences, we propose a new probabilistic scoring model to distinguish high-quality answers from low-quality answers. By fully exploiting these relations, we summarize answers using a maximum coverage model. Experiment results on the data extracted from Chinese cQA websites demonstrate the efficacy of our proposed method.

international conference on computational linguistics | 2010