Fangtao Li
Tsinghua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fangtao Li.
international joint conference on artificial intelligence | 2011
Fangtao Li; Nathan Nan Liu; Hongwei Jin; Kai Zhao; Qiang Yang; Xiaoyan Zhu
Traditional sentiment analysis mainly considers binary classifications of reviews, but in many real-world sentiment classification problems, non-binary review ratings are more useful. This is especially true when consumers wish to compare two products, both of which are not negative. Previous work has addressed this problem by extracting various features from the review text for learning a predictor. Since the same word may have different sentiment effects when used by different reviewers on different products, we argue that it is necessary to model such reviewer and product dependent effects in order to predict review ratings more accurately. In this paper, we propose a novel learning framework to incorporate reviewer and product information into the text based learner for rating prediction. The reviewer, product and text features are modeled as a three-dimension tensor. Tensor factorization techniques can then be employed to reduce the data sparsity problems. We perform extensive experiments to demonstrate the effectiveness of our model, which has a significant improvement compared to state of the art methods, especially for reviews with unpopular products and inactive reviewers.
conference on information and knowledge management | 2013
Shenghua Liu; Fuxin Li; Fangtao Li; Xueqi Cheng; Huawei Shen
Sentiment classification is an important problem in tweets mining. There lack labeled data and rating mechanism for generating them in Twitter service. And topics in Twitter are more diverse while sentiment classifiers always dedicate themselves to a specific domain or topic. Thus it is a challenge to make sentiment classification adaptive to diverse topics without sufficient labeled data. Therefore we formally propose an adaptive multiclass SVM model which transfers an initial common sentiment classifier to a topic-adaptive one. To tackle the tweet sparsity, non-text features are explored besides the conventional text features, which are intuitively split into two views. An iterative algorithm is proposed for solving this model by alternating among three steps: optimization, unlabeled data selection and adaptive feature expansion steps. The algorithm alternatively minimizes the margins of two independent objectives on different views to learn coefficient matrices, which are collaboratively used for unlabeled tweets selection from the topic that the algorithm is adapting to. And then topic-adaptive sentiment words are expended based on the above selection, in turn to help the first two steps find more confident and unlabeled tweets and boost the final performance. Comparing with the well-known supervised sentiment classifiers and semi-supervised approaches, our algorithm achieves promising increases in accuracy averagely on the 6 topics from public tweet corpus.
international conference on innovative computing, information and control | 2007
Fangtao Li; Tao Guan; Xian Zhang; Xiaoyan Zhu
Abstract Feature selection is an important component of text classification to reduce the data dimensionality. In this paper, we optimize the Johnsons Heuristic algorithm for rough set reduction, and then propose an aggressive feature selection method for text categorization. This method integrates the advantages of knowledge reduction in rough set (RS) theory and the conventional feature selection methods information gain (IG) and document frequency (DF). It is the first time that the rough set based feature selection method is experimented on the large-scale data set Reuters. And the results show that the proposed method can obtain higher categorization accuracy than IG and DF with much fewer features. In addition, comparing with the original rough set reduction, the proposed method reduces the computational time significantly. For the Reuters dataset, several discretization widths are adopted, and with our method, the quantities of features are reduced by 93.5%, 88.4% with only 0.61%, 0.13% decreases of F1 measure respectively.
international conference on computational linguistics | 2008
Fangtao Li; Xian Zhang; Xiaoyan Zhu
In this paper, an information distance based approach is proposed to perform answer validation for question answering system. To validate an answer candidate, the approach calculates the conditional information distance between the question focus and the candidate under certain condition pattern set. Heuristic methods are designed to extract question focus and generate proper condition patterns from question. General search engines are employed to estimate the Kolmogorov complexity, hence the information distance. Experimental results show that our approach is stable and flexible, and outperforms traditional tfidf methods.
information technology and computer science | 2010
Yang Tang; Fangtao Li; Minlie Huang; Xiaoyan Zhu
As online community question answering (cQA) portals like Yahoo! Answers1 and Baidu Zhidao2 have attracted over hundreds of millions of questions, how to utilize these questions and accordant answers becomes increasingly important for cQA websites. Prior approaches focus on using information retrieval techniques to provide a ranked list of questions based on their similarities to the query. Due to the high variance of question quality and answer quality, users have to spend lots of time on finding the truly best answers from retrieved results. In this paper, we develop an answer retrieval and summarization system which directly provides an accurate and comprehensive answer summary instead of a list of similar questions to user’s query. To fully explore the information of relations between queries and questions, between questions and answers, and between answers and sentences, we propose a new probabilistic scoring model to distinguish high-quality answers from low-quality answers. By fully exploiting these relations, we summarize answers using a maximum coverage model. Experiment results on the data extracted from Chinese cQA websites demonstrate the efficacy of our proposed method.
international conference on computational linguistics | 2010
Fangtao Li; Chao Han; Minlie Huang; Xiaoyan Zhu; Yingju Xia; Shu Zhang; Hao Yu
national conference on artificial intelligence | 2010
Fangtao Li; Minlie Huang; Xiaoyan Zhu
north american chapter of the association for computational linguistics | 2010
Zhicheng Zheng; Fangtao Li; Minlie Huang; Xiaoyan Zhu
meeting of the association for computational linguistics | 2012
Fangtao Li; Sinno Jialin Pan; Ou Jin; Qiang Yang; Xiaoyan Zhu
Theory and Applications of Categories | 2009
Fangtao Li; Zhicheng Zheng; Fan Bu; Yang Tang; Xiaoyan Zhu; Minlie Huang