Fangzhao Wu
Tsinghua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fangzhao Wu.
Neurocomputing | 2016
Fangzhao Wu; Yongfeng Huang; Yangqiu Song
Microblog sentiment analysis is a fundamental problem for many interesting applications. Existing microblog sentiment classification methods judge the sentiment polarity mainly according to textual content. However, since microblog messages are very short and noisy, and their sentiment polarities are often ambiguous and context-dependent, the accuracy of microblog sentiment classification is usually unsatisfactory. Fortunately, microblog messages lie in social media and contain rich social contexts. The social context information often implies sentiment connections between microblog messages. For example, a microblogging user usually expresses the same sentiment when posting multiple messages towards the same topic. Motivated by these observations, in this paper we propose a structured microblog sentiment classification (SMSC) framework. Our framework can combine social context information with textual content information to improve microblog sentiment classification accuracy. Two kinds of social contexts are used in our framework, i.e., social connections between microblog messages brought by the same author and social connections brought by social relations between users. In our framework, social context information is formulated as the graph structure over the sentiments of microblog messages. The objective function of our framework is a tradeoff between the agreement with content-based sentiment predictions and the consistency with social contexts. An efficient optimization algorithm is introduced to solve our framework. Experimental results on two Twitter sentiment analysis benchmark datasets indicate that our method can outperform baseline methods consistently and significantly.
conference on information and knowledge management | 2015
Fangzhao Wu; Jinyun Shu; Yongfeng Huang; Zhigang Yuan
The popularity of microblogging platforms, such as Twitter, makes them important for information dissemination and sharing. However, they are also recognized as ideal places by spammers to conduct social spamming. Massive social spammers and spam messages heavily hurt the user experience and hinder the healthy development of microblogging systems. Thus, effectively detecting the social spammers and spam messages in microblogging is of great value. Existing studies mainly regard social spammer detection and spam message detection as two separate tasks. However, social spammers and spam messages have strong connections, since social spammers tend to post more spam messages and spam messages have high probabilities to be posted by social spammers. Combining social spammer detection with spam message detection has the potential to boost the performance of each task. In this paper, we propose a unified framework for social spammer and spam message co-detection in microblogging. Our framework utilizes the posting relations between users and messages to combine social spammer detection with spam message detection. In addition, we extract the social relations between users as well as the connections between messages, and incorporate them into our framework as regularization terms over the prediction results. Besides, we introduce an efficient optimization method to solve our framework. Extensive experiments on a real-world microblog dataset demonstrate that our framework can significantly and consistently improve the performance of both social spammer detection and spam message detection.
Neurocomputing | 2016
Fangzhao Wu; Jinyun Shu; Yongfeng Huang; Zhigang Yuan
Microblogging websites, such as Twitter, have become popular platforms for information dissemination and sharing. However, they are also full of spammers who frequently conduct social spamming on them. Massive social spammers and spam messages heavily hurt the user experience and hinder the healthy development of microblogging systems. Thus, effectively detecting the social spammers and spam messages is of great value to both microblogging users and websites. Existing studies usually treat social spammer detection and spam message detection as two separate tasks. However, social spammers and spam messages have strong inherent connections, since social spammers tend to post more spam messages and spam messages have high probabilities to be posted by social spammers. Thus combining social spammer detection with spam message detection has the potential to boost the performance of both tasks. In this paper, we propose a unified approach for social spammer and spam message co-detection in microblogging. Our approach utilizes the posting relations between users and messages to combine social spammer detection with spam message detection. In addition, we extract the social relations between users and the connections between messages to refine detection results. We regard these social contexts as the graph structure over the detection results and incorporate them into our approach as regularization terms. Besides, we introduce an efficient optimization algorithm to solve the model of our approach and propose an accelerated method to tackle the most time-consuming step. Extensive experiments on a real-world microblog dataset demonstrate that our approach can improve the performance of both social spammer detection and spam message detection effectively and efficiently.
Information Sciences | 2016
Fangzhao Wu; Yangqiu Song; Yongfeng Huang
Microblogging services, such as Twitter, are very popular for information release and dissemination. Analyzing the sentiments in massive microblog messages is useful for sensing the publics opinions on various topics, which has wide applications in both academic and industrial fields. However, microblog sentiment analysis is a challenging task, because microblog messages are short and noisy, and contain massive user-invented acronyms and informal words. It is expensive and time-consuming to manually annotate sufficient samples for training an accurate and robust microblog sentiment classifier. Fortunately, unlabeled microblog messages can provide a lot of useful sentiment knowledge. For example, emoticons are frequently used in microblog messages and they usually indicate sentiment orientations. In this paper, we propose to extract useful sentiment knowledge from massive unlabeled messages to enhance microblog sentiment classification. Three kinds of sentiment knowledge, i.e., contextual similarity knowledge, word-sentiment knowledge, and contextual polarity knowledge, are explored. We propose a unified framework to incorporate the heterogenous sentiment knowledge into the learning of microblog sentiment classifiers. An efficient optimization method based on ADMM is introduced to solve the model of our framework and an accelerated algorithm is proposed to tackle the most time-consuming step. Extensive experiments were conducted on three benchmark Twitter datasets. The experimental results show that our approach can improve the performance of microblog sentiment classification effectively and efficiently.
Information Fusion | 2017
Fangzhao Wu; Yongfeng Huang; Zhigang Yuan
Extract and fuse four kinds of sentiment knowledge from multiple sources.A unified model to fuse knowledge for domain-specific sentiment classification.An efficient algorithm to solve the model of our approach.Extensive experiment results validate effectiveness and efficiency of our approach. Analyzing the sentiments in massive user-generated online data, such as product reviews and microblogs, has become a hot research topic. It can help customers, companies and expert systems make more informed decisions. Sentiment analysis is widely known as a domain dependent problem. Different domains usually have different sentiment expressions and a general sentiment classifier is not suitable for all domains. A natural solution to this problem is to train a domain-specific sentiment classifier for each target domain. However, the labeled data in target domain is usually insufficient, and it is costly and time-consuming to annotate enough samples. In order to tackle this problem, we propose a novel approach to train domain-specific sentiment classifiers by fusing the sentiment knowledge from multiple sources. Sentiment information from four sources is extracted and fused in our approach. The first source is sentiment lexicons, which contain sentiment polarities of general sentiment words. The second source is the sentiment classifiers of multiple source domains. The third source is the unlabeled data in target domain, from which we extract domain-specific sentiment relations among words. The fourth source is the labeled data in target domain. We propose a unified framework to fuse these four kinds of sentiment knowledge and train domain-specific sentiment classifier for target domain. In addition, we present an efficient optimization algorithm to solve the model of our approach. Extensive experiments are conducted on both Amazon product review dataset and Twitter dataset. Experimental results show that by fusing the sentiment information extracted from multiple sources, our approach can effectively improve the performance of sentiment classification and reduce the dependence on labeled data. For instance, our approach can achieve an accuracy of 87.22% in Kitchen domain when only 200 samples in target domain are labeled. The performance improvements of our approach compared with purely supervised sentiment classifier are 8.98% and 7.92% on Amazon and Twitter datasets respectively.
IEEE Transactions on Knowledge and Data Engineering | 2017
Fangzhao Wu; Zhigang Yuan; Yongfeng Huang
We propose a collaborative multi-domain sentiment classification approach to train sentiment classifiers for multiple domains simultaneously. In our approach, the sentiment information in different domains is shared to train more accurate and robust sentiment classifiers for each domain when labeled data is scarce. Specifically, we decompose the sentiment classifier of each domain into two components, a global one and a domain-specific one. The global model can capture the general sentiment knowledge and is shared by various domains. The domain-specific model can capture the specific sentiment expressions in each domain. In addition, we extract domain-specific sentiment knowledge from both labeled and unlabeled samples in each domain and use it to enhance the learning of domain-specific sentiment classifiers. Besides, we incorporate the similarities between domains into our approach as regularization over the domain-specific sentiment classifiers to encourage the sharing of sentiment information between similar domains. Two kinds of domain similarity measures are explored, one based on textual content and the other one based on sentiment expressions. Moreover, we introduce two efficient algorithms to solve the model of our approach. Experimental results on benchmark datasets show that our approach can effectively improve the performance of multi-domain sentiment classification and significantly outperform baseline methods.
conference on information and knowledge management | 2016
Fangzhao Wu; Sixing Wu; Yongfeng Huang; Songfang Huang; Yong Qin
Sentiment domain adaptation is widely studied to tackle the domain-dependence problem in sentiment analysis field. Existing domain adaptation methods usually train a sentiment classifier in a source domain and adapt it to the target domain using transfer learning techniques. However, when the sentiment feature distributions of the source and target domains are significantly different, the adaptation performance will heavily decline. In this paper, we propose a new sentiment domain adaptation approach by adapting the sentiment knowledge in general-purpose sentiment lexicons to a specific domain. Since the general sentiment words of general-purpose sentiment lexicons usually convey consistent sentiments in different domains, they have better generalization performance than the sentiment classifier trained in a source domain. In addition, we propose to extract various kinds of contextual sentiment knowledge from massive unlabeled samples in target domain and formulate them as sentiment relations among sentiment expressions. It can propagate the sentiment information in general sentiment words to massive domain-specific sentiment expressions. Besides, we propose a unified framework to incorporate these different kinds of sentiment knowledge and learn an accurate domain-specific sentiment classifier for target domain. Moreover, we propose an efficient optimization algorithm to solve the model of our approach. Extensive experiments on benchmark datasets validate the effectiveness and efficiency of our approach.
conference on information and knowledge management | 2014
Fangzhao Wu; Jun Xu; Hang Li; Xin Jiang
This paper addresses the problem of post-processing of ranking in search, referred to as post ranking. Although important, no research seems to have been conducted on the problem, particularly with a principled approach, and in practice ad-hoc ways of performing the task are being adopted. This paper formalizes the problem as constrained optimization in which the constraints represent the post-processing rules and the objective function represents the trade-off between adherence to the original ranking and satisfaction of the rules. The optimization amounts to refining the original ranking result based on the rules. We further propose a specific probabilistic implementation of the general formalization on the basis of the Bradley-Terry model, which is theoretically sound, effective, and efficient. Our experimental results, using benchmark datasets and enterprise search dataset, show that the proposed method works much better than several baseline methods of utilizing rules.
conference on information and knowledge management | 2013
Fangzhao Wu; Yangqiu Song; Shixia Liu; Yongfeng Huang; Zhenyu Liu
Correlated topical trend detection is very useful in analyzing public and social media influence. In this paper, we propose an algorithm that can both detect the correlation and discover the corresponding keywords that trigger the correlation. To detect the correlation, we use a projection vector to project two text streams onto the same space, and then use a least square cost function to regress one text stream over the other with different time lags. To extract the corresponding keywords, we impose the non-negative sparsity constraints over the projection parameters. In addition, we present an accelerated algorithm based on Nesterovs method to efficiently solve the optimization problem. In our experiments, we use both syntehtic and real data sets to demonstrate the advantages and capabilities of the proposed algorithm over CCA on the follower link prediction problem.
meeting of the association for computational linguistics | 2017
Fangzhao Wu; Yongfeng Huang; Jun Yan
Domain adaptation is an important technology to handle domain dependence problem in sentiment analysis field. Existing methods usually rely on sentiment classifiers trained in source domains. However, their performance may heavily decline if the distributions of sentiment features in source and target domains have significant difference. In this paper, we propose an active sentiment domain adaptation approach to handle this problem. Instead of the source domain sentiment classifiers, our approach adapts the general-purpose sentiment lexicons to target domain with the help of a small number of labeled samples which are selected and annotated in an active learning mode, as well as the domain-specific sentiment similarities among words mined from unlabeled samples of target domain. A unified model is proposed to fuse different types of sentiment information and train sentiment classifier for target domain. Extensive experiments on benchmark datasets show that our approach can train accurate sentiment classifier with less labeled samples.