Weike Pan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Weike Pan is active.

Explore More

Publication

Featured researches published by Weike Pan.

international joint conference on artificial intelligence | 2011

Transfer learning to predict missing ratings via heterogeneous user feedbacks

Weike Pan; Nathan Nan Liu; Evan Wei Xiang; Qiang Yang

Data sparsity due to missing ratings is a major challenge for collaborative filtering (CF) techniques in recommender systems. This is especially true for CF domains where the ratings are expressed numerically. We observe that, while we may lack the information in numerical ratings, we may have more data in the form of binary ratings. This is especially true when users can easily express themselves with their likes and dislikes for certain items. In this paper, we explore how to use the binary preference data expressed in the form of like/dislike to help reduce the impact of data sparsity of more expressive numerical ratings. We do this by transferring the rating knowledge from some auxiliary data source in binary form (that is, likes or dislikes), to a target numerical rating matrix. Our solution is to model both numerical ratings and like/dislike in a principled way, using a novel framework of Transfer by Collective Factorization (TCF). In particular, we construct the shared latent space collectively and learn the data-dependent effect separately. A major advantage of the TCF approach over previous collective matrix factorization (or bifactorization) methods is that we are able to capture the data-dependent effect when sharing the data-independent knowledge, so as to increase the over-all quality of knowledge transfer. Experimental results demonstrate the effectiveness of TCF at various sparsity levels as compared to several state-of-the-art methods.

Knowledge Based Systems | 2015

Adaptive Bayesian personalized ranking for heterogeneous implicit feedbacks

Weike Pan; Hao Zhong; Congfu Xu; Zhong Ming

Implicit feedbacks have recently received much attention in recommendation communities due to their close relationship with real industry problem settings. However, most works only exploit users’ homogeneous implicit feedbacks such as users’ transaction records from “bought” activities, and ignore the other type of implicit feedbacks like examination records from “browsed” activities. The latter are usually more abundant though they are associated with high uncertainty w.r.t. users’ true preferences. In this paper, we study a new recommendation problem called heterogeneous implicit feedbacks (HIF), where the fundamental challenge is the uncertainty of the examination records. As a response, we design a novel preference learning algorithm to learn a confidence for each uncertain examination record with the help of transaction records. Specifically, we generalize Bayesian personalized ranking (BPR), a seminal pairwise learning algorithm for homogeneous implicit feedbacks, and learn the confidence adaptively, which is thus called adaptive Bayesian personalized ranking (ABPR). ABPR has the merits of uncertainty reduction on examination records and accurate pairwise preference learning on implicit feedbacks. Experimental results on two public data sets show that ABPR is able to leverage uncertain examination records effectively, and can achieve better recommendation performance than the state-of-the-art algorithm on various ranking-oriented evaluation metrics.

Artificial Intelligence | 2013

Transfer learning in heterogeneous collaborative filtering domains

Weike Pan; Qiang Yang

A major challenge for collaborative filtering (CF) techniques in recommender systems is the data sparsity that is caused by missing and noisy ratings. This problem is even more serious for CF domains where the ratings are expressed numerically, e.g. as 5-star grades. We assume the 5-star ratings are unordered bins instead of ordinal relative preferences. We observe that, while we may lack the information in numerical ratings, we sometimes have additional auxiliary data in the form of binary ratings. This is especially true given that users can easily express themselves with their preferences expressed as likes or dislikes for items. In this paper, we explore how to use these binary auxiliary preference data to help reduce the impact of data sparsity for CF domains expressed in numerical ratings. We solve this problem by transferring the rating knowledge from some auxiliary data source in binary form (that is, likes or dislikes), to a target numerical rating matrix. In particular, our solution is to model both the numerical ratings and ratings expressed as like or dislike in a principled way. We present a novel framework of Transfer by Collective Factorization (TCF), in which we construct a shared latent space collectively and learn the data-dependent effect separately. A major advantage of the TCF approach over the previous bilinear method of collective matrix factorization is that we are able to capture the data-dependent effect when sharing the data-independent knowledge. This allows us to increase the overall quality of knowledge transfer. We present extensive experimental results to demonstrate the effectiveness of TCF at various sparsity levels, and show improvements of our approach as compared to several state-of-the-art methods.

Mining Text Data | 2012

Transfer Learning for Text Mining

Weike Pan; Erheng Zhong; Qiang Yang

Over the years, transfer learning has received much attention in machine learning research and practice. Researchers have found that a major bottleneck associated with machine learning and text mining is the lack of high-quality annotated examples to help train a model. In response, transfer learning offers an attractive solution for this problem. Various transfer learning methods are designed to extract the useful knowledge from different but related auxiliary domains. In its connection to text mining, transfer learning has found novel and useful applications. In this chapter, we will review some most recent developments in transfer learning for text mining, explain related algorithms in detail, and project future developments of this field. We focus on two important topics: cross-domain text document classification and heterogeneous transfer learning that uses labeled text documents to help classify images.

siam international conference on data mining | 2013

CoFiSet: Collaborative filtering via learning pairwise preferences over item-sets

Li Chen; Weike Pan

Collaborative filtering aims to make use of users’ feedbacks to improve the recommendation performance, which has been deployed in various industry recommender systems. Some recent works have switched from exploiting explicit feedbacks of numerical ratings to implicit feedbacks like browsing and shopping records, since such data are more abundant and easier to collect. One fundamental challenge of leveraging implicit feedbacks is the lack of negative feedbacks, because there are only some observed relatively “positive” feedbacks, making it difficult to learn a prediction model. Previous works address this challenge via proposing some pointwise or pairwise preference assumptions on items. However, such assumptions with respect to items may not always hold, for example, a user may dislike a bought item or like an item not bought yet. In this paper, we propose a new and relaxed assumption of pairwise preferences over item-sets, which defines a user’s preference on a set of items (item-set) instead of on a single item. The relaxed assumption can give us more accurate pairwise preference relationships. With this assumption, we further develop a general algorithm called CoFiSet (collaborative filtering via learning pairwise preferences over item-sets). Experimental results show that CoFiSet performs better than several stateof-the-art methods on various ranking-oriented evaluation metrics on two real-world data sets. Furthermore, CoFiSet is very efficient as shown by both the time complexity and CPU time.

IEEE Intelligent Systems | 2014

Interaction-Rich Transfer Learning for Collaborative Filtering with Heterogeneous User Feedback

Weike Pan; Zhong Ming

A real recommender system can usually make use of more than one type of user feedback--for example, numerical ratings and binary ratings-to learn a users true preferences. Recent work has proposed a transfer learning algorithm called transfer by collective factorization (TCF) to exploit such heterogeneous user feedback. TCF performs via sharing data-independent knowledge and modeling data-dependent effects simultaneously. However, TCF is a batch algorithm and updates the model parameters only once after scanning all data, which might not be efficient enough for real systems. This article proposes a novel and efficient transfer learning algorithm called interaction-rich transfer by collective factorization (iTCF), which extends the efficient collective matrix factorization (CMF) algorithm by providing more interactions between the user-specific latent features. The assumption under iTCF is that the predictability with regards to the same users rating behaviors in two related data is likely to be similar. Considering the shared predictability, the authors derive novel update rules for iTCF in a stochastic algorithmic framework. The advantages of iTCF include its efficiency compared with TCF, and its higher prediction accuracy compared with CMF. Experimental results on three real-world datasets show the effectiveness of iTCF over the state-of-the-art methods.

IEEE Intelligent Systems | 2014

An adaptive fusion algorithm for spam detection

Congfu Xu; Baojun Su; Yunbiao Cheng; Weike Pan; Li Chen

Spam detection has become a critical component in various online systems such as email services, advertising engines, social media sites, and so on. Here, the authors use email services as an example, and present an adaptive fusion algorithm for spam detection (AFSD), which is a general, content-based approach and can be applied to nonemail spam detection tasks with little additional effort. The proposed algorithm uses n-grams of nontokenized text strings to represent an email, introduces a link function to convert the prediction scores of online learners to become more comparable, trains the online learners in a mistake-driven manner via thick thresholding to obtain highly competitive online learners, and designs update rules to adaptively integrate the online learners to capture different aspects of spams. The prediction performance of AFSD is studied on five public competition datasets and on one industry dataset, with the algorithm achieving significantly better results than several state-of-the-art approaches, including the champion solutions of the corresponding competitions.

international joint conference on artificial intelligence | 2011

Source-selection-free transfer learning

Evan Wei Xiang; Sinno Jialin Pan; Weike Pan; Jian Su; Qiang Yang

Transfer learning addresses the problems that labeled training data are insufficient to produce a high-performance model. Typically, given a target learning task, most transfer learning approaches require to select one or more auxiliary tasks as sources by the designers. However, how to select the right source data to enable effective knowledge transfer automatically is still an unsolved problem, which limits the applicability of transfer learning. In this paper, we take one step ahead and propose a novel transfer learning framework, known as source-selection-free transfer learning (SSFTL), to free users from the need to select source domains. Instead of asking the users for source and target data pairs, as traditional transfer learning does, SSFTL turns to some online information sources such as World Wide Web or the Wikipedia for help. The source data for transfer learning can be hidden somewhere within this large online information source, but the users do not know where they are. Based on the online information sources, we train a large number of classifiers. Then, given a target task, a bridge is built for labels of the potential source candidates and the target domain data in SSFTL via some large online social media with tag cloud as a label translator. An added advantage of SSFTL is that, unlike many previous transfer learning approaches, which are difficult to scale up to the Web scale, SSFTL is highly scalable and can offset much of the training work to offline stage. We demonstrate the effectiveness and efficiency of SSFTL through extensive experiments on several real-world datasets in text classification.

IEEE Intelligent Systems | 2017

Collaborative Recommendation with Multiclass Preference Context

Weike Pan; Zhong Ming

Factorization- and neighborhood-based methods have been recognized as state-of-the-art approaches for collaborative recommendation tasks. In this article, the authors take user ratings as categorical multiclass preferences and propose a novel method called matrix factorization with multiclass preference context (MF-MPC), which integrates an enhanced neighborhood based on the assumption that users with similar past multiclass preferences (instead of one-class preferences in SVD++) will have similar tastes in the future. The main merit of MF-MPC is its ability to make use of the multiclass preference context in the factorization framework in a fine-grained manner and thus inherit the advantages of those two methods. Experimental results on three real-world datasets show that their solution can perform significantly better than factorization-based methods, neighborhood-based methods, and integrated methods with a one-class preference context.

conference on information and knowledge management | 2014

HGMF: Hierarchical Group Matrix Factorization for Collaborative Recommendation

Xin Wang; Weike Pan; Congfu Xu

Matrix factorization is one of the most powerful techniques in collaborative filtering, which models the (user, item) interactions behind historical explicit or implicit feedbacks. However, plain matrix factorization may not be able to uncover the structure correlations among users and items well such as communities and taxonomies. As a response, we design a novel algorithm, i.e., hierarchical group matrix factorization (HGMF), in order to explore and model the structure correlations among users and items in a principled way. Specifically, we first define four types of correlations, including (user, item), (user, item group), (user group, item) and (user group, item group); we then extend plain matrix factorization with a hierarchical group structure; finally, we design a novel clustering algorithm to mine the hidden structure correlations. In the experiments, we study the effectiveness of our HGMF for both rating prediction and item recommendation, and find that it is better than some state-of-the-art methods on several real-world data sets.

Explore More