Nathan Nan Liu
Hong Kong University of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nathan Nan Liu.
international conference on data mining | 2008
Rong Pan; Yunhong Zhou; Bin Cao; Nathan Nan Liu; Rajan Lukose; Martin B. Scholz; Qiang Yang
Many applications of collaborative filtering (CF), such as news item recommendation and bookmark recommendation, are most naturally thought of as one-class collaborative filtering (OCCF) problems. In these problems, the training data usually consist simply of binary data reflecting a users action or inaction, such as page visitation in the case of news item recommendation or webpage bookmarking in the bookmarking scenario. Usually this kind of data are extremely sparse (a small fraction are positive examples), therefore ambiguity arises in the interpretation of the non-positive examples. Negative examples and unlabeled positive examples are mixed together and we are typically unable to distinguish them. For example, we cannot really attribute a user not bookmarking a page to a lack of interest or lack of awareness of the page. Previous research addressing this one-class problem only considered it as a classification task. In this paper, we consider the one-class problem under the CF setting. We propose two frameworks to tackle OCCF. One is based on weighted low rank approximation; the other is based on negative example sampling. The experimental results show that our approaches significantly outperform the baselines.
IEEE Intelligent Systems | 2010
Qiang Yang; Zhi-Hua Zhou; Wenji Mao; Wei Li; Nathan Nan Liu
In recent years, social behavioral data have been exponentially expanding due to the tremendous success of various outlets on the social Web (aka Web 2.0) such as Facebook, Digg, Twitter, Wikipedia, and Delicious. As a result, theres a need for social learning to support the discovery, analysis, and modeling of human social behavioral data. The goal is to discover social intelligence, which encompasses a spectrum of knowledge that characterizes human interaction, communication, and collaborations. The social Web has thus become a fertile ground for machine learning and data mining research. This special issue gathers the state-of-the-art research in social learning and is devoted to exhibiting some of the best representative works in this area.
conference on information and knowledge management | 2011
Ou Jin; Nathan Nan Liu; Kai Zhao; Yong Yu; Qiang Yang
With the rapid growth of social Web applications such as Twitter and online advertisements, the task of understanding short texts is becoming more and more important. Most traditional text mining techniques are designed to handle long text documents. For short text messages, many of the existing techniques are not effective due to the sparseness of text representations. To understand short messages, we observe that it is often possible to find topically related long texts, which can be utilized as the auxiliary data when mining the target short texts data. In this article, we present a novel approach to cluster short text messages via transfer learning from auxiliary long text data. We show that while some previous work exists that enhance short text clustering with related long texts, most of them ignore the semantic and topical inconsistencies between the target and auxiliary data and hurt the clustering performance. To accommodate the possible inconsistency between source and target data, we propose a novel topic model - Dual Latent Dirichlet Allocation (DLDA) model, which jointly learns two sets of topics on short and long texts and couples the topic parameters to cope with the potential inconsistency between data sets. We demonstrate through large-scale clustering experiments on both advertisements and Twitter data that we can obtain superior performance over several state-of-art techniques for clustering short text documents.
conference on information and knowledge management | 2009
Nathan Nan Liu; Min Zhao; Qiang Yang
A central goal of collaborative filtering (CF) is to rank items by their utilities with respect to individual users in order to make personalized recommendations. Traditionally, this is often formulated as a rating prediction problem. However, it is more desirable for CF algorithms to address the ranking problem directly without going through an extra rating prediction step. In this paper, we propose the probabilistic latent preference analysis (pLPA) model for ranking predictions by directly modeling user preferences with respect to a set of items rather than the rating scores on individual items. From a users observed ratings, we extract his preferences in the form of pairwise comparisons of items which are modeled by a mixture distribution based on Bradley-Terry model. An EM algorithm for fitting the corresponding latent class model as well as a method for predicting the optimal ranking are described. Experimental results on real world data sets demonstrated the superiority of the proposed method over several existing CF algorithms based on rating predictions in terms of ranking performance measure NDCG.
international joint conference on artificial intelligence | 2011
Weike Pan; Nathan Nan Liu; Evan Wei Xiang; Qiang Yang
Data sparsity due to missing ratings is a major challenge for collaborative filtering (CF) techniques in recommender systems. This is especially true for CF domains where the ratings are expressed numerically. We observe that, while we may lack the information in numerical ratings, we may have more data in the form of binary ratings. This is especially true when users can easily express themselves with their likes and dislikes for certain items. In this paper, we explore how to use the binary preference data expressed in the form of like/dislike to help reduce the impact of data sparsity of more expressive numerical ratings. We do this by transferring the rating knowledge from some auxiliary data source in binary form (that is, likes or dislikes), to a target numerical rating matrix. Our solution is to model both numerical ratings and like/dislike in a principled way, using a novel framework of Transfer by Collective Factorization (TCF). In particular, we construct the shared latent space collectively and learn the data-dependent effect separately. A major advantage of the TCF approach over previous collective matrix factorization (or bifactorization) methods is that we are able to capture the data-dependent effect when sharing the data-independent knowledge, so as to increase the over-all quality of knowledge transfer. Experimental results demonstrate the effectiveness of TCF at various sparsity levels as compared to several state-of-the-art methods.
ubiquitous computing | 2008
Derek Hao Hu; Sinno Jialin Pan; Vincent Wenchen Zheng; Nathan Nan Liu; Qiang Yang
Recognizing and understanding the activities of people from sensor readings is an important task in ubiquitous computing. Activity recognition is also a particularly difficult task because of the inherent uncertainty and complexity of the data collected by the sensors. Many researchers have tackled this problem in an overly simplistic setting by assuming that users often carry out single activities one at a time or multiple activities consecutively, one after another. However, so far there has been no formal exploration on the degree in which humans perform concurrent or interleaving activities, and no thorough study on how to detect multiple goals in a real world scenario. In this article, we ask the fundamental questions of whether users often carry out multiple concurrent and interleaving activities or single activities in their daily life, and if so, whether such complex behavior can be detected accurately using sensors. We define several classes of complexity levels under a goal taxonomy that describe different granularities of activities, and relate the recognition accuracy with different complexity levels or granularities. We present a theoretical framework for recognizing multiple concurrent and interleaving activities, and evaluate the framework in several real-world ubiquitous computing environments.
international joint conference on artificial intelligence | 2011
Fangtao Li; Nathan Nan Liu; Hongwei Jin; Kai Zhao; Qiang Yang; Xiaoyan Zhu
Traditional sentiment analysis mainly considers binary classifications of reviews, but in many real-world sentiment classification problems, non-binary review ratings are more useful. This is especially true when consumers wish to compare two products, both of which are not negative. Previous work has addressed this problem by extracting various features from the review text for learning a predictor. Since the same word may have different sentiment effects when used by different reviewers on different products, we argue that it is necessary to model such reviewer and product dependent effects in order to predict review ratings more accurately. In this paper, we propose a novel learning framework to incorporate reviewer and product information into the text based learner for rating prediction. The reviewer, product and text features are modeled as a three-dimension tensor. Tensor factorization techniques can then be employed to reduce the data sparsity problems. We perform extensive experiments to demonstrate the effectiveness of our model, which has a significant improvement compared to state of the art methods, especially for reviews with unpopular products and inactive reviewers.
conference on recommender systems | 2011
Nathan Nan Liu; Xiangrui Meng; Chao Liu; Qiang Yang
Recommender systems have to deal with the cold start problem as new users and/or items are always present. Rating elicitation is a common approach for handling cold start. However, there still lacks a principled model for guiding how to select the most useful ratings. In this paper, we propose a principled approach to identify representative users and items using representative-based matrix factorization. Not only do we show that the selected representatives are superior to other competing methods in terms of achieving good balance between coverage and diversity, but we also demonstrate that ratings on the selected representatives are much more useful for making recommendations (about 10% better than competing methods). In addition to illustrating how representatives help solve the cold start problem, we also argue that the problem of finding representatives itself is an important problem that would deserve further investigations, for both its practical values and technical challenges.
conference on information and knowledge management | 2010
Nathan Nan Liu; Evan Wei Xiang; Min Zhao; Qiang Yang
Most collaborative filtering algorithms are based on certain statistical models of user interests built from either explicit feedback (eg: ratings, votes) or implicit feedback (eg: clicks, purchases). Explicit feedbacks are more precise but more difficult to collect from users while implicit feedbacks are much easier to collect though less accurate in reflecting user preferences. In the existing literature, separate models have been developed for either of these two forms of user feedbacks due to their heterogeneous representation. However in most real world recommended systems both explicit and implicit user feedback are abundant and could potentially complement each other. It is desirable to be able to unify these two heterogeneous forms of user feedback in order to generate more accurate recommendations. In this work, we developed matrix factorization models that can be trained from explicit and implicit feedback simultaneously. Experimental results of multiple datasets showed that our algorithm could effectively combine these two forms of heterogeneous user feedback to improve recommendation quality.
ACM Transactions on Intelligent Systems and Technology | 2013
Nathan Nan Liu; Luheng He; Min Zhao
Most existing collaborative filtering models only consider the use of user feedback (e.g., ratings) and meta data (e.g., content, demographics). However, in most real world recommender systems, context information, such as time and social networks, are also very important factors that could be considered in order to produce more accurate recommendations. In this work, we address several challenges for the context aware movie recommendation tasks in CAMRa 2010: (1) how to combine multiple heterogeneous forms of user feedback? (2) how to cope with dynamic user and item characteristics? (3) how to capture and utilize social connections among users? For the first challenge, we propose a novel ranking based matrix factorization model to aggregate explicit and implicit user feedback. For the second challenge, we extend this model to a sequential matrix factorization model to enable time-aware parametrization. Finally, we introduce a network regularization function to constrain user parameters based on social connections. To the best of our knowledge, this is the first study that investigates the collective modeling of social and temporal dynamics. Experiments on the CAMRa 2010 dataset demonstrated clear improvements over many baselines.