Hongzhi Yin
University of Queensland
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Hongzhi Yin.
knowledge discovery and data mining | 2015
Weiqing Wang; Hongzhi Yin; Ling Chen; Yizhou Sun; Shazia Wasim Sadiq; Xiaofang Zhou
With the rapid development of location-based social networks (LBSNs), spatial item recommendation has become an important means to help people discover attractive and interesting venues and events, especially when users travel out of town. However, this recommendation is very challenging compared to the traditional recommender systems. A user can visit only a limited number of spatial items, leading to a very sparse user-item matrix. Most of the items visited by a user are located within a short distance from where he/she lives, which makes it hard to recommend items when the user travels to a far away place. Moreover, user interests and behavior patterns may vary dramatically across different geographical regions. In light of this, we propose Geo-SAGE, a geographical sparse additive generative model for spatial item recommendation in this paper. Geo-SAGE considers both user personal interests and the preference of the crowd in the target region, by exploiting both the co-occurrence pattern of spatial items and the content of spatial items. To further alleviate the data sparsity issue, Geo-SAGE exploits the geographical correlation by smoothing the crowds preferences over a well-designed spatial index structure called spatial pyramid. We conduct extensive experiments and the experimental results clearly demonstrate our Geo-SAGE model outperforms the state-of-the-art.
ACM Transactions on Knowledge Discovery From Data | 2015
Hongzhi Yin; Bin Cui; Ling Chen; Zhiting Hu; Chengqi Zhang
This article proposes LA-LDA, a location-aware probabilistic generative model that exploits location-based ratings to model user profiles and produce recommendations. Most of the existing recommendation models do not consider the spatial information of users or items; however, LA-LDA supports three classes of location-based ratings, namely spatial user ratings for nonspatial items, nonspatial user ratings for spatial items, and spatial user ratings for spatial items. LA-LDA consists of two components, ULA-LDA and ILA-LDA, which are designed to take into account user and item location information, respectively. The component ULA-LDA explicitly incorporates and quantifies the influence from local public preferences to produce recommendations by considering user home locations, whereas the component ILA-LDA recommends items that are closer in both taste and travel distance to the querying users by capturing item co-occurrence patterns, as well as item location co-occurrence patterns. The two components of LA-LDA can be applied either separately or collectively, depending on the available types of location-based ratings. To demonstrate the applicability and flexibility of the LA-LDA model, we deploy it to both top-k recommendation and cold start recommendation scenarios. Experimental evidence on large-scale real-world data, including the data from Gowalla (a location-based social network), DoubanEvent (an event-based social network), and MovieLens (a movie recommendation system), reveal that LA-LDA models user profiles more accurately by outperforming existing recommendation models for top-k recommendation and the cold start problem.
ACM Transactions on Information Systems | 2016
Hongzhi Yin; Bin Cui; Xiaofang Zhou; Weiqing Wang; Zi Huang; Shazia Wasim Sadiq
Point-of-Interest (POI) recommendation has become an important means to help people discover attractive and interesting places, especially when users travel out of town. However, the extreme sparsity of a user-POI matrix creates a severe challenge. To cope with this challenge, we propose a unified probabilistic generative model, the Topic-Region Model (TRM), to simultaneously discover the semantic, temporal, and spatial patterns of users’ check-in activities, and to model their joint effect on users’ decision making for selection of POIs to visit. To demonstrate the applicability and flexibility of TRM, we investigate how it supports two recommendation scenarios in a unified way, that is, hometown recommendation and out-of-town recommendation. TRM effectively overcomes data sparsity by the complementarity and mutual enhancement of the diverse information associated with users’ check-in activities (e.g., check-in content, time, and location) in the processes of discovering heterogeneous patterns and producing recommendations. To support real-time POI recommendations, we further extend the TRM model to an online learning model, TRM-Online, to track changing user interests and speed up the model training. In addition, based on the learned model, we propose a clustering-based branch and bound algorithm (CBB) to prune the POI search space and facilitate fast retrieval of the top-k recommendations. We conduct extensive experiments to evaluate the performance of our proposals on two real-world datasets, including recommendation effectiveness, overcoming the cold-start problem, recommendation efficiency, and model-training efficiency. The experimental results demonstrate the superiority of our TRM models, especially TRM-Online, compared with state-of-the-art competitive methods, by making more effective and efficient mobile recommendations. In addition, we study the importance of each type of pattern in the two recommendation scenarios, respectively, and find that exploiting temporal patterns is most important for the hometown recommendation scenario, while the semantic patterns play a dominant role in improving the recommendation effectiveness for out-of-town users.
conference on information and knowledge management | 2016
Min Xie; Hongzhi Yin; Hao Wang; Fanjiang Xu; Weitong Chen; Sen Wang
With the rapid prevalence of smart mobile devices and the dramatic proliferation of location-based social networks (LBSNs), location-based recommendation has become an important means to help people discover attractive and interesting points of interest (POIs). However, the extreme sparsity of user-POI matrix and cold-start issue create severe challenges, causing CF-based methods to degrade significantly in their recommendation performance. Moreover, location-based recommendation requires spatiotemporal context awareness and dynamic tracking of the users latest preferences in a real-time manner. To address these challenges, we stand on recent advances in embedding learning techniques and propose a generic graph-based embedding model, called GE, in this paper. GE jointly captures the sequential effect, geographical influence, temporal cyclic effect and semantic effect in a unified way by embedding the four corresponding relational graphs (POI-POI, POI-Region, POI-Time and POI-Word)into a shared low dimensional space. Then, to support the real-time recommendation, we develop a novel time-decay method to dynamically compute the users latest preferences based on the embedding of his/her checked-in POIs learnt in the latent space. We conduct extensive experiments to evaluate the performance of our model on two real large-scale datasets, and the experimental results show its superiority over other competitors, especially in recommending cold-start POIs. Besides, we study the contribution of each factor to improve location-based recommendation and find that both sequential effect and temporal cyclic effect play more important roles than geographical influence and semantic effect.
conference on information and knowledge management | 2015
Hongzhi Yin; Xiaofang Zhou; Yingxia Shao; Hao Wang; Shazia Wasim Sadiq
Point-of-Interest (POI) recommendation has become an important means to help people discover attractive and interesting locations, especially when users travel out of town. However, extreme sparsity of user-POI matrix creates a severe challenge. To cope with this challenge, a growing line of research has exploited the temporal effect, geographical-social influence, content effect and word-of-mouth effect. However, current research lacks an integrated analysis of the joint effect of the above factors to deal with the issue of data-sparsity, especially in the out-of-town recommendation scenario which has been ignored by most existing work. In light of the above, we propose a joint probabilistic generative model to mimic user check-in behaviors in a process of decision making, which strategically integrates the above factors to effectively overcome the data sparsity, especially for out-of-town users. To demonstrate the applicability and flexibility of our model, we investigate how it supports two recommendation scenarios in a unified way, i.e., home-town recommendation and out-of-town recommendation. We conduct extensive experiments to evaluate the performance of our model on two real large-scale datasets in terms of both recommendation effectiveness and efficiency, and the experimental results show its superiority over other competitors.
international conference on data engineering | 2013
Hongzhi Yin; Bin Cui; Hua Lu; Yuxin Huang; Junjie Yao
Web 2.0 users generate and spread huge amounts of messages in online social media. Such user-generated contents are mixture of temporal topics (e.g., breaking events) and stable topics (e.g., user interests). Due to their different natures, it is important and useful to distinguish temporal topics from stable topics in social media. However, such a discrimination is very challenging because the user-generated texts in social media are very short in length and thus lack useful linguistic features for precise analysis using traditional approaches. In this paper, we propose a novel solution to detect both stable and temporal topics simultaneously from social media data. Specifically, a unified user-temporal mixture model is proposed to distinguish temporal topics from stable topics. To improve this models performance, we design a regularization framework that exploits prior spatial information in a social network, as well as a burst-weighted smoothing scheme that exploits temporal prior information in the time dimension. We conduct extensive experiments to evaluate our proposal on two real data sets obtained from Del.icio.us and Twitter. The experimental results verify that our mixture model is able to distinguish temporal topics from stable topics in a single detection process. Our mixture model enhanced with the spatial regularization and the burst-weighted smoothing scheme significantly outperforms competitor approaches, in terms of topic detection accuracy and discrimination in stable and temporal topics.
IEEE Transactions on Knowledge and Data Engineering | 2017
Hongzhi Yin; Weiqing Wang; Hao Wang; Ling Chen; Xiaofang Zhou
Point-of-interest (POI) recommendation has become an important way to help people discover attractive and interesting places, especially when they travel out of town. However, the extreme sparsity of user-POI matrix and cold-start issues severely hinder the performance of collaborative filtering-based methods. Moreover, user preferences may vary dramatically with respect to the geographical regions due to different urban compositions and cultures. To address these challenges, we stand on recent advances in deep learning and propose a Spatial-Aware Hierarchical Collaborative Deep Learning model (SH-CDL). The model jointly performs deep representation learning for POIs from heterogeneous features and hierarchically additive representation learning for spatial-aware personal preferences. To combat data sparsity in spatial-aware user preference modeling, both the collective preferences of the public in a given target region and the personal preferences of the user in adjacent regions are exploited in the form of social regularization and spatial smoothing. To deal with the multimodal heterogeneous features of the POIs, we introduce a late feature fusion strategy into our SH-CDL model. The extensive experimental analysis shows that our proposed model outperforms the state-of-the-art recommendation models, especially in out-of-town and cold-start recommendation scenarios.
international conference on data engineering | 2016
Bolong Zheng; Kai Zheng; Xiaokui Xiao; Han Su; Hongzhi Yin; Xiaofang Zhou; Guohui Li
It is nowadays quite common for road networks to have textual contents on the vertices, which describe auxiliary information (e.g., business, traffic, etc.) associated with the vertex. In such road networks, which are modelled as weighted undirected graphs, each vertex is associated with one or more keywords, and each edge is assigned with a weight, which can be its physical length or travelling time. In this paper, we study the problem of keyword-aware continuous k nearest neighbour (KCkNN) search on road networks, which computes the k nearest vertices that contain the query keywords issued by a moving object and maintains the results continuously as the object is moving on the road network. Reducing the query processing costs in terms of computation and communication has attracted considerable attention in the database community with interesting techniques proposed. This paper proposes a framework, called a Labelling AppRoach for Continuous kNN query (LARC), on road networks to cope with KCkNN query efficiently. First we build a pivot-based reverse label index and a keyword-based pivot tree index to improve the efficiency of keyword-aware k nearest neighbour (KkNN) search by avoiding massive network traversals and sequential probe of keywords. To reduce the frequency of unnecessary result updates, we develop the concepts of dominance interval and region on road network, which share the similar intuition with safe region for processing continuous queries in Euclidean space but are more complicated and thus require more dedicated design. For high frequency keywords, we resolve the dominance interval when the query results changed. In addition, a path-based dominance updating approach is proposed to compute the dominance region efficiently when the query keywords are of low frequency. We conduct extensive experiments by comparing our algorithms with the state-of-the-art methods on real data sets. The empirical observations have verified the superiority of our proposed solution in all aspects of index size, communication cost and computation time.
advanced data mining and applications | 2011
Hongzhi Yin; Bin Cui; Yuxin Huang
Given a task T , a pool of experts with different skills, and a social network G that captures social relationships and various interactions among these experts, we study the problem of finding a wise group of experts , a subset of , to perform the task. We call this the Expert Group Formation problem in this paper. In order to reduce various potential social influence among team members and avoid following the crowd, we require that the members of not only meet the skill requirements of the task, but also be diverse. To quantify the diversity of a group of experts, we propose one metric based on the social influence incurred by the subgraph in G that only involves . We analyze the problem of Diverse Expert Group Formation and show that it is NP-hard. We explore its connections with existing combinatorial problems and propose novel algorithms for its approximation solution. To the best of our knowledge, this is the first work to study diversity in the social graph and facilitate its effect in the Expert Group Formation problem. We conduct extensive experiments on the DBLP dataset and the experimental results show that our framework works well in practice and gives useful and intuitive results.
international conference on data mining | 2017
Hongzhi Yin; Hongxu Chen; Xiaoshuai Sun; Hao Wang; Yang Wang; Quoc Viet Hung Nguyen
With the rapid rise of various e-commerce and social network platforms, users are generating large amounts of heterogeneous behavior data, such as purchasehistory, adding-to-favorite, adding-to-cart and click activities, and this kind of user behavior data is usually binary, only reflecting a users action or inaction (i.e., implicit feedback data). Tensor factorization is a promising means of modeling heterogeneous user behaviors by distinguishing different behavior types. However, ambiguity arises in the interpretation of the unobserved user behavior records that mix both real negative examples and potential positive examples. Existing tensor factorization models either ignore unobserved examples or treat all of them as negative examples, leading to either poor prediction performance or huge computation cost. In addition, the distribution of positive examples w.r.t. behavior types is heavily skewed. Existing tensor factorization models would bias towards the type of behaviors with a large number of positive examples. In this paper, we propose a scalable probabilistic tensor factorization model (SPTF) for heterogeneous behavior data and develop a novel negative sampling technique to optimize SPTF by leveraging both observed and unobserved examples with much lower computational costs and higher modeling accuracy. To overcome the issue of the heavy skewness of the behavior data distribution, we propose a novel adaptive ranking-based positive sampling approach to speed up the model convergence and improve the prediction accuracy for sparse behavior types. Our proposed model optimization techniques enable SPTF to be scalable to large-scale behavior datasets. Extensive experiments have been conducted on a large-scale e-commerce dataset, and the experimental results show the superiority of our proposed SPTF model in terms of prediction accuracy and scalability.