[PDF] Spatial Object Recommendation with Hints: When Spatial Granularity Matters

Abstract

Existing spatial object recommendation algorithms generally treat objects identically when ranking them. However, spatial objects often cover different levels of spatial granularity and thereby are heterogeneous. For example, one user may prefer to be recommended a region (say Manhattan), while another user might prefer a venue (say a restaurant). Even for the same user, preferences can change at different stages of data exploration. In this paper, we study how to support top-k spatial object recommendations at varying levels of spatial granularity, enabling spatial objects at varying granularity, such as a city, suburb, or building, as a Point of Interest (POI). To solve this problem, we propose the use of a POI tree, which captures spatial containment relationships between POIs. We design a novel multi-task learning model called MPR (short for Multi-level POI Recommendation), where each task aims to return the top-k POIs at a certain spatial granularity level. Each task consists of two subtasks: (i) attribute-based representation learning; (ii) interaction-based representation learning. The first subtask learns the feature representations for both users and POIs, capturing attributes directly from their profiles. The second subtask incorporates user-POI interactions into the model. Additionally, MPR can provide insights into why certain recommendations are being made to a user based on three types of hints: user-aspect, POI-aspect, and interaction-aspect. We empirically validate our approach using two real-life datasets, and show promising performance improvements over several state-of-the-art methods.

Full PDF

SSpatial Object Recommendation with Hints: When SpatialGranularity Matters

Hui Luo , † , Jingbo Zhou ∗ , , , Zhifeng Bao , Shuangli Li , ,J. Shane Culpepper , Haochao Ying , Hao Liu , , Hui Xiong ∗ , RMIT University, Business Intelligence Lab, Baidu Research National Engineering Laboratory of Deep Learning Technology and Application, China University of Science and Technology of China, Zhejiang University, Rutgers University{hui.luo,zhifeng.bao,shane.culpepper}@rmit.edu.au{zhoujingbo,liuhao30,v_lishuangli}@baidu.com,[email protected],[email protected]

ABSTRACT

Manhattan ), while another user might prefera venue (say a restaurant ). Even for the same user, preferencescan change at different stages of data exploration. In this paper,we study how to support top- 𝑘 spatial object recommendationsat varying levels of spatial granularity, enabling spatial objects atvarying granularity, such as a city, suburb, or building, as a Pointof Interest (POI). To solve this problem, we propose the use of a POI tree , which captures spatial containment relationships betweenPOIs. We design a novel multi-task learning model called

MPR (short for M ulti-level P OI R ecommendation ), where each task aimsto return the top- 𝑘 POIs at a certain spatial granularity level. Eachtask consists of two subtasks: (i) attribute-based representationlearning; (ii) interaction-based representation learning. The firstsubtask learns the feature representations for both users and POIs,capturing attributes directly from their profiles. The second subtaskincorporates user-POI interactions into the model. Additionally,

MPR can provide insights into why certain recommendations arebeing made to a user based on three types of hints: user-aspect , POI-aspect , and interaction-aspect . We empirically validate our approachusing two real-life datasets, and show promising performance im-provements over several state-of-the-art methods.

KEYWORDS

Spatial Object Recommendation; POI Tree; Attention Network

ACM Reference Format:

Hui Luo, Jingbo Zhou, Zhifeng Bao, Shuangli Li, J. Shane Culpepper, HaochaoYing, Hao Liu, Hui Xiong. 2020. Spatial Object Recommendation with Hints: † This work was done when Hui Luo visited Baidu Research. ∗ Jingbo Zhou and Hui Xiong are corresponding authors.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

SIGIR ’20, July 25–30, 2020, Virtual Event, China © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-8016-4/20/07...$15.00https://doi.org/10.1145/3397271.3401090

When Spatial Granularity Matters. In

Proceedings of the 43rd InternationalACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR ’20), July 25–30, 2020, Virtual Event, China.

ACM, New York, NY, USA,10 pages. https://doi.org/10.1145/3397271.3401090

Spatial object recommendation is an important location-based ser-vice with many practical applications, where the most relevantvenues [31, 34] or regions [21] are recommended based on spatial,temporal, and textual information. Existing spatial object recom-mendation methods [20, 24, 29] usually do not differentiate thegranularity of spatial objects (i.e., building versus suburb), whenranking a list of top- 𝑘 objects. However, the most appropriate gran-ularity of spatial object ranking may vary at different stages ofdata exploration for a user, and can vary from one user to another,which is hard to predict a priori. Choosing the most appropriatespatial granularity based on the recommendation scenario is oftencritical [1]. For example, if a user is planning to visit America for aholiday, they may initially want to be recommended a particularregion such as “Los Angeles” or “New York” at the beginning ofdata exploration. The user might also wish to drill down for spe-cific venue recommendations such as a restaurant or a bar as theexploration continues.Therefore, user expectations at varying spatial granularity ofPOIs (Point of Interests, i.e., a region or a venue) should be satisfiedby the recommender system adaptively and dynamically. Note thata recommended region or venue is referred to as a POI for easeof readability in this paper. We refer to this as the multi-level POIrecommendation problem, which aims to recommend the top- 𝑘 POIcandidates from each level of spatial granularity. Dynamic selectionof the most appropriate recommendation level(s) is driven by userinteractions and application constraints. Elucidating all integration-specific details of our proposed model for an end-to-end productionsystem is beyond the scope of this paper.To solve this multi-level POI recommendation problem, a straight-forward solution is to build a separate recommendation model foreach level of spatial granularity, and then apply an existing POIrecommendation algorithm directly. However, this approach hasone drawback: it may not fully leverage mutual information amongPOIs at different spatial granularity levels. For example, a user mayprefer to visit an area because of the POIs contained in that area.Therefore, a major challenge must be addressed:

How can we achievea one-size-fits-all model to make effective recommendations at everylevel of spatial granularity?

In other words, instead of designing a r X i v : . [ c s . I R ] J a n igure 1: User check-in records at varying spatial granularity.a best-match recommendation model for level independently, thespatial containment relationships between all of the levels shouldbe considered. If a user has visited a POI 𝑝 (like a shop), there willbe a check-in record for the parent POI(s) (like a shopping mall)covering 𝑝 . Such information can heavily influence and assist therecommendation of the parent POI(s).In this paper, POIs are structured as a tree based on their spatialcontainment — defined as the relationship of a child POI whichis fully covered by a higher level POI [11]. For example, a restau-rant is within a mall, which in turn is within a suburb (CBD) ofa city in Figure 1, allowing recommendation to be made at anylevel (i.e., a particular spatial granularity) in the POI tree. We thenpropose a new technique called MPR (short for M ulti-level P OI R ecommendation), which employs multi-task learning in orderto jointly train the model using every available level of spatialgranularity. Each task corresponds to recommending POIs locatedat a certain spatial granularity. Our approach is able to leveragedata that is much sparser than prior work [12, 15, 24, 29], whichused only the check-in metadata found in commonly datasets suchas Foursquare or Gowalla. Our two test collections were gener-ated using real-life data from an online map service which is alsomore heterogeneous than the collections commonly used in simi-lar studies. Moreover, the sparsity of a user-POI check-in matrixfor Foursquare and Gowalla – the most commonly used ones byexisting work – is around 99 .

9% [18], while our datasets are muchmore sparse (i.e., around 99 . POI context graph to describe the geospatial influ-ence factors between any two POIs at the same level, which mapsthree different sources of spatial relationships — co-search , co-visit ,and geospatial distance .Lastly, it is worth noting that our proposed model can be usedto directly justify recommendations to a user for any level of spa-tial granularity. Providing justification for recommendations hasbeen shown to be an important factor in user satisfaction [2, 3].For instance, when Alice is in a dilemma about a recommendedPOI, we can provide recommendation hints along with the recom-mended POIs, in three aspects where the latter two are unique in Table 1:

A summary of key notations. The midlines partition thevariables by section – Section 2, 3, and 4 respectively.

Symbol Description Dimension U , P The set of users and POIs, respectively I The set of user-POI interactions T The POI tree with 𝐿 levels 𝑙 The 𝑙 -th level of T 𝑚 (or 𝑛 𝑙 ) The number of users (or POIs at the 𝑙 -th level of T ) 𝑝 𝑙𝑖 The 𝑖 -th POI node located at the 𝑙 -th level of T 𝐶 ( 𝑝 𝑙𝑖 ) The child POIs rooted at 𝑝 𝑙𝑖 𝑓 𝑢 (or 𝑓 𝑝 ) The number of users’ (or POIs’) explicit features; 𝑓 = 𝑓 𝑢 + 𝑓 𝑝 . 𝑟 The latent factor size of explicit features. X (or Y 𝑙 ) The observed matrix between users (or POIs at the 𝑙 -th levelof T ) and their attributes R 𝑚 × 𝑓 (or R 𝑛𝑙 × 𝑓 ) U 𝑢 (or U 𝑙𝑝 ) The explicit feature representations of users (or POIs at the 𝑙 -th level of T ) R 𝑚 × 𝑟 (or R 𝑛𝑙 × 𝑟 ) V 𝑇 The shared latent explicit feature representations of both usersand POIs at the 𝑙 -th level of T R 𝑟 × 𝑓 XA (or YA 𝑙 ) The direct attribute matrix of users (or POIs at the 𝑙 -th level of T ) R 𝑚 × 𝑓𝑢 (or R 𝑛𝑙 × 𝑓𝑝 ) XT (or YT 𝑙 ) The inverse attribute matrix of users (or POIs at the 𝑙 -th levelof T ) R 𝑚 × 𝑓 𝑝 (or R 𝑛𝑙 × 𝑓𝑢 ) 𝑝 + 𝑗 , 𝑝 − 𝑗 The positive and negative POI instances 𝑟 𝑙 The latent factor size of implicit features at the 𝑙 -th level of T 𝑑 The hidden layer size of the attention network O 𝑙 The user-POI check-in matrix R 𝑚 × 𝑛𝑙 S 𝑙 The feature-based check-in matrix R 𝑚 × 𝑛𝑙 G 𝑙 The historical check-in matrix R 𝑚 × 𝑛𝑙 H 𝑙𝑢 (or H 𝑙𝑝 ) The implicit feature representations of users (or POIs at the 𝑙 -th level of T ) R 𝑚 × 𝑟𝑙 (or R 𝑛𝑙 × 𝑟𝑙 ) A 𝑙𝑝 The inner-level propagated POI feature representation R 𝑛𝑙 × 𝑟𝑙 + our model. We can provide: (1) user-aspect hint based on the userprofile: Chinatown appears to be an important area since she lovesdumplings based on her user profile. (2) POI-aspect hint based onthe POI tree: the particular region (such as the CBD) is initially rec-ommended since it contains several relevant shops and restaurantsof interest to the user. (3) interaction-aspect hint based on the POIcontext graph: the

State Library is also recommended because shehas visited a library several times before.In summary, we make the following contributions: • We are the first to explore the multi-level POI recommendationproblem, which aims to simultaneously recommend POIs at dif-ferent levels of spatial granularity (Section 2). • We propose a novel model

MPR using multi-task learning, whereeach task caters for one level of the spatial granularity. Each taskhas two subtasks: attribute-based representation learning (Sec-tion 3) and interaction-based representation learning (Section 4). • Our model can provide specific hints on why certain POI recom-mendations are being made, namely user-aspect , POI-aspect , and interaction-aspect hints (Section 5). • We perform extensive experiments on two large-scale real-lifedatasets to evaluate the performance of our model. Our experi-mental results show promising improvements over several state-of-the-art POI recommendation algorithms (Section 6).

Throughout this paper, all vectors are represented by bold lowercaseletters and are column vectors (e.g., x ), where the 𝑖 -th element isshown as a scalar (e.g., x 𝑖 ). All matrices are denoted by bold upperase letters (e.g., M ); the element located in the 𝑖 -th row and 𝑗 -thcolumn of matrix M is marked as M 𝑖,𝑗 . Also, we use calligraphiccapital letters (e.g., U ) to denote sets and use normal lowercaseletters (e.g., 𝑢 ) to denote scalars. Note that, the superscript 𝑙 is usedin certain symbols to denote the 𝑙 -th level of T , such as Y 𝑙 . Forclarity of exposition, Table 1 summarizes the key notations used inthis work, where only the dimensions of matrices are reserved. In a recommender system, there are a set of users U = { 𝑢 , 𝑢 , ..., 𝑢 𝑚 } and a set of POIs P = { 𝑝 , 𝑝 , ..., 𝑝 𝑛 } available. Each user 𝑢 𝑖 ∈ U has an attribute set derived from a user profile, such as age andhobby. Each POI 𝑝 𝑗 ∈ P has two components: (i) a parent POI, in-dicating that 𝑝 𝑗 is covered geospatially, and the parent POI may beempty if 𝑝 𝑗 is a root area; (ii) an attribute set, which is derived fromthe POI profile and typically contains attributes such as a tag orcategory. Based on spatial containment relationships among POIs,we construct a POI tree (see Definition 1) over P to predict POIsfor each level of spatial granularity. Definition 1. (POI Tree) A POI tree T is a tree structure of 𝐿 levels,where each node represents a spatial object, 𝐻 𝑙 denotes the 𝑙 -thlevel of T , and 𝑛 𝑙 is the number of POI nodes at level 𝐻 𝑙 . A node 𝑝 𝑙𝑖 is the parent of a node 𝑝 𝑙 + 𝑗 if 𝑝 𝑙𝑖 contains 𝑝 𝑙 + 𝑗 in geo-space. Wedenote 𝐶 ( 𝑝 𝑙𝑖 ) as all child POIs rooted at 𝑝 𝑙𝑖 . An illustrative exampleof a POI tree is shown in Figure 1. User-POI Interaction.

Each instance of the interaction I betweena user 𝑢 𝑖 and a POI 𝑝 𝑗 is a tuple ⟨ 𝑢 𝑖 , 𝑝 𝑗 , 𝑠 𝑖 𝑗 ⟩ , where the score 𝑠 𝑖 𝑗 corresponds to a “binary value”, indicating whether 𝑢 𝑖 has visited 𝑝 𝑗 (e.g., 𝑠 𝑖 𝑗 = 𝑢 𝑖 has checked in 𝑝 𝑗 ; otherwise, 𝑠 𝑖 𝑗 = Definition 2. (Multi-level POI Recommendation) Given a user,their historical user-POI interactions, a pre-built POI tree T , and aparameter 𝑘 , return the top- 𝑘 most relevant POIs at each level of T . Model Architecture.

The architecture of the model

MPR is shownin Figure 2. Taking the input of historical user-POI interactions anda pre-built POI tree T based on spatial containment relationship, MPR outputs the top- 𝑘 POIs for each level of T . To achieve the goalshown in Definition 2, we leverage multi-task learning to implementa joint optimization over all levels of the POI tree, where each taskincludes two main subtasks for the given POI level: attribute-basedrepresentation learning (Section 3) and interaction-based representa-tion learning (Section 4).The first subtask explores the attributes of both users and POIsby mapping them to two embedding spaces: X and Y 𝑙 . These areinduced from two sources of information: (i) XA and YA 𝑙 , whichare attributes directly derived from the user and POI profile, respec-tively. (ii) XT and YT 𝑙 , which are derived from the user and POIattribute distributions obtained from check-in statistics.The second subtask focuses on how to model the interactionsbetween users and POIs to further capture personal preferences. Ad-ditionally, we model two important matrices: (i) the inter-level POIfeatures matrix A 𝑙𝑝 propogated from child POIs using an attentionmechanism; and (ii) the geospatial influence matrix U 𝑙𝑔 between POIs derived from a POI context graph (Section 4.2.2), each edgeof which contains one of the three types of spatial relationshipsbetween any two POIs at the same level, i.e., co-search , co-visit , and geospatial distance .These two subtasks are combined using shared latent factors (i.e., U 𝑢 and U 𝑙𝑝 ), in order to guarantee that the feature representations ofusers and POIs at the 𝑙 -th level of T will remain unchanged despiteattributes and interactions being modeled in separate subtasks. Objective.

As each task in

MPR incorporates two different learningobjectives for each subtask, we train a joint model by optimizingthe sum of two loss functions as follows. L = 𝜆 L + 𝜆 L + || Θ || 𝐹 (1)where L and L are the loss functions for the first and secondobjectives applied across all levels of T . The computational detailsof these two loss functions are described further in Section 3 andSection 4, respectively. 𝜆 and 𝜆 are hyper-parameters to balancethe trade-off between the two loss functions, and || Θ || 𝐹 is the L2regularization used by the model to minimize overfitting, and || · || 𝐹 is the Frobenius norm. Traditional methods usually leverage historical user-POI interac-tions by mapping users and POIs to a shared latent space via fac-torization over a user-POI affinity matrix. However, the learnedlatent space rarely provides any insight into why a user prefers aPOI [37]. Worse still, such data is often quite sparse [8], which maynot be sufficient to provide meaningful signals.To address this limitation, we leverage the attributes of both usersand POIs, which provide complimentary evidences (i.e., the “user-aspect hint” introduced in Section 5) to reveal to a user why certainPOIs are being recommended. This allows a user to interactivelyprovide additional information to align the current recommenda-tions with their information need. We refer to these attributes thatcan be directly derived from the dataset as explicit features , e.g.,user’s age and hobby. In contrast, implicit features correspond tothe attributes inferred from available data. To this end, we learn anattribute-based representation for our recommender system.

Before introducing details on model training using the above at-tributes, we define the first loss function to be used in our approach.Similar to previous matrix factorization models for user-POI check-in records, we derive a factorization model over the observed user-attribute matrix X ∈ R 𝑚 × 𝑓 and POI-attribute matrix Y 𝑙 ∈ R 𝑛 𝑙 × 𝑓 tolearn explicit feature representations for users and POIs, where 𝑓 isthe total number of explicit features of users and POIs. This can beachieved by minimizing the following loss function: L = || U 𝑢 V 𝑇 − X || 𝐹 + 𝐿 ∑︁ 𝑙 = || U 𝑙𝑝 V 𝑇 − Y 𝑙 || 𝐹 (2)where U 𝑢 ∈ R 𝑚 × 𝑟 and U 𝑙𝑝 ∈ R 𝑛 𝑙 × 𝑟 are two learned parameters tomodel the explicit feature representations of users and POIs, whichare then combined with a shared latent vector V 𝑇 ∈ R 𝑟 × 𝑓 . Here, 𝑟 is the latent factor magnitude. igure 2: Architecture of the multi-level POI recommendation (

MPR ) model. (Best viewed in color)

We first show how to build the matrix X to incorporate the attributevalues of users. X is in turn a concatenation of two matrices cap-turing different contexts – (1) a direct attribute matrix XA directlyobtained from user attributes; (2) an inverse attribute matrix XT induced from the empirical user check-in distribution: X = XA ⊕ XT (3)where XA ∈ R 𝑚 × 𝑓 𝑢 , XT ∈ R 𝑚 × 𝑓 𝑝 , X ∈ R 𝑚 × 𝑓 , 𝑓 = 𝑓 𝑢 + 𝑓 𝑝 , and ⊕ is the concatenation operator.Similarly, we construct the attribute matrix Y 𝑙 for POIs at the 𝑙 -th level of T , which in turn is a concatenation of a direct attributematrix YA 𝑙 and an inverse attribute matrix YT 𝑙 : Y 𝑙 = YA 𝑙 ⊕ YT 𝑙 (4)where YA 𝑙 ∈ R 𝑛 𝑙 × 𝑓 𝑝 , YT 𝑙 ∈ R 𝑛 𝑙 × 𝑓 𝑢 , Y 𝑙 ∈ R 𝑛 𝑙 × 𝑓 . We use 𝑓 𝑢 and 𝑓 𝑝 to denote the number of user features and POI features gener-ated from their respective attributes. The concatenation process isillustrated in the lower left corner of Figure 2. Constructing the direct attribute matrix.

Raw attribute valuescan be numerical (e.g., the age is 18) or binary (e.g., a hobby suchas reading ). We empirically define various decision rules to splitan attribute 𝑎 𝑘 into two decision features. For any numerical at-tribute (e.g., age), a threshold 𝜃 𝑘 is selected to split the attributeinto [ 𝑎 𝑘 < 𝜃 𝑘 ] and [ 𝑎 𝑘 ≥ 𝜃 𝑘 ] . Note that, multiple threshold valuescan also be used to split one attribute empirically, which generatesa corresponding number of features. For a binary attribute (e.g.,country), we have [ 𝑎 𝑘 = 𝜃 𝑘 ] or [ 𝑎 𝑘 ≠ 𝜃 𝑘 ] . XA 𝑖,𝑘 = (cid:26) 𝑢 𝑖 satisfies the decision rule over 𝑎 𝑘 𝑙 -th level of T , we model the direct attribute matrices XA (Eq. 5) and YA 𝑙 (Eq. 6) as a concatenation of one-hot vectors,where an element of value 1 denotes a fulfilled decision rule. YA 𝑙 𝑗,𝑘 = (cid:26) 𝑝 𝑗 satisfies the decision rule over 𝑎 𝑘 Constructing the inverse attribute matrix.

We assume thatusers visit only the venues they are interested in, e.g., if Alice oftengoes to the library, she may be a book-lover. However, such info hasto be inferred as it may not be available in the user profile (hobbies).This assumption allows us to enrich the raw data, and is a form of weak supervision [5]. Leveraging the attributes of POIs visited byusers in this manner somewhat mitigates sparsity and cold-startissues commonly encountered in recommendation modeling.If a POI 𝑝 𝑗 , which has an attribute 𝑎 𝑘 and was visited by a user 𝑢 𝑖 for 𝑡𝑝 𝑖 𝑗𝑘 times, then 𝑡𝑝 𝑖𝑘 = (cid:205) 𝑡𝑝 𝑖 𝑗𝑘 and each element in theuser inverse attribute matrix XT is computed as follows (assumemin-max normalization): XT 𝑖,𝑘 =  𝑡𝑝 𝑖𝑘 − 𝑡𝑝 ↓ 𝑖𝑘 𝑡𝑝 ↑ 𝑖𝑘 − 𝑡𝑝 ↓ 𝑖𝑘 If 𝑢 𝑖 visited 𝑝 𝑗 that has 𝑎 𝑘 𝑡𝑝 ↑ 𝑖𝑘 and 𝑡𝑝 ↓ 𝑖𝑘 are the highest and lowest check-in frequencyfor 𝑢 𝑖 , respectively.Similarly, attributes for the users who checked in a specific POI 𝑝 𝑗 represent the inverse attributes . Suppose a POI was visited by 𝑡𝑢 𝑖 𝑗𝑘 users who have an attribute 𝑎 𝑘 , then 𝑡𝑢 𝑗𝑘 = (cid:205) 𝑡𝑢 𝑖 𝑗𝑘 and eachelement in the POI inverse attribute matrix YT 𝑙 is: YT 𝑙 𝑗,𝑘 =  𝑡𝑢 𝑗𝑘 − 𝑡𝑢 ↓ 𝑗𝑘 𝑡𝑢 ↑ 𝑗𝑘 − 𝑡𝑢 ↓ 𝑗𝑘 If 𝑝 𝑗 was visited by 𝑢 𝑖 who has 𝑎 𝑘 𝑡𝑢 ↑ 𝑗𝑘 and 𝑡𝑢 ↓ 𝑗𝑘 are the largest and the smallest number ofusers who visit 𝑝 𝑗 , respectively. INTERACTION-BASED REPRESENTATIONLEARNING

In this section, we will show how to further boost the recommen-dation performance by exploiting user-POI interactions.

We leverage the Bayesian Personalized Ranking (BPR) [22] principleto construct the loss function L for the second subtask. Specif-ically, following the popular negative sampling strategy [13, 26],a negative POI instance 𝑝 − 𝑗 which the user never visited is pairedwith a positive POI instance 𝑝 + 𝑗 , and the pairwise log loss can becomputed by maximizing the difference between the predictionscores of the positive and negative samples. L is shown as follows: L = − 𝐿 ∑︁ 𝑙 = 𝑚 ∑︁ 𝑖 = 𝑛 𝑙 ∑︁ 𝑗 = ln 𝜎 ( O 𝑙𝑖,𝑝 + 𝑗 − O 𝑙𝑖,𝑝 − 𝑗 ) (9)where O 𝑙𝑖,𝑝 + 𝑗 (or O 𝑙𝑖,𝑝 − 𝑗 ) is the predicted score w.r.t. a positive POI 𝑝 + 𝑗 (or a negative POI 𝑝 − 𝑗 ) located at the 𝑙 -th POI level for the 𝑖 -th user.Here, we add the minus sign in the front to match the minimizationobjective with Eq. 1. The user-POI check-in matrix O 𝑙 ∈ R 𝑚 × 𝑛 𝑙 willbe further elaborated next. We incorporate two matrices S 𝑙 ∈ R 𝑚 × 𝑛 𝑙 and G 𝑙 ∈ R 𝑚 × 𝑛 𝑙 into O 𝑙 through a linear combination, where S 𝑙 denotes the feature-basedcheck-in matrix , and G 𝑙 is the historical check-in matrix . A config-urable parameter 𝜏 is used to control the relative contributions ofthese two matrices, resulting in the following equation: O 𝑙 = S 𝑙 + 𝜏 G 𝑙 (10)By combining S 𝑙 and G 𝑙 , we obtain the final top- 𝑘 recommendedresults sorted by similarity score O 𝑙 . This process is illustrated inthe lower right corner of Figure 2. Next, we show how to construct S 𝑙 and G 𝑙 . In order tofully leverage the interaction data of users and POIs, the feature-based check-in matrix S 𝑙 located at the 𝑙 -th level is built based on thefeature representation P 𝑙 and Q 𝑙 w.r.t. users and POIs, respectively: S 𝑙 = P 𝑙 ( Q 𝑙 ) 𝑇 , P 𝑙 = U 𝑢 ⊕ H 𝑙𝑢 ⊕ A 𝑙𝑢 , Q 𝑙 = U 𝑙𝑝 ⊕ H 𝑙𝑝 ⊕ A 𝑙𝑝 (11)In Eq. 11, P 𝑙 ∈ R 𝑚 ×( 𝑟 + 𝑟 𝑙 + 𝑟 𝑙 + ) is a concatenation of three ma-trices w.r.t. users: U 𝑢 , H 𝑙𝑢 , and A 𝑙𝑢 . Specifically, U 𝑢 is the explicitfeature representation of users. H 𝑙𝑢 ∈ R 𝑚 × 𝑟 𝑙 is the implicit featurerepresentation of users, and A 𝑙𝑢 ∈ R 𝑚 × 𝑟 𝑙 + is a trainable matrixparameter to match A 𝑙𝑝 in the same space. Here, 𝑟 𝑙 denotes thelatent factor size of implicit features at the 𝑙 -th level of T .Accordingly, Q 𝑙 ∈ R 𝑛 𝑙 ×( 𝑟 + 𝑟 𝑙 + 𝑟 𝑙 + ) incorporates three kinds ofinformation w.r.t. the POIs at the 𝑙 -th level of T : U 𝑙𝑝 is the ex-plicit feature representation, H 𝑙𝑝 ∈ R 𝑛 𝑙 × 𝑟 𝑙 is the implicit featurerepresentation, and A 𝑙𝑝 ∈ R 𝑛 𝑙 × 𝑟 𝑙 + is the inter-level POI feature rep-resentation propagated from child POIs with an attention network.Recall that U 𝑢 and U 𝑙𝑝 were described in Section 3. We nowdescribe the details on how to construct implicit feature represen-tations H 𝑙𝑢 and H 𝑙𝑝 , and how to produce an inter-level POI featurerepresentation A 𝑙𝑝 . Implicit feature representation.

Some features that influenceuser preferences may be implicit. For example, Alice might go tohistorical libraries because she loves the classical architecture there,or for other unknown reasons which cannot be inferred. Thesetypes of features can be learned by using two matrices H 𝑙𝑢 and H 𝑙𝑝 w.r.t. users and POIs, respectively. Inter-level propagated POI feature representation.

The fea-ture information covered by a child POI can also be used by itsparent POI. For instance, the attributes of child POIs (e.g., a restau-rant or a store) can be aggregated into its parent POI (e.g., a mall).In particular, for each parent POI 𝑝 𝑙𝑖 , we also propagate a learnedimplicit feature representation (i.e., an embedding vector h 𝑙 + 𝑗 in H 𝑙 + 𝑝 ) from each child POI 𝑝 𝑙 + 𝑗 to 𝑝 𝑙𝑖 , producing the inter-level fea-ture representation a 𝑙𝑖 for 𝑝 𝑙𝑖 to leverage the inter-level information.Here, we denote A 𝑙𝑝 ∈ R 𝑛 𝑙 × 𝑟 𝑙 + as the inter-level POI feature repre-sentation matrix for all POIs at the 𝑙 -th level of T , where a 𝑙𝑖 is anembedding vector for a POI in A 𝑙𝑝 . Next, we show how to induce a 𝑙𝑖 in detail.One possible way to learn a 𝑙𝑖 is to augment the implicit featuresin all its child POIs. However, different child POIs might providedifferent contributions when influencing the parent POIs. For exam-ple, many users may visit a shopping mall (a parent POI) frequentlyfor a popular grocery store (a child POI) and nothing else.To mitigate this issue, we propagate learned implicit featuresfrom a child POI h 𝑙 + 𝑗 using various attention weights throughout T in order to learn the best inter-level feature representation a 𝑙𝑖 fora parent POI 𝑝 𝑙𝑖 . Specifically, we use a multi-layer perceptron (MLP)when learning attention weights each child POI 𝑝 𝑙 + 𝑗 rooted at 𝑝 𝑙𝑖 .  𝑤 𝑙 + 𝑗 = 𝐹 ( h 𝑙 + 𝑗 ) = 𝑅𝑒𝐿𝑈 ( d ( W 𝑙 + h 𝑙 + 𝑗 + b )) + 𝑏 ˜ 𝑤 𝑙 + 𝑗 = 𝜎 ( 𝑤 𝑙 + 𝑗 ) = 𝑒𝑥𝑝 ( 𝑤 𝑙 + 𝑗 ) (cid:205) 𝑝𝑙 + 𝑡 ∈ 𝐶 ( 𝑝𝑙𝑖 ) 𝑒𝑥𝑝 ( 𝑤 𝑙 + 𝑡 ) a 𝑙𝑖 = (cid:205) 𝑝 𝑙 + 𝑗 ∈ 𝐶 ( 𝑝 𝑙𝑖 ) ˜ 𝑤 𝑙 + 𝑗 h 𝑙 + 𝑗 (12)where the implicit feature embedding h 𝑙 + 𝑗 ∈ R 𝑟 𝑙 + of child POI 𝑝 𝑙 + 𝑗 is the input, and 𝑅𝑒𝐿𝑈 ( 𝑥 ) = 𝑚𝑎𝑥 ( , 𝑥 ) is applied as the activationfunction to produce 𝑤 𝑙 + 𝑗 in the first formula. W 𝑙 + ∈ R 𝑑 × 𝑟 𝑙 + isa transpose matrix, b ∈ R 𝑑 denotes a bias vector, 𝑏 refers to abias variable, and d ∈ R × 𝑑 projects the attention weight for a POInode where the hidden layer size of the attention network is 𝑑 . 𝐶 ( 𝑝 𝑙𝑖 ) indicates all child POIs rooted at 𝑝 𝑙𝑖 .After computing the attention weight 𝑤 𝑙 + 𝑗 , we normalize it toobtain ˜ 𝑤 𝑙 + 𝑗 using a softmax function 𝜎 (·) as shown in the secondformula. Finally, a 𝑙𝑖 is produced using the resulting child POIs andattention weights in the third formula. The complete architectureof our attention network mechanism is depicted in the centre ofFigure 2. Intuitively, a POI can-didate may be recommended if it is located near a previously visitedPOI. To exploit spatial containment, we first construct 𝐿 POI con-text graphs, one for each POI level. Each POI context graph embedsthe contextual information of the POIs. The mechanism used toncorporate contextual information between a POI candidate and avisited POI into our recommendation model is described next.

POI context graph.

For ease of illustration, we use a single POIcontext graph as an example and omit superscripts (i.e., 𝑙 ) whendenoting a particular level in T . Specifically, we represent a POIcontext graph as G = ⟨V , E⟩ , where V is the set of POIs, and E isthe set of edges between any two connected POIs. Given any twoPOIs 𝑝 and 𝑝 ( 𝑝 , 𝑝 ∈ V ), we define three types of edge relations,such that E can be further weighted using multiple geospatialinfluence factors. • Co-search.

If a user searches for a restaurant and a coffee shopwithin a short time interval using a map application, and thenvisits the restaurant, we can infer that a coffee shop has a higherlikelihood of relevance the next time the user views the map [42].Thus, we use 𝛿 ( 𝑝 , 𝑝 | Δ 𝑡 ) to denote the co-occurrence searchfrequency between two POIs 𝑝 and 𝑝 within a fixed sessioninterval Δ 𝑡 (e.g., 30 minutes) for all users. • Co-visit.

If a user first visits a restaurant and then goes to a coffeeshop and locations are being tracked for the user, we assume thatthe coffee shop has a higher priority for recommendations madewhen a user is located in a restaurant. We use 𝜓 ( 𝑝 , 𝑝 | Δ 𝑡 ) torepresent the visit frequency chronologically between 𝑝 and 𝑝 within a fixed time interval Δ 𝑡 (e.g., 30 minutes). • Geospatial distance.

According to Tobler’s first law of geography[23], “everything is related to everything else, but near thingsare more related than distant things”. The nearby objects oftenhave underlying relationships and influence, thus we also applya geospatial distance factor which captures the geographical in-fluence. Here, we use 𝜁 ( 𝑝 , 𝑝 ) to denote the inverse Euclideandistance between 𝑝 and 𝑝 .Note that G is constructed before training. The edge weightsderived using these three geospatial factors are normalized usingsigmoid function, which is defined as 𝜎 ( 𝑥 ) = /( + 𝑒𝑥𝑝 (− 𝑥 )) . Graph-based geospatial influence representation.

Given a POIcandidate 𝑝 𝑖 to be recommended and a historical POI check-in tra-jectory Q 𝑢 𝑘 for a user 𝑢 𝑘 , we define the geospatial influence matrix 𝑈 𝑙𝑔,𝑢 𝑘 ∈ R 𝑛 𝑙 × 𝑟 , and incorporate POI context info using Eq. 13. Sinceusing every visited POI from Q 𝑢 𝑘 is not scalable, we only choose asubset Q 𝑢 𝑘 𝑠 containing the top- 𝑡 frequently visited POIs from Q 𝑢 𝑘 for each user 𝑢 𝑘 such that Q 𝑢 𝑘 𝑠 ⊂ Q 𝑢 𝑘 and |Q 𝑢 𝑘 𝑠 | = 𝑡 . Specifically,we denote the embedding vector for 𝑝 𝑖 in 𝑈 𝑙𝑔,𝑢 𝑘 as 𝜇 𝑙,𝑝 𝑖 𝑔,𝑢 𝑘 , and theembedding vector for 𝑝 𝑗 in 𝑈 𝑙𝑝 as 𝜐 𝑙𝑝 𝑗 . Thus for the recommendedPOI 𝑝 𝑖 and the historical visited POI 𝑝 𝑗 , we can get: 𝜇 𝑙,𝑝 𝑖 𝑔,𝑢 𝑘 = 𝑡 ∑︁ 𝑝 𝑗 ∈Q 𝑢𝑘𝑠 𝛿 ( 𝑝 𝑖 , 𝑝 𝑗 | Δ 𝑡 ) 𝜓 ( 𝑝 𝑖 , 𝑝 𝑗 | Δ 𝑡 ) 𝜁 ( 𝑝 𝑖 , 𝑝 𝑗 ) 𝜐 𝑙𝑝 𝑗 (13)where 𝑡 is set to 3 in our experiment. Consequently, the embeddingvector 𝑔 𝑙𝑢 𝑘 for the user 𝑢 𝑘 in the historical check-in matrix G 𝑙 canbe computed as: 𝑔 𝑙𝑢 𝑘 = 𝜔 𝑘 ( 𝑈 𝑙𝑔,𝑢 𝑘 ) 𝑇 (14)where 𝜔 𝑘 is an embedding vector for 𝑢 𝑘 in 𝑈 𝑢 . Finally, G 𝑙 can bebuilt as G 𝑙 = [ . . . , ( 𝑔 𝑙𝑢 𝑘 ) 𝑇 . . . . ] 𝑇 for all users.Note that POI recommendation task can be formalized as a top- 𝑘 ranking problem. Once we have learned the model parameters in MPR , given a user, a ranking score for each POI located at the 𝑙 -th level of T can be obtained from the matrix O 𝑙 , and then the POIswith top- 𝑘 highest ranking scores will be recommended to the user. It is desirable to complement recommendations with an intuitionas to why certain results are produced, since it may not always beobvious to the user [36]. Our approach provides such additionalbenefit by enabling (i) user-aspect hint: user attributes used bythe model can be derived; (ii)

POI-aspect hint: when a parent POIis recommended, specific child POIs can be discovered; and (iii) interaction-aspect hint: if we recommend a new POI, we can high-light data from historical check-in venues that were most relevant.

User-aspect.

We assume that a user 𝑢 𝑖 has visited a POI 𝑝 𝑗 basedon the attributes of that POI. Our model captures the top- 𝐾 featuresfor 𝑢 𝑖 from an explicit feature embedding vector uf , obtained froma row vector from M 𝑢 matrix, which is computed by M 𝑢 = U 𝑢 V 𝑇 (as mentioned in Section 3). 𝐾 is set to 5 in our experiment. Thus,the column index set 𝐵 𝑖 = ⟨ 𝑏 𝑖 , 𝑏 𝑖 , ..., 𝑏 𝑖𝐾 ⟩ are the top- 𝐾 ranked in uf . The matrix M 𝑙𝑝 = U 𝑙𝑝 V 𝑇 is used to determine the POI explicitfeature embedding vector pf 𝑙 and find the corresponding POI fea-ture prediction values based on 𝐵 𝑖 . We can then expose the POIfeature with the highest value to 𝑢 𝑖 for recommendation evidence. Figure 3:

Illustration example on user-aspect hint.An illustrative example of a user-aspect hint is shown in Figure 3.After obtaining the two matrices M 𝑢 and M 𝑙𝑝 , say for the user 𝑢 , theuser feature with the highest 𝐾 values (assuming that 𝐾 =

3, then 𝐵 = ⟨ , , ⟩ ) is located in the second, third, and fourth columnusing the embedding uf . Then, the corresponding POI featureswhose column indexes drop into 𝐵 are identified, where the POIfeature with the highest value 0 . pf 𝑙 canthen be presented as a hint to the user. POI-aspect.

Intuitions about parent POI recommendations can bederived from the attention influence weights computed for eachchild POI (as described in Section 4.2.1). If we recommend a parentPOI 𝑝 to a user 𝑢 𝑖 , a set of important child POIs can be shown,ordered by attention scores. Thus the contribution ratio for eachchild POI 𝑝 𝑗 ( 𝑝 𝑗 ∈ 𝐶 ( 𝑝 )) over all child POIs 𝐶 ( 𝑝 ) is computed by a ui ⊙ a pj (cid:205) 𝑝𝑐 ∈ 𝐶 ( 𝑝𝑗 ) a ui ⊙ a pc , where a u i is a user embedding in A 𝑙𝑢 , a p j and a p c are two POI embedding vectors in A 𝑙𝑝 , and ⊙ is the dot productoperator. We mark the child POI with the highest contribution ratioas a “hot” POI which might attract the user. Interaction-aspect.

For any recommended POI 𝑝 𝑗 , we can eas-ily evaluate the contribution to examine whether the historicalcheck-in information influences the final prediction. We define thecontribution ratio as the prediction score G 𝑙𝑢 𝑖 ,𝑝 𝑗 (as introduced inSection 4.2.2) on historical interactions divided by the total pre-dicted score O 𝑙𝑢 𝑖 ,𝑝 𝑗 , which is 𝛾 = G 𝑙𝑢𝑖,𝑝𝑗 O 𝑙𝑢𝑖,𝑝𝑗 . If 𝛾 exceeds a threshold,e assume that the historical check-in information is the importantcontributor when recommending 𝑝 𝑗 to 𝑢 𝑖 . We investigate the following four research questions: • RQ1.

How does our proposed

MPR model perform when com-pared with the state-of-the-art POI recommendation methods? • RQ2.

How does

MPR perform when varying the hyper-parametersettings (e.g., embedding size )? • RQ3.

How can

MPR be used to provide recommendation hints? • RQ4.

How do different components in

MPR contribute to theoverall performance?We evaluate all methods using two real-world city-wide datasets,

Beijing and

Chengdu , from Baidu Maps , which is one of the mostpopularly used map services in China. Both datasets are randomlysampled as a portion of whole data from Baidu Maps. Due to spacelimitations, we only show the experimental results for the Beijing dataset, except when answering

RQ1 . Similar performance trendswere observed for the

Chengdu dataset when answering RQ2-RQ4. • The POI tree T . We trace the profile for each POI and then recur-sively search its parent POI to build T . A three-level POI tree isbuilt: 𝐻 , 𝐻 , and 𝐻 from top to bottom. For example, a spatialcontainment path in T on the Beijing dataset is

Wudaokou (a fa-mous neighborhood in

Beijing ) → Tsinghua University → TsinghuaGarden , which are located at 𝐻 , 𝐻 , and 𝐻 , respectively. • Check-in data.

Each check-in has the following info: userId, poiIdand a check-in timestamp. We filter out users with fewer than 10check-in POIs and POIs visited by fewer than 10 users. To buildthe check-in data on 𝐻 and 𝐻 , the check-in records from userswas used and we also aggregated the check-ins in the parentPOIs if any of their child POIs were visited. • User and POI profile.

Each user has their own attributes such asage and hobby, and 𝑓 𝑢 =173 user features are extracted. EachPOI has a parent POI, and its own attributes, where 𝑓 𝑝 =467representative POI features are available after filtering out thoseattributes shared by fewer than 10 POIs. Setup.

We partitioned the check-in data into a training set, a vali-dation set, and a test set. The first two months of check-ins wereused for training in the

Beijing testset, and the first three monthsin

Chengdu . The most recent 15 days of check-ins were used as thetest data and all remaining ones were used in the validation data inboth datasets. A negative sample was randomly selected for eachpositive sample during training. Any check-in that occurred in thetraining set was pruned from both the validation and test set, toensure that any POI recommended had never been visited by theuser before.For each model, the parameters were tuned on the validationdata to find the best values that maximized 𝑃 @ 𝑘 , and used for alltest predictions. Mini-batch adaptive gradient descent [6] is usedto control the learning step size dynamically. All experiments wereimplemented in Python on a GPU-CPU platform using a GTX 1080GPU. https://map.baidu.com Evaluation Metrics.

We adopt two commonly-used performancemetrics [18]: Precision ( 𝑃 @ 𝑘 ), and Normalized Discounted Cumu-lative Gain ( 𝑁 𝐷𝐶𝐺 @ 𝑘 ). These two metrics were used to evaluatethe model performance since 𝑃 @ 𝑘 is commonly used when eval-uating the coverage of recommendation results, and 𝑁 𝐷𝐶𝐺 @ 𝑘 captures additional signals about the overall effectiveness of thetop- 𝑘 recommendations, and supports graded relevance. Parameter Settings.

The parameters Δ 𝑡 and Δ 𝑡 are set to 30minutes by default. The adjustable parameter 𝜏 for graph-basedgeospatial influence is set to 1 by default, and the regularizationparameters are set as follows: 𝜆 = .

01 and 𝜆 = .

1, both of whichare set according to the experiment evaluation using the validationdataset. Furthermore, the hidden factor size 𝑟 𝑙 of the POI levels arefixed, and we empirically set the attention layer size 𝑑 to be thesame as 𝑟 𝑙 , which is equal to 150 discovered during the parametertuning experiment shown in Table 3. Baselines.

To validate the performance of our model

MPR , wecompared directly against the following state-of-the-art methods.Note that, these baselines all treat POIs as isomorphic, thus we haveto construct multiple models, one for each POI level, in order togenerate comparable output to our approach. • WRMF (Weighted Regularized Matrix Factorization) [10]: a point-wise latent factor model that distinguishes user observed andunobserved check-in data by using confidence values to adapt toimplicit feedback data from a user. • BPRMF (Bayesian Personalized Ranking) [22]: a pair-wise learn-ing framework for implicit feedback data, combined with matrixfactorization as the internal predictor. • PACE (Preference and Context Embedding) [29]: a neural embed-ding approach that generally combines user check-in behaviorsand context information from users and POIs through a graph-based semi-supervised learning framework. • SAE-NAD (Self-attentive Autoencoders with Neighbor-AwareInfluence) [20]: explicitly integrates spatial information into anautoencoder framework and uses a self-attention mechanism togenerate user representation from historical check-in records.

Table 2 compares all methods usingdifferent 𝑘 values on both datasets. The key observations can besummarized as follows. • Our model

MPR achieves the best performance on all metricsat every single level of spatial granularity, demonstrating therobustness of our model. Specifically, the NDCG@10 for

MPR on Beijing has: (1) a 4 .

5% improvement over the best baseline SAE-NAD at the 𝐻 level; (2) a 4 .

5% improvement over the strongestbaseline WRMF at the 𝐻 level; and (3) a 5% improvement overthe best baseline SAE-NAD at the 𝐻 level. • In term of P@10,

MPR substantially outperforms WRMF andBPRMF (42 .

6% and 4 .

7% respectively) at the 𝐻 level. This resultsfrom WRMF and BPRMF treating each POI level independentlywhen training the model. Clearly, MPR benefits from jointlyoptimizing the loss for every level of T in order to achieve itscollaborative training goal. able 2: Model performance comparisons on the

Beijing and

Chengdu dataset. Entries marked △ and ▲ correspond to statistical significanceusing a paired t-test with Bonferroni correction at 95% and 99 .

9% confidence intervals respectively. Comparisons are relative to PACE.

Level Model

Beijing Chengdu

P@5 NDCG@5 P@10 NDCG@10 P@20 NDCG@20 P@5 NDCG@5 P@10 NDCG@10 P@20 NDCG@20 𝐻 WRMF 0.056 ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ BPRMF 0.079 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ PACE 0.067 0.104 0.053 0.124 0.043 0.156 0.087 0.117 0.074 0.152 0.054 0.181SAE-NAD 0.078 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ MPR ▲ ▲ ▲ ▲ ▲ ▲ △ △ ▲ ▲ ▲ ▲ 𝐻 WRMF 0.009 0.017 0.007 0.022 0.005 0.026 ▲ ▽ ▲ ▲ ▲ ▲ ▼ ▼ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ △ ▲ ▲ ▲ 𝐻 WRMF 0.008 ▲ ▲ ▲ ▲ ▲ △ △ ▼ ▲ ▲ ▲ △ △ ▲ PACE 0.007 0.008 0.005 0.009 0.004 0.010 0.016 0.023 0.016 0.032 0.009 0.035SAE-NAD 0.008 ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ △ ▲ ▲ ▲ MPR ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ ▲ Table 3:

Impact of Parameters 𝜏 and 𝑟 𝑙 on Beijing dataset

Level Metric 𝜏 𝑟 𝑙 𝐻 P@10 0.067 0.067

NDCG@10 0.161 𝐻 P@10 0.007 𝐻 P@10 0.007 • Both PACE and SAE-NAD directly incorporate geospatial dis-tance information. We can observe that SAE-NAD outperformsPACE in most cases. One potential reason is that although PACEbuilds a context graph to model important geographical influ-ences, it ignores the historical visit information when extract-ing the POI-POI co-visit relations that are used by SAE-NAD.However, SAE-NAD employs an autoencoder and self-attentionmechanism when constructing POI-and-POI relations, while ourmodel

MPR is able to learn the geospatial influence relationsacross all T levels, and the additive benefits are clear. As such,we believe that the inter-level relations captured by our modelare both flexible and effective. • We performed a Bonferroni corrected paired t-test and show thesignificance across all three levels for all four baselines. Compar-isons were relative to PACE, which has recently been adapted tosolve several different location-based recommendation problems.

MPR . We also found that

MPR performsrelatively well at both 𝐻 and 𝐻 levels, since the upper level cap-tures much richer information from the lower levels using theattention mechanism. Implicit feature representations of child POIsare aggregated from child to parent, increasing the data availablewhen learning the new model. In contrast, for POIs at the 𝐻 level,these signals are not available, and thus the overall performancecompared with the other baselines exhibits less dramatic perfor-mance improvements, but is still effective. Table 4:

Ablation study on the

Beijing dataset.

Level Metric M1 M2 M3 𝐻 P@10 0.066 0.067 0.067NDCG@10 0.156 0.160 0.162 𝐻 P@10 0.007 0.008 0.008NDCG@10 0.020 0.022 0.023 𝐻 P@10 0.006 0.006 0.007NDCG@10 0.010 0.011 0.021

Table 3 shows the re-sults when varying 𝜏 (in Eq. 10) from 0 . .

4, in order to con-trol the tradeoff between the feature-based check-in matrix andhistory-based check-in matrix. With the increase of 𝜏 , the effective-ness 𝑁 𝐷𝐶𝐺 @10 of POI recommendations at 𝐻 and 𝐻 are moresensitive than that at 𝐻 . From the results, we observe that the 𝑁 𝐷𝐶𝐺 @10 at the 𝐻 level first goes up, and then begins to dropoff. Considering the holistic performance for all these three levels,our model adopts the setting 𝜏 = We also investigated the perfor-mance when varying the embedding size 𝑟 𝑙 from 50 to 250 in Table 3.The 𝑁 𝐷𝐶𝐺 @10 of both 𝐻 and 𝐻 improved as expected since theselevels have access to additional information from the lower levels.However, although the precision of 𝐻 and 𝐻 peak when 𝑟 𝑙 = 𝑟 𝑙 =

150 sinceit offered the best trade-off based on our internal experiments.

We analyzed our model and created several heat maps to demon-strate how recommendation hints might be created in Figure 4. Allvalues are min-max normalized for direct comparisons in the figure.

Figure 4a illustrates the POI feature pre-diction values, where a row represents a recommended POI, and a a) User-apsect hint (b) POI-aspect hint (c) Interaction-aspect hint

Figure 4: Visualization heat maps of three recommendationhints on the

Beijing dataset. The larger a value is, the darkercolor its corresponding cell has. column denotes a POI feature. In the figure, users were randomlysampled and we selected five user features which best representedthe sampled user preferences according to the learned user fea-ture prediction matrix M 𝑢 . Then we recorded the column numbers,which are V113, V173, V174, V175, and V178. We then recommendedfive POIs (i.e., 5, 140, 283, 291, and 421), and extracted the POI featureprediction values from the learned POI feature prediction matrix M 𝑙𝑝 by the corresponding recorded columns accordingly (e.g., V113).When examining the heatmap of the resulting POI feature values,we can clearly observe the POI feature which has the highest value.For example, when we recommended POI 421st to the user, theV174 feature had the greatest contribution. Figure 4b depicts the child POI attentionscores, where a row represents a recommended parent POI, anda column denotes a child POI. Specifically, we first chose the top-5 parent POIs recommended to a user. For each recommendedparent POI, we analyzed the attention scores and displayed thetop-5 child POIs ( 𝑃 𝑃

5) that had the highest attention score. Thescore contribution ratios for each child POI are then displayed. Thechild POI with the highest attention score can be interpreted asfollows. When POI 421 st was recommended, we can observe that ithad a child POI 𝑃 Figure 4c shows the contributionpercentages (i.e., 𝛾 ) from the historical POIs used in the overallprediction, where a row refers to a user, and a column is a rec-ommended POI. In this experiment, we randomly chose five users.For each user, we produce five recommended POIs ( 𝑃 𝑃 𝛾 w.r.t. a historical POI exceeds a fixed threshold (say 0 . th user as a concrete example, the geospa-tial influence from historical POIs had a strong influence on therecommendations of 𝑃 𝑃 In this section, we present an ablation study to better understandthe influence of two core submodules: (i) child POI features propa-gated to a parent POI bottom-up using the attention mechanism(Section 4.2.1); (ii) the geospatial influence factors between POIs de-rived from a POI context graph, which map three different sourcesof spatial relationships between any two POIs at the same POIlevel (Section 4.2.2). We evaluated three variants of models withor without the above core submodules: (1) M1: our model withoutboth submodules (i) and (ii); (2) M2: our model without submodule(ii); (3) M3: our model

MPR . The experimental results when 𝑘 = 𝐻 lacks the propagated child POI features, the joint training acrossall POI levels still provides additional performance benefits. Whencomparing M2 and M3, we find that M3 also achieves consistentperformance improvements for 𝑁 𝐷𝐶𝐺 @10, reaffirming the impor-tance of geospatial influence in the POI context graph.

POI recommendation has been intensively studied in recent years,with a focus on how to integrate spatial and temporal properties[30, 33, 34]. Recent advances in machine learning techniques haveinspired several innovative methods, such as sequential embed-ding [39], graph-based embedding [28], autoencoder-based mod-els [20] and semi-supervised learning methods [29]. We refer theinterested readers to a comprehensive survey [18] on POI recom-mendation. In the remainder of this section, we review the mostclosely related work to our own.

Category-aware POI Recommendation.

Categories of POIs vis-ited by a user often capture preferred activities, thus they are impor-tant indicators to model user preferences [16, 27, 41]. Liu et al. [17]exploited the transition patterns of user preferences over locationcategories to enhance recommendation performance. Specifically, aPOI category tree is built, where the top level has food or entertain-ment , while the bottom level includes Asian restaurant or bar . Zhaoet al. [38] showed that a POI has different influences in differentsub-categories. Based on the hierarchical categories of each POI,they devised a geographical matrix factorization method (which isa variant of GeoMF [14]) for recommendation. The essential differ-ence is that, each POI in [38] is still a single node but with multipleinfluence areas for hierarchical categories, whereas in our problema POI has a tree structure constructed by spatial containment re-lationship. He et al. [9] adopted a two-step mode in their model,which predicted the category preference of next POI first and thenderived the ranking list of POIs within the corresponding category.However, these studies differ from our work. They maintain ahierarchical structure of POI categories, but we focus on how toexploit the spatial containment, rather than semantic categories.

Recommendation based on a Spatial Hierarchy.

The utility ofexploiting hierarchical structures of either users or items for itemrecommendation has been discussed in several prior studies [19, 25,35]. Here we mainly highlight the key difference between existingapproaches involving spatial hierarchy and ours.Yin et al. [32] split the whole geographical area into a spatialpyramid of varying grid cells at different levels. The main purposeof such a spatial pyramid was to overcome the data sparsity prob-lem. If the check-in data w.r.t. a region is sparse, then the check-indata generated by its ancestor regions can be used. Feng et al. [7]proposed a latent representation model to incorporate geographicalinfluence, where all POIs are divided into different regions hierar-chically and a binary tree is built over the POIs in each region. Onemajor difference is that they aim to predict a set of users who willvisit a given POI in a given future period. Chang et al. [4] proposeda hierarchical POI embedding model from two data layers (i.e., acheck-in context layer and a text content layer), neither of which isrelated with the tree structure of POIs in our work. Zheng et al. [40]leveraged the hierarchy property of geographic spaces to mine usersimilarity by exploring people’s movements on different scales ofeographic spaces. They assume that users who share similar loca-tion histories on geographical spaces of finer granularities may bemore correlated. Therefore, these methods are not straightforwardto cope with our multi-level POI recommendation problem.In summary, we are the first to define the multi-level POI recom-mendation problem, and utilize a POI hierarchical tree structurebased on spatial containment to achieve POI recommendationsfrom varying spatial granularity.

In this work, we proposed and studied the multi-level POI recom-mendation problem. We show how to create POI recommendationsat varying levels of spatial granularity by constructing a POI tree,derived from various spatial containment relationships betweenitems. Different from existing POI recommendation studies whichsupport the next-POI recommendation, we provide more recom-mendation strategies which can be used directly by a wide varietyof geographically based recommendation engines. To address thisproblem, we proposed a multi-task learning model called

MPR ,where each task seamlessly combines two subtasks: attribute-basedrepresentation learning and interaction-based representation learn-ing. We also provide three different recommendation hint typeswhich can be produced using our model. Finally, we compared ourmodel with several state-of-the-art approaches and two real-worlddatasets, thus demonstrating the effectiveness of our new approach.In future work, we will explore techniques to incorporate temporalinformation into our model and further boost the effectiveness.

This work was partially supported by NSFC 71531001 and 91646204,ARC DP180102050, DP200102611, DP190101113, and Google FacultyResearch Awards.

REFERENCES [1] Jie Bao, Yu Zheng, David Wilkie, and Mohamed Mokbel. 2015. Recommendationsin location-based social networks: a survey.

GeoInformatica

19, 3 (2015), 525–565.[2] Ramesh Baral and Tao Li. 2017. PERS: A Personalized and Explainable POIRecommender System. arXiv preprint arXiv:1712.07727 (2017).[3] Ramesh Baral, XiaoLong Zhu, SS Iyengar, and Tao Li. 2018. ReEL: Review AwareExplanation of Location Recommendation. In

UMAP . 23–32.[4] Buru Chang, Yonggyu Park, Donghyeon Park, Seongsoon Kim, and JaewooKang. 2018. Content-Aware Hierarchical Point-of-Interest Embedding Model forSuccessive POI Recommendation.. In

IJCAI . 3301–3307.[5] Mostafa Dehghani, Hamed Zamani, Aliaksei Severyn, Jaap Kamps, and W BruceCroft. 2017. Neural ranking models with weak supervision. In

SIGIR . 65–74.[6] John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methodsfor online learning and stochastic optimization.

JMLR

12 (2011), 2121–2159.[7] Shanshan Feng, Gao Cong, Bo An, and Yeow Meng Chee. 2017. Poi2vec: Geo-graphical latent representation for predicting future visitors. In

AAAI . 102–108.[8] Mengyue Hang, Ian Pytlarz, and Jennifer Neville. 2018. Exploring student check-in behavior for improved point-of-interest prediction. In

SIGKDD . 321–330.[9] Jing He, Xin Li, and Lejian Liao. 2017. Category-aware Next Point-of-InterestRecommendation via Listwise Bayesian Personalized Ranking. In

IJCAI . 1837–1843.[10] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering forimplicit feedback datasets. In

ICDM . 263–272.[11] Yehuda E Kalay. 1982. Determining the spatial containment of a point in generalpolyhedra.

Computer Graphics and Image Processing

19, 4 (1982), 303–334.[12] Huayu Li, Yong Ge, Richang Hong, and Hengshu Zhu. 2016. Point-of-interestrecommendations: Learning potential check-ins from friends. In

SIGKDD . 975–984.[13] Xutao Li, Gao Cong, Xiao-Li Li, Tuan-Anh Nguyen Pham, and Shonali Krish-naswamy. 2015. Rank-geofm: A ranking based geographical factorization methodfor point of interest recommendation. In

SIGIR . 433–442.[14] Defu Lian, Cong Zhao, Xing Xie, Guangzhong Sun, Enhong Chen, and YongRui. 2014. GeoMF: joint geographical modeling and matrix factorization for point-of-interest recommendation. In

SIGKDD . 831–840.[15] Bin Liu, Hui Xiong, Spiros Papadimitriou, Yanjie Fu, and Zijun Yao. 2014. A gen-eral geographical probabilistic factor model for point of interest recommendation.

TKDE

27, 5 (2014), 1167–1179.[16] Hao Liu, Yongxin Tong, Panpan Zhang, Xinjiang Lu, Jianguo Duan, and Hui Xiong.2019. Hydra: A Personalized and Context-Aware Multi-Modal TransportationRecommendation System. In

SIGKDD . 2314–2324.[17] Xin Liu, Yong Liu, Karl Aberer, and Chunyan Miao. 2013. Personalized point-of-interest recommendation by mining users’ preference transition. In

CIKM .733–738.[18] Yiding Liu, Tuan-Anh Nguyen Pham, Gao Cong, and Quan Yuan. 2017. Anexperimental evaluation of point-of-interest recommendation in location-basedsocial networks.

PVLDB

10, 10 (2017), 1010–1021.[19] Kai Lu, Guanyuan Zhang, Rui Li, Shuai Zhang, and Bin Wang. 2012. Exploitingand exploring hierarchical structure in music recommendation. In

AIRS . 211–225.[20] Chen Ma, Yingxue Zhang, Qinglong Wang, and Xue Liu. 2018. Point-of-InterestRecommendation: Exploiting Self-Attentive Autoencoders with Neighbor-AwareInfluence. In

CIKM . 697–706.[21] Tuan-Anh Nguyen Pham, Xutao Li, and Gao Cong. 2017. A general model forout-of-town region recommendation. In

WWW . 401–410.[22] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme.2009. BPR: Bayesian personalized ranking from implicit feedback. In

UAI . 452–461.[23] Waldo R Tobler. 1970. A computer movie simulating urban growth in the Detroitregion.

Economic geography

46 (1970), 234–240.[24] Hao Wang, Huawei Shen, Wentao Ouyang, and Xueqi Cheng. 2018. ExploitingPOI-Specific Geographical Influence for Point-of-Interest Recommendation. In

IJCAI . 3877–3883.[25] Suhang Wang, Jiliang Tang, Yilin Wang, and Huan Liu. 2015. Exploring ImplicitHierarchical Structures for Recommender Systems.. In

IJCAI . 1813–1819.[26] Weiqing Wang, Hongzhi Yin, Zi Huang, Qinyong Wang, Xingzhong Du, andQuoc Viet Hung Nguyen. 2018. Streaming ranking based recommender systems.In

SIGIR . 525–534.[27] Yuan Xia, Jingbo Zhou, Jingjia Cao, Yanyan Li, Fei Gao, Kun Liu, Haishan Wu,and Hui Xiong. 2018. Intent-aware audience targeting for ride-hailing service. In

ECML/PKDD . 136–151.[28] Min Xie, Hongzhi Yin, Hao Wang, Fanjiang Xu, Weitong Chen, and Sen Wang.2016. Learning graph-based poi embedding for location-based recommendation.In

CIKM . 15–24.[29] Carl Yang, Lanxiao Bai, Chao Zhang, Quan Yuan, and Jiawei Han. 2017. Bridgingcollaborative filtering and semi-supervised learning: a neural approach for poirecommendation. In

SIGKDD . 1245–1254.[30] Zijun Yao, Yanjie Fu, Bin Liu, Yanchi Liu, and Hui Xiong. 2016. POI recommenda-tion: A temporal matching between POI popularity and user regularity. In

ICDM .549–558.[31] Mao Ye, Peifeng Yin, Wang-Chien Lee, and Dik-Lun Lee. 2011. Exploiting geo-graphical influence for collaborative point-of-interest recommendation. In

SIGIR .325–334.[32] Hongzhi Yin, Weiqing Wang, Hao Wang, Ling Chen, and Xiaofang Zhou. 2017.Spatial-aware hierarchical collaborative deep learning for POI recommendation.

TKDE

29, 11 (2017), 2537–2551.[33] Quan Yuan, Gao Cong, Zongyang Ma, Aixin Sun, and Nadia Magnenat Thalmann.2013. Time-aware point-of-interest recommendation. In

SIGIR . 363–372.[34] Quan Yuan, Gao Cong, and Aixin Sun. 2014. Graph-based point-of-interestrecommendation with geographical and temporal influences. In

CIKM . 659–668.[35] Weijia Zhang, Hao Liu, Yanchi Liu, Jingbo Zhou, and Hui Xiong. 2020. Semi-Supervised Hierarchical Recurrent Graph Neural Network for City-Wide ParkingAvailability Prediction. In

AAAI .[36] Yongfeng Zhang and Xu Chen. 2018. Explainable Recommendation: A Surveyand New Perspectives. arXiv preprint arXiv:1804.11192 (2018).[37] Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang, Yiqun Liu, and ShaopingMa. 2014. Explicit factor models for explainable recommendation based onphrase-level sentiment analysis. In

SIGIR . 83–92.[38] Pengpeng Zhao, Xiefeng Xu, Yanchi Liu, Ziting Zhou, Kai Zheng, Victor S Sheng,and Hui Xiong. 2017. Exploiting Hierarchical Structures for POI Recommendation.In

ICDM . 655–664.[39] Shenglin Zhao, Tong Zhao, Irwin King, and Michael R Lyu. 2017. Geo-teaser:Geo-temporal sequential embedding rank for point-of-interest recommendation.In

WWW . 153–162.[40] Yu Zheng, Lizhu Zhang, Zhengxin Ma, Xing Xie, and Wei-Ying Ma. 2011. Recom-mending friends and locations based on individual location history.

TWEB

5, 1(2011), 5:1–5:44.[41] Jingbo Zhou, Shan Gou, Renjun Hu, Dongxiang Zhang, Jin Xu, Airong Jiang, YingLi, and Hui Xiong. 2019. A Collaborative Learning Framework to Tag Refinementfor Points of Interest. In

SIGKDD . 1752–1761.[42] Jingbo Zhou, Hongbin Pei, and Haishan Wu. 2018. Early warning of humancrowds based on query data from Baidu maps: Analysis based on Shanghaistampede. In