[PDF] Sequenced Route Query with Semantic Hierarchy

Abstract

The trip planning query searches for preferred routes starting from a given point through multiple Point-of-Interests (PoI) that match user requirements. Although previous studies have investigated trip planning queries, they lack flexibility for finding routes because all of them output routes that strictly match user requirements. We study trip planning queries that output multiple routes in a flexible manner. We propose a new type of query called skyline sequenced route (SkySR) query, which searches for all preferred sequenced routes to users by extending the shortest route search with the semantic similarity of PoIs in the route. Flexibility is achieved by the {\it semantic hierarchy} of the PoI category. We propose an efficient algorithm for the SkySR query, bulk SkySR algorithm that simultaneously searches for sequenced routes and prunes unnecessary routes effectively. Experimental evaluations show that the proposed approach significantly outperforms the existing approaches in terms of response time (up to four orders of magnitude). Moreover, we develop a prototype service that uses the SkySR query, and conduct a user test to evaluate its usefulness.

Full PDF

aa r X i v : . [ c s . D B ] S e p Sequenced Route Query with Semantic Hierarchy

Yuya Sasaki † , Yoshiharu Ishikawa ‡ , Yasuhiro Fujiwara §† , Makoto Onizuka † † Graduate School of Information Science and Technology, Osaka University, Osaka, Japan ‡ Graduate School of Information Science, Nagoya University, Nagoya, Japan § NTT Software Innovation Center, Tokyo, [email protected],[email protected],[email protected],[email protected]

ABSTRACT

The trip planning query searches for preferred routes startingfrom a given point through multiple Point-of-Interests (PoI) thatmatch user requirements. Although previous studies have in-vestigated trip planning queries, they lack ﬂexibility for ﬁndingroutes because all of them output routes that strictly match userrequirements. We study trip planning queries that output multi-ple routes in a ﬂexible manner. We propose a new type of querycalled skyline sequenced route (SkySR) query, which searches forall preferred sequenced routes to users by extending the short-est route search with the semantic similarity of PoIs in the route.Flexibility is achieved by the semantic hierarchy of the PoI cat-egory. We propose an eﬃcient algorithm for the SkySR query, bulk SkySR algorithm that simultaneously searches for sequencedroutes and prunes unnecessary routes eﬀectively. Experimentalevaluations show that the proposed approach signiﬁcantly out-performs the existing approaches in terms of response time (upto four orders of magnitude). Moreover, we develop a prototypeservice that uses the SkySR query, and conduct a user test toevaluate its usefulness.

Recently, technological advances in various devices, such as smartphones and automobile navigation systems, have allowed usersto obtain real-time location information easily. This has triggeredthe development of location-based services such as Foursquare,which exploit rich location information to improve service qual-ity. The users of the location-based services often want to ﬁndshort routes that pass through multiple Points-of-Interest (PoIs);consequently, developing trip planning queries that can ﬁnd theshortest routes that passes through user-speciﬁed categories hasattracted considerable attention [4, 10]. If multiple PoI categories,e.g., restaurant and shopping mall, are in an ordered list (i.e.,a category sequence ), the trip planning query searches for a se-quenced route that passes PoIs that match the user-speciﬁed cat-egories in order.

Example 1.1.

Figure 1 shows a road network with the follow-ing PoIs: “Asian restaurant”, “Italian restaurant”, “Gift shop”, “Hobbyshop”, and “Arts&Entertainment (A&E)”. Assume that a user wantsto go to an Asian restaurant, an A&E place, and a gift shop in thisorder from start point v q . The sequenced route query outputsroute R v q that satisﬁedthe user requirements h Asian restaurant, A&E, gift shop i .Existing approaches ﬁnd the shortest route based on the userquery. However, such approaches may ﬁnd an unexpectedly longroute because the found PoIs may be distant from the start point. © 2018 Copyright held by the owner/author(s). Published in Proceedings of the21st International Conference on Extending Database Technology (EDBT), March26-29, 2018, ISBN 978-3-89318-078-3 on OpenProceedings.org.Distribution of this paper is permitted under the terms of the Creative Commonslicense CC-by-nc-nd 4.0. A I GAI G HH

AIGH G R3 R2R1 I p p p p p p p p p p p p p q User-locationFoodShop & ServiceArts & EntertainmentAsian RestaurantItalian RestaurantGift ShopHobby Shop v q Figure 1: An example of a road network with PoIs

Japanese BakeryItalianAsian Gift shop Hobby shopFood Shop & ServiceClothingstoreMen's storeSushi

Figure 2: Examples of category trees in Foursquare

A major problem with the existing approaches is that they onlyoutput routes that perfectly match the given categories [5, 14,16]. To overcome this problem, we introduce ﬂexible similaritymatching based on PoI category classiﬁcation to ﬁnd shorterroutes in a ﬂexible manner. In the real-world, category classiﬁ-cation often forms a semantic hierarchy , which we refer to asa category tree . For example, in Foursquare , the “Food” cate-gory tree includes “Asian restaurant,” “Italian restaurant,” and“Bakery” as subcategories, and the “Shop &Service” category in-cludes “Gift shop,” “Hobby shop,” and “Clothing store” as subcat-egories (Figure 2). We employ this semantic hierarchy to eval-uate routes in terms of two aspects, i.e., route length and thesemantic similarity between the categories of the PoIs in theroute and those speciﬁed in the user query. As a result, we canﬁnd eﬀective sequenced routes that semantically match the userrequirement based on the semantic hierarchy. For example, inFigure 1, route R skyline sequenced route (SkySR) query , which applies the skylineconcept to the route length and semantic similarity (i.e., we con-sider route length and semantic similarity as route scores). Givena start point and a sequence of PoI categories, a SkySR query https://developer.foursquare.com/categorytree able 1: Example routes in New York city Approach Distance Sequenced routeExisting 3239 meters Cupcake Shop → Art Museum → Jazz Club(e.g., [16])Proposed 3239 meters Cupcake Shop → Art Museum → Jazz Club1858 meters Dessert Shop → Art Museum → Jazz Club1392 meters Dessert Shop → Museum → Jazz Club823 meters Dessert Shop → Museum → Music Venue searches for sequenced routes that are no worse than any otherroutes in terms of length and semantic similarity.

Example 1.2.

Table 1 shows real-world examples of sequencedroutes in New York city where a user plans to go to a cupcakeshop, an art museum, and then a jazz club in this order. The ex-isting approaches output a single route that matches the user’srequirement perfectly. The proposed approach can output threeadditional routes that are shorter than the route found by the ex-isting approach. Note that the additional routes also satisfy theuser query semantically. The user can select a preferred routeamong all the four routes depending on how far he/she does notwant to walk or their available time.The SkySR query can provide eﬀective trip plans; however,it incurs signiﬁcant computational cost because a large numberof routes can match the user requirement. Therefore, the SkySRquery requires an eﬃcient algorithm. The challenge is to searchfor SkySRs eﬃciently by reducing the search space without sac-riﬁcing the exactness of the result. We propose bulk SkySR al-gorithm (

BSSR for short) that ﬁnds exact SkySRs eﬃciently. Re-call that a feature of SkySRs is that their scores are no worsethan those of other sequenced routes.

BSSR exploits the branch-and-bound algorithm [9], which eﬀectively prunes unnecessaryroutes based on the upper and lower bounds of route scores. Inaddition, to improve eﬃciency more, we employ four techniquesto optimize

BSSR . (1) First, we initially ﬁnd sequenced routes tocalculate the upper bound. (2) We tighten the upper bound byarranging the priority queue and (3) tighten the lower boundby introducing minimum distances. (4) we keep intermediate re-sults for later processing, which refer to as on-the-ﬂy caching .Our approach signiﬁcantly outperforms existing approaches interms of response time (up to four orders of magnitude) with-out increasing memory usage or sacriﬁcing the exactness of theresult.The main contributions of this paper are as follows. • We introduce a semantic hierarchy to the route searchquery, which allows us to search for routes ﬂexibly. • We propose the skyline sequenced route (SkySR) query , whichﬁnds all preferred routes related to a speciﬁed categorysequence with a semantic hierarchy (Section 4). • We propose an exact and eﬃcient algorithm and its op-timization techniques to process SkySR queries (Section5). • We discuss variations and extensions of the SkySR query.The SkySR query can be applied to various user require-ments and environments (Section 6). • We demonstrate that the proposed approach works wellin terms of response time and memory usage by perform-ing extensive experiments. (Section 7). • We develop a prototype service that employs the SkySRquery and conduct a user test to evaluate usefulness ofthe SkySR query. (Section 8). The remainder of this paper is organized as follows. Section 2introduces related work. Section 3 describes the problem formu-lation, and Section 4 deﬁnes the SkySR query. Section 5 presentsthe proposed algorithm. In Section 6, we discuss variations andextensions of the SkySR query. Sections 7 and 8 present experi-ment and user test results, respectively, and Section 9 concludesthe paper.

First, we review trip planning query studies related to the SkySRquery. Then, we review some studies related to the skyline op-erator. To the best of our knowledge, no study has considereda skyline sequenced route; thus, our problem cannot be solvedeﬃciently using existing approaches.

Trip planning:

We categorize trip planning queries in Table2. Note that all existing trip planning queries only output routesthat perfectly match the user-speciﬁed category sequences. More-over, since most trip planning queries assume Euclidean distance,they cannot ﬁnd SkySRs, in which road network distance is as-sumed. Dai et al. [4] proposed a personalized sequenced routeand assumed that PoIs have ratings as well as categories and thatusers assign weighting factors as preferences. Although this per-sonalized sequenced route considers route lengths and ratings,it only outputs the route that perfectly matches the given cate-gories and has the best score based on lengths, ratings, and pref-erences. Only the optimal sequenced route (OSR) is applicable toﬁnd SkySRs without modiﬁcation because the OSR and SkySRare based on the same settings (except for scoring). Sharifzadehet al. [16] proposed two algorithms to ﬁnd OSRs in road net-works: the

Dijkstra-based solution and the

Progressive NeighborExploration (PNE) approach . The main diﬀerence between thesealgorithms is that the Dijkstra-based solution employs the Dijk-stra algorithm to search for PoIs and the PNE approach employsthe nearest neighbor search. It has been reported that these algo-rithms are comparable in terms of performance [16]. Thus, weconsider both algorithms to verify the performance of the pro-posed approach.

Skyline:

The skyline operator was proposed previously [2].Few studies have considered the skyline concept for route searches.Recently, the skyline route (or skyline path) has received consid-erable attention [1, 6, 8, 13, 17, 18, 20]. A skyline route assumesthat edges on road networks are associated with multiple costs,such as distance, travel time, and tolls. Here, the objective is toﬁnd skyline routes from a start point to a destination consideringthese multiple costs. However, since we specify a category se-quence rather than a destination, we cannot apply conventionalalgorithms to ﬁnd SkySRs. The continuous skyline query in roadnetworks (e.g., [7]) searches for the skyline PoIs for a moving ob-ject considering both the PoI category and the distances to themoving object. Because continuous skyline queries search for asingle PoI category, these solutions are not applicable to SkySRqueries, which obtain routes that pass through multiple PoIs.

Table 3 summarizes the notations used in this paper. We assumea connected graph G = ( V ∪ P , E ) , where V , P , and E ⊆ ( V ∪ P )×( V ∪ P ) represent the sets of vertices, PoI vertices, and edges,respectively. This graph corresponds to a road network that con-tains PoIs. The numbers of vertices, PoI vertices, and edges aredenoted | V | , | P | , and | E | , respectively. PoI vertex p ∈ P is as-sociated with category c ∈ C , where C is the set of categories. able 2: Types of trip planning queries. Type Distance metrics Order Destination Result ScoresSkySR (proposed) Network Total Yes or No Exact Length and semanticOptimal sequenced route (OSR) [16] Euclidean or Network Total Yes or No Exact LengthSequenced route [5, 14] Network Total Yes Exact LengthPersonalized sequenced route [4] Euclidean Total No Approximate Length and ratingTrip planning [10] Euclidean or Network Non Yes Approximate LengthMulti rule partial sequenced route [3] Euclidean Partial No Approximate LengthMulti rule partial sequenced route [11] Euclidean Partial No Exact LengthMulti-type nearest neighbor [12] Euclidean Non No Exact Length

Table 3: Notations

Symbol Meaning V Set of vertices P Set of PoI vertices E Set of edges p PoI vertex C Set of categories c Category t Category tree c p Category of PoI vertex pt c Category tree of c P c Set of PoI vertices associated with c P t Set of PoI vertices associated with t S Category sequence (sequence of categories) R Route (sequence of PoI vertices) S R Sequential PoI categories in R l ( R ) Length score of R s ( R ) Semantic score of R R Set of routes E( R ) Set of super-routes of R S Minimal set of sequenced routes S q Category sequence speciﬁed by user v q Start point speciﬁed by user

We denote the category of PoI vertex p as c p , and assume thateach PoI is associated with a single category. Each category isassociated with category tree t , and we denote the category treeof category c as t c . We denote the set of PoI vertices associatedwith c and the set of PoI vertices associated with the categorytree t as P c and P t , respectively. If a PoI vertex is associatedwith category c , it is also associated with all ancestor categoriesof c in t c . Each edge e ( u i , u j ) in E is associated with a weight w ( u i , u j ) (≥ ) . The weight can represent either travel durationor distance. Next, we deﬁne several terms required to introducethe skyline sequenced route (SkySR). Deﬁnition 3.1. (Category sequence) A category sequence S = h c S [ ] , c S [ ] , . . . , c S [| S |]i is a sequence of categories, where | S | is the size of S . c S [ i ] ∈ C denotes the i -th category in S . A super-category sequence of S is a category sequence where each i -thcategory is either c S [ i ] or an ancestor of c S [ i ] (1 ≤ i ≤ | S | ) inthe category tree. Deﬁnition 3.2. (Route) A route R = h p R [ ] , . . . , p R [| R |]i is asequence of PoI vertices in a road network, where p R [ i ] ∈ P and | R | denote the i -th PoI vertex in R and the size of R , respectively. S R denotes the category sequence of R (i.e., h c p R [ ] , . . . , c p R [| R |] i ).In addition, we deﬁne a super-route of R as an extended route of R , such as h R , p i , p j , . . . i . In other words, a super-route of R isobtained by adding a sequence of PoI vertices to the end of R . R and E( R ) denote a set of routes and a set of super-routes of R ,respectively. Moreover, given a route R = h p R [ ] , . . . , p R [| R |]i and a PoI vertex p , we deﬁne R ⊕ p = h p R [ ] , . . . , p R [| R |] , p i . Deﬁnition 3.3. (Category similarity)

Given two categories c and c ′ , the similarity sim ( c , c ′ ) ∈ [ , ] is calculated by an ar-bitrary function such as the Wu and Palmer similarity or path length [15, 19]. We assume the following relations in the simi-larity. • c is irrelevant to c ′ if both exist in diﬀerent category trees;thus, we obtain sim ( c , c ′ ) = • c semantically matches c ′ if c and c ′ are in the same cate-gory tree; thus, we obtain 0 < sim ( c , c ′ ) ≤ • c perfectly matches c ′ if c and c ′ are the same; thus, weobtain sim ( c , c ′ ) = sequenced route using the above deﬁnitions. Thediﬀerence between our deﬁnition of sequenced route and the pre-vious deﬁnition [16] is that we consider category similarity. Deﬁnition 3.4. (Sequenced route)

Given category sequence S = h c S [ ] , . . . , c S [| S |]i , R = h p R [ ] , . . . , p R [| R |]i is a sequencedroute of category sequence S if and only if it satisﬁes (i) | R | = | S | ,(ii) c S [ i ] semantically matches c p R [ i ] for all i such that 1 ≤ i ≤ | S | ,and (iii) all PoI vertices in R diﬀer each other. Deﬁnition 3.5. (Route scores)

Given category sequence S andvertex v as a start point, we deﬁne two scores for route R : lengthscore l ( R ) ∈ [ , inf ] and semantic score s ( R ) ∈ [ , ] . We deﬁnethe length score l ( R ) as follows: l ( R ) = D ( v , p R [ ]) + Σ | R |− i = D ( p R [ i ] , p R [ i + ]) , (1)where D ( u i , u j ) denotes the smallest weight sum of the edgeson the routes between vertices (or PoIs) u i and u j . The semanticscore s ( R ) is calculated by an aggregation function f as follows: s ( R ) = f ( h , h , . . . , h | R | ) , (2)where h i denotes sim ( c S [ i ] , c p R [ i ] ) . We assume that, if all h i = s ( R ) =

0, i.e., if all PoI vertices in a route perfectly matchthe categories, the semantic score of the given route is 0. Wealso assume that s ( R ) is the possible minimum semantic scoreof R when it is a sequenced route. Without loss of generality,preferred routes have small length and semantic score. Here, we deﬁne the SkySR query. Intuitively, a SkySR is a poten-tial route that may be the best route related to the user’s require-ment. A potential route is a route that is not dominated by anyother routes; the notion of dominance is used in the skyline oper-ator [2]. We deﬁne dominance for sequenced routes and SkySRquery in the following.

Deﬁnition 4.1. (Dominance)

Let R be the set of all sequencedroutes starting from point v for category sequence S . For two se-quenced routes R , R ′ ∈ R , we say that R dominates R ′ if wehave (i) l ( R ) < l ( R ′ ) and s ( R ) ≤ s ( R ′ ) or (ii) s ( R ) < s ( R ′ ) and l ( R ) ≤ l ( R ′ ) . If two sequenced routes have the same length and se-mantic scores, the routes are equivalent in the dominance, and aset of sequenced routes is minimal if it has no equivalent routes. eﬁnition 4.2. (SkySR query) Given vertex v q as a start pointand category sequence S q , a skyline sequenced route is a se-quenced route not dominated by other routes. Let R be the set ofall sequenced routes from start point v q for category sequence S q , and let S be a minimal set of the sequenced routes. TheSkySR query returns S that includes sequenced routes such thatall R ∈ S are SkySRs and all R ′ ∈ R \ S are dominated by orequivalent to some of R ∈ S .An naive solution to ﬁnd SkySRs is to ﬁrst enumerate SkySRcandidates by iteratively executing OSR queries for any super-category sequences of S q and then check the dominance amongthe routes. The number of super-category sequences of S q in-creases exponentially as the depth of the category in the cate-gory tree and the size of S q increase. Thus, although OSR al-gorithms can ﬁnd a sequenced route eﬃciently, we must repeatmany searches. As a result, the naive solution needs signiﬁcantlyhigh computational cost to ﬁnd SkySRs. In this section, we present the proposed approach, which we re-fer to as the bulk SkySR algorithm ( BSSR ), that ﬁnds SkySRs eﬃ-ciently. Section 5.1 presents the

BSSR design policy, and Section5.2 explains the

BSSR procedure. In Section 5.3, we propose opti-mization techniques for

BSSR . We also theoretically analyze itsperformance in Section 5.4. Finally, we show a running exam-ple of

BSSR in Section 5.5. In Section 5, we assume undirectedgraphs in which each PoI vertex is associated with only one cate-gory and that users give sequences of single PoI categories. How-ever, in a real application, the graphs would be directed graphs,each PoI vertex would be associated with multiple categories,and users may specify complex categories. Section 6 describeshow we handle the above conditions.

Our idea to improve eﬃciency is to ﬁnd sequenced routes simul-taneously (i.e., by searching sequenced routes in bulk) in order toreduce the search space. We have two choice as the basis for ourapproach; Dijkstra-based or nearest neighbor-based approaches[16]. We use the Dijkstra-based approach as the basis of our al-gorithm. Recall that a SkySR query has two scores for a route,i.e., length and semantic scores. To ﬁnd all SkySRs, we must ﬁndroutes that have small category scores even if the routes havelarge length scores. However, PoIs that are included in the routeswith small category scores could be distant from the start point.Although the nearest neighbor-based approach ﬁnds the closestPoIs, it cannot eﬃciently ﬁnd such PoIs. On the other hand, theDijkstra-based approach searches for all PoI vertices that matcha PoI category. Therefore, the Dijkstra-based approach is moresuitable for the SkySR query than the nearest neighbor-basedapproach.Although our approach ﬁnds sequenced routes simultaneously,it entails a large number of executions of the Dijkstra algorithm.This is because, since the number of PoI candidates increases, alarge number of possible routes increases. The search space doesnot become small eﬀectively. To eﬀectively reduce the searchspace, we exploit the branch-and-bound algorithm, which usesthe upper and lower bounds of a branch of the search spaceto solve an optimization problem eﬀectively. With

BSSR , eachbranch corresponds to each route. For the upper and lower bounds,we compute the bounds during ﬁnding the set of SkySRs. Specif-ically, we compute the upper bound of a route from the already found sequenced routes, and we compute the lower bound fromthe current searched route (i.e., not a sequenced route yet). Withthe upper and lower bounds, we can safely prune unnecessaryroutes to improve eﬃciency.To further increase eﬃciency, we propose optimization tech-niques for

BSSR . In order to exploit the branch-and-bound algo-rithm, it is necessary to initialize the upper bound. Thus, we ﬁrstsearch for a sequenced route to initialize the upper bound. How-ever, it may take high computational cost to ﬁnd a sequencedroute. Therefore, we propose a nearest neighbor-based initial searchmethod ( NNinit ) that ﬁnds sequenced routes eﬃciently by greed-ily ﬁnding PoI vertices. In addition, to eﬀectively update the up-per bound, we assign a priority to each route and use the priorityqueue to eﬃciently ﬁnd routes that are likely to give an eﬀectiveupper bound. To compute the lower bound, we compute the pos-sible minimum distance and add it to the length score of a routeto safely prune unnecessary routes. Moreover, to avoid execut-ing the Dijkstra algorithm iteratively from the same vertices, wematerialize search results of the Dijkstra algorithm and reusethem to search the PoI vertices. By using

BSSR with optimiza-tion techniques, we can perform the SkySR query eﬃciently.

Bulk SkySR algorithm ( BSSR ) ﬁnds all SkySRs by ﬁnding simulta-neously sequenced routes with checking dominance on demand.The naive solution must execute OSR queries for all super-categorysequences of S q one by one because it only searches for thePoIs that perfectly match the given category. In contrast, BSSR searches for all PoIs that semantically match the given category.The basic process of

BSSR is simple as shown in Algorithm1: (i) start searching the PoI vertices that match the ﬁrst cate-gory from start point v q and insert the route found into priorityqueue Q b which stores all found routes (line 4), (ii) fetch a routefrom Q b (line 6), (iii) search for the next PoI vertices that seman-tically match the next category c d from PoI vertex p d which isthe end of the fetched route, and insert the fetched route witheach of the found PoI vertices into Q b (lines 7–9), and (iv) if Q b is not empty, return to (ii), otherwise output the minimal set ofsequenced route S (line 10). In steps (i) and (iii), we ﬁnd PoI ver-tices from the end of the fetched route using a Dijkstra algorithmmodiﬁed for the SkySR query as described in Section 5.2.2. Algorithm 1:

Bulk SkySR algorithm procedure BSSR( v q , S q ) S ← ϕ ; priority_queue Q b ← ϕ ; mDijkstra( ϕ , c S [ ] , v q , Q b , S ) ; while Q b is not empty do R ← Q b .dequeue(); c d ← c S [| R | + ] ; p d ← p R [| R |] ; mDijkstra( R , c d , p d , Q b , S ) ; return S ; end procedure We search for sequenced routes si-multaneously to reduce the search space. Our idea to safely re-duce the search space is to exploit the branch-and-bound algo-rithm, which can reduce unnecessary search space. This sectiondescribes the theoretical background of using the branch-and-bound algorithm. We use the following three lemmas to reducethe search space: emma 5.1.

Let S be a minimum set of sequenced routes whilesearching for SkySRs and S ′ be the minimum set of sequencedroutes after ﬁnding SkySRs. If sequenced route R is dominated bya sequenced route in S , R cannot be included in S ′ .proof: From Deﬁnition 4.2, we search for a set of SkySRs, whichare not dominated by the other sequenced routes. If we ﬁnd a se-quenced route not dominated by any sequenced routes in S , weupdate S by inserting the new sequenced route and deleting asequenced route dominated by the new one. Therefore, any se-quenced routes in S after the update are not dominated by anysequenced routes in S prior to the update. As a result, sequencedroutes in S ′ are not dominated by any sequenced routes in S . Inother words, R is not included in S ′ if we have sequenced route R ′ in S such that l ( R ′ ) ≤ l ( R ) and s ( R ′ ) ≤ s ( R ) . (cid:3) Lemma 5.2.

Let E( R ) be a set of super-routes of R starting fromthe same start point. For any route R ′ in E( R ) , the length and se-mantic scores l ( R ′ ) and s ( R ′ ) cannot be less than l ( R ) and s ( R ) ,respectively.proof: Let R ′ be a route included in E( R ) . Since we have D ( u i , u j ) ≥

0, the following property holds for a route R from Equation (1)of Deﬁnition 3.5. D ( v q , p R ′ [ ]) + Σ | R ′ |− i = D ( p R ′ [ i ] , p R ′ [ i + ]) = D ( v q , p R [ ]) + Σ | R |− i = D ( p R [ i ] , p R [ i + ]) + Σ | R ′ |− i = | R | D ( p R ′ [ i ] , p R ′ [ i + ])≥ D ( v q , p R [ ]) + Σ | R |− i = D ( p R [ i ] , p R [ i + ]) . Therefore, we have l ( R ) ≤ l ( R ′ ) . s ( R ) is the possible minimumsemantic score of R when it becomes a sequenced route. Thus,even if PoI vertices are added to R , we have s ( R ) ≤ s ( R ′ ) . As aresult, we have l ( R ) ≤ l ( R ′ ) and s ( R ) ≤ s ( R ′ ) . (cid:3) In terms of the branch-and-bound algorithm, Lemma 5.1 and5.2 give us the upper and lower bounds of the scores of a route,respectively. We can prune routes according to the followinglemma.

Lemma 5.3. ( pruning condition ) If (i) R is a sequenced routeincluded in the set S of sequenced routes and (ii) l ( R ) ≤ l ( R ′ ) and s ( R ) ≤ s ( R ′ ) , any routes in E( R ′ ) cannot be included in S .proof: If we have l ( R ) ≤ l ( R ′ ) and s ( R ) ≤ s ( R ′ ) , R ′ is notincluded in S (Lemma 5.1). From Lemma 5.2, the scores of R ′ cannot become less than l ( R ′ ) and s ( R ′ ) even if we expand R ′ .Therefore, any routes in E( R ′ ) cannot be included in S because R ′ is dominated by or equivalent to the sequenced route with l ( R ) and s ( R ) . (cid:3) Lemma 5.3 gives us the length score threshold for a route, and,if the length score of a route is greater than this threshold, wecan prune the given route. We deﬁne the length score thresholdof a route as follows:

Deﬁnition 5.4.

The threshold l ( R ) of the length score of route R is given by the following equation: l ( R ) = min R ′ ∈S { l ( R ′ )| s ( R ) ≥ s ( R ′ )} . (3)If l ( R ) ≤ l ( R ) , we can safely prune R because it cannot beincluded in the result. Thus, we can reduce the search spacewithout sacriﬁcing the exactness of the result. Equation (3) has asmall computation cost because S includes only a small numberof sequenced routes as shown in Section 7. We search the nextPoI vertices that semantically match the next PoI category us-ing the modiﬁed Dijkstra algorithm. The modiﬁed Dijkstra algo-rithm can prune unnecessary routes based on Lemma 5.3. More-over, based on the following lemma, it terminates unnecessarytraversal of the graph and avoids inserting unnecessary routes.

Lemma 5.5.

Let R = h p R [ ] , . . . , p R [ i ] , p R [ i + ] , p R [ i + ] . . . , p R [| R |]i be a route and p i : i + be a PoI vertex on a path between p R [ i ] and p R [ i + ] . Route R must be dominated by or equivalentto another route if we have sim ( c S [ i + ] , c p i : i + ) ≥ sim ( c S [ i + ] , c p R [ i + ] ) .proof: Let R ′ = h p R [ ] , . . . , p R [ i ] , p i : i + , p R [ i + ] , . . . , p R [| R |]i be a route such that the diﬀerence between R and R ′ is only in p i : i + and p R [ i + ] . Since the PoI vertex p i : i + is on the pathbetween p R [ i ] and p R [ i + ] , we have l ( R ) ≥ l ( R ′ ) based on tri-angle inequality (i.e., D ( p i : i + , p R [ i + ]) + D ( p R [ i + ] , p R [ i + ]) ≥ D ( p i : i + , p R [ i + ]) ). Moreover, if sim ( c S [ i + ] , c p i : i + ) ≥ sim ( c S [ i + ] , c p R [ i + ] ) , we have s ( R ) ≥ s ( R ′ ) . Therefore, R is dom-inated by or equivalent to R ′ because l ( R ) ≥ l ( R ′ ) and s ( R ) ≥ s ( R ′ ) . (cid:3) Lemma 5.5 gives us two properties for the SkySR query: (i)even if we ﬁnd a PoI vertex that passes through another PoI ver-tex that has a better category similarity, we can ignore the PoIvertex, and (ii) if we ﬁnd a PoI vertex that perfectly matches thegiven category, we do not need to traverse the graph throughthe PoI vertex. As a result, using Lemma 5.3 and 5.5, we can eﬃ-ciently ﬁnd the next PoI vertices.Algorithm 2 shows the pseudocode for the modiﬁed Dijkstraalgorithm, which is used to ﬁnd PoI vertices that semanticallymatch c d from p d . In priority queue Q d for the modiﬁed Dijkstraalgorithm, the top vertex is the closest vertex to p d . The queueis initialized to p d (line 3). The closest vertex to p d is dequeuedfrom Q d (line 5). R t is a route expanded from R d , which is R d with fetched vertex u (line 7). If the length score of R t is greaterthan or equal to the threshold of R d , the modiﬁed Dijkstra al-gorithm terminates the process (Lemma 5.3) (line 8). We checkwhether (i) u semantically matches c d and (ii) u does not proceedthrough another PoI vertex whose category similarity is greaterthan or equal to that of u (line 9). If we satisfy the above condi-tions and the length score of R t is less than its threshold (line10), we insert R t into the priority queue or the set of sequencedroutes (lines 10–12). Otherwise, we skip the process to insert R t (Lemma 5.3 and 5.5). The neighbor vertices of u are inserted into Q d unless u perfectly matches c d (Lemma 5.5) (lines 13–17). In this section, we propose four optimization techniques for

BSSR .Section 5.3.1 explains an initial search for sequenced routes andproposes

NNinit . We then explain tightening the upper and thelower bounds in Section 5.3.2 and Section 5.3.3, respectively. Fur-thermore, in Section 5.3.4 we propose an on-the-ﬂy caching tech-nique to reuse previous search results of the modiﬁed Dijkstraalgorithm.

We prune unnecessary routes eﬃcientlyusing the branch-and-bound algorithm. However, we cannot cal-culate the threshold of R if there are no sequenced routes in S whose semantic scores are not greater than that of s ( R ) basedon Equation (3). Therefore, initially, we search for the sequencedroute whose semantic score is 0. However, the length score of thesequenced route can be large if its semantic score is 0. To tighten lgorithm 2: Modiﬁed Dijkstra algorithm to ﬁnd the nextPoI vertices matching c d from p d procedure mDijkstra( R d , c d , p d , Q b , S ) dist [ u ] = inf for all u ∈ V ∪ P , dist [ p d ] = ; priority_queue Q d ← { p d } ; while Q d is not empty do u ← Q d .dequeue; if u is already visited then continue; R t ← R d ⊕ u ; if l ( R t ) ≥ l ( R d ) then break; if u ∈ P tcd and u is not through the PoI vertex whose categorysimilarity is higher than that of u then if l ( R t ) < l ( R t ) then if R t is a sequenced route then S .update( R t ); else Q b .enqueue( R t ); if u < P cd then for each u ′ for e ( u , u ′ ) ∈ E do if dist [ u ] + w ( u , u ′ ) < dist [ u ] then dist [ u ′ ] = dist [ u ] + w ( u , u ′ ) . w ; Q d .enqueue( u ′ ); end procedure the threshold, we also search for sequenced routes whose seman-tic scores are greater than 0 because the length scores of themare less than that of the sequenced route with a semantic scoreof 0. We initially ﬁnd several sequenced routes to tighten theupper bound.We propose NNinit , which searches for several sequencedroutes eﬃciently.

NNinit performs a nearest neighbor search re-peatedly to ﬁnd PoI vertices that perfectly match the given cate-gories. With this process, we can ﬁnd a sequenced route whosesemantic score is 0. Moreover,

NNinit can ﬁnd the PoI vertexthat semantically matches the given category during the near-est neighbor search. When we ﬁnd the last visited PoI vertex,we may ﬁnd PoI vertices that semantically match the last cate-gory in S q . Therefore, we can obtain sequenced routes whosesemantic scores are greater than 0 and length scores are small.As a result, NNinit can ﬁnd several sequenced routes withoutincurring additional cost, and one of the sequenced routes has asemantic score of 0.We present the pseudocode for

NNinit in Algorithm 3. Here,priority queue Q is initialized to start point v q (line 3). NNinit re-peats the Dijkstra algorithm | S q | times to ﬁnd sequenced routes(line 4). The Dijkstra algorithm is executed to search for theclosest PoI vertex that perfectly matches c S q [ i ] from the initialvertex (the ﬁrst initial vertex is v q ) (lines 5–19). Here, the clos-est vertex to the initial vertex is dequeued from Q (line 7). Ifthe algorithm ﬁnds a PoI vertex that perfectly matches c S q [ i ] ,this vertex is added to R and Q is initialized to the PoI vertex(lines 12–15). When it ﬁnds the last PoI vertex that semanticallymatches c S q [| S q |] , it inserts the sequenced route into S (lines9–11). Finally, we obtain a set of sequenced routes, and one ofthe sequenced routes in S has a semantic score of 0. Example 5.6.

We show an example of

NNinit using Example1.1, which searches an Asian restaurant, an A&E place, and a giftshop in this order from start point v q . NNinit executes the Dijk-stra algorithm three times because the size of category sequenceis three. First,

NNinit searches PoI vertices that perfectly matchAsian restaurant from v q . Then, it ﬁnds p that is the closest PoIthat perfectly match Asian restaurant to v q . Next, it searchesthe closest PoI vertex that perfectly matches A&E to p and thenﬁnds p . From the next search, NNinit inserts sequenced routes

Algorithm 3:

Initial search for ﬁnding sequenced routeswith a small cost procedure NNinit( v q , S q ) S ← ϕ , R ← ϕ ; priority_queue Q ← { v q } ; /* execute Dijkstra algorithm | S q | times */ for i : 1 to | S q | do dist [ u ] = inf for all u ∈ V ∪ P , dist [ Q . top ] = while Q is not empty do u ← Q .dequeue; if u is already visited then continue; if i = | S q | and u ∈ P tcSq [ i ] then R ′ ← R ⊕ u ; S .update( R ′ ); if u ∈ P cSq [ i ] then R ← R ⊕ u ; Q ← { u } ; break; for each u ′ for e ( u , u ′ ) ∈ E do if dist [ u ] + w ( u , u ′ ) < dist [ u ′ ] then dist [ u ′ ] = dist [ u ] + w ( u , u ′ ) ; Q .enqueue( u ′ ); return S ; end procedure to S when it ﬁnds PoI vertices that semantically match gift shop. NNinit ﬁnds p whose category is Shop&Service (i.e., semanti-cally match) and thus inserts h p , p , p i to S . After ﬁnding p ,it ﬁnds p that perfectly matches gift shop and inserts h p , p , p i to S . Finally NNinit returns S including {h p , p , p i , h p , p , p i} .The length score of h p , p , p i is 12, which is less than the lengthscore of h p , p , p i of 15. We use the upper bound to prune unnecessary routes.The upper bound is computed from the obtained sequenced routes.To tighten the upper bound, it is important to eﬃciently ﬁndsequenced routes that have small length and semantic scores.

BSSR extends a route at the top of the priority queue to searchfor a sequenced route, as shown in Algorithm 1. Note that pri-ority queues in existing algorithms conventionally consider onlydistances (i.e., a distance-based priority queue). If we use a distance-based priority queue,

BSSR preferentially extends a route with asmall length score. Although we must increase the size of a routeto | S q | to ﬁnd a sequenced route, a route that has a small lengthscore likely has a small size. Therefore, it is diﬃcult to searchfor sequenced routes eﬃciently using a distance-based priorityqueue.To search for sequenced routes eﬃciently, we preferentiallyextend a route that has a large size. Here, since many routes inthe priority queue could have the same size, we must consideran additional priority, which is expected to aﬀect performance.If multiple routes in the priority queue are the same size, we pref-erentially extend the route with the smallest semantic score. Wecan reduce the search space by searching for sequenced routesin ascending order of semantic score. Moreover, if routes are thesame size and have the same semantic score, we preferentiallyextend the route with the smallest length score. As a result, wecan eﬃciently obtain sequenced routes with small length andsemantic scores. As described in Section 5.2.1, we use the length scores of routesas the lower bound, i.e., we prune a route if the length scoref the route is not less than the threshold. Note that the lengthscore of the route increases as the route size increases. This in-dicates that it is diﬃcult to prune routes before the route sizeincreases. Our approach to tighten the lower bound of the routeis to estimate the increase of the length score. However, if wecarelessly estimate a future length score, we may sacriﬁce theexactness of th result.The basic idea of this estimation is to calculate the possibleminimum distance . Here, we compute the smallest distance amongany pair of PoI vertices in sets of PoI vertices. We use the follow-ing two minimum distances, semantic-match minimum distance l s and perfect-match minimum distance l p : Deﬁnition 5.7. ( minimum distance ) The semantic-match min-imum distance l s and perfect-match minimum distance l p aregiven by the following equations: l s ( R ) = Σ | S q |− i = | R | l s [ i ] , where l s [ i ] = min p i ∈ P ti , p i + ∈ P ti + D ( p i , p i + ) . (4) l p ( R ) = Σ | S q |− i = | R | l p [ i ] , where l p [ i ] = min p i ∈ P ti , p i + ∈ P ci + D ( p i , p i + ) . (5)In Equations (4) and (5), P t i and P c i denote the set of PoI ver-tices associated with a category tree of c S q [ i ] and the set of PoIvertices whose category is c S q [ i ] , respectively.We compute the semantic-match minimum distance based onthe distance to the PoI vertices that semantically match the nextcategory. We can safely add the semantic-match minimum dis-tance to the current length score without restriction. However,the semantic-match minimum distance is much less than thethreshold. Thus, it could be diﬃcult to improve pruning perfor-mance; thus, we use the perfect-match minimum distance to in-crease pruning performance. The perfect-match minimum dis-tance is computed based on the distance to the PoI vertices thatperfectly match the next category. We can improve pruning per-formance using the perfect-match minimum distance comparedto the semantic-match minimum distance because the perfect-match minimum distance is much greater than the semantic-match minimum distance; therefore, the perfect-match minimumdistance tightens the lower bound more than the semantic-matchminimum distance. However, we can use the perfect-match mini-mum distance only in a special case, i.e., where a route must passonly PoIs that perfectly match the given categories so as notto be dominated. The perfect-match minimum distance workswell if the number of sequenced route in S is large because theconstraint is usually satisﬁed by increasing the number of se-quenced route in S . Lemma 5.8.

Let R ′ and R ′′ be sequenced routes in S and R bea route such that (i) l ( R ) ≥ l ( R ′ ) and s ( R ) < s ( R ′ ) and (ii) l ( R ) < l ( R ′′ ) and s ( R ) ≥ s ( R ′′ ) . Let δ be the minimum increment of asemantic score . We can prune R if we have (a) l ( R ) ≥ l ( R ′ ) and s ( R ) + δ ≥ s ( R ′ ) and (b) l ( R ) + l p ( R ) ≥ l ( R ′′ ) and s ( R ) ≥ s ( R ′′ ) .proof: First, we consider case (a). If we have l ( R ) ≥ l ( R ′ ) and s ( R ) + δ ≥ s ( R ′ ) , R is dominated by or equivalent to R ′ if itssemantic score increases. Therefore, R must only pass throughPoI vertices that perfectly match the given categories not to bedominated. If R passes through only PoI vertices that perfectlymatch the given categories, the length score of R increases by The least increase of the semantic score is computed from the category tree.Speciﬁcally, we can compute the least increase from the category that is most sim-ilar (but not equal) to the next category. at least l p ( R ) . For case (b), if we have l ( R ) + l p ( R ) ≥ l ( R ′′ ) and s ( R ) ≥ s ( R ′′ ) , R is dominated by or equivalent to R ′′ if its lengthscore increases by l p ( R ) . As a result, if we have two routes R ′ and R ′′ , such as (i) l ( R ) ≥ l ( R ′ ) and s ( R ) + δ ≥ s ( R ′ ) and (ii) l ( R ) + l p ( R ) ≥ l ( R ′′ ) and s ( R ) ≥ s ( R ′′ ) , R is dominated by orequivalent to at least one of R ′ and R ′′ . (cid:3) To compute the estimation of the lower bound, we computetwo types of possible minimum distances l s and l p . A naive ap-proach computes all minimum distances from the PoI verticesthat semantically match c S q [ i ] to c S q [ i + ] for 1 ≤ i ≤ | S q | − multi-source multi-destination Dijkstra algorithm . In this algo-rithm, all start points are inserted into the same priority queue.Then, the algorithm dequeues vertices in the same manner as theconventional Dijkstra algorithm. Here, the process is terminatedif the top of the priority queue becomes one of the destinations.This approach only needs | S q | − Lemma 5.9.

The multi-source multi-destination Dijkstra algo-rithm guarantees the minimum distance from the start points tothe destinations.proof:

We ﬁrst insert multiple start points into the priorityqueue, and their distances from the start points are initialized as0. If we ﬁnd a vertex, it is inserted into the queue and the distanceto the vertex is updated from the closest start point to the vertex.The vertex with the smallest distance from the start point in thepriority queue is dequeued from the priority queue. If the top ver-tex in the priority queue is one of the destinations, there are nodestinations with smaller distance than the top one. Therefore,we can guarantee the minimum distance from the start points tothe destinations. (cid:3)

Algorithm 4 shows the pseudocode to compute the semantic-match minimum distance. The estimation of the lower bound isexecuted after line 4 in Algorithm 1. Here, we initialize P i and P i + (lines 3–4). l ( ϕ ) denotes the threshold for a route whose se-mantic score is 0. The diﬀerence between computing the semantic-match and perfect-match minimum distances is whether the PoIvertices in P i + semantically or perfectly match the given cate-gory. Example 5.10.

We show an example to compute the semantic-match minimum distance using Example 1.1. P , P , and P in-clude { p , p , p , p , p } , { p , p , p } , and { p , p , p , p , p } , re-spectively. First, PoI vertices in P are inserted to priority queue Q , and the set of destinations is P . By processing the Dijkstraalgorithm, we compute possible minimum distance l s [ ] = p to p ). Next, we search PoI vertices that semanticallymatch A&E to gift shop. Then, we compute l s [ ] = p to p ). Finally, we obtain semantic-match minimum distance l s = { , } . We can compute the perfect-match minimum dis-tance in the same way and obtain l p = { , } , which is greaterthan l s . Although

BSSR eﬃciently prunes unnecessary routes, itmay iteratively execute the modiﬁed Dijkstra algorithm at thesame vertex because, in Algorithm 1 (line 8), p d could be the lgorithm 4: Computing possible minimum distance procedure EstimationLowerbound( v q , S q ) for i : 1 to | S q | − do P i ← { p | p ∈ P tcSq [ i ] and D ( v q , p ) < l ( ϕ )} ; P i + ← { p | p ∈ P tcSq [ i + ] and D ( v q , p ) < l ( ϕ )} ; dist [ u ] = inf for all u ∈ V ∪ P , dist [ p ] = for all p ∈ P i ; priority_queue Q ← { p } ∈ P i ; while Q is not empty do u ← Q .dequeue; if u is already visited then continue; if u ∈ P i + then l s [ i ] = dist [ u ] ; break; for each u ′ for e ( u , u ′ ) ∈ E do if dist [ u ] + w ( u , u ′ ) < dist [ u ′ ] then dist [ u ′ ] = dist [ u ] + w ( u , u ′ ) ; Q .enqueue( u ′ ); return l s ; end procedure same as the former executions of the modiﬁed Dijkstra algo-rithms. Thus, we reuse the result starting at the same PoI vertexby materializing the result of the modiﬁed Dijkstra algorithm(i.e., keeping PoI vertices matching c d and distances from p d tothe PoI vertices), which we refer to as on-the-ﬂy caching .After ﬁnding SkySRs, on-the-ﬂy caching frees the results ofthe modiﬁed Dijkstra algorithms (this is why we call it on-the-ﬂy ), because the search space rarely overlaps across diﬀerent in-puts (i.e., S q and v q diﬀer). In this section, we theoretically analyze the cost and correctnessof the proposed

BSSR . Theorem 1. (Time complexity)

Let γ be a ratio of pruningand α be a ratio of the size of a graph to ﬁnd the SkySRs. The timecomplexity of BSSR is O ( γ ( α | P |) | S q | α (| E | + (| V | + | P |) log ( α (| V | + | P |)))) .proof: The time complexity of the Dijkstra algorithm is O (| E | + | V | log | V |) if the number of vertices is | V | . In our setting, wehave | V | + | P | vertices because we have two types of vertices.In addition, we do not need to search the whole graph by reduc-ing the graph size according to the threshold. Therefore, the timecomplexity of the modiﬁed Dijkstra algorithm is O ( α (| E | + (| V | + | P |) log ( α (| V | + | P |))) . The time complexity of BSSR depends onthe number of times the modiﬁed Dijkstra algorithms is exe-cuted. The number of modiﬁed Dijkstra algorithms is equal toall the potential routes | P | | S q | . Recall that we can prune the num-ber of routes using the branch-and-bound algorithm. Finally, thetime complexity of BSSR is O ( γ ( α | P |) | S q | α (| E | + (| V | + | P |) log ( α (| V | + | P |)))) . (cid:3) In our approach, γ and α depend on the upper and lowerbounds. These are aﬀected by the graph structure, the categorytrees, and the ratio of PoI vertices, and the time complexity of BSSR depends on these factors.

Theorem 2. (Space complexity)

Let γ be the pruning ratio,and α be the ratio of the size of the graph to ﬁnd the SkySRs. Thespace complexity of BSSR is O (| E | + | V | + | P | + γ | S q |( α | P |) | S q | ) . proof: We store the whole graph of size O (| E | + | V | + | P |) . Wealso store routes into the priority queue and S , and the maxi-mum number of routes is | P | | S q | . We can prune the number ofroutes using the branch-and-bound algorithm. The size of theroutes is proportional to | S q | . Therefore, the space complexityof BSSR is O (| E | + | V | + | P | + γ | S q |( α | P |) | S q | ) . (cid:3) If the number of routes in the priority queue is small, thegraph size becomes the main factor related to the memory us-age. Otherwise, the number of routes in the priority queue isthe main factor.

Theorem 3. (Correctness)

BSSR guarantees the exact result.proof:

BSSR prunes routes based on the upper and lower bounds.

BSSR safely prunes routes dominated by or equivalent to the ob-tained sequenced routes. As a result,

BSSR does not sacriﬁce theexactness of the search result. (cid:3)

We demonstrate

BSSR with optimization techniques using Ex-ample 1.1. Table 4 shows routes in priority queue Q b and se-quenced routes in S . To compute category similarity and seman-tic score, we use Equations (6) and (7), respectively.First, we process NNinit , and S initially includes {h p , p , p i , h p , p , p i} . 1st step: BSSR starts to ﬁnd PoI vertices that seman-tically match Asian restaurant from v q with the threshold of 15.Then, it ﬁnds p , p , p , p , and p . Both p ’s and p ’s cat-egory similarities are 1, and their lengths are 6 and 8, respec-tively. Thus, p comes the top in Q b . 2nd step: BSSR searchesPoI vertices that semantically match Arts&Entertainment from p , and ﬁnds p . Since h p , p i passes through p and l (h p , p i) is more than 15, both routes are not inserted to Q b . 3rd step:as the top route is h p , p i , BSSR searches PoI vertices that se-mantically match gift shop from p . BSSR does not ﬁnd anyroutes due to the threshold. 4th step:

BSSR fetches h p i from Q b and inserts two routes h p , p i and h p , p i to Q b . 5th step: BSSR fetches h p , p i and ﬁnds sequenced route h p , p , p i .Since h p , p , p i dominates h p , p , p i , h p , p , p i is deletedfrom S . 6th step: The top route h p , p i is deleted from Q b be-cause its length score is not smaller than the threshold of 13. 7thstep: BSSR fetches h p i and inserts h p , p i and h p , p i . 8th step: BSSR fetches h p , p i and ﬁnds a sequenced route h p , p , p i . h p , p , p i is inserted to S , and h p , p , p i is deleted from S .9th step: h p , p i is deleted due to the threshold. 10th step: BSSR fetches h p i and ﬁnds a route h p , p i . 11th step: BSSR ﬁnds a se-quenced route h p , p , p i , and the route dominates h p , p , p i .12th step: The distance from p to the PoI vertices that matchA&E is larger than the threshold. Finally, BSSR returns the setof SkySRs S . The SkySR query has a number of variations and extensions. Wediscuss some of these in the following.

Directed graphs:

The SkySR query can be easily applied to di-rected graphs. We only need to use the Dijkstra algorithm fordirected graphs. Here, no modiﬁcation of the main idea is re-quired.

PoI with multiple categories:

To treat PoIs with multiple cat-egories, we can change the deﬁnitions of sequenced routes and able 4: Example of BSSR algorithm Q b : S : h p , p , p i , h p , p , p i Q b : h p i , h p i , h p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p , p i , h p i , h p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p i , h p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p , p i , h p , p i , h p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p , p i , h p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p , p i , h p , p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p , p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p , p i , h p iS : h p , p , p i , h p , p , p i Q b : h p iS : h p , p , p i , h p , p , p i Q b : S : h p , p , p i , h p , p , p i category similarity. Speciﬁcally, we change condition (ii) in Def-inition 3.4 to state that at least one c p i [ j ] ( ≤ j ≤ k i ) semanti-cally matches c S [ i ] for 1 ≤ i ≤ | S | , where c p i [ j ] is the j -th cat-egory of p i and k i is the number of categories associated with p i . The category similarity is either the highest or the averagevalue among the category similarities. Complex category requirement:

We can specify more detailedcategory requirements, such as conjunction , disjunction , and nega-tion . For example, we can specify that a PoI category is “Amer-ican restaurant” or “Mexican restaurant” (disjunction), but not“Taco Place” (negation). If PoI vertices are associated with morethan two categories, we can specify a conjunction such as “Cafe”and “Bakery”. Note that the time complexity of our algorithmdoes not change if we specify a detailed requirement becausethe detailed requirements are equivalent to increasing the num-ber of categories. Skyline trip planning query:

The proposed algorithm can beapplied to the trip planning query without category order. Forsearching routes without category order, the proposed algorithmsearches PoI vertices that semantically match a category in agiven set of categories. Then, if the algorithm ﬁnds PoI vertices,it deletes the categories that are already included in the routesto ﬁnd next PoI vertices. Note that we need to modify some def-inition and scoring functions for routes without category order.By this procedure, we can ﬁnd skyline routes eﬃciently.

SkySR with destination:

Note that we can specify the destina-tion. The simple way to calculate a SkySR with a destination isto add the distance from the last visited PoI vertex to the desti-nation to the length score after ﬁnding the sequenced route. Toimprove eﬃciency, we traverse PoI vertices from both the desti-nation and the start point.

We perform experiments to evaluate the eﬀectiveness of the pro-posed algorithm. All algorithms are implemented in C++ andrun on an Intel(R) Xeon(R) CPU E5620 @ 2.40GHz with 32 GBof RAM.

Table 5: Summery of dataset

Dataset Area | V | | P | | E | Tokyo Tokyo 401,893 174,421 499,397NYC New York city 1,150,744 451,051 1,722,350Cal California 21,048 87,365 108,863

Algorithm.

We compare the proposed

BSSR and algorithms thatiteratively ﬁnd OSRs using the Dijkstra-based solution and thePNE approach (denoted

Dij and

PNE , respectively), as describedin Section 3. We evaluate performance with respect to (i) re-sponse time, and (ii) maximum resident set size (RSS) to repre-sent memory usage.

Dataset.

We conduct experiments using various maps (Tokyo,New York city, and California). Table 5 summarizes each dataset.For the Tokyo and NYC datasets, the road network is extractedfrom OpenStreetMap and the PoI information is extracted fromFoursquare. Each PoI is embedded on the closest edge in thesame way as [10] and is associated with the Foursquare categorytrees. Note that the number of category trees in Foursquare is 10.For the Cal dataset, the road network and PoI information areavailable online . The number of categories in the Cal datasetis 63 . For each dataset, we use distances based on longitudeand latitude as edge weights and treat the graphs as undirectedgraphs. The graphs are implemented using adjacency lists.For each dataset, we generate 100 searches, in which the sizeof a sequence is | S q | . The start points are selected randomly fromvertices in the maps. The categories of sequences are selectedrandomly from the leaf nodes in the category trees with the con-straint that they have diﬀerent category trees. Since the numberof PoI vertices associated with each category is signiﬁcantly bi-ased, we select only categories that have a large number of PoIvertices.Here, category similarity is calculated based on the Wu andPalmer similarity measure [19] and the semantic score is calcu-lated as the product of the category similarities of the sequencemembers. Speciﬁcally, we calculate the category similarity andsemantic score using the following equations: sim ( c , c ′ ) = max c i ∈ a ( c ′ ) · d ( c m ) d ( c ) + d ( c ′ ) , (6) s ( R ) = − Π min (| R | , | S q |) i = sim ( c p R [ i ] , c S q [ i ]) , (7)where a ( c ) , d ( c ) , and c m denote the set of ancestor categories of c (including c ), the depth of c , and the deepest common ancestorcategory of c and c i , respectively. First, we present an overview of the performance of all algo-rithms. Figure 3 shows the response time with various categorysequence sizes, and Table 6 shows the RSS for a category se-quence of size four. Here, “

BSSR w/o Opt ” denotes

BSSR withoutoptimization techniques. In Figure 3, there are missing bars forthe case of size of sequence 5, because the executions were notﬁnished after a month.

BSSR achieves the least response time with all datasets andreduces the search space by exploiting the branch-and-bound ∼ lifeifei/SpatialDataset.htm Since the PoIs in the Cal dataset have no category tree information, we generatea category of height three where a non-leaf node has three child nodes. -2 -1 R e s pon s e t i m e [ s e c ] Size of sequence |S q |BSSRBSSR w/o OptPNEDij (a) Tokyo -2 -1 R e s pon s e t i m e [ s e c ] Size of sequence |S q |BSSRBSSR w/o OptPNEDij (b) NYC -2 -1 R e s pon s e t i m e [ s e c ] Size of sequence |S q |BSSRBSSR w/o OptPNEDij (c) Cal Figure 3: Results obtained for the datasets with various | S q | Table 6: RSS Comparison

BSSR BSSR w/o Opt PNE Dij

Tokyo 239.6 MB 497.5 MB 239.8 MB 4.8 GBNYC 658.0 MB 659.4 MB 658.7 MB 9.7 GBCal 36.7 MB 53.7 MB 36.6 MB 70.3 MB

Table 7: Eﬀect of initial search for various | S q | Dataset Approach Metrics 2 3 4 5Tokyo Proposed Weight sum 0.009 0.013 0.017 0.021Response time [msec] 3.5 5.1 6.9 8.6 of routes 1.49 1.33 1.36 1.49Ratio 0.74 0.79 0.82 0.86Existing Weight sum 0.32 (regardless | S q | )NYC Proposed Weight sum 0.044 0.066 0.073 0.078Response time [msec] 10.7 16.5 19.5 24.1 of routes 1.76 1.79 1.81 1.82Ratio 0.67 0.81 0.85 0.83Existing Weight sum 1.31 (regardless | S q | )Cal Proposed Weight sum 0.79 1.28 1.57 1.85Response time [msec] 1.4 2.3 2.9 3.9 of routes 2.27 2.37 2.28 2.25Ratio 0.70 0.79 0.85 0.86Existing Weight sum 12.14 (regardless | S q | ) algorithm and the proposed optimization techniques. By com-paring BSSR and

BSSR w/o Opt , we conﬁrm that the optimiza-tion techniques increase eﬃciency. When the size of the cate-gory sequence is small,

PNE ﬁnds SkySRs eﬃciently because itcan search for sequenced routes eﬃciently if the category se-quence size is small. On the other hand, as category sequencesize increases, the response time of

PNE and

Dij increases sig-niﬁcantly. If the category sequence size is large,

BSSR achievesbetter performance than

PNE and

Dij even if we do not use opti-mization techniques. By comparing

Dij to PNE , it can be seenthat their performance depends on the datasets and the cate-gory sequence size. Although the PNE approach was proposedto be a more sophisticated algorithm than the Dijkstra-based so-lution [16],

PNE requires more time than

Dij for the NYC and Caldatasets, which implies that it is not eﬀectively robust to datasets.In terms of RSS,

BSSR and

PNE achieve nearly the same perfor-mance. These two algorithms do not store many routes in thepriority queue; therefore, RSS is highly dependent on the graphsize. On the other hand, as

Dij stores many routes in the pri-ority queue, RSS is signiﬁcantly larger than those of the otheralgorithms. Although we do not show the routes returned byeach algorithm due to space limitations, all algorithms outputthe same routes. As a result,

BSSR achieves the fastest responsetime with small memory usage without sacriﬁcing the exactnessof the result.

The optimization techniques improve the eﬃciency of

BSSR . Here,we evaluate each optimization technique.

Initial Search:

We show the search spaces with and with-out an initial search for the ﬁrst modiﬁed Dijkstra algorithmto evaluate the eﬀect of the initial search. Moreover, we eval-uate

NNinit in terms of response time. Table 7 shows the weightsum, which represents the search space, the response time of

NNinit , and the number of sequenced routes found by

NNinit for various category sequence sizes. In addition, we show theratio of the length score of the sequenced route with the largestsemantic score among the sequenced routes found in the initialsearch to the length score of the sequenced route whose seman-tic score is 0 in the initial search. The weight sum with the ini-tial search is signiﬁcantly smaller than that without the initialsearch. We can avoid traversing the whole graph using the ini-tial search; thus, this can signiﬁcantly reduce the search spaceof

BSSR . Moreover, since the response time of

NNinit is signiﬁ-cantly less than that of

BSSR (Figure 3), we conﬁrm that

NNinit can reduce the search space eﬃciently. Note that the numberof sequenced routes found by the initial search is not large. Onthe other hand, the length score of the sequenced route withthe largest semantic score is much smaller than that of the se-quenced route whose semantic score is 0. As a result,

NNinit reduces the search space signiﬁcantly without increasing totalresponse time.

Tightening Upper Bound:

The priority queue aims at eﬃ-ciently tightening the upper bound to reduce the search space.Here, we show the total number of vertices visited by

BSSR ,which is highly related to the response time. Table 8 shows thetotal number of vertices visited by the proposed priority queueand distance-based priority queue for various category sequencesizes. The number of vertices visited by the proposed priorityqueue is less than that of the distance-based priority queue. Inparticular, as the size of the category sequences increases, theperformance gap increases because, as the category sequencesize increases, the distance-based priority queue cannot ﬁnd se-quenced routes eﬃciently. Thus, the upper bound is rarely up-dated. On the other hand, the proposed priority queue can up-date the upper bound eﬃciently because the route with the largestsize is dequeued preferentially. Thus, the proposed priority queueis more suitable than the distance-based approach for ﬁndingSkySRs.

Tightening Lower Bound:

To tighten the lower bound, wepropose two types of possible minimum distances, i.e., semantic-match and perfect-match minimum distances. If the minimum R a t i o o f w e i gh t s u m Semantic-matchPerfect-match

Figure 4: Eﬀect of minimumpossible distances o f D ij ks t r a Size of sequence |S q |with cachew/o cache (a) Tokyo o f D ij ks t r a Size of sequence |S q |with cachew/o cache (b) NYC o f D ij ks t r a Size of sequence |S q |with cachew/o cache (c) Cal Figure 5: Eﬀect of on-the-ﬂy caching for various | S q | Table 8: Eﬀect of priority queue for various | S q | Dataset Approach 2 3 4 5Tokyo

Proposed

Distance-based

NYC

Proposed

Distance-based

Cal

Proposed

Distance-based o f S ky S R s Size of sequence |S q |TokyoNYCCal Figure 6: Number of SkySRs for various | S q | possible distance is large, we can prune routes even if the routesinclude a small number of PoI vertices. Figure 4 shows the ra-tios of the possible minimum distances to the sum weights ofthe initial search when we set the category sequence size to ﬁve.The semantic-match and perfect-match minimum distances inthe Tokyo dataset eﬀectively reduce the search space by tighten-ing the lower bound. However, diﬀerent from the Tokyo dataset,the possible minimum distances in the NYC and Cal datasets aresmall. Since the PoI vertices in the two datasets are relativelyconcentrated in a small area, the possible minimum distances be-come small. The eﬀect of the possible minimum distances highlydepends on the skews of locations of the PoI vertices. On-the-ﬂy Caching:

On-the-ﬂy caching can reuse the re-sults of former modiﬁed Dijkstra algorithm executions; thus, thenumber of executions of the Dijkstra algorithm decreases. Figure5 shows the numbers of executions of modiﬁed Dijkstra algo-rithms by

BSSR with all optimization techniques and those ex-cept for on-the-ﬂy caching. The number of executions of the Di-jkstra algorithms decreases using on-the-ﬂy caching. In particu-lar, when the category sequence size increases, the performancegap increases because, as the category sequence size increases,we have more opportunities to reuse former results. Thus, weconﬁrm that on-the-ﬂy caching is eﬀective to reduce the num-ber of executions of the Dijkstra algorithms.

Figure 6 shows the number of SkySRs obtained with each datasetfor various | S q | . As shown, the Cal dataset returns the largest Table 9: Example SkySRs in Tokyo

Distance Sequenced route7451 meters Beer Garden → Sushi Restaurant → Sake Bar1295 meters Bar → Sushi Restaurant → Sake Bar

013 24

Sushirestaurant BarSake Bar Sake Bar BeerGardenSushirestaurant

Second route First route

Start pointDestination

Figure 7: Visualization of routes in Tokyo: black circles(with 0 and 4) denote a start point and a destination, re-spectively. Blue and red circles denote sequences of PoIsfor the ﬁrst and second routes in Table 9, respectively, andtheir numbers indicate the order of PoIs to be visited. number of SkySRs. The response time and RSS obtained withthe Tokyo and NYC datasets are much greater than the those ofthe Cal dataset, which implies that the number of SkySRs doesnot aﬀect response time and RSS signiﬁcantly. Moreover, if weuse a complete real-world dataset, we may not require a rankingfunction because the number of SkySRs would be small.

We show an example of SkySRs in Tokyo. We assume that weplan to go to places for dinner and drinks. We want to visit a“Beer garden”, a “Sushi restaurant”, and a “Sake bar” from ourcurrent location and ﬁnally go to our hotel. Table 9 and Figure 7show two representative SkySRs from the four identiﬁed SkySRs.Note that the other two routes are similar to either of the rep-resentative routes. In the Foursquare category trees, “Bar” in-cludes “Beer Garden” and “Sake bar”, and “Japanese restaurant”includes “Sushi restaurant”. Thus, we ﬁnd routes using “Bar”and/or “Japanese restaurant”. The second route is much shorterthan the ﬁrst route that perfectly matches the user requirement,and the diﬀerence between them is only whether they pass a“Bar” or “Beer garden”. The best route depends on the users andsituations (e.g., weather); thus, we conﬁrm that SkySRs are use-ful to help users make decisions. igure 8: Screenshot of the prototype system R a t i o o f an s w e r s Figure 9: Ratios of answers for each question

We developed a prototype SkySR query service using Open-StreetMap and the Santander Open Data platform from Santander,Spain . Figure 8 shows a screenshot of the prototype system,which outputs one of the SkySR route. We performed a test inJuly, 2017. To gather users for this test, the Santander municipal-ity arranged meetings with diﬀerent groups of people to presentthe service: municipal staﬀ (computing, convention and tourismmunicipal services), students from vocational training depart-ments who are developing webpages and apps, and citizens. Wealso provided a leaﬂet that shows the concept of the SkySR queryand how to use the service. In this test, users freely used the ser-vice and answered a questionnaire (25 respondents). The ques-tionnaire included the following three questions. Q1 What do you think about this service?

Answer.

1. I love it, 2. I like it, 3. I do not like it. Q2 Would you recommend it to anyone?

Answer.

1. Yes, 2. Maybe, 3. No. Q3 Do you think that it is a good idea for the city: citizens,tourists, commercial sectors?

Answer.

1. Yes, 2. Maybe, 3. No.We summarize the ratios of answers for each question in Fig-ure 9. As shown, more than 80% of the users liked the service. Inaddition, the questionnaire shows that the service is valuable forthe city. From the user experiment, we conﬁrm that the SkySRquery is useful for users and cities.

In this paper, we have ﬁrst introduced a semantic hierarchy fortrip planning. We then proposed the skyline sequenced route https://ss.festival.ckp.jp/OuRouteSuggestion/dispSearchRoute/index. The defaultlanguage is Spanish. http://datos.santander.es (SkySR) query, which ﬁnds all preferred routes from a start pointaccording to a user’s PoI requirements. In addition, we haveproposed an eﬃcient algorithm for the SkySR query, i.e., BSSR ,which simultaneously searches for all SkySRs by a single traver-sal of a given graph. To optimize the performance of

BSSR , weproposed four optimization techniques. We evaluated the pro-posed approach using real-world datasets and demonstrated thatit comprehensively outperforms naive approaches in terms of re-sponse time without increasing memory usage or sacriﬁcing theexactness of the result. Moreover, we developed a SkySR queryservice using open data, and conducted a user test, which con-ﬁrmed that SkySR queries are useful for both users and cities.In future work, we would like to extend the proposed ap-proach in several directions. First, because we assume a foreststructure for the category classiﬁcation in this paper, a morecomplex classiﬁcation may provide better granularity. Second,because we have not used any preprocessing techniques such asindexing, we plan to propose a suitable preprocessing methodfor the SkySR query. Finally, although the SkySR query proposedin this paper considers two scores (length and category similar-ity), it could be extended to consider many attributes of a PoI(e.g., text, keywords, and ratings) and the cost/quality of a graph(e.g., route popularity, tolls, and the number of traﬃc lights).

ACKNOWLEDGEMENT

This research is partially supported by the Grant-in-Aid for Sci-entiﬁc Research (A)(JP16H01722) and Grant-in-Aid for YoungScientists (B)(JP15K21069).

REFERENCES [1] Saad Aljubayrin, Zhen He, and Rui Zhang. 2015. Skyline Trips of MultiplePOIs Categories. In

DASFAA . 189–206.[2] S Börzsöny, Donald Kossmann, and Konrad Stocker. 2001. The Skyline Oper-ator. In

ICDE . 421–430.[3] Haiquan Chen, Wei-Shinn Ku, Min-Te Sun, and Roger Zimmermann. 2008.The Multi-rule Partial Sequenced Route Query. In

ACM SIGSPATIAL GIS . 1–10.[4] Jian Dai, Chengfei Liu, Jiajie Xu, and Zhiming Ding. 2016. On Personalizedand Sequenced Route Planning.

World Wide Web

19, 4 (2016), 679–705.[5] Jochen Eisner and Stefan Funke. 2012. Sequenced route queries: Gettingthings done on the way back home. In

ACM SIGSPATIAL . 502–505.[6] Pierre Hansen. 1980. Bicriterion path problems. In

Multiple criteria decisionmaking theory and application . 109–127.[7] Xuegang Huang and Christian S Jensen. 2005. In-route skyline querying forlocation-based services. In

W2GIS . 120–135.[8] H-P Kriegel, Matthias Renz, and Matthias Schubert. 2010. Route SkylineQueries: A Multi-preference Path Planning Approach. In

ICDE . 261–272.[9] Eugene L Lawler and David E Wood. 1966. Branch-and-bound Methods: ASurvey.

Operations research

14, 4 (1966), 699–719.[10] Feifei Li, Dihan Cheng, Marios Hadjieleftheriou, George Kollios, and Shang-Hua Teng. 2005. On Trip Planning Queries in Spatial Databases. In

SSTD .273–290.[11] Jing Li, Yin David Yang, and Nikos Mamoulis. 2013. Optimal Route Querieswith Arbitrary Order Constraints.

TKDE

25, 5 (2013), 1097–1110.[12] Xiaobin Ma, Shashi Shekhar, Hui Xiong, and Pusheng Zhang. 2006. Exploitinga Page-level Upper Bound for Multi-type Nearest Neighbor Queries. In

ACMGIS . 179–186.[13] Ernesto Queiros Vieira Martins. 1984. On a multicriteria shortest path prob-lem.

European Journal of Operational Research

16, 2 (1984), 236–245.[14] Yutaka Ohsawa, Htoo Htoo, Noboru Sonehara, and Masao Sakauchi. 2012.Sequenced Route Query in Road Network Distance based on Incremental Eu-clidean Restriction. In

DEXA . 484–491.[15] Philip Resnik. 1995. Using Information Content to Evaluate Semantic Simi-larity in a Taxonomy. In

IJCAI . 448–453.[16] Mehdi Sharifzadeh, Mohammad Kolahdouzan, and Cyrus Shahabi. 2008. TheOptimal Sequenced Route Query.

The VLDB Journal

17, 4 (2008), 765–787.[17] Michael Shekelyan, Gregor Jossé, and Matthias Schubert. 2015. Linear PathSkylines in Multicriteria Networks. In

ICDE . 459–470.[18] Yuan Tian, Ken CK Lee, and Wang-Chien Lee. 2009. Finding Skyline Paths inRoad Networks. In

ACM SIGSPATIAL GIS . 444–447.[19] Zhibiao Wu and Martha Palmer. 1994. Verbs Semantics and Lexical Selection.In

ACL . 133–138.20] Bin Yang, Chenjuan Guo, Christian S Jensen, Manohar Kaul, and Shuo Shang.2014. Stochastic Skyline Route Planning under Time-varying Uncertainty. In