Sequenced Route Query with Semantic Hierarchy
Yuya Sasaki, Yoshiharu Ishikawa, Yasuhiro Fujiwara, Makoto Onizuka
aa r X i v : . [ c s . D B ] S e p Sequenced Route Query with Semantic Hierarchy
Yuya Sasaki † , Yoshiharu Ishikawa ‡ , Yasuhiro Fujiwara §† , Makoto Onizuka † † Graduate School of Information Science and Technology, Osaka University, Osaka, Japan ‡ Graduate School of Information Science, Nagoya University, Nagoya, Japan § NTT Software Innovation Center, Tokyo, [email protected],[email protected],[email protected],[email protected]
ABSTRACT
The trip planning query searches for preferred routes startingfrom a given point through multiple Point-of-Interests (PoI) thatmatch user requirements. Although previous studies have in-vestigated trip planning queries, they lack flexibility for findingroutes because all of them output routes that strictly match userrequirements. We study trip planning queries that output multi-ple routes in a flexible manner. We propose a new type of querycalled skyline sequenced route (SkySR) query, which searches forall preferred sequenced routes to users by extending the short-est route search with the semantic similarity of PoIs in the route.Flexibility is achieved by the semantic hierarchy of the PoI cat-egory. We propose an efficient algorithm for the SkySR query, bulk SkySR algorithm that simultaneously searches for sequencedroutes and prunes unnecessary routes effectively. Experimentalevaluations show that the proposed approach significantly out-performs the existing approaches in terms of response time (upto four orders of magnitude). Moreover, we develop a prototypeservice that uses the SkySR query, and conduct a user test toevaluate its usefulness.
Recently, technological advances in various devices, such as smartphones and automobile navigation systems, have allowed usersto obtain real-time location information easily. This has triggeredthe development of location-based services such as Foursquare,which exploit rich location information to improve service qual-ity. The users of the location-based services often want to findshort routes that pass through multiple Points-of-Interest (PoIs);consequently, developing trip planning queries that can find theshortest routes that passes through user-specified categories hasattracted considerable attention [4, 10]. If multiple PoI categories,e.g., restaurant and shopping mall, are in an ordered list (i.e.,a category sequence ), the trip planning query searches for a se-quenced route that passes PoIs that match the user-specified cat-egories in order.
Example 1.1.
Figure 1 shows a road network with the follow-ing PoIs: “Asian restaurant”, “Italian restaurant”, “Gift shop”, “Hobbyshop”, and “Arts&Entertainment (A&E)”. Assume that a user wantsto go to an Asian restaurant, an A&E place, and a gift shop in thisorder from start point v q . The sequenced route query outputsroute R v q that satisfiedthe user requirements h Asian restaurant, A&E, gift shop i .Existing approaches find the shortest route based on the userquery. However, such approaches may find an unexpectedly longroute because the found PoIs may be distant from the start point. © 2018 Copyright held by the owner/author(s). Published in Proceedings of the21st International Conference on Extending Database Technology (EDBT), March26-29, 2018, ISBN 978-3-89318-078-3 on OpenProceedings.org.Distribution of this paper is permitted under the terms of the Creative Commonslicense CC-by-nc-nd 4.0. A I GAI G HH
AIGH G R3 R2R1 I p p p p p p p p p p p p p q User-locationFoodShop & ServiceArts & EntertainmentAsian RestaurantItalian RestaurantGift ShopHobby Shop v q Figure 1: An example of a road network with PoIs
Japanese BakeryItalianAsian Gift shop Hobby shopFood Shop & ServiceClothingstoreMen's storeSushi
Figure 2: Examples of category trees in Foursquare
A major problem with the existing approaches is that they onlyoutput routes that perfectly match the given categories [5, 14,16]. To overcome this problem, we introduce flexible similaritymatching based on PoI category classification to find shorterroutes in a flexible manner. In the real-world, category classifi-cation often forms a semantic hierarchy , which we refer to asa category tree . For example, in Foursquare , the “Food” cate-gory tree includes “Asian restaurant,” “Italian restaurant,” and“Bakery” as subcategories, and the “Shop &Service” category in-cludes “Gift shop,” “Hobby shop,” and “Clothing store” as subcat-egories (Figure 2). We employ this semantic hierarchy to eval-uate routes in terms of two aspects, i.e., route length and thesemantic similarity between the categories of the PoIs in theroute and those specified in the user query. As a result, we canfind effective sequenced routes that semantically match the userrequirement based on the semantic hierarchy. For example, inFigure 1, route R skyline sequenced route (SkySR) query , which applies the skylineconcept to the route length and semantic similarity (i.e., we con-sider route length and semantic similarity as route scores). Givena start point and a sequence of PoI categories, a SkySR query https://developer.foursquare.com/categorytree able 1: Example routes in New York city Approach Distance Sequenced routeExisting 3239 meters Cupcake Shop → Art Museum → Jazz Club(e.g., [16])Proposed 3239 meters Cupcake Shop → Art Museum → Jazz Club1858 meters Dessert Shop → Art Museum → Jazz Club1392 meters Dessert Shop → Museum → Jazz Club823 meters Dessert Shop → Museum → Music Venue searches for sequenced routes that are no worse than any otherroutes in terms of length and semantic similarity.
Example 1.2.
Table 1 shows real-world examples of sequencedroutes in New York city where a user plans to go to a cupcakeshop, an art museum, and then a jazz club in this order. The ex-isting approaches output a single route that matches the user’srequirement perfectly. The proposed approach can output threeadditional routes that are shorter than the route found by the ex-isting approach. Note that the additional routes also satisfy theuser query semantically. The user can select a preferred routeamong all the four routes depending on how far he/she does notwant to walk or their available time.The SkySR query can provide effective trip plans; however,it incurs significant computational cost because a large numberof routes can match the user requirement. Therefore, the SkySRquery requires an efficient algorithm. The challenge is to searchfor SkySRs efficiently by reducing the search space without sac-rificing the exactness of the result. We propose bulk SkySR al-gorithm (
BSSR for short) that finds exact SkySRs efficiently. Re-call that a feature of SkySRs is that their scores are no worsethan those of other sequenced routes.
BSSR exploits the branch-and-bound algorithm [9], which effectively prunes unnecessaryroutes based on the upper and lower bounds of route scores. Inaddition, to improve efficiency more, we employ four techniquesto optimize
BSSR . (1) First, we initially find sequenced routes tocalculate the upper bound. (2) We tighten the upper bound byarranging the priority queue and (3) tighten the lower boundby introducing minimum distances. (4) we keep intermediate re-sults for later processing, which refer to as on-the-fly caching .Our approach significantly outperforms existing approaches interms of response time (up to four orders of magnitude) with-out increasing memory usage or sacrificing the exactness of theresult.The main contributions of this paper are as follows. • We introduce a semantic hierarchy to the route searchquery, which allows us to search for routes flexibly. • We propose the skyline sequenced route (SkySR) query , whichfinds all preferred routes related to a specified categorysequence with a semantic hierarchy (Section 4). • We propose an exact and efficient algorithm and its op-timization techniques to process SkySR queries (Section5). • We discuss variations and extensions of the SkySR query.The SkySR query can be applied to various user require-ments and environments (Section 6). • We demonstrate that the proposed approach works wellin terms of response time and memory usage by perform-ing extensive experiments. (Section 7). • We develop a prototype service that employs the SkySRquery and conduct a user test to evaluate usefulness ofthe SkySR query. (Section 8). The remainder of this paper is organized as follows. Section 2introduces related work. Section 3 describes the problem formu-lation, and Section 4 defines the SkySR query. Section 5 presentsthe proposed algorithm. In Section 6, we discuss variations andextensions of the SkySR query. Sections 7 and 8 present experi-ment and user test results, respectively, and Section 9 concludesthe paper.
First, we review trip planning query studies related to the SkySRquery. Then, we review some studies related to the skyline op-erator. To the best of our knowledge, no study has considereda skyline sequenced route; thus, our problem cannot be solvedefficiently using existing approaches.
Trip planning:
We categorize trip planning queries in Table2. Note that all existing trip planning queries only output routesthat perfectly match the user-specified category sequences. More-over, since most trip planning queries assume Euclidean distance,they cannot find SkySRs, in which road network distance is as-sumed. Dai et al. [4] proposed a personalized sequenced routeand assumed that PoIs have ratings as well as categories and thatusers assign weighting factors as preferences. Although this per-sonalized sequenced route considers route lengths and ratings,it only outputs the route that perfectly matches the given cate-gories and has the best score based on lengths, ratings, and pref-erences. Only the optimal sequenced route (OSR) is applicable tofind SkySRs without modification because the OSR and SkySRare based on the same settings (except for scoring). Sharifzadehet al. [16] proposed two algorithms to find OSRs in road net-works: the
Dijkstra-based solution and the
Progressive NeighborExploration (PNE) approach . The main difference between thesealgorithms is that the Dijkstra-based solution employs the Dijk-stra algorithm to search for PoIs and the PNE approach employsthe nearest neighbor search. It has been reported that these algo-rithms are comparable in terms of performance [16]. Thus, weconsider both algorithms to verify the performance of the pro-posed approach.
Skyline:
The skyline operator was proposed previously [2].Few studies have considered the skyline concept for route searches.Recently, the skyline route (or skyline path) has received consid-erable attention [1, 6, 8, 13, 17, 18, 20]. A skyline route assumesthat edges on road networks are associated with multiple costs,such as distance, travel time, and tolls. Here, the objective is tofind skyline routes from a start point to a destination consideringthese multiple costs. However, since we specify a category se-quence rather than a destination, we cannot apply conventionalalgorithms to find SkySRs. The continuous skyline query in roadnetworks (e.g., [7]) searches for the skyline PoIs for a moving ob-ject considering both the PoI category and the distances to themoving object. Because continuous skyline queries search for asingle PoI category, these solutions are not applicable to SkySRqueries, which obtain routes that pass through multiple PoIs.
Table 3 summarizes the notations used in this paper. We assumea connected graph G = ( V ∪ P , E ) , where V , P , and E ⊆ ( V ∪ P )×( V ∪ P ) represent the sets of vertices, PoI vertices, and edges,respectively. This graph corresponds to a road network that con-tains PoIs. The numbers of vertices, PoI vertices, and edges aredenoted | V | , | P | , and | E | , respectively. PoI vertex p ∈ P is as-sociated with category c ∈ C , where C is the set of categories. able 2: Types of trip planning queries. Type Distance metrics Order Destination Result ScoresSkySR (proposed) Network Total Yes or No Exact Length and semanticOptimal sequenced route (OSR) [16] Euclidean or Network Total Yes or No Exact LengthSequenced route [5, 14] Network Total Yes Exact LengthPersonalized sequenced route [4] Euclidean Total No Approximate Length and ratingTrip planning [10] Euclidean or Network Non Yes Approximate LengthMulti rule partial sequenced route [3] Euclidean Partial No Approximate LengthMulti rule partial sequenced route [11] Euclidean Partial No Exact LengthMulti-type nearest neighbor [12] Euclidean Non No Exact Length
Table 3: Notations
Symbol Meaning V Set of vertices P Set of PoI vertices E Set of edges p PoI vertex C Set of categories c Category t Category tree c p Category of PoI vertex pt c Category tree of c P c Set of PoI vertices associated with c P t Set of PoI vertices associated with t S Category sequence (sequence of categories) R Route (sequence of PoI vertices) S R Sequential PoI categories in R l ( R ) Length score of R s ( R ) Semantic score of R R Set of routes E( R ) Set of super-routes of R S Minimal set of sequenced routes S q Category sequence specified by user v q Start point specified by user
We denote the category of PoI vertex p as c p , and assume thateach PoI is associated with a single category. Each category isassociated with category tree t , and we denote the category treeof category c as t c . We denote the set of PoI vertices associatedwith c and the set of PoI vertices associated with the categorytree t as P c and P t , respectively. If a PoI vertex is associatedwith category c , it is also associated with all ancestor categoriesof c in t c . Each edge e ( u i , u j ) in E is associated with a weight w ( u i , u j ) (≥ ) . The weight can represent either travel durationor distance. Next, we define several terms required to introducethe skyline sequenced route (SkySR). Definition 3.1. (Category sequence) A category sequence S = h c S [ ] , c S [ ] , . . . , c S [| S |]i is a sequence of categories, where | S | is the size of S . c S [ i ] ∈ C denotes the i -th category in S . A super-category sequence of S is a category sequence where each i -thcategory is either c S [ i ] or an ancestor of c S [ i ] (1 ≤ i ≤ | S | ) inthe category tree. Definition 3.2. (Route) A route R = h p R [ ] , . . . , p R [| R |]i is asequence of PoI vertices in a road network, where p R [ i ] ∈ P and | R | denote the i -th PoI vertex in R and the size of R , respectively. S R denotes the category sequence of R (i.e., h c p R [ ] , . . . , c p R [| R |] i ).In addition, we define a super-route of R as an extended route of R , such as h R , p i , p j , . . . i . In other words, a super-route of R isobtained by adding a sequence of PoI vertices to the end of R . R and E( R ) denote a set of routes and a set of super-routes of R ,respectively. Moreover, given a route R = h p R [ ] , . . . , p R [| R |]i and a PoI vertex p , we define R ⊕ p = h p R [ ] , . . . , p R [| R |] , p i . Definition 3.3. (Category similarity)
Given two categories c and c ′ , the similarity sim ( c , c ′ ) ∈ [ , ] is calculated by an ar-bitrary function such as the Wu and Palmer similarity or path length [15, 19]. We assume the following relations in the simi-larity. • c is irrelevant to c ′ if both exist in different category trees;thus, we obtain sim ( c , c ′ ) = • c semantically matches c ′ if c and c ′ are in the same cate-gory tree; thus, we obtain 0 < sim ( c , c ′ ) ≤ • c perfectly matches c ′ if c and c ′ are the same; thus, weobtain sim ( c , c ′ ) = sequenced route using the above definitions. Thedifference between our definition of sequenced route and the pre-vious definition [16] is that we consider category similarity. Definition 3.4. (Sequenced route)
Given category sequence S = h c S [ ] , . . . , c S [| S |]i , R = h p R [ ] , . . . , p R [| R |]i is a sequencedroute of category sequence S if and only if it satisfies (i) | R | = | S | ,(ii) c S [ i ] semantically matches c p R [ i ] for all i such that 1 ≤ i ≤ | S | ,and (iii) all PoI vertices in R differ each other. Definition 3.5. (Route scores)
Given category sequence S andvertex v as a start point, we define two scores for route R : lengthscore l ( R ) ∈ [ , inf ] and semantic score s ( R ) ∈ [ , ] . We definethe length score l ( R ) as follows: l ( R ) = D ( v , p R [ ]) + Σ | R |− i = D ( p R [ i ] , p R [ i + ]) , (1)where D ( u i , u j ) denotes the smallest weight sum of the edgeson the routes between vertices (or PoIs) u i and u j . The semanticscore s ( R ) is calculated by an aggregation function f as follows: s ( R ) = f ( h , h , . . . , h | R | ) , (2)where h i denotes sim ( c S [ i ] , c p R [ i ] ) . We assume that, if all h i = s ( R ) =
0, i.e., if all PoI vertices in a route perfectly matchthe categories, the semantic score of the given route is 0. Wealso assume that s ( R ) is the possible minimum semantic scoreof R when it is a sequenced route. Without loss of generality,preferred routes have small length and semantic score. Here, we define the SkySR query. Intuitively, a SkySR is a poten-tial route that may be the best route related to the user’s require-ment. A potential route is a route that is not dominated by anyother routes; the notion of dominance is used in the skyline oper-ator [2]. We define dominance for sequenced routes and SkySRquery in the following.
Definition 4.1. (Dominance)
Let R be the set of all sequencedroutes starting from point v for category sequence S . For two se-quenced routes R , R ′ ∈ R , we say that R dominates R ′ if wehave (i) l ( R ) < l ( R ′ ) and s ( R ) ≤ s ( R ′ ) or (ii) s ( R ) < s ( R ′ ) and l ( R ) ≤ l ( R ′ ) . If two sequenced routes have the same length and se-mantic scores, the routes are equivalent in the dominance, and aset of sequenced routes is minimal if it has no equivalent routes. efinition 4.2. (SkySR query) Given vertex v q as a start pointand category sequence S q , a skyline sequenced route is a se-quenced route not dominated by other routes. Let R be the set ofall sequenced routes from start point v q for category sequence S q , and let S be a minimal set of the sequenced routes. TheSkySR query returns S that includes sequenced routes such thatall R ∈ S are SkySRs and all R ′ ∈ R \ S are dominated by orequivalent to some of R ∈ S .An naive solution to find SkySRs is to first enumerate SkySRcandidates by iteratively executing OSR queries for any super-category sequences of S q and then check the dominance amongthe routes. The number of super-category sequences of S q in-creases exponentially as the depth of the category in the cate-gory tree and the size of S q increase. Thus, although OSR al-gorithms can find a sequenced route efficiently, we must repeatmany searches. As a result, the naive solution needs significantlyhigh computational cost to find SkySRs. In this section, we present the proposed approach, which we re-fer to as the bulk SkySR algorithm ( BSSR ), that finds SkySRs effi-ciently. Section 5.1 presents the
BSSR design policy, and Section5.2 explains the
BSSR procedure. In Section 5.3, we propose opti-mization techniques for
BSSR . We also theoretically analyze itsperformance in Section 5.4. Finally, we show a running exam-ple of
BSSR in Section 5.5. In Section 5, we assume undirectedgraphs in which each PoI vertex is associated with only one cate-gory and that users give sequences of single PoI categories. How-ever, in a real application, the graphs would be directed graphs,each PoI vertex would be associated with multiple categories,and users may specify complex categories. Section 6 describeshow we handle the above conditions.
Our idea to improve efficiency is to find sequenced routes simul-taneously (i.e., by searching sequenced routes in bulk) in order toreduce the search space. We have two choice as the basis for ourapproach; Dijkstra-based or nearest neighbor-based approaches[16]. We use the Dijkstra-based approach as the basis of our al-gorithm. Recall that a SkySR query has two scores for a route,i.e., length and semantic scores. To find all SkySRs, we must findroutes that have small category scores even if the routes havelarge length scores. However, PoIs that are included in the routeswith small category scores could be distant from the start point.Although the nearest neighbor-based approach finds the closestPoIs, it cannot efficiently find such PoIs. On the other hand, theDijkstra-based approach searches for all PoI vertices that matcha PoI category. Therefore, the Dijkstra-based approach is moresuitable for the SkySR query than the nearest neighbor-basedapproach.Although our approach finds sequenced routes simultaneously,it entails a large number of executions of the Dijkstra algorithm.This is because, since the number of PoI candidates increases, alarge number of possible routes increases. The search space doesnot become small effectively. To effectively reduce the searchspace, we exploit the branch-and-bound algorithm, which usesthe upper and lower bounds of a branch of the search spaceto solve an optimization problem effectively. With
BSSR , eachbranch corresponds to each route. For the upper and lower bounds,we compute the bounds during finding the set of SkySRs. Specif-ically, we compute the upper bound of a route from the already found sequenced routes, and we compute the lower bound fromthe current searched route (i.e., not a sequenced route yet). Withthe upper and lower bounds, we can safely prune unnecessaryroutes to improve efficiency.To further increase efficiency, we propose optimization tech-niques for
BSSR . In order to exploit the branch-and-bound algo-rithm, it is necessary to initialize the upper bound. Thus, we firstsearch for a sequenced route to initialize the upper bound. How-ever, it may take high computational cost to find a sequencedroute. Therefore, we propose a nearest neighbor-based initial searchmethod ( NNinit ) that finds sequenced routes efficiently by greed-ily finding PoI vertices. In addition, to effectively update the up-per bound, we assign a priority to each route and use the priorityqueue to efficiently find routes that are likely to give an effectiveupper bound. To compute the lower bound, we compute the pos-sible minimum distance and add it to the length score of a routeto safely prune unnecessary routes. Moreover, to avoid execut-ing the Dijkstra algorithm iteratively from the same vertices, wematerialize search results of the Dijkstra algorithm and reusethem to search the PoI vertices. By using
BSSR with optimiza-tion techniques, we can perform the SkySR query efficiently.
Bulk SkySR algorithm ( BSSR ) finds all SkySRs by finding simulta-neously sequenced routes with checking dominance on demand.The naive solution must execute OSR queries for all super-categorysequences of S q one by one because it only searches for thePoIs that perfectly match the given category. In contrast, BSSR searches for all PoIs that semantically match the given category.The basic process of
BSSR is simple as shown in Algorithm1: (i) start searching the PoI vertices that match the first cate-gory from start point v q and insert the route found into priorityqueue Q b which stores all found routes (line 4), (ii) fetch a routefrom Q b (line 6), (iii) search for the next PoI vertices that seman-tically match the next category c d from PoI vertex p d which isthe end of the fetched route, and insert the fetched route witheach of the found PoI vertices into Q b (lines 7–9), and (iv) if Q b is not empty, return to (ii), otherwise output the minimal set ofsequenced route S (line 10). In steps (i) and (iii), we find PoI ver-tices from the end of the fetched route using a Dijkstra algorithmmodified for the SkySR query as described in Section 5.2.2. Algorithm 1:
Bulk SkySR algorithm procedure BSSR( v q , S q ) S ← ϕ ; priority_queue Q b ← ϕ ; mDijkstra( ϕ , c S [ ] , v q , Q b , S ) ; while Q b is not empty do R ← Q b .dequeue(); c d ← c S [| R | + ] ; p d ← p R [| R |] ; mDijkstra( R , c d , p d , Q b , S ) ; return S ; end procedure We search for sequenced routes si-multaneously to reduce the search space. Our idea to safely re-duce the search space is to exploit the branch-and-bound algo-rithm, which can reduce unnecessary search space. This sectiondescribes the theoretical background of using the branch-and-bound algorithm. We use the following three lemmas to reducethe search space: emma 5.1.
Let S be a minimum set of sequenced routes whilesearching for SkySRs and S ′ be the minimum set of sequencedroutes after finding SkySRs. If sequenced route R is dominated bya sequenced route in S , R cannot be included in S ′ .proof: From Definition 4.2, we search for a set of SkySRs, whichare not dominated by the other sequenced routes. If we find a se-quenced route not dominated by any sequenced routes in S , weupdate S by inserting the new sequenced route and deleting asequenced route dominated by the new one. Therefore, any se-quenced routes in S after the update are not dominated by anysequenced routes in S prior to the update. As a result, sequencedroutes in S ′ are not dominated by any sequenced routes in S . Inother words, R is not included in S ′ if we have sequenced route R ′ in S such that l ( R ′ ) ≤ l ( R ) and s ( R ′ ) ≤ s ( R ) . (cid:3) Lemma 5.2.
Let E( R ) be a set of super-routes of R starting fromthe same start point. For any route R ′ in E( R ) , the length and se-mantic scores l ( R ′ ) and s ( R ′ ) cannot be less than l ( R ) and s ( R ) ,respectively.proof: Let R ′ be a route included in E( R ) . Since we have D ( u i , u j ) ≥
0, the following property holds for a route R from Equation (1)of Definition 3.5. D ( v q , p R ′ [ ]) + Σ | R ′ |− i = D ( p R ′ [ i ] , p R ′ [ i + ]) = D ( v q , p R [ ]) + Σ | R |− i = D ( p R [ i ] , p R [ i + ]) + Σ | R ′ |− i = | R | D ( p R ′ [ i ] , p R ′ [ i + ])≥ D ( v q , p R [ ]) + Σ | R |− i = D ( p R [ i ] , p R [ i + ]) . Therefore, we have l ( R ) ≤ l ( R ′ ) . s ( R ) is the possible minimumsemantic score of R when it becomes a sequenced route. Thus,even if PoI vertices are added to R , we have s ( R ) ≤ s ( R ′ ) . As aresult, we have l ( R ) ≤ l ( R ′ ) and s ( R ) ≤ s ( R ′ ) . (cid:3) In terms of the branch-and-bound algorithm, Lemma 5.1 and5.2 give us the upper and lower bounds of the scores of a route,respectively. We can prune routes according to the followinglemma.
Lemma 5.3. ( pruning condition ) If (i) R is a sequenced routeincluded in the set S of sequenced routes and (ii) l ( R ) ≤ l ( R ′ ) and s ( R ) ≤ s ( R ′ ) , any routes in E( R ′ ) cannot be included in S .proof: If we have l ( R ) ≤ l ( R ′ ) and s ( R ) ≤ s ( R ′ ) , R ′ is notincluded in S (Lemma 5.1). From Lemma 5.2, the scores of R ′ cannot become less than l ( R ′ ) and s ( R ′ ) even if we expand R ′ .Therefore, any routes in E( R ′ ) cannot be included in S because R ′ is dominated by or equivalent to the sequenced route with l ( R ) and s ( R ) . (cid:3) Lemma 5.3 gives us the length score threshold for a route, and,if the length score of a route is greater than this threshold, wecan prune the given route. We define the length score thresholdof a route as follows:
Definition 5.4.
The threshold l ( R ) of the length score of route R is given by the following equation: l ( R ) = min R ′ ∈S { l ( R ′ )| s ( R ) ≥ s ( R ′ )} . (3)If l ( R ) ≤ l ( R ) , we can safely prune R because it cannot beincluded in the result. Thus, we can reduce the search spacewithout sacrificing the exactness of the result. Equation (3) has asmall computation cost because S includes only a small numberof sequenced routes as shown in Section 7. We search the nextPoI vertices that semantically match the next PoI category us-ing the modified Dijkstra algorithm. The modified Dijkstra algo-rithm can prune unnecessary routes based on Lemma 5.3. More-over, based on the following lemma, it terminates unnecessarytraversal of the graph and avoids inserting unnecessary routes.
Lemma 5.5.
Let R = h p R [ ] , . . . , p R [ i ] , p R [ i + ] , p R [ i + ] . . . , p R [| R |]i be a route and p i : i + be a PoI vertex on a path between p R [ i ] and p R [ i + ] . Route R must be dominated by or equivalentto another route if we have sim ( c S [ i + ] , c p i : i + ) ≥ sim ( c S [ i + ] , c p R [ i + ] ) .proof: Let R ′ = h p R [ ] , . . . , p R [ i ] , p i : i + , p R [ i + ] , . . . , p R [| R |]i be a route such that the difference between R and R ′ is only in p i : i + and p R [ i + ] . Since the PoI vertex p i : i + is on the pathbetween p R [ i ] and p R [ i + ] , we have l ( R ) ≥ l ( R ′ ) based on tri-angle inequality (i.e., D ( p i : i + , p R [ i + ]) + D ( p R [ i + ] , p R [ i + ]) ≥ D ( p i : i + , p R [ i + ]) ). Moreover, if sim ( c S [ i + ] , c p i : i + ) ≥ sim ( c S [ i + ] , c p R [ i + ] ) , we have s ( R ) ≥ s ( R ′ ) . Therefore, R is dom-inated by or equivalent to R ′ because l ( R ) ≥ l ( R ′ ) and s ( R ) ≥ s ( R ′ ) . (cid:3) Lemma 5.5 gives us two properties for the SkySR query: (i)even if we find a PoI vertex that passes through another PoI ver-tex that has a better category similarity, we can ignore the PoIvertex, and (ii) if we find a PoI vertex that perfectly matches thegiven category, we do not need to traverse the graph throughthe PoI vertex. As a result, using Lemma 5.3 and 5.5, we can effi-ciently find the next PoI vertices.Algorithm 2 shows the pseudocode for the modified Dijkstraalgorithm, which is used to find PoI vertices that semanticallymatch c d from p d . In priority queue Q d for the modified Dijkstraalgorithm, the top vertex is the closest vertex to p d . The queueis initialized to p d (line 3). The closest vertex to p d is dequeuedfrom Q d (line 5). R t is a route expanded from R d , which is R d with fetched vertex u (line 7). If the length score of R t is greaterthan or equal to the threshold of R d , the modified Dijkstra al-gorithm terminates the process (Lemma 5.3) (line 8). We checkwhether (i) u semantically matches c d and (ii) u does not proceedthrough another PoI vertex whose category similarity is greaterthan or equal to that of u (line 9). If we satisfy the above condi-tions and the length score of R t is less than its threshold (line10), we insert R t into the priority queue or the set of sequencedroutes (lines 10–12). Otherwise, we skip the process to insert R t (Lemma 5.3 and 5.5). The neighbor vertices of u are inserted into Q d unless u perfectly matches c d (Lemma 5.5) (lines 13–17). In this section, we propose four optimization techniques for
BSSR .Section 5.3.1 explains an initial search for sequenced routes andproposes
NNinit . We then explain tightening the upper and thelower bounds in Section 5.3.2 and Section 5.3.3, respectively. Fur-thermore, in Section 5.3.4 we propose an on-the-fly caching tech-nique to reuse previous search results of the modified Dijkstraalgorithm.
We prune unnecessary routes efficientlyusing the branch-and-bound algorithm. However, we cannot cal-culate the threshold of R if there are no sequenced routes in S whose semantic scores are not greater than that of s ( R ) basedon Equation (3). Therefore, initially, we search for the sequencedroute whose semantic score is 0. However, the length score of thesequenced route can be large if its semantic score is 0. To tighten lgorithm 2: Modified Dijkstra algorithm to find the nextPoI vertices matching c d from p d procedure mDijkstra( R d , c d , p d , Q b , S ) dist [ u ] = inf for all u ∈ V ∪ P , dist [ p d ] = ; priority_queue Q d ← { p d } ; while Q d is not empty do u ← Q d .dequeue; if u is already visited then continue; R t ← R d ⊕ u ; if l ( R t ) ≥ l ( R d ) then break; if u ∈ P tcd and u is not through the PoI vertex whose categorysimilarity is higher than that of u then if l ( R t ) < l ( R t ) then if R t is a sequenced route then S .update( R t ); else Q b .enqueue( R t ); if u < P cd then for each u ′ for e ( u , u ′ ) ∈ E do if dist [ u ] + w ( u , u ′ ) < dist [ u ] then dist [ u ′ ] = dist [ u ] + w ( u , u ′ ) . w ; Q d .enqueue( u ′ ); end procedure the threshold, we also search for sequenced routes whose seman-tic scores are greater than 0 because the length scores of themare less than that of the sequenced route with a semantic scoreof 0. We initially find several sequenced routes to tighten theupper bound.We propose NNinit , which searches for several sequencedroutes efficiently.
NNinit performs a nearest neighbor search re-peatedly to find PoI vertices that perfectly match the given cate-gories. With this process, we can find a sequenced route whosesemantic score is 0. Moreover,
NNinit can find the PoI vertexthat semantically matches the given category during the near-est neighbor search. When we find the last visited PoI vertex,we may find PoI vertices that semantically match the last cate-gory in S q . Therefore, we can obtain sequenced routes whosesemantic scores are greater than 0 and length scores are small.As a result, NNinit can find several sequenced routes withoutincurring additional cost, and one of the sequenced routes has asemantic score of 0.We present the pseudocode for
NNinit in Algorithm 3. Here,priority queue Q is initialized to start point v q (line 3). NNinit re-peats the Dijkstra algorithm | S q | times to find sequenced routes(line 4). The Dijkstra algorithm is executed to search for theclosest PoI vertex that perfectly matches c S q [ i ] from the initialvertex (the first initial vertex is v q ) (lines 5–19). Here, the clos-est vertex to the initial vertex is dequeued from Q (line 7). Ifthe algorithm finds a PoI vertex that perfectly matches c S q [ i ] ,this vertex is added to R and Q is initialized to the PoI vertex(lines 12–15). When it finds the last PoI vertex that semanticallymatches c S q [| S q |] , it inserts the sequenced route into S (lines9–11). Finally, we obtain a set of sequenced routes, and one ofthe sequenced routes in S has a semantic score of 0. Example 5.6.
We show an example of
NNinit using Example1.1, which searches an Asian restaurant, an A&E place, and a giftshop in this order from start point v q . NNinit executes the Dijk-stra algorithm three times because the size of category sequenceis three. First,
NNinit searches PoI vertices that perfectly matchAsian restaurant from v q . Then, it finds p that is the closest PoIthat perfectly match Asian restaurant to v q . Next, it searchesthe closest PoI vertex that perfectly matches A&E to p and thenfinds p . From the next search, NNinit inserts sequenced routes
Algorithm 3:
Initial search for finding sequenced routeswith a small cost procedure NNinit( v q , S q ) S ← ϕ , R ← ϕ ; priority_queue Q ← { v q } ; /* execute Dijkstra algorithm | S q | times */ for i : 1 to | S q | do dist [ u ] = inf for all u ∈ V ∪ P , dist [ Q . top ] = while Q is not empty do u ← Q .dequeue; if u is already visited then continue; if i = | S q | and u ∈ P tcSq [ i ] then R ′ ← R ⊕ u ; S .update( R ′ ); if u ∈ P cSq [ i ] then R ← R ⊕ u ; Q ← { u } ; break; for each u ′ for e ( u , u ′ ) ∈ E do if dist [ u ] + w ( u , u ′ ) < dist [ u ′ ] then dist [ u ′ ] = dist [ u ] + w ( u , u ′ ) ; Q .enqueue( u ′ ); return S ; end procedure to S when it finds PoI vertices that semantically match gift shop. NNinit finds p whose category is Shop&Service (i.e., semanti-cally match) and thus inserts h p , p , p i to S . After finding p ,it finds p that perfectly matches gift shop and inserts h p , p , p i to S . Finally NNinit returns S including {h p , p , p i , h p , p , p i} .The length score of h p , p , p i is 12, which is less than the lengthscore of h p , p , p i of 15. We use the upper bound to prune unnecessary routes.The upper bound is computed from the obtained sequenced routes.To tighten the upper bound, it is important to efficiently findsequenced routes that have small length and semantic scores.
BSSR extends a route at the top of the priority queue to searchfor a sequenced route, as shown in Algorithm 1. Note that pri-ority queues in existing algorithms conventionally consider onlydistances (i.e., a distance-based priority queue). If we use a distance-based priority queue,
BSSR preferentially extends a route with asmall length score. Although we must increase the size of a routeto | S q | to find a sequenced route, a route that has a small lengthscore likely has a small size. Therefore, it is difficult to searchfor sequenced routes efficiently using a distance-based priorityqueue.To search for sequenced routes efficiently, we preferentiallyextend a route that has a large size. Here, since many routes inthe priority queue could have the same size, we must consideran additional priority, which is expected to affect performance.If multiple routes in the priority queue are the same size, we pref-erentially extend the route with the smallest semantic score. Wecan reduce the search space by searching for sequenced routesin ascending order of semantic score. Moreover, if routes are thesame size and have the same semantic score, we preferentiallyextend the route with the smallest length score. As a result, wecan efficiently obtain sequenced routes with small length andsemantic scores. As described in Section 5.2.1, we use the length scores of routesas the lower bound, i.e., we prune a route if the length scoref the route is not less than the threshold. Note that the lengthscore of the route increases as the route size increases. This in-dicates that it is difficult to prune routes before the route sizeincreases. Our approach to tighten the lower bound of the routeis to estimate the increase of the length score. However, if wecarelessly estimate a future length score, we may sacrifice theexactness of th result.The basic idea of this estimation is to calculate the possibleminimum distance . Here, we compute the smallest distance amongany pair of PoI vertices in sets of PoI vertices. We use the follow-ing two minimum distances, semantic-match minimum distance l s and perfect-match minimum distance l p : Definition 5.7. ( minimum distance ) The semantic-match min-imum distance l s and perfect-match minimum distance l p aregiven by the following equations: l s ( R ) = Σ | S q |− i = | R | l s [ i ] , where l s [ i ] = min p i ∈ P ti , p i + ∈ P ti + D ( p i , p i + ) . (4) l p ( R ) = Σ | S q |− i = | R | l p [ i ] , where l p [ i ] = min p i ∈ P ti , p i + ∈ P ci + D ( p i , p i + ) . (5)In Equations (4) and (5), P t i and P c i denote the set of PoI ver-tices associated with a category tree of c S q [ i ] and the set of PoIvertices whose category is c S q [ i ] , respectively.We compute the semantic-match minimum distance based onthe distance to the PoI vertices that semantically match the nextcategory. We can safely add the semantic-match minimum dis-tance to the current length score without restriction. However,the semantic-match minimum distance is much less than thethreshold. Thus, it could be difficult to improve pruning perfor-mance; thus, we use the perfect-match minimum distance to in-crease pruning performance. The perfect-match minimum dis-tance is computed based on the distance to the PoI vertices thatperfectly match the next category. We can improve pruning per-formance using the perfect-match minimum distance comparedto the semantic-match minimum distance because the perfect-match minimum distance is much greater than the semantic-match minimum distance; therefore, the perfect-match minimumdistance tightens the lower bound more than the semantic-matchminimum distance. However, we can use the perfect-match mini-mum distance only in a special case, i.e., where a route must passonly PoIs that perfectly match the given categories so as notto be dominated. The perfect-match minimum distance workswell if the number of sequenced route in S is large because theconstraint is usually satisfied by increasing the number of se-quenced route in S . Lemma 5.8.
Let R ′ and R ′′ be sequenced routes in S and R bea route such that (i) l ( R ) ≥ l ( R ′ ) and s ( R ) < s ( R ′ ) and (ii) l ( R ) < l ( R ′′ ) and s ( R ) ≥ s ( R ′′ ) . Let δ be the minimum increment of asemantic score . We can prune R if we have (a) l ( R ) ≥ l ( R ′ ) and s ( R ) + δ ≥ s ( R ′ ) and (b) l ( R ) + l p ( R ) ≥ l ( R ′′ ) and s ( R ) ≥ s ( R ′′ ) .proof: First, we consider case (a). If we have l ( R ) ≥ l ( R ′ ) and s ( R ) + δ ≥ s ( R ′ ) , R is dominated by or equivalent to R ′ if itssemantic score increases. Therefore, R must only pass throughPoI vertices that perfectly match the given categories not to bedominated. If R passes through only PoI vertices that perfectlymatch the given categories, the length score of R increases by The least increase of the semantic score is computed from the category tree.Specifically, we can compute the least increase from the category that is most sim-ilar (but not equal) to the next category. at least l p ( R ) . For case (b), if we have l ( R ) + l p ( R ) ≥ l ( R ′′ ) and s ( R ) ≥ s ( R ′′ ) , R is dominated by or equivalent to R ′′ if its lengthscore increases by l p ( R ) . As a result, if we have two routes R ′ and R ′′ , such as (i) l ( R ) ≥ l ( R ′ ) and s ( R ) + δ ≥ s ( R ′ ) and (ii) l ( R ) + l p ( R ) ≥ l ( R ′′ ) and s ( R ) ≥ s ( R ′′ ) , R is dominated by orequivalent to at least one of R ′ and R ′′ . (cid:3) To compute the estimation of the lower bound, we computetwo types of possible minimum distances l s and l p . A naive ap-proach computes all minimum distances from the PoI verticesthat semantically match c S q [ i ] to c S q [ i + ] for 1 ≤ i ≤ | S q | − multi-source multi-destination Dijkstra algorithm . In this algo-rithm, all start points are inserted into the same priority queue.Then, the algorithm dequeues vertices in the same manner as theconventional Dijkstra algorithm. Here, the process is terminatedif the top of the priority queue becomes one of the destinations.This approach only needs | S q | − Lemma 5.9.
The multi-source multi-destination Dijkstra algo-rithm guarantees the minimum distance from the start points tothe destinations.proof:
We first insert multiple start points into the priorityqueue, and their distances from the start points are initialized as0. If we find a vertex, it is inserted into the queue and the distanceto the vertex is updated from the closest start point to the vertex.The vertex with the smallest distance from the start point in thepriority queue is dequeued from the priority queue. If the top ver-tex in the priority queue is one of the destinations, there are nodestinations with smaller distance than the top one. Therefore,we can guarantee the minimum distance from the start points tothe destinations. (cid:3)
Algorithm 4 shows the pseudocode to compute the semantic-match minimum distance. The estimation of the lower bound isexecuted after line 4 in Algorithm 1. Here, we initialize P i and P i + (lines 3–4). l ( ϕ ) denotes the threshold for a route whose se-mantic score is 0. The difference between computing the semantic-match and perfect-match minimum distances is whether the PoIvertices in P i + semantically or perfectly match the given cate-gory. Example 5.10.
We show an example to compute the semantic-match minimum distance using Example 1.1. P , P , and P in-clude { p , p , p , p , p } , { p , p , p } , and { p , p , p , p , p } , re-spectively. First, PoI vertices in P are inserted to priority queue Q , and the set of destinations is P . By processing the Dijkstraalgorithm, we compute possible minimum distance l s [ ] = p to p ). Next, we search PoI vertices that semanticallymatch A&E to gift shop. Then, we compute l s [ ] = p to p ). Finally, we obtain semantic-match minimum distance l s = { , } . We can compute the perfect-match minimum dis-tance in the same way and obtain l p = { , } , which is greaterthan l s . Although
BSSR efficiently prunes unnecessary routes, itmay iteratively execute the modified Dijkstra algorithm at thesame vertex because, in Algorithm 1 (line 8), p d could be the lgorithm 4: Computing possible minimum distance procedure EstimationLowerbound( v q , S q ) for i : 1 to | S q | − do P i ← { p | p ∈ P tcSq [ i ] and D ( v q , p ) < l ( ϕ )} ; P i + ← { p | p ∈ P tcSq [ i + ] and D ( v q , p ) < l ( ϕ )} ; dist [ u ] = inf for all u ∈ V ∪ P , dist [ p ] = for all p ∈ P i ; priority_queue Q ← { p } ∈ P i ; while Q is not empty do u ← Q .dequeue; if u is already visited then continue; if u ∈ P i + then l s [ i ] = dist [ u ] ; break; for each u ′ for e ( u , u ′ ) ∈ E do if dist [ u ] + w ( u , u ′ ) < dist [ u ′ ] then dist [ u ′ ] = dist [ u ] + w ( u , u ′ ) ; Q .enqueue( u ′ ); return l s ; end procedure same as the former executions of the modified Dijkstra algo-rithms. Thus, we reuse the result starting at the same PoI vertexby materializing the result of the modified Dijkstra algorithm(i.e., keeping PoI vertices matching c d and distances from p d tothe PoI vertices), which we refer to as on-the-fly caching .After finding SkySRs, on-the-fly caching frees the results ofthe modified Dijkstra algorithms (this is why we call it on-the-fly ), because the search space rarely overlaps across different in-puts (i.e., S q and v q differ). In this section, we theoretically analyze the cost and correctnessof the proposed
BSSR . Theorem 1. (Time complexity)
Let γ be a ratio of pruningand α be a ratio of the size of a graph to find the SkySRs. The timecomplexity of BSSR is O ( γ ( α | P |) | S q | α (| E | + (| V | + | P |) log ( α (| V | + | P |)))) .proof: The time complexity of the Dijkstra algorithm is O (| E | + | V | log | V |) if the number of vertices is | V | . In our setting, wehave | V | + | P | vertices because we have two types of vertices.In addition, we do not need to search the whole graph by reduc-ing the graph size according to the threshold. Therefore, the timecomplexity of the modified Dijkstra algorithm is O ( α (| E | + (| V | + | P |) log ( α (| V | + | P |))) . The time complexity of BSSR depends onthe number of times the modified Dijkstra algorithms is exe-cuted. The number of modified Dijkstra algorithms is equal toall the potential routes | P | | S q | . Recall that we can prune the num-ber of routes using the branch-and-bound algorithm. Finally, thetime complexity of BSSR is O ( γ ( α | P |) | S q | α (| E | + (| V | + | P |) log ( α (| V | + | P |)))) . (cid:3) In our approach, γ and α depend on the upper and lowerbounds. These are affected by the graph structure, the categorytrees, and the ratio of PoI vertices, and the time complexity of BSSR depends on these factors.
Theorem 2. (Space complexity)
Let γ be the pruning ratio,and α be the ratio of the size of the graph to find the SkySRs. Thespace complexity of BSSR is O (| E | + | V | + | P | + γ | S q |( α | P |) | S q | ) . proof: We store the whole graph of size O (| E | + | V | + | P |) . Wealso store routes into the priority queue and S , and the maxi-mum number of routes is | P | | S q | . We can prune the number ofroutes using the branch-and-bound algorithm. The size of theroutes is proportional to | S q | . Therefore, the space complexityof BSSR is O (| E | + | V | + | P | + γ | S q |( α | P |) | S q | ) . (cid:3) If the number of routes in the priority queue is small, thegraph size becomes the main factor related to the memory us-age. Otherwise, the number of routes in the priority queue isthe main factor.
Theorem 3. (Correctness)
BSSR guarantees the exact result.proof:
BSSR prunes routes based on the upper and lower bounds.
BSSR safely prunes routes dominated by or equivalent to the ob-tained sequenced routes. As a result,
BSSR does not sacrifice theexactness of the search result. (cid:3)
We demonstrate
BSSR with optimization techniques using Ex-ample 1.1. Table 4 shows routes in priority queue Q b and se-quenced routes in S . To compute category similarity and seman-tic score, we use Equations (6) and (7), respectively.First, we process NNinit , and S initially includes {h p , p , p i , h p , p , p i} . 1st step: BSSR starts to find PoI vertices that seman-tically match Asian restaurant from v q with the threshold of 15.Then, it finds p , p , p , p , and p . Both p ’s and p ’s cat-egory similarities are 1, and their lengths are 6 and 8, respec-tively. Thus, p comes the top in Q b . 2nd step: BSSR searchesPoI vertices that semantically match Arts&Entertainment from p , and finds p . Since h p , p i passes through p and l (h p , p i) is more than 15, both routes are not inserted to Q b . 3rd step:as the top route is h p , p i , BSSR searches PoI vertices that se-mantically match gift shop from p . BSSR does not find anyroutes due to the threshold. 4th step:
BSSR fetches h p i from Q b and inserts two routes h p , p i and h p , p i to Q b . 5th step: BSSR fetches h p , p i and finds sequenced route h p , p , p i .Since h p , p , p i dominates h p , p , p i , h p , p , p i is deletedfrom S . 6th step: The top route h p , p i is deleted from Q b be-cause its length score is not smaller than the threshold of 13. 7thstep: BSSR fetches h p i and inserts h p , p i and h p , p i . 8th step: BSSR fetches h p , p i and finds a sequenced route h p , p , p i . h p , p , p i is inserted to S , and h p , p , p i is deleted from S .9th step: h p , p i is deleted due to the threshold. 10th step: BSSR fetches h p i and finds a route h p , p i . 11th step: BSSR finds a se-quenced route h p , p , p i , and the route dominates h p , p , p i .12th step: The distance from p to the PoI vertices that matchA&E is larger than the threshold. Finally, BSSR returns the setof SkySRs S . The SkySR query has a number of variations and extensions. Wediscuss some of these in the following.
Directed graphs:
The SkySR query can be easily applied to di-rected graphs. We only need to use the Dijkstra algorithm fordirected graphs. Here, no modification of the main idea is re-quired.
PoI with multiple categories:
To treat PoIs with multiple cat-egories, we can change the definitions of sequenced routes and able 4: Example of BSSR algorithm Q b : S : h p , p , p i , h p , p , p i Q b : h p i , h p i , h p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p , p i , h p i , h p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p i , h p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p , p i , h p , p i , h p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p , p i , h p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p , p i , h p , p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p , p i , h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p i , h p iS : h p , p , p i , h p , p , p i Q b : h p , p i , h p iS : h p , p , p i , h p , p , p i Q b : h p iS : h p , p , p i , h p , p , p i Q b : S : h p , p , p i , h p , p , p i category similarity. Specifically, we change condition (ii) in Def-inition 3.4 to state that at least one c p i [ j ] ( ≤ j ≤ k i ) semanti-cally matches c S [ i ] for 1 ≤ i ≤ | S | , where c p i [ j ] is the j -th cat-egory of p i and k i is the number of categories associated with p i . The category similarity is either the highest or the averagevalue among the category similarities. Complex category requirement:
We can specify more detailedcategory requirements, such as conjunction , disjunction , and nega-tion . For example, we can specify that a PoI category is “Amer-ican restaurant” or “Mexican restaurant” (disjunction), but not“Taco Place” (negation). If PoI vertices are associated with morethan two categories, we can specify a conjunction such as “Cafe”and “Bakery”. Note that the time complexity of our algorithmdoes not change if we specify a detailed requirement becausethe detailed requirements are equivalent to increasing the num-ber of categories. Skyline trip planning query:
The proposed algorithm can beapplied to the trip planning query without category order. Forsearching routes without category order, the proposed algorithmsearches PoI vertices that semantically match a category in agiven set of categories. Then, if the algorithm finds PoI vertices,it deletes the categories that are already included in the routesto find next PoI vertices. Note that we need to modify some def-inition and scoring functions for routes without category order.By this procedure, we can find skyline routes efficiently.
SkySR with destination:
Note that we can specify the destina-tion. The simple way to calculate a SkySR with a destination isto add the distance from the last visited PoI vertex to the desti-nation to the length score after finding the sequenced route. Toimprove efficiency, we traverse PoI vertices from both the desti-nation and the start point.
We perform experiments to evaluate the effectiveness of the pro-posed algorithm. All algorithms are implemented in C++ andrun on an Intel(R) Xeon(R) CPU E5620 @ 2.40GHz with 32 GBof RAM.
Table 5: Summery of dataset
Dataset Area | V | | P | | E | Tokyo Tokyo 401,893 174,421 499,397NYC New York city 1,150,744 451,051 1,722,350Cal California 21,048 87,365 108,863
Algorithm.
We compare the proposed
BSSR and algorithms thatiteratively find OSRs using the Dijkstra-based solution and thePNE approach (denoted
Dij and
PNE , respectively), as describedin Section 3. We evaluate performance with respect to (i) re-sponse time, and (ii) maximum resident set size (RSS) to repre-sent memory usage.
Dataset.
We conduct experiments using various maps (Tokyo,New York city, and California). Table 5 summarizes each dataset.For the Tokyo and NYC datasets, the road network is extractedfrom OpenStreetMap and the PoI information is extracted fromFoursquare. Each PoI is embedded on the closest edge in thesame way as [10] and is associated with the Foursquare categorytrees. Note that the number of category trees in Foursquare is 10.For the Cal dataset, the road network and PoI information areavailable online . The number of categories in the Cal datasetis 63 . For each dataset, we use distances based on longitudeand latitude as edge weights and treat the graphs as undirectedgraphs. The graphs are implemented using adjacency lists.For each dataset, we generate 100 searches, in which the sizeof a sequence is | S q | . The start points are selected randomly fromvertices in the maps. The categories of sequences are selectedrandomly from the leaf nodes in the category trees with the con-straint that they have different category trees. Since the numberof PoI vertices associated with each category is significantly bi-ased, we select only categories that have a large number of PoIvertices.Here, category similarity is calculated based on the Wu andPalmer similarity measure [19] and the semantic score is calcu-lated as the product of the category similarities of the sequencemembers. Specifically, we calculate the category similarity andsemantic score using the following equations: sim ( c , c ′ ) = max c i ∈ a ( c ′ ) · d ( c m ) d ( c ) + d ( c ′ ) , (6) s ( R ) = − Π min (| R | , | S q |) i = sim ( c p R [ i ] , c S q [ i ]) , (7)where a ( c ) , d ( c ) , and c m denote the set of ancestor categories of c (including c ), the depth of c , and the deepest common ancestorcategory of c and c i , respectively. First, we present an overview of the performance of all algo-rithms. Figure 3 shows the response time with various categorysequence sizes, and Table 6 shows the RSS for a category se-quence of size four. Here, “
BSSR w/o Opt ” denotes
BSSR withoutoptimization techniques. In Figure 3, there are missing bars forthe case of size of sequence 5, because the executions were notfinished after a month.
BSSR achieves the least response time with all datasets andreduces the search space by exploiting the branch-and-bound ∼ lifeifei/SpatialDataset.htm Since the PoIs in the Cal dataset have no category tree information, we generatea category of height three where a non-leaf node has three child nodes. -2 -1 R e s pon s e t i m e [ s e c ] Size of sequence |S q |BSSRBSSR w/o OptPNEDij (a) Tokyo -2 -1 R e s pon s e t i m e [ s e c ] Size of sequence |S q |BSSRBSSR w/o OptPNEDij (b) NYC -2 -1 R e s pon s e t i m e [ s e c ] Size of sequence |S q |BSSRBSSR w/o OptPNEDij (c) Cal Figure 3: Results obtained for the datasets with various | S q | Table 6: RSS Comparison
BSSR BSSR w/o Opt PNE Dij
Tokyo 239.6 MB 497.5 MB 239.8 MB 4.8 GBNYC 658.0 MB 659.4 MB 658.7 MB 9.7 GBCal 36.7 MB 53.7 MB 36.6 MB 70.3 MB
Table 7: Effect of initial search for various | S q | Dataset Approach Metrics 2 3 4 5Tokyo Proposed Weight sum 0.009 0.013 0.017 0.021Response time [msec] 3.5 5.1 6.9 8.6 of routes 1.49 1.33 1.36 1.49Ratio 0.74 0.79 0.82 0.86Existing Weight sum 0.32 (regardless | S q | )NYC Proposed Weight sum 0.044 0.066 0.073 0.078Response time [msec] 10.7 16.5 19.5 24.1 of routes 1.76 1.79 1.81 1.82Ratio 0.67 0.81 0.85 0.83Existing Weight sum 1.31 (regardless | S q | )Cal Proposed Weight sum 0.79 1.28 1.57 1.85Response time [msec] 1.4 2.3 2.9 3.9 of routes 2.27 2.37 2.28 2.25Ratio 0.70 0.79 0.85 0.86Existing Weight sum 12.14 (regardless | S q | ) algorithm and the proposed optimization techniques. By com-paring BSSR and
BSSR w/o Opt , we confirm that the optimiza-tion techniques increase efficiency. When the size of the cate-gory sequence is small,
PNE finds SkySRs efficiently because itcan search for sequenced routes efficiently if the category se-quence size is small. On the other hand, as category sequencesize increases, the response time of
PNE and
Dij increases sig-nificantly. If the category sequence size is large,
BSSR achievesbetter performance than
PNE and
Dij even if we do not use opti-mization techniques. By comparing
Dij to PNE , it can be seenthat their performance depends on the datasets and the cate-gory sequence size. Although the PNE approach was proposedto be a more sophisticated algorithm than the Dijkstra-based so-lution [16],
PNE requires more time than
Dij for the NYC and Caldatasets, which implies that it is not effectively robust to datasets.In terms of RSS,
BSSR and
PNE achieve nearly the same perfor-mance. These two algorithms do not store many routes in thepriority queue; therefore, RSS is highly dependent on the graphsize. On the other hand, as
Dij stores many routes in the pri-ority queue, RSS is significantly larger than those of the otheralgorithms. Although we do not show the routes returned byeach algorithm due to space limitations, all algorithms outputthe same routes. As a result,
BSSR achieves the fastest responsetime with small memory usage without sacrificing the exactnessof the result.
The optimization techniques improve the efficiency of
BSSR . Here,we evaluate each optimization technique.
Initial Search:
We show the search spaces with and with-out an initial search for the first modified Dijkstra algorithmto evaluate the effect of the initial search. Moreover, we eval-uate
NNinit in terms of response time. Table 7 shows the weightsum, which represents the search space, the response time of
NNinit , and the number of sequenced routes found by
NNinit for various category sequence sizes. In addition, we show theratio of the length score of the sequenced route with the largestsemantic score among the sequenced routes found in the initialsearch to the length score of the sequenced route whose seman-tic score is 0 in the initial search. The weight sum with the ini-tial search is significantly smaller than that without the initialsearch. We can avoid traversing the whole graph using the ini-tial search; thus, this can significantly reduce the search spaceof
BSSR . Moreover, since the response time of
NNinit is signifi-cantly less than that of
BSSR (Figure 3), we confirm that
NNinit can reduce the search space efficiently. Note that the numberof sequenced routes found by the initial search is not large. Onthe other hand, the length score of the sequenced route withthe largest semantic score is much smaller than that of the se-quenced route whose semantic score is 0. As a result,
NNinit reduces the search space significantly without increasing totalresponse time.
Tightening Upper Bound:
The priority queue aims at effi-ciently tightening the upper bound to reduce the search space.Here, we show the total number of vertices visited by
BSSR ,which is highly related to the response time. Table 8 shows thetotal number of vertices visited by the proposed priority queueand distance-based priority queue for various category sequencesizes. The number of vertices visited by the proposed priorityqueue is less than that of the distance-based priority queue. Inparticular, as the size of the category sequences increases, theperformance gap increases because, as the category sequencesize increases, the distance-based priority queue cannot find se-quenced routes efficiently. Thus, the upper bound is rarely up-dated. On the other hand, the proposed priority queue can up-date the upper bound efficiently because the route with the largestsize is dequeued preferentially. Thus, the proposed priority queueis more suitable than the distance-based approach for findingSkySRs.
Tightening Lower Bound:
To tighten the lower bound, wepropose two types of possible minimum distances, i.e., semantic-match and perfect-match minimum distances. If the minimum R a t i o o f w e i gh t s u m Semantic-matchPerfect-match
Figure 4: Effect of minimumpossible distances o f D ij ks t r a Size of sequence |S q |with cachew/o cache (a) Tokyo o f D ij ks t r a Size of sequence |S q |with cachew/o cache (b) NYC o f D ij ks t r a Size of sequence |S q |with cachew/o cache (c) Cal Figure 5: Effect of on-the-fly caching for various | S q | Table 8: Effect of priority queue for various | S q | Dataset Approach 2 3 4 5Tokyo
Proposed
Distance-based
NYC
Proposed
Distance-based
Cal
Proposed
Distance-based o f S ky S R s Size of sequence |S q |TokyoNYCCal Figure 6: Number of SkySRs for various | S q | possible distance is large, we can prune routes even if the routesinclude a small number of PoI vertices. Figure 4 shows the ra-tios of the possible minimum distances to the sum weights ofthe initial search when we set the category sequence size to five.The semantic-match and perfect-match minimum distances inthe Tokyo dataset effectively reduce the search space by tighten-ing the lower bound. However, different from the Tokyo dataset,the possible minimum distances in the NYC and Cal datasets aresmall. Since the PoI vertices in the two datasets are relativelyconcentrated in a small area, the possible minimum distances be-come small. The effect of the possible minimum distances highlydepends on the skews of locations of the PoI vertices. On-the-fly Caching:
On-the-fly caching can reuse the re-sults of former modified Dijkstra algorithm executions; thus, thenumber of executions of the Dijkstra algorithm decreases. Figure5 shows the numbers of executions of modified Dijkstra algo-rithms by
BSSR with all optimization techniques and those ex-cept for on-the-fly caching. The number of executions of the Di-jkstra algorithms decreases using on-the-fly caching. In particu-lar, when the category sequence size increases, the performancegap increases because, as the category sequence size increases,we have more opportunities to reuse former results. Thus, weconfirm that on-the-fly caching is effective to reduce the num-ber of executions of the Dijkstra algorithms.
Figure 6 shows the number of SkySRs obtained with each datasetfor various | S q | . As shown, the Cal dataset returns the largest Table 9: Example SkySRs in Tokyo
Distance Sequenced route7451 meters Beer Garden → Sushi Restaurant → Sake Bar1295 meters Bar → Sushi Restaurant → Sake Bar
013 24
Sushirestaurant BarSake Bar Sake Bar BeerGardenSushirestaurant
Second route First route
Start pointDestination
Figure 7: Visualization of routes in Tokyo: black circles(with 0 and 4) denote a start point and a destination, re-spectively. Blue and red circles denote sequences of PoIsfor the first and second routes in Table 9, respectively, andtheir numbers indicate the order of PoIs to be visited. number of SkySRs. The response time and RSS obtained withthe Tokyo and NYC datasets are much greater than the those ofthe Cal dataset, which implies that the number of SkySRs doesnot affect response time and RSS significantly. Moreover, if weuse a complete real-world dataset, we may not require a rankingfunction because the number of SkySRs would be small.
We show an example of SkySRs in Tokyo. We assume that weplan to go to places for dinner and drinks. We want to visit a“Beer garden”, a “Sushi restaurant”, and a “Sake bar” from ourcurrent location and finally go to our hotel. Table 9 and Figure 7show two representative SkySRs from the four identified SkySRs.Note that the other two routes are similar to either of the rep-resentative routes. In the Foursquare category trees, “Bar” in-cludes “Beer Garden” and “Sake bar”, and “Japanese restaurant”includes “Sushi restaurant”. Thus, we find routes using “Bar”and/or “Japanese restaurant”. The second route is much shorterthan the first route that perfectly matches the user requirement,and the difference between them is only whether they pass a“Bar” or “Beer garden”. The best route depends on the users andsituations (e.g., weather); thus, we confirm that SkySRs are use-ful to help users make decisions. igure 8: Screenshot of the prototype system R a t i o o f an s w e r s Figure 9: Ratios of answers for each question
We developed a prototype SkySR query service using Open-StreetMap and the Santander Open Data platform from Santander,Spain . Figure 8 shows a screenshot of the prototype system,which outputs one of the SkySR route. We performed a test inJuly, 2017. To gather users for this test, the Santander municipal-ity arranged meetings with different groups of people to presentthe service: municipal staff (computing, convention and tourismmunicipal services), students from vocational training depart-ments who are developing webpages and apps, and citizens. Wealso provided a leaflet that shows the concept of the SkySR queryand how to use the service. In this test, users freely used the ser-vice and answered a questionnaire (25 respondents). The ques-tionnaire included the following three questions. Q1 What do you think about this service?
Answer.
1. I love it, 2. I like it, 3. I do not like it. Q2 Would you recommend it to anyone?
Answer.
1. Yes, 2. Maybe, 3. No. Q3 Do you think that it is a good idea for the city: citizens,tourists, commercial sectors?
Answer.
1. Yes, 2. Maybe, 3. No.We summarize the ratios of answers for each question in Fig-ure 9. As shown, more than 80% of the users liked the service. Inaddition, the questionnaire shows that the service is valuable forthe city. From the user experiment, we confirm that the SkySRquery is useful for users and cities.
In this paper, we have first introduced a semantic hierarchy fortrip planning. We then proposed the skyline sequenced route https://ss.festival.ckp.jp/OuRouteSuggestion/dispSearchRoute/index. The defaultlanguage is Spanish. http://datos.santander.es (SkySR) query, which finds all preferred routes from a start pointaccording to a user’s PoI requirements. In addition, we haveproposed an efficient algorithm for the SkySR query, i.e., BSSR ,which simultaneously searches for all SkySRs by a single traver-sal of a given graph. To optimize the performance of
BSSR , weproposed four optimization techniques. We evaluated the pro-posed approach using real-world datasets and demonstrated thatit comprehensively outperforms naive approaches in terms of re-sponse time without increasing memory usage or sacrificing theexactness of the result. Moreover, we developed a SkySR queryservice using open data, and conducted a user test, which con-firmed that SkySR queries are useful for both users and cities.In future work, we would like to extend the proposed ap-proach in several directions. First, because we assume a foreststructure for the category classification in this paper, a morecomplex classification may provide better granularity. Second,because we have not used any preprocessing techniques such asindexing, we plan to propose a suitable preprocessing methodfor the SkySR query. Finally, although the SkySR query proposedin this paper considers two scores (length and category similar-ity), it could be extended to consider many attributes of a PoI(e.g., text, keywords, and ratings) and the cost/quality of a graph(e.g., route popularity, tolls, and the number of traffic lights).
ACKNOWLEDGEMENT
This research is partially supported by the Grant-in-Aid for Sci-entific Research (A)(JP16H01722) and Grant-in-Aid for YoungScientists (B)(JP15K21069).
REFERENCES [1] Saad Aljubayrin, Zhen He, and Rui Zhang. 2015. Skyline Trips of MultiplePOIs Categories. In
DASFAA . 189–206.[2] S Börzsöny, Donald Kossmann, and Konrad Stocker. 2001. The Skyline Oper-ator. In
ICDE . 421–430.[3] Haiquan Chen, Wei-Shinn Ku, Min-Te Sun, and Roger Zimmermann. 2008.The Multi-rule Partial Sequenced Route Query. In
ACM SIGSPATIAL GIS . 1–10.[4] Jian Dai, Chengfei Liu, Jiajie Xu, and Zhiming Ding. 2016. On Personalizedand Sequenced Route Planning.
World Wide Web
19, 4 (2016), 679–705.[5] Jochen Eisner and Stefan Funke. 2012. Sequenced route queries: Gettingthings done on the way back home. In
ACM SIGSPATIAL . 502–505.[6] Pierre Hansen. 1980. Bicriterion path problems. In
Multiple criteria decisionmaking theory and application . 109–127.[7] Xuegang Huang and Christian S Jensen. 2005. In-route skyline querying forlocation-based services. In
W2GIS . 120–135.[8] H-P Kriegel, Matthias Renz, and Matthias Schubert. 2010. Route SkylineQueries: A Multi-preference Path Planning Approach. In
ICDE . 261–272.[9] Eugene L Lawler and David E Wood. 1966. Branch-and-bound Methods: ASurvey.
Operations research
14, 4 (1966), 699–719.[10] Feifei Li, Dihan Cheng, Marios Hadjieleftheriou, George Kollios, and Shang-Hua Teng. 2005. On Trip Planning Queries in Spatial Databases. In
SSTD .273–290.[11] Jing Li, Yin David Yang, and Nikos Mamoulis. 2013. Optimal Route Querieswith Arbitrary Order Constraints.
TKDE
25, 5 (2013), 1097–1110.[12] Xiaobin Ma, Shashi Shekhar, Hui Xiong, and Pusheng Zhang. 2006. Exploitinga Page-level Upper Bound for Multi-type Nearest Neighbor Queries. In
ACMGIS . 179–186.[13] Ernesto Queiros Vieira Martins. 1984. On a multicriteria shortest path prob-lem.
European Journal of Operational Research
16, 2 (1984), 236–245.[14] Yutaka Ohsawa, Htoo Htoo, Noboru Sonehara, and Masao Sakauchi. 2012.Sequenced Route Query in Road Network Distance based on Incremental Eu-clidean Restriction. In
DEXA . 484–491.[15] Philip Resnik. 1995. Using Information Content to Evaluate Semantic Simi-larity in a Taxonomy. In
IJCAI . 448–453.[16] Mehdi Sharifzadeh, Mohammad Kolahdouzan, and Cyrus Shahabi. 2008. TheOptimal Sequenced Route Query.
The VLDB Journal
17, 4 (2008), 765–787.[17] Michael Shekelyan, Gregor Jossé, and Matthias Schubert. 2015. Linear PathSkylines in Multicriteria Networks. In
ICDE . 459–470.[18] Yuan Tian, Ken CK Lee, and Wang-Chien Lee. 2009. Finding Skyline Paths inRoad Networks. In
ACM SIGSPATIAL GIS . 444–447.[19] Zhibiao Wu and Martha Palmer. 1994. Verbs Semantics and Lexical Selection.In
ACL . 133–138.20] Bin Yang, Chenjuan Guo, Christian S Jensen, Manohar Kaul, and Shuo Shang.2014. Stochastic Skyline Route Planning under Time-varying Uncertainty. In