[PDF] An Efficient Index Method for the Optimal Route Query over Multi-Cost Networks

Abstract

Smart city has been consider the wave of the future and the route recommendation in networks is a fundamental problem in it. Most existing approaches for the shortest route problem consider that there is only one kind of cost in networks. However, there always are several kinds of cost in networks and users prefer to select an optimal route under the global consideration of these kinds of cost. In this paper, we study the problem of finding the optimal route in the multi-cost networks. We prove this problem is NP-hard and the existing index techniques cannot be used to this problem. We propose a novel partition-based index with contour skyline techniques to find the optimal route. We propose a vertex-filtering algorithm to facilitate the query processing. We conduct extensive experiments on six real-life networks and the experimental results show that our method has an improvement in efficiency by an order of magnitude compared to the previous heuristic algorithms.

Full PDF

aa r X i v : . [ c s . D B ] A p r An Efﬁcient Index Method for the Optimal RouteQuery over Multi-Cost Networks

Yajun Yang , Hang Zhang , Hong Gao , Qinghua Hu , Xin Wang College of Intelligence and Computing, Tianjin University, Tianjin, China [email protected], [email protected], [email protected], [email protected] School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China [email protected]

Abstract —Smart city has been consider the wave of the futureand the route recommendation in networks is a fundamentalproblem in it. Most existing approaches for the shortest routeproblem consider that there is only one kind of cost in networks.However, there always are several kinds of cost in networksand users prefer to select an optimal route under the globalconsideration of these kinds of cost. In this paper, we studythe problem of ﬁnding the optimal route in the multi-costnetworks. We prove this problem is NP-hard and the existingindex techniques cannot be used to this problem. We proposea novel partition-based index with contour skyline techniques toﬁnd the optimal route. We propose a vertex-ﬁltering algorithm tofacilitate the query processing. We conduct extensive experimentson six real-life networks and the experimental results show thatour method has an improvement in efﬁciency by an order ofmagnitude compared to the previous heuristic algorithms.

Index Terms —optimal path, multi-cost networks, index I. INTRODUCTION W Ith the rapid developing of the information technology,smart technologies have been widely used to promotethe convenience for people’s life in the city. Smart city hasbeen attracting more and more attention from academic andindustrial community. The intelligent route recommendation isa fundamental problem in smart city. For example, in trafﬁcnetworks, the shortest route query is to ﬁnd a shortest pathbetween two locations. In social networks, the shortest routequery is to ﬁnd the closest relationships such as friendshipbetween two individuals.Most existing work about the shortest route problem assumethat there is only one kind of cost in the networks. However,the relationships among various entities are always investi-gated from several distinct aspects. For example, in trafﬁcnetworks, the routes between two cities are taken into accountwith several kinds of cost such as road length, toll fee, trafﬁccongestion and so on. It is inadvisable to choose a shortestpath only by one kind of cost because the total toll fee of aroute with the minimum length may be too expensive to acceptfor some users. It is important to ﬁnd an optimal route underglobal consideration with people’s preference.A network is called multi-cost network if every edge in ithas several kinds of cost. Obviously, the shortest route underone kind of cost may not be the optimal route for some usersin multi-cost networks. Score function is proposed by userand it can calculate an overall score based on all kinds of cost to measure the optimality for a route. Note that the scorefunctions given by distinct users may be different. Given ascore function f ( · ) , a starting vertex v s and an ending vertex v e , this paper is to ﬁnd a route from v s to v e with the minimumscore and such route is also called an optimal path from v s to v e under the score function f ( · ) in the following.The traditional shortest path problem can be solved bypolynomial algorithm e.g., Dijkstra algorithm, and variousindex techniques are proposed to improve the efﬁciency. How-ever, these index techniques cannot be used for the optimalpath in the multi-cost networks because the score functionsgiven by distinct users may be different. An index built fora score function f ( · ) cannot cope with the case of anotherscore function g ( · ) . In addition, we prove the optimal pathproblem is NP-hard in this paper if the score function is non-linear, e.g., f ( x, y ) = x + y , and then existing algorithmscannot work under such functions. As discussed in previousstudies about trafﬁc networks[10], [21], the non-linear scorefunctions are existent widely and reasonable in real-life. Forexample, in special conditions such as trafﬁc jam occurring,the traveling time and fuel consumption are nonlinear (e.g.,quadratic, convex and so on) function with the distance fromsource to destination[14].In this paper, we develop a novel partition-based index toﬁnd the optimal path in multi-cost networks under variouslinear or non-linear score functions. The main contributions aresummarized below. First, we study the problem of the optimalpath recommendation in multi-cost networks and prove it isNP-hard. Second, we propose a partition-based index andcontour skyline in the index. We prove the problem of com-puting contour skyline is NP-hard. We give a -approximatealgorithm and present that there is no (2 − ǫ ) -approximatesolution in polynomial time if P = N P . Third, we propose avertex-ﬁltering algorithm which can ﬁlter a large of proportionof vertices that cannot be passed through by the optimal path.Finally, we conﬁrm the effectiveness and efﬁciency of ouralgorithms using real-life datasets.The rest of this paper is organized as follows. SectionII gives the problem statement. Section III introduces thepartition-based index and how to construct it. Section IVproposes a vertex-ﬁltering algorithm and discusses how toﬁnd the optimal path by partition-based index. We conductexperiments using six real-life datasets in Section V. Theexperimental results conﬁrm the effectiveness and efﬁciency of our approach. Section VI discusses the related works. Weconclude this paper in section VII.II. P

ROBLEM S TATEMENT

A. Multi-cost Networks and the Optimal Path

Deﬁnition 2.1: ( multi-cost network ) A multi-cost network isa simple directed graph, denoted as G = ( V, E, W ) , where V and E are the sets of vertices and edges respectively. W is a setof vectors. Every edge e ∈ E is represented by e = ( v i , v j ) , v i , v j ∈ V , and w ( v i , v j ) ∈ W is the cost vector of ( v i , v j ) , w ( v i , v j ) = ( w , w , · · · , w d ) , where w i is the i -th kind ofcost value of edge ( v i , v j ) .In this paper, we assume w i ≥ . This assumption isreasonable, because the cost cannot be less than zero inreal applications. Our work can be easily extended to handleundirected graphs, an undirected edge is equivalent to twodirected edges. For simplicity, we only discuss the directedgraphs in the following.A path p is a sequence of vertices ( v , v , · · · , v l ) , where v i ∈ V and ( v i − , v i ) ∈ E We use w ( p ) to denote cost vectorof path p , i.e., w ( p ) = ( w ( p ) , w ( p ) , · · · , w d ( p )) , where w x ( p ) = P li =1 w x ( v i − , v i ) for ≤ x ≤ d .For a path p in G , a score function is used to calculatean overall score f ( p ) base on w ( p ) . The score function f ( · ) is always monotone increasing, i.e., for two different paths p and p ′ , if ( ∀ i, c i ( p ) ≤ c i ( p ′ )) ∧ ( ∃ i, c i ( p ) < c i ( p ′ )) , then f ( p ) < f ( p ′ ) . It is a common propertyand its intuitive meaningis that if all costs of a path p are less than that of p ′ , then theoverall score of p must be less than p ′ . The deﬁnition of theoptimal path over the multi-cost networks is given below: Deﬁnition 2.2: ( optimal path ) Given a multi-cost network G ,a score function f ( · ) , a starting vertex v s and an ending vertex v e , the optimal path from v s to v e , denoted as p ∗ s,e , is a pathin G that has the minimum score among all paths from v s to v e , i.e., f ( p ∗ s,e ) ≤ f ( p ) for any p ∈ P s,e , where P s,e is the setof all simple paths from v s to v e .Fig. 1 illustrates an concrete multi-cost network G . Thescore function in this example is f ( w , w ) = w + w .Consider the path p : v s → v → v e in G , its cost vectoris w ( p ) = (10 , and its score is f ( p ) = w ( p ) + w ( p ) =10 + 4 = 14 . because the score of p is the minimum amongall paths from v s to v e , then p is the optimal path.The following theorem shows the problem of ﬁnding theoptimal path in the multi-cost networks under non-linear scorefunction is NP-hard. Theorem 2.1:

The problem of ﬁnding the optimal path undera non-linear function in the multi-cost networks is NP-hard.Proof:

We reduce the problem of the minimum sum ofsquares, which is NP-complete[7], to this problem. The min-imum sum of squares problem is as follows. Given a numberset A = { a , a , · · · , a n } of size n and an integer k ≤ | A | ,ﬁnd a partition A ∗ = { A , A , · · · , A k } of A such that P kj =1 ( P a i ∈ A j a i ) is minimum. Note that A j (1 ≤ j ≤ k ) cannot be an empty set for an optimal partition A ∗ . Givenan instance of the minimum sum of squares problem, it can v s v e v v v v v (5,3) (5,1)(1,1)(0.5,4) (0.5,2)(2,3) (3,3) (3,3)(2,1) Fig. 1. An example of multi-cost graph G ( V, E ) be converted to an instance of the optimal path problem asfollows. We create a graph G with n + 1 + kn vertices, { v , v , · · · , v n +1 } ∪ { v i,j | ≤ i ≤ n, ≤ j ≤ k } . Here, v i,j (1 ≤ j ≤ k ) is placed between v i and v i +1 . We create theedges in G as follows. For ∀ ≤ i ≤ n and ∀ ≤ j ≤ k , wecreate an edge e i, ( i,j ) from v i to v i,j . The cost of edge e i, ( i,j ) isassigned as w ( e i, ( i,j ) ) = (0 , · · · , , a i , , · · · , , i.e., the j -thcost value of w ( e i, ( i,j ) ) is a i and the others are zero. Similarly,we create an edge e ( i,j ) ,i +1 from v i,j to v i +1 . The cost ofedge e ( i,j ) ,i +1 is also w ( e ( i,j ) ,i +1 ) = (0 , · · · , , a i , , · · · , ,i.e., the j -th cost value of w ( e ( i,j ) ,i +1 ) is a i and the othersare zero. Let v = v s and v n +1 = v e . Score function is f ( w , · · · , w k ) = P ki =1 ( w i ) . Here, ( w , · · · , w k ) is the costvector w ( p ) of a path p . Obviously, if a path p travels throughan edge e i, ( i,j ) , it must travel through e ( i,j ) ,i +1 . We canconcatenate e i, ( i,j ) and e ( i,j ) ,i +1 as a new edge e ji,i +1 from v i to v i +1 . e ji,i +1 is called the j -th edge from v i to v i +1 in G . Thecost of e ji,i +1 is (0 , · · · , , a i , , · · · , , i.e., the j -th cost valueof w ( e ji,i +1 ) is a i and the others are zero. For any path p from v s to v e in graph G , the j -th cost value w j ( p ) of w ( p ) is equalto the sum of the j -th cost values of all the edges in p . Let E jp be the set of all the j -th edges in G that p travels through, i.e., E jp = { e ji,i +1 | e ji,i +1 ∈ p, ≤ i ≤ n } . Then { E jp | ≤ j ≤ k } corresponds to a partition A = { A j | ≤ j ≤ k } of A , where A is the number set { a , a , · · · , a n } and A j (1 ≤ j ≤ k ) isthe number set of the j -th cost value of all the edges in E jp ,i.e., A j = { w j ( e ) | e ∈ E jp } . Consequently, an optimal path p ∗ with the minimum score corresponds to an optimal partition A ∗ for A such that P kj =1 ( P a i ∈ A j a i ) is the minimum. Notethat this reduction is in polynomial time. If we ﬁnd an optimalpath from v s to v e in G in polynomial time, then we also canﬁnd an optimal partition A ∗ for number set A . Therefore, theproblem of ﬁnding the optimal path over the multi-cost graphsis NP-hard. ✷ B. Challenging Problem

If score function f ( · ) is linear, i.e., for any two consecutiveedges ( v x , v y ) and ( v y , v z ) , we have f ( w ( v x , v y ) + w ( v y , v z )) = f ( w ( v x , v y )) + f ( w ( v y , v z )) then f ( w ( v x , v y )) can be considered as the single-oneweight of the edge ( v x , v y ) for any edge in G . Obviously, f ( w , w ) = w + w is a linear function. In this case, theproblem of ﬁnding the optimal path in the multi-cost networkscan be solved in polynomial time by the existing shortest pathalgorithms, e.g., Dijkstra algorithm. The shortest path p based on the weight f ( w ( v x , v y )) is exactly the optimal in the multi-cost networks. Otherwise, there is another path p ′ such that f ( p ′ ) < f ( p ) . By the linearity of score function, we have f ( p ′ ) = f ( l − X i =1 w ( v ′ i , v ′ i +1 )) = l − X i =1 f ( w ( v ′ i , v ′ i +1 ))

6) = 37 , which is less than the score f (4 ,

4) = 32 of path p ′ : s → v → v . This example statesa sub-path of an optimal path may be not the optimal one inthe multi-cost networks.Enumeration is a straightforward method to compute theoptimal path in the multi-cost graphs. Given a starting ver-tex v s and an ending vertex v e , we compute the score forevery path from v s to v e and then ﬁnd the path with theminimum score. Let the maximum out-degree of G is λ , i.e., λ = max { d + ( v ) | v ∈ V } , where d + ( v ) is out-degree of v . Thesearch space is O ( λ | V | ) for enumeration, which is obviouslyinfeasible in real applications. Another alternative approach isto pre-compute the optimal path for every pair of vertices in G .The critical shortcoming is that cannot cope with distinct scorefunctions. Since the score functions are various, an optimalpath under one function may be not an optimal path underanother function.There are only a small number of heuristic algorithms areproposed to solve it[25]. In this paper, we develop a novelpartition-based index to ﬁnd the optimal path in multi-costnetworks and it can support well for Dijkstra-based algorithmsunder linear functions or heuristic algorithms under non-linearfunctions. III. P ARTITION -B ASED I NDEX

A. What is the Partition-Based Index?

Given a graph G ( V, E ) , a k -partition of G is a collection { V , · · · , V k } satisfying the following conditions: (1) every V p is a subset of V ; (2) for ∀ V p , V q ( p = q ) , V p ∩ V q = ∅ ; (2) V = S ≤ p ≤ k V p . A vertex v i is called an entry (orexit) of V p , if (1) v i ∈ V p ; and (2) ∃ v j , v j / ∈ V p ∧ v j ∈ N − ( v i ) ( or v j ∈ N + ( v i )) , where N − ( v i ) and N + ( v i ) are v i ’s incoming and outgoing neighbor set respectively. Entriesand exits are also called the border vertices . We use V p .entry and V p .exit to denote the entry set and exit set of V p , and use V.entry and

V.exit to denote the sets of all entries and exitsin G , respectively. Obviously, V.entry = S ≤ p ≤ k V p .entry and V.exit = S ≤ p ≤ k V p .exit .A partition-based index includes two parts: inter-index and inner-index . We ﬁrst introduce the lower bound of optimalpath ( LBOP ) and skyline path .For a multi-cost network G with d kinds of cost, G x (1 ≤ x ≤ d ) is a weighted graph with the same structure as G ,and the weight of every edge ( v i , v j ) in G x is the x -th cost w x ( v i , v j ) of w ( v i , v j ) . For any two vertices v i , v j ∈ G , P i,j = { p i,j , · · · , p di,j } is the set of single-one cost shortestpaths from v i to v j , where p xi,j is the shortest path from v i to v j in G x . We use φ xi,j to denote the weight of p xi,j . The costvector Φ i,j = ( φ i,j , · · · , φ di,j ) is called the lower bound of theoptimal path ( LBOP ) from v i to v j in G .Let p and p ′ be two different paths in a multi-cost graph G . We say p dominate p ′ , denoted as p ≺ p ′ , iff for ∀ i (1 ≤ i ≤ d ) , w i ( p ) ≤ w i ( p ′ ) , and ∃ i (1 ≤ i ≤ d ) , w i ( p ) < w i ( p ′ ) .Here, w i ( p ) and w i ( p ′ ) are the i -th cost value of w ( p ) and w ( p ′ ) , respectively. For two vertices v i , v j ∈ G , a path p is a skyline path from v i to v j iff p cannot be dominated by anyother path p ′ from v i to v j .For any path p i,j from v i to v j , the cost vector of p i,j is w ( p i,j ) = ( w ( p i,j ) , · · · , w d ( p i,j )) , then we have Φ i,j p i,j ,i.e., for ∀ x (1 ≤ x ≤ d ) , φ xi,j ≤ w x ( p i,j ) .Lemma 3.1 guarantees that Φ i,j is the strict lower boundfor the optimal path from v i to v j in the multi-cost network G . Lemma 3.1: Φ i,j is the strict lower bound for the optimalpath from v i to v j in G , that is, there does not exist anotherlower bound Φ ′ i,j such that Φ i,j ≺ Φ ′ i,j and Φ ′ i,j p i,j forany path p i,j from v i to v j .Proof: We prove it by contradiction. Assume that there is Φ ′ i,j satisfying Φ i,j ≺ Φ ′ i,j , then ∃ x (1 ≤ x ≤ d ) , such that φ ′ xi,j > φ xi,j . On the other hand, because p xi,j is a path from v i to v j and then Φ ′ i,j p xi,j . It means φ ′ xi,j ≤ φ xi,j , which is acontradiction. ✷ Inter-index:

Inter-index is essentially a matrix A to main-tain the LBOP for every pair of border vertex and entry in G . Each row represents a border vertex (entry or exit) v i andeach column represents an entry v j in G . The size of A is ( | V.exit | + | V.entry | ) × | V.entry | . Each cell A i,j includestwo elements: Φ i,j and P i,j . Inner-index:

Inner-index consists of k sub-indexs and everysub-index I p is associated with a vertex subset V p . I p includes two parts: (i) Skyline-Path-Inner-Index I Sp ; and (ii) LBOP -Inner-Index I Lp .Skyline-Path-Inner-Index I Sp of V p is a collection of skylinepath sets for all pairs of entry and exit in V p , i.e., I Sp = { SP ( i,j ); p | v i ∈ V p .entry, v j ∈ V p .exit } . SP ( i,j ); p is the setof all skyline paths from v i to v j in G p , where G p is theinduced subgraph of V p on G . Note that the paths in SP ( i,j ); p only pass through the vertices in V p . LBOP -Inner-Index I Lp of V p is essentially a matrix M p ofsize | V p | × | V p | to maintain LBOP s for all pairs of vertices v i and v j V p . Actually, we only need to maintain a smallermatrix M ′ p as I Lp in memory. M ′ p is a sub-matrix of M p . Itmaintain all the LBOP s from an entry to a vertex in V p andall the LBOP s from a vertex to an exit in V p . The remainingsub-matrix M − p = M p \ M ′ p (1 ≤ p ≤ k ) is maintained inthe disk. M − s and M − e are taken into the memory when thestarting vertex v s and the ending vertex v e are given.By inter-index and LBOP -inner-index, Φ i,j can be calcu-lated easily for any pair of vertices v i and v j in G . Given astarting vertex v s and an ending vertex v e , we use V s and V e to denote the vertex subsets including v s and v e respectively.If V s = V e , we can obtain Φ s,e from LBOP -inner-index I Lp directly. If V s = V e , we calculate Φ s,e by Lemma 3.2. Lemma 3.2:

Given two vertices v s and v e in a multi-costnetwork G , V s and V e are two distinct vertex subsets including v s and v e respectively. Let v i be an entry of V e . Thus for ∀ x (1 ≤ x ≤ d ) , we have φ xs,e = min { φ xs,i + φ xi,e | v i ∈ V e .entry } ,where φ xs,e , φ xs,i and φ xi,e are the x -th cost of LBOP Φ s,e , Φ s,i and Φ i,e respectively.Proof: We know φ ( s,e ); x (1 ≤ x ≤ d ) is the weight of theshortest path p xs,e in graph G x , which must pass through anentry v i in V e .entry . Therefore, p xs,e can be regarded as twoparts: (i) sub-path from v s to v i ; and (ii) sub-path from v i to v e . Because φ ( s,i ); x and φ ( i,e ); x are the weights of the shortestpaths from v s to v i and from v i to v e respectively in G x ,then we have φ ( s,i ); x + φ ( i,e ); x ≤ φ ( s,e ); x . On the other hand, φ ( s,e ); x is the minimum among all the paths from v s to v e ,then φ ( s,e ); x ≤ φ ( s,i ); x + φ ( i,e ); x . Thus we have φ ( s,e ); x = φ ( s,i ); x + φ ( i,e ); x . Next, we prove that v i is exactly the entryminimizing φ ( s,i ); x + φ ( i,e ); x . It is obvious otherwise p xs,e isnot the single-one cost shortest path in G x . Then we have φ ( s,e ); x = min { φ ( s,i ); x + φ ( i,e ); x | v i ∈ V e .entry } . ✷ Φ s,e can be calculated in two cases: (1) v s ∈ V s .entry ∪ V s .exit ; and (2) v s / ∈ V s .entry ∪ V s .exit . For case (1), φ xs,i and φ xs,i can be directly retrieved from inter-index and LBOP -inner-index I Le respectively. Therefore, the minimumvalue of φ ( s,i ); x + φ ( i,e ); x can be easily calculated as φ xs,e byLemma 3.2. For case (2), because φ xs,i is not maintained ininter-index, it is necessary to calculate the minimum value of φ xs,j + φ xj,i | v j ∈ V s .exit } as φ xs,i and then calculate φ xs,e in thesimilar way as the case (1). The algorithm to compute Φ s,e for any two vertices v s and v e in G is shown in Algorithm1. The set P s,e of the single-one cost shortest paths can becalculated in the similar way as calculating Φ s,e . Algorithm 1 C OMPUTE -LBOP (

I, s, t ) Input: index I , starting vertex v s and ending vertex v e Output:

LBOP Φ s,e from v s to v e . if V s = V e then return Φ s,e from I Ls (or ( I Le ) ); else if v s ∈ V s .entry ∪ V s .exit then P ROCEDURE ( v s , v e , V e .entry ) ; else for v i ∈ V e .entry do P ROCEDURE ( v s , v i , V s .exit ) ; P ROCEDURE ( v s , v e , V e .entry ) ; return Φ s,e ; Algorithm 2 P ROCEDURE ( v i , v j , V ) for x = 1 to d do for each v r ∈ V do φ ∗ ← φ ( i,r ); x + φ ( r,j ); x ; if φ ( i,j ); x > φ ∗ then φ ( i,j ); x ← φ ∗ ; B. How to Construct Partition-Based Index?1) Inter-index and

LBOP -inner-index:

For

LBOP -inner-index I Lp of vertex subset V p , the shortest path algorithmscan be used to calculate Φ i,j for every pair of vertex v i and v j in V p . For inter-index, Φ i,j for every pair of border vertex v i ∈ V.entry ∪ V.exit and entry v j ∈ V.entry also can becalculated by the shortest path algorithms. It worth notingthat it is not necessary to maintain Φ i,j in inter-index if v i and v j are in the same vertex subset V p because it has beenmaintained in the LBOP -inner-index.

2) Skyline-path-inner-index:

For every I Sp in Skyline-path-inner-index, I Sp = { SP ( i,j ); p | v i ∈ V p .entry, v j ∈ V p .exit } , itis necessary to calculate SP ( i,j ); p for every pair of entry v i and exit v j in V p . We use the heuristic algorithm proposedin [25] to calculate SP ( i,j ); p . All possible skyline paths in G p are organized in a search tree T and a prior queue Q isused to maintain the paths in T to be searched, where G p isthe induced subgraph of V p on G . In each iteration, a path p is dequeued from Q . When the ending vertex of p is not v j , algorithm need to check whether p can be dominated bya path in SP ( i,j ); p . If not, p is extended to a new path p ′ by appending an outgoing neighbor v o of ending vertex in p and then p ′ is inserted into Q . When the ending vertex of p is v j . If p cannot be dominated by any path in SP ( i,j ); p , p will be inserted into SP ( i,j ); p . On the other hand, the pathsdominated by p will be removed from SP ( i,j ); p . The severalpruning strategies can be used for this algorithm and the moredetails are shown in [25]. C. Contour skyline set

Given a skyline-path-inner-index I Sp , each skyline path p ∈ SP ( i,j ); p can be regarded as a skyline point p in the d -dimensional space according to w ( p ) . Note that some suchpoints in the space are proximity. This property is helpfulfor improve the efﬁciency of the optimal path query. In thissection, we propose the deﬁnition of the contour skylineset. All skyline points in SP ( i,j ); p can be partitioned into p p p p p p p p p cp cp cp Fig. 2. An example of contour skyline set several groups by their space proximity. We compute a contourskyline point for every group and the set of the contour skylinepoints is called the contour skyline set of SP ( i,j ); p .Fig. 2 is an example of the contour skyline set in thecluster V p . p , · · · , p are the skyline points in a 2-dimensionalspace and each p i is a skyline path p i . We observe that R = { p , p , p } , R = { p , p , p , p } and R = { p , p } are three groups such that the skyline points in the same groupare space proximity. Then cp , cp and cp are the contourskyline points corresponding to R , R and R respectively.Let w ( cp i ) = ( w ( cp i ) , w ( cp i )) be the cost vector of cp i .It is obvious that cp i is the LBOP of the skyline paths in R i , i.e., w x ( cp i ) = min { w x ( p ) | p ∈ R i } , where w x ( cp i ) and w x ( p ) are the x -th cost value of w ( cp i ) and w ( p ) respectively. Therefore, the problem to compute the contourskyline points is equivalent to partition the skyline points intoseveral different groups such that the points in each groupare more space proximity. Given a speciﬁed r , our goal isto partition the skyline points into r groups. To do that, weintroduce the concept of the diameter for such group. For agroup R i , the diameter of R i , denoted as D ( R i ) , is deﬁned asthe maximum Euclidean distance among all the pairs of thepoints in S . Formally, D ( R i ) = max { dist ( p, p ′ ) | p i , p j ∈ R i } (1)where, dist ( p, p ′ ) is the Euclidean distance between p and p ′ in the multi-dimensional space. Given a r -partition R = { R , · · · , R r } , we deﬁne the diameter D ( R ) of R below: D ( R ) = max {D ( R i ) | R i ∈ R} (2)Intuitively, D ( R ) quantiﬁes the partition quality as the maxi-mum distance between any two points in the same group. Apartition R is good if, for every two points in the same group,they are close to each other. Deﬁnition 3.1: ( Contour skyline ) Given two vertices v x and v y in vertex subset V p , SP ( x,y ); p is the skyline path set from v x to v y in the induced subgraph G p , every path in SP ( x,y ); p is a skyline point in d -dimensional space. Given an integer r ,an optimal r -partition R opt is a partition to minimize D ( R ) .For every group R i in R opt , the contour skyline point cp i isthe LBOP of the skyline paths in R i , the set of all cp i is calledthe contour skyline set of SP ( x,y ); p , denoted as CS ( x,y ); p .The efﬁciency of the optimal path query can be improved by CS ( x,y ); p . We introduce it in Section IV-B. Next, we discusshow to compute the contour skyline points. This problem isto ﬁnd the optimal partition R opt for all the skyline points in SP ( x,y ); p . In case of 2D space, we propose a dynamic pro-gramming method to compute the optimal partition SP ( x,y ); p . We prove this problem is NP-hard in 3D or higher dimensionalspace. We give a 2-approximate algorithm and show there isno (2 − ǫ ) -approximate solution in the polynomial time. Case 1: (2D space) : Assume that SP ( x,y ); p has been al-ready computed and let m be the size of SP ( x,y ); p . We use S = { p , · · · , p m } to denote the set of all skyline points in SP ( x,y ); p , where all p i in S are sorted in ascending orderof their x -coordinates. We use S i to denote { p , p , · · · , p i } .Specially, S = ∅ . We also use a notation opt ( i, t ) to denotethe optimal t -partition for S i . Obviously, the optimal r -partition R opt for S is essentially opt ( m, r ) . Let S j,i be thepoint set { p j , · · · , p i } , where ≤ j ≤ i ≤ m . Then we havethe following recursive equation: D ( opt ( i, t )) = i min j = t − { max {D ( opt ( j − , t − , D ( S j,i ) }} (3)The meaning of Eq. (3) is that: without loss generality, assumethat the optimal t -partition of S i is { R , · · · , R t } , where R t is the last group which consists of { p j , · · · , p i } . Then, { R , · · · , R t − } must be the optimal ( t − -partition for S j − .Let j min be the value of j minimizing Eq. (3), then we have opt ( i, t ) = opt ( j min − , t − ∪ S j min ,i opt ( i,

1) = S i (4)By Eq. (3) and Eq. (4), a dynamic programming method canbe utilized to compute the optimal r -partition for SP ( x,y ); p in2D space. Case2: (3D and the higher dimensional space) : In 3D andthe higher dimensional space , we prove the optimal r -partitionproblem is NP-hard by reducing the r -split problem in 2Dspace, which is NP-hard, to this problem. Given a set of points { p , · · · , p n } in 2D space, the r -split problem is to ﬁnd a setof r groups { B , · · · , B r } that minimizes max ≤ x ≤ r { max { dist ( p i , p j ) | p i , p j ∈ B x }} (5)This problem is similar to the r -partition problem for theskyline points, but when the points in space are the skylinepoints, the complexity for the r -split problem is unknown. Wegive Lemma 3.3 as follows: Lemma 3.3:

For dimensionality d ≥ , the r -partition problemis NP-hard.Proof: Given a set of points { p , · · · , p n } in 2D space, wemap each of them to a skyline point in 3D space. For apoint p i with x -coordinate p i ( x ) and y -coordinate p i ( y ) , it ismapped to a point p ′ i in 3D space with x , y and z -coordinates: p ′ i ( x ) = − √ p i ( x ) + p i ( y ) , p ′ i ( y ) = √ p i ( x ) + p i ( y ) , and p ′ i ( z ) = − √ p i ( y ) . For any two points in 3D space p ′ and p ′ , if p ′ ( x ) > p ′ ( x ) and p ′ ( y ) > p ′ ( y ) , then p ′ ( z ) < p ′ ( z ) .It means each point in 3D space is a skyline point. On theother hand, we also ﬁnd dist ( p ′ , p ′ ) = dist ( p , p ) , where dist ( p i , p j ) is the Euclidean distance between p i and p j . Thisreduction is in the polynomial time. If we can ﬁnd the optimal r -partition in the polynomial time, then we can solve r -splitproblem in the polynomial time.Given a set S of points in 3D space, we can convert it toa d -dimensional point set S ′ for any d ≥ easily. We assign ( d − zeros to all the other coordinates for any point in S . The optimal r -partition for S ′ is obviously the optimal r -partition for S in 3D space. It is in the polynomial time forthe reduction from 3D space to the d -dimensional space. ✷ We give a greedy algorithm for r -partition on a given SP ( x,y ); p in a vertex subset V p . The main idea is as follows:In the initialization phase, all the points are assigned to agroup R . One of these points, denoted as bp , is selectedas the “base point” of R . The selection of bp is arbitrary.During each iteration, some points in R , · · · , R j are movedinto a new group R j +1 . Also, one of these points will beselected as the “base point” of the new group, i.e., bp j +1 . Theconstruction of the new group is accomplished by ﬁrst ﬁndinga point p i , in one of the previous j groups { R , · · · , R j } ,whose distance to the base point of group it belongs ismaximal. Such a point will be moved into the group R j +1 and selected as the “base point” of R j +1 . A point in anyof the previous groups will be moved into group R j +1 if itsdistance to p i is not larger than the distance to the base pointof group it belongs to. With the r -partition, the CS ( x,y ); p of SP ( x,y ); p can be computed easily according to the deﬁnitionof the contour skyline set.This algorithm is guaranteed as a -approximate solutionbecause there is no (2 − ǫ ) -approximate solution in thepolynomial time if P = N P , as analysis in [9].In summary, for each SP ( x,y ); p in vertex subset V p , wecompute the contour skyline set CS ( x,y ); p . We also maintainevery CS ( x,y ); p in I Sp . D. How to Partition Graph to K Vertex Subsets

For optimal path problem in the multi-cost networks, theless number of edges among different vertex subsets results inthe less number of entries and exits in the multi-cost network,and then the size of partition-based index becomes smaller.The objective of the partition is to make the edges dense in thesame vertex subset and sparse among different vertex subsets.It is an optimal partition problem and has been well studied inthe past couple of decades[1], [6], [24]. In this paper, we usethe classic multi-level graph partitioning algorithm, proposedby Metis et al. in [1], to partition the networks in experiments.IV. Q

UERY P ROCESSING

Given a multi-cost network G ( V, E, W ) , a starting vertex v s and an ending vertex v e , V s and V e are the vertex subsetsincluding v s and v e respectively. A shrunk graph ¯ G = ( ¯ V , ¯ E ) can be derived from partition-based index. ¯ V consists of threesets: (1) V s ; (2) V e , and (3) S p = s,e ( V p .entry ∪ V p .exit ) . Theedges in ¯ E satisfy three following conditions: (1) ( v i , v j ) ∈ ¯ E ,iff (( v i , v j ) ∈ E ) ∧ (( v i , v j ∈ V s ) ∨ ( v i , v j ∈ V e )) ; (2) ( v i , v j ) ∈ ¯ E , iff (( v i , v j ) ∈ E ) ∧ (( v i ∈ V p .exit ) ∧ ( v j ∈ V q .entry )) ,where V p = V q ; and (3) m edges { ( v i , v j ) , · · · , ( v i , v j ) m } are constructed for any pair of entry v i and exit v j in V p , where V p = V s and V p = V e . Note that m is the size of SP ( i,j ); p .In case (3), every edge ( v i , v j ) α (1 ≤ α ≤ m ) from v i to v j represents a skyline path in SP ( i,j ); p . The following theorem Algorithm 3 V ERTEX -F ILTERING ( ¯ G ( ¯ V , ¯ E ) , v s , v e , f ( · ) ) Input: ¯ G ( ¯ V , ¯ E ) , the score function f ( · ) , the starting vertex v s and the ending vertex v e ; Output: the optimal path p ∗ s,e . τ ← min { f ( p xs,e | p xs,e ∈ P s,e } ; for each v i ∈ ¯ V do if τ < f (Φ s,i + Φ i,e ) then ¯ V ← ¯ V − { v i } ; O PTIMAL -P ATH ( ¯ G ( ¯ V ) , v s , v e , f ( · ) ) return p ∗ s,e , τ ; guarantees the optimal path problem on G ( V, E ) is equivalentto that on ¯ G ( ¯ V , ¯ E ) . Theorem 4.1:

Given a multi-cost graph G ( V, E ) , a startingvertex v s and an ending vertex v e on G , a shrunk graph ¯ G ( ¯ V , ¯ E ) regarding v s and v e can be constructed. Findingthe optimal path from v s to v e in G is equivalent to ﬁndingthe optimal path from v s to v e in ¯ G .Proof: First, we prove that an optimal path p from v s to v e in G is also an optimal path in ¯ G . p must be a path from v s to v e in ¯ G , otherwise some part of p can be dominated by a skylinepath in a cluster. A new path can be constructed by using thisskyline path instead of this part in p . By the monotonicity ofthe score function f ( · ) , the score of new path is less than thescore of p , which is contradict with that p is the optimal pathin G . Moreover, p must be an optimal path from v s to v e in ¯ G , otherwise there must exist another path p ′ whose score isless that p in ¯ G . Obviously, p ′ is also a path in G , thus it iscontradict with that p is the optimal path in G .Next, we prove that an optimal path p in ¯ G is also an optimalpath in G . Assume that there exist another path p ′ whose scoreis less than p in G , we consider two cases. First, p ′ is also apath in ¯ G , then p is not the optimal path in ¯ G because p ′ ’sscore is less than p ’s score. Second, p ′ is not a path in ¯ G , then p ′ must be dominated by another path p ′′ in ¯ G and the scoreof p ′′ is less than the score of p in ¯ G . It is contradict with that p is the optimal path in ¯ G . ✷ Based on Theorem 4.1, the optimal path from v s to v e on G ( V, E ) is equivalent to the optimal path on ¯ G ( ¯ V , ¯ E ) . Theprocess of ﬁnding the optimal path includes two steps: (1)vertex-ﬁltering; and (2) query processing. A. Vertex-Filtering

We propose a vertex-ﬁltering algorithm which can effec-tively ﬁlter vertices from ¯ G ( ¯ V , ¯ E ) . Given two vertices v i and v j in ¯ G , Φ i,j and P i,j can be calculated by Algorithm1. Obviously, τ = min { f ( p xs,e ) | p xs,e ∈ P s,e } is an upperbound of the score of the optimal path from v s to v e . If P s,e = ∅ , then there does not exist a path from v s to v e and algorithm immediately return p ∗ s, e = ∅ . For any v i in ¯ G , if τ < f (Φ s,i + Φ i,e ) , then v i can be removed from ¯ G .In the other words, the optimal path from v s to v e cannotpass through v i . Theorem 4.2 guarantees the correctness ofthe vertex ﬁltering. Theorem 4.2:

Given a multi-cost graph G ( V, E ) , a scorefunction f ( · ) , a starting vertex v s and an ending vertex v e , a shrunk graph ¯ G ( ¯ V , ¯ E ) can be constructed. P s,e is the setof the single-one cost shortest paths from v s to v e , P s,e = ∅ . τ is an upper bound of the optimal path from v s to v e , τ = min { f ( p xs,e ) | p xs,e ∈ P s,e } . For any vertex v i in ¯ G , if τ < f (Φ s,i + Φ i,e ) , where Φ s,i and Φ i,e are the LBOP from v s to v i and the LBOP from v i to v e respectively, then theoptimal path from v s to v e cannot travel through v i .Proof: We only need to prove that, for any path p travelingthrough v i , there exists a path p ′ without traveling through v i , such that f ( p ′ ) < f ( p ) . Obviously, p consists of twosegments: (i) the sub-path p s,i from v s to v i ; and (ii) the sub-path p i,e from v i to v e . By the deﬁnition of the LBOP , wehave Φ s,i p s,i and Φ i,e p i,e . Thus, Φ s,i +Φ i,e p . By themonotonicity of the score function f ( · ) , f (Φ s,i +Φ i,e ) ≤ f ( p ) .Let p ′ be the path in P s,e whose score is τ , i.e., f ( p ′ ) = τ .Obviously, p ′ is a path from v s to v e and it does not travelthrough v i , otherwise it is contradict with τ < f (Φ s,i + Φ i,e ) .Then we have f ( p ′ ) < f (Φ s,i + Φ i,e ) ≤ f ( p ) . ✷ The vertex-ﬁltering algorithm is shown in Algorithm 3. Thealgorithm need to perform veriﬁcation for every vertex in ¯ V ,then the time complexity of the vertex-ﬁltering algorithm is O ( ¯ V ) . ¯ V f is the set of vertices that cannot be ﬁltered in thevertex-ﬁltering step. Let ¯ G f ( ¯ V f , ¯ E f ) be the induced subgraphof ¯ V f on ¯ G . By Theorem 4.2, we only need to compute theoptimal path from v s to v e on ¯ G f ( ¯ V f , ¯ E f ) . B. Query Processing

We discuss the query processing for two cases: (1) scorefunction is linear; and (2) score function is non-linear.For case (1), every pair of border vertex v i and entry v j can be calculated a score according to Φ i,j , and this scorecan be regarded as a lower bound of distance from one vertexsubset to another. In addition, For every SP ( i,j ); p in Skyline-Path-Inner-Index I Sp , the minimum score of the skyline pathin SP ( i,j ); p is exactly the shortest distance from an entry v i to an exit v j in V p . By calculating these score, the partition-based index becomes the G-Tree index proposed in [26] andthen the optimal path problem can be solved.For case (2), the optimal path problem is NP-hard. A best-ﬁrst branch and bound search algorithm can be utilized tocompute the optimal path on ¯ G f ( ¯ V f , ¯ E f ) in the similar wayas the algorithm proposed in [25]. Note that ¯ G is not a simplegraph because there are several edges from an entry v i toan exit v j in a vertex subset V p . Given a graph ¯ G f , a startingvertex v s and an ending vertex v e , all the possible paths startedfrom v s in ¯ G f can be organized in a search tree. Here, theroot node represents the starting vertex set { v s } . Any non-root node C = { v s , ( v s , v ) , v , · · · , ( v l − , v l ) , v l } representsa path started from v s . | C | is the number of vertices in C ,i.e., | C | = |{ v | v ∈ C }| . For two different nodes C and C ′ in the search tree, C is the parent of C ′ if they satisfy thefollowing two conditions: (i) C ⊂ C ′ and | C ′ | = | C | + 1 ;and (ii) C ′ \ C is an edge-node set { ( v i , v j ) , v j } , where v i and v j are the ending vertex of path C and C ′ respectively. Ineach iteration, a node C is dequeued from the min-heap H .Algorithm extends C by processing the children of C . Assumethat the ending vertex of C is v i . For each edge ( v i , v j ) in ¯ G f , Dataset Category Number of vertices Number of edgesCAITN IP network 4,837 17,426EuAll email network 11,521 32,389Slashdot social network 20,639 187,672HepPh citation network 34,546 421,578CARN road network 21,047 21,692EURN road network 3,598,623 4,354,029TABLE ID

ATASET C HARACTERISTICS algorithm adds the edge-node set { ( v i , v j ) , v i } into C to geta child C ′ of C . Note that there may exist several edges from v i to v j when v i ∈ V p .entry and v ∈ V p .exit and every edgerepresents a skyline path from v i to v j in G p . The similarpruning strategies in [25] can be used to decide whether C ′ can be pruned or not. If C ′ cannot be pruned, it will be insertedinto the min-heap H . Algorithm terminates when H is emptyor f ( C ) are not less that the minimum score of the path from v s to v e that has been searched for the top element C in H .The contour skyline set can be used to improve the queryefﬁciency. For an entry v i and an exit v j in a cluster V p , we use e i,j = { ( v i , v j ) , · · · , ( v i , v j ) m } to denote the multiple edgesfrom v i to v j . Each ( v i , v j ) α ∈ e i,j represents a skyline path in SP ( i,j ); p . In each iteration, a node C is to be expanded. Let v i be the ending vertex of C . If v i is an entry of a cluster V p ( V p = V s and V p = V e ), then for each v j ∈ V p .exit , we do not needto add every edge-node set { ( v i , v j ) α , v } (1 ≤ α ≤ m ) into C to get a child C ′ of C . Let CS ( i,j ); p = { cp , · · · , cp r } be the contour skyline set of SP ( i,j ); p . Each cp x ∈ CS ( i,j ); p corresponds to a group R x of the skyline paths in SP ( i,j ); p (recall r -partition), then cp x corresponds to a group e xi,j ofedges in e i,j , where e xi,j = { ( v i , v j ) x , · · · , ( v i , v j ) x t } , e xi,j ⊂ e i,j . Each ( v i , v j ) x β ∈ e xi,j represents a skyline path in R x . cp x can be considered as an edge from v i to v j and then { cp x , v j } can be added into C to get a virtual child C ′ of C . C ′ corresponds to a children group C ′ x = { C ′ x , · · · , C ′ x t } of C , where each C ′ x β (1 ≤ β ≤ t ) is a child of C , C ′ x β isobtained by adding the edge-node set { ( v i , v j ) x β , v j } into C .Because cp x is the LBOP of R x , then cp x is the LBOP of e xi,j .Thus, we have C ′ ≺ C ′ x β for any β, ≤ β ≤ t . If the virtualnode C ′ can be pruned, then all C ′ x β in C ′ x can be pruned.V. P ERFORMANCE S TUDY

In this section, we test the partition-based index on six real-life networks including road networks, social network, etc. Allexperiments were done on a 3.0 GHz Intel Pentium Core i5CPU PC with 32GB main memory, running on Windows 7.All algorithms are implemented by Visual C++.The details of real-life networks used in experiments areshown in Table I, where CAITN is the Chicago anonymized in-ternet trace network, CARN and EURN are two road networksof California and Eastern USA respectively, EuAll is an emailcommunication network, Slashdot is a social network abouttechnology related news, and HepPh is a citation network fromthe e-print arXiv.For each network, we randomly assigned d kinds of costto every edge ( d ∈ { , , , } ). We randomly generate 1,000 pairs of vertices and query the optimal path for every pair .The reported querying time is the average time on each dataset.The score function is f ( w , · · · , w d ) = P di =1 w i .We compare our method with A* algorithm[12], geneticalgorithm(GA)[4] and LEXGO* algorithm[16], which arethree the state of the art heuristic algorithms for queryingskyline paths over multi-cost graphs. Note that skyline pathsessentially are a candidate set for an optimal path query, thusmore time is necessary to seek out the optimal path fromthe skyline paths for these methods. The experimental resultspresent the querying time of skyline path by these heuristicmethods are always much larger than the optimal path by ourmethod, even though the time are not counted in for ﬁndingan optimal one from all the skyline paths. We also compareour method with BF-Search in [25], which uses a naive indexto ﬁnd the optimal path in the multi-cost networks under thenon-linear functions.

20 50 80 100 Q ue r y i ng t i m e ( m s ) kCAITNCARNEuAll SlashdotHepPh (a) impact of k Q ue r y i ng t i m e ( m s ) rCAITNCARNEuAll SlashdotHepPh (b) impact of r Fig. 3. Impact of k and r Exp-1: Querying time : As shown in Table II, we investigatethe querying time on ﬁve datasets by comparing the partition-based index with A* algorithm, genetic algorithm, LEXGO*algorithm and BF-Search for d = 2 and d = 3 . In thisexperiment, the number of vertex subsets is k = 50 . For allnetworks, the querying time of the partition-based index arealways in order of magnitude less than the others. The reason isthat the partition-based index pre-computes the LBOP, skylinepaths and contour skyline for any pair of entry and exit inevery vertex subset and a large proportion of the vertices areﬁltered in the vertex-ﬁltering phase. Exp-2: Index size : The index size is shown in Table III. Wecompare the size of the partition-based index with the BF-Search for d = 2 and d = 3 . A* algorithm, genetic algorithmand LEXGO* algorithm are not listed here because they do notuse index. The number k is also . We ﬁnd the size of the thepartition-based index are much smaller than BF-Search. Theseresults indicates the partition-based index is space efﬁcient andit is more suitable for the large networks. Exp-3: Impact of vertex-ﬁltering : We investigate the effec-tiveness of the vertex-ﬁltering algorithm in Table IV. In thisexperiment, k = 50 and d = 2 . From Table IV, we ﬁnd thevertex-ﬁltering algorithm can ﬁlter at least vertices foreach dataset. We ﬁnd | ¯ E | may be larger than | E | , where | ¯ E | and | E | are the number of vertices in the shrunk graph ¯ G and the original graph G respectively. It is because that thereare multiple edges between every pair of entry v i and exit v j in each V p ( V p = V s and V p = V e ) in ¯ G . Avg. | SP ( i,j ); p | in Table IV is the average number of the edges between anypair of entry v i and exit v j in the same vertex subset. In fact,for each pair of entry v i and exit v j , | SP ( i,j ); p | ≪ | P ( i,j ); x | ,where P ( i,j ); x is the number of all the possible paths from u to v in G x . Therefore, even though | ¯ E | > | E | , our algorithmon ¯ G are more efﬁcient than that on G because many pathsfrom an entry to an exit have been ﬁltered by SP ( i,j ); p . Inaddition, each edge ( v i , v j ) α from an entry v i to an exit v j in ¯ G represents a skyline path from v i to v j . When algorithmexpands a node C whose ending vertex is v i , C ’s children in ¯ G are more possible to be pruned than that in G . Exp-4: Impact of k and r : We investigate the impact of thenumber k of the vertex subsets and the size r of the contourskyline set. The experimental results are shown in Fig. 3. For k , an appropriate k makes the number of the entries and theexits smaller in ¯ G and thus the querying time is less. A largeror smaller k will increase the querying time. In Fig. 3(a), weﬁnd the optimal k are distinct for the different datasets. Forexample, the optimal k is 50 for Euall dataset but it is 80 forSlashdot dataset. For r , the skyline points in a group are moreproximity under a larger r and then algorithm is more effectiveto prune a virtual node C ′ as the discussion in section IV-B. Onthe other hand, a larger r results in the more contour skylinepoints and then the querying time increases. In two extremecases, when r = 1 , the only contour skyline point is the LBOP of SP ( i,j ); p , and when r = | SP ( i,j ); p | , the contour skyline setis exactly SP ( i,j ); p . For these two cases, the contour skylineset cannot work well. We ﬁnd the optimal r are also distinctfor the different datasets. The optimal r is 5 for EuAll datasetand it is 8 for Slashdot and HepPh datasets. Exp-6. Scalability : We evaluate the scalability of our methodin Fig.4. We investigate the querying time by varying thenumber of vertices from one million to three millions onEURN dataset for d = 2 and d = 3 . For each graph, k = 10 − n , where n is the number of the vertices ingraph. We compare our method with BF-Search, GA algorithmand LEXGO* algorithm. The experimental results show ourmethod are always in order of magnitude faster than others andit can perform efﬁciently even though the number of verticesis larger than three millions. It indicates our method are alsosuitable for large multi-cost graphs. R unn i ng t i m e ( s e c ) Number of nodes ( × million)PB IndexBF Search LEXGO*GA (a) d = 2 R unn i ng t i m e ( s e c ) Number of nodes ( × million)PB IndexBF Search LEXGO*GA (b) d = 3 Fig. 4. Adaptivity to large graphs

VI. R

ELATED W ORK

The existing works for the shortest path problem proposevarious index techniques to enhance the efﬁciency of the short-est path query for large graphs.

The shortest path quad tree d = 2 d = 3 Dataset A* GA LEXGO* BF-Search PB-Index A* GA LEXGO* BF-Search PB-IndexCAITN 28.37 8.76 10.13 0.0374 0.0041 47.26 12.42 16.52 0.0515 0.0071CARN 121.25 36.87 32.71 0.0733 0.0115 219.38 68.73 79.83 0.0851 0.0189EuAll 211.76 92.28 79.27 0.1471 0.0062 336.52 155.34 132.46 0.2019 0.0113Slashdot 879.98 193.91 201.36 4.8139 0.0871 1127.62 316.77 289.71 6.2506 0.1027HepPh 1934.52 303.64 288.71 17.653 0.2194 3253.43 589.32 573.13 21.467 0.2938TABLE IIO

NLINE Q UERYING T IME IN S ECOND d = 2 d = 3 Dataset BF-Search PB-ndex BF-Search PB-IndexCAITN 115.99 6.21 203.78 13.52CARN 2600.68 93.85 4398.95 163.98EuAll 796.33 20.83 1333.86 39.23Slashdot 1746.39 47.21 3136.24 81.75HepPh 4124.96 138.74 6460.35 224.02TABLE IIII

NDEX S IZE IN

MBDataset | ¯ V | | ¯ E | | ¯ V f | | ¯ E f | Avg. | SP ( i,j ); x | CAITN 746 19,132 368 9,560 11.17CARN 1,268 27,338 539 12,057 6.02Enron 1,073 29,418 471 13,715 14.78Slashdot 1,782 293,877 936 198,429 43.16HepPh 3,832 1,718,753 1,297 646,396 55.31TABLE IVI

MPACT OF V ERTEX -F ILTERING scheme is proposed in [20], which pre-computes the shortestpaths for every two vertices in a graph and organizes them bya quad tree. This method is not applicable for the optimalpath problem in the multi-cost graphs. Because the scorefunctions given by different users may be different, the quadtree constructed according to one score function cannot answerthe optimal path query under the other functions. Xiao et al.in [23] proposes the concept of the compact BFS-trees wherethe BFS-trees are compressed by exploiting the symmetryproperty of the graphs. Wei et al. in [22] proposes a novelmethod named TEDI, which utilizes the tree decompositiontheory to build an index and process the shortest path query.Cheng et al. in [3] proposes a disk-based index for the single-source shortest path or distance queries. This index is a tree-structured index constructed based on the concept of vertexcover and it is I/O-efﬁcient when the input graph is too largeto ﬁt in main memory. Rice et al. in [18] introduces a newshortest path query type in which dynamic constraints maybe placed on the allowable set of edges that can appear on avalid shortest path. They formalize this problem as a speciﬁcvariant of formal language constrained shortest path problemsand then they propose the generalized shortest path queries inthe following work[19]. Zhu et al. in [27] presents AH index tonarrow the gap between theory and practice. Landmark-basedtechniques have been widely used to estimate the distancebetween two vertices in a graph in many applications[8], [17],[2]. Goldberg et al. in [8] choose some anchor vertices calledlandmark and pre-computes for each vertex its graph distance to all anchor vertices. A distance vector is created from thesedistances. A lower bound derived from the distance vector canbe used by A ∗ algorithm to guide the shortest path search.Qiao et al. in [17] propose a query-dependent local landmarkscheme, which identiﬁes a local landmark close to the speciﬁcquery nodes and provides a more accurate distance estimationthan the traditional global landmark approaches. The latestwork[2] proposes a new exact method based on distance-aware2-hop cover for the distance queries. All the above methodsutilize the following property in the shortest path: any sub-pathof a shortest path is also a shortest path. Therefore, they onlyneed to maintain the shortest paths among the vertices in theindex and compute the shortest path by concatenating the subshortest paths in the index. However, in the multi-cost graphs,this property does not hold. Therefore, these methods cannotsolve the optimal path problem in the multi-cost graphs.In recent years, several works[13], [5], [11], [4], [16], [12]study the multi-criteria shortest path (MCSP) problem onmulti-cost graphs. Given a starting vertex and an ending vertex,it is to ﬁnd all the skyline paths from the starting vertex to theending vertex. Most existing works on MCSP are heuristicalgorithm based on the following property: any sub-path ofa skyline path is also a skyline path. To compute a skylinepath p , these methods needs to expand all the skyline pathsfrom the starting vertex to a vertex v for every v ∈ p . Thedifference between MCSP and our problem is as follows.MCSP is to ﬁnd all skyline paths but our problem is onlyto ﬁnd one path that is the optimal under the score function.It is obvious that skyline paths is a candidate set of the optimalpath. However, the time cost is too expensive to ﬁnd an optimalpath by exhausting all skyline paths. Moreover, these worksdoes not develop any index technique to facilitate the skylinepath querying. Mouratidis et al. in [15] studies the skylinequeries and the top-k queries on the multi-cost transportationnetworks. For any vertex v in graph, all the distances on thedifferent dimensions between v and the query point form thecost vector of v . The deﬁnition of the cost vector in this workis different with ours and the query results are points but notpaths. Therefore, the methods in this work cannot applied tothe optimal path problem in this paper.VII. C ONCLUSION

In this paper, we study the problem of ﬁnding the optimalroute in the multi-cost networks. We prove this problemis NP-hard and propose a novel partition-based index withcontour skyline techniques. We also propose a vertex-ﬁlteringalgorithm to facilitate the query processing. We conduct ex- tensive experiments and the experimental results validate theefﬁciency of our method.R EFERENCES[1] A. Abou-Rjeili and G. Karypis. Multilevel algorithms for partitioningpower-law graphs. In

IPDPS , 2006.[2] T. Akiba, Y. Iwata, and Y. Yoshida. Fast exact shortest-path distancequeries on large networks by pruned landmark labeling. In

SIGMODConference , pages 349–360, 2013.[3] J. Cheng, Y. Ke, S. Chu, and C. Cheng. Efﬁcient processing of distancequeries in large graphs: A vertex cover approach. In

SIGMOD , 2012.[4] L. Chomatek. Genetic diversity in the multiobjective optimization ofpaths in graphs. In

Information Systems Architecture and Technology:Proceedings of 36th International Conference on Information SystemsArchitecture and Technology - ISAT 2015 - Part IV, Karpacz, Poland,September 20-22, 2015 , pages 123–136, 2015.[5] D. Delling and D. Wagner. Pareto paths with sharc. In

Proceedings ofthe 8th International Symposium on Experimental Algorithms (SEA’09) ,pages 125–136, Dortmund, Germany, 2009. Springer Verlag.[6] I. S. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts withouteigenvectors a multilevel approach.

IEEE Trans. Pattern Anal. Mach.Intell. , 29(11):1944–1957, 2007.[7] M. R. Garey and D. S. Johnson.

Computers and Intractability: A Guideto the Theory of NP-Completeness . W. H. Freeman, 1979.[8] A. V. Goldberg and C. Harrelson. Computing the shortest path: A searchmeets graph theory. In SODA , pages 156–165, 2005.[9] T. F. Gonzalez. Clustering to minimize the maximum interclusterdistance.

Theor. Comput. Sci. , 38:293–306, 1985.[10] N. Ilich and S. P. Simonovic. An evolution program for non-lineartransportation problems.

Journal of Heuristics , 7:145–168, 2001.[11] L. Mandow and D. J. Perez. A new approach to multiobjective a*search. In

Proceedings of the 19th International Joint Conference onArtiﬁcial Intelligence (IJCAI’05) , pages 218–223, Edinburgh, Scotland,2005. Morgan Kaufmann Publishers.[12] L. Mandow and J. P´erez-de-la-Cruz. Multiobjective a* search withconsistent heuristics.

J. ACM , 57(5):27:1–27:25, 2010.[13] E. Q. V. Martins. On a multicriteria shortest path problem.

EuropeanJournal of Operational Research , 16(2):236 – 245, 1984.[14] H. D. S. Mokhtar S. Bazaraa and C. M. Shetty. nonlinear programming: theory and algorithms . Wiley Interscience, 2006.[15] K. Mouratidis, Y. Lin, and M. L. Yiu. Preference queries in large multi-cost transportation networks. In

ICDE , pages 533–544, 2010.[16] F. J. Pulido, L. Mandow, and J. P´erez-de-la-Cruz. Multiobjective shortestpath problems with lexicographic goal-based preferences.

EuropeanJournal of Operational Research , 239(1):89–101, 2014.[17] M. Qiao, H. Cheng, L. Chang, and J. X. Yu. Approximate shortestdistance computing: A query-dependent local landmark scheme. In

ICDE , 2012.[18] M. N. Rice and V. J. Tsotras. Graph indexing of road networks forshortest path queries with label restrictions.

PVLDB , 4(2):69–80, 2010.[19] M. N. Rice and V. J. Tsotras. Engineering generalized shortest pathqueries. In

ICDE , pages 949–960, 2013.[20] H. Samet, J. Sankaranarayanan, and H. Alborzi. Scalable networkdistance browsing in spatial databases. In

SIGMOD , pages 43–54, 2008.[21] C. M. Shetty. A solution to the transportation problem with nonlinearcosts.

Operation Research , 7(5):571–580, 1959.[22] F. Wei. Tedi: efﬁcient shortest path query answering on graphs. In

SIGMOD , pages 99–110, 2010.[23] Y. Xiao, W. Wu, J. Pei, W. Wang, and Z. He. Efﬁciently indexingshortest paths by exploiting symmetry in graphs. In

EDBT , pages 493–504, 2009.[24] X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. Scan: a structuralclustering algorithm for networks. In

KDD , pages 824–833, 2007.[25] Y. Yang, J. X. Yu, H. Gao, and J. Li. Finding the optimal path overmulti-cost graphs. In

CIKM , pages 2124–2128. ACM, 2012.[26] R. Zhong, G. Li, K. Tan, and L. Zhou. G-tree: an efﬁcient index forKNN search on road networks. In

CIKM , pages 39–48, 2013.[27] A. D. Zhu, H. Ma, X. Xiao, S. Luo, Y. Tang, and S. Zhou. Shortestpath and distance queries on road networks: towards bridging theoryand practice. In