[PDF] Optimal Trajectories of a UAV Base Station Using Hamilton-Jacobi Equations

Abstract

We consider the problem of optimizing the trajectory of an Unmanned Aerial Vehicle (UAV). Assuming a traffic intensity map of users to be served, the UAV must travel from a given initial location to a final position within a given duration and serves the traffic on its way. The problem consists in finding the optimal trajectory that minimizes a certain cost depending on the velocity and on the amount of served traffic. We formulate the problem using the framework of Lagrangian mechanics. We derive closed-form formulas for the optimal trajectory when the traffic intensity is quadratic (single-phase) using Hamilton-Jacobi equations. When the traffic intensity is bi-phase, i.e. made of two quadratics, we provide necessary conditions of optimality that allow us to propose a gradient-based algorithm and a new algorithm based on the linear control properties of the quadratic model. These two solutions are of very low complexity because they rely on fast convergence numerical schemes and closed form formulas. These two approaches return a trajectory satisfying the necessary conditions of optimality. At last, we propose a data processing procedure based on a modified K-means algorithm to derive a bi-phase model and an optimal trajectory simulation from real traffic data.

Full PDF

OOptimal Trajectories of a UAV Base Station UsingHamilton-Jacobi Equations

Marceau Coupechoux ∗ , J´er ˆome Darbon † , Jean-Marc K´elif ‡ , and Marc Sigelle § ∗ LTCI,Telecom Paris, Institut Polytechnique de Paris, France, † Brown University, US, ‡ OrangeLabs, France, § On leave from Telecom Paris, France Email:[email protected], jerome [email protected],[email protected], [email protected]

Abstract

We consider the problem of optimizing the trajectory of an Unmanned Aerial Vehicle (UAV). Assuming atrafﬁc intensity map of users to be served, the UAV must travel from a given initial location to a ﬁnal positionwithin a given duration and serves the trafﬁc on its way. The problem consists in ﬁnding the optimal trajectorythat minimizes a certain cost depending on the velocity and on the amount of served trafﬁc. We formulate theproblem using the framework of Lagrangian mechanics. We derive closed-form formulas for the optimal trajectorywhen the trafﬁc intensity is quadratic (single-phase) using Hamilton-Jacobi equations. When the trafﬁc intensityis bi-phase, i.e. made of two quadratics, we provide necessary conditions of optimality that allow us to propose agradient-based algorithm and a new algorithm based on the linear control properties of the quadratic model. Thesetwo solutions are of very low complexity because they rely on fast convergence numerical schemes and closedform formulas. These two approaches return a trajectory satisfying the necessary conditions of optimality. At last,we propose a data processing procedure based on a modiﬁed K-means algorithm to derive a bi-phase model andan optimal trajectory simulation from real trafﬁc data.

I. I

NTRODUCTION

Unmanned Aerial Vehicles (UAV) are expected to play an increasing role in future wireless net-works [1]. UAVs may be deployed in an ad hoc manner when the traditional cellular infrastructureis missing. They can serve as relays to reach distant users outside the coverage of wireless networks.They also may be used to disseminate data to ground stations or collect information from sensors. Inthis paper, we address one of the envisioned use cases for UAV-aided wireless communications, which J. Darbon is supported by NSF DMS-1820821. M. Coupechoux has performed his work at LINCS laboratory. a r X i v : . [ m a t h . O C ] F e b relates to cellular network ofﬂoading in highly crowded areas [1]. More speciﬁcally, we focus on the pathplanning problem or trajectory optimization problem that consists in ﬁnding an optimal path for a UAVBase Station (BS) that minimizes a certain cost depending on the velocity and on the amount of servedtrafﬁc. Our approach is based on the Lagrangian mechanics framework and the use of Hamilton-Jacobi(HJ) equations. A. Related Work

UAV trajectory optimization for networks has been tackled maybe for the ﬁrst time in [2]. The modelconsists in a UAV ﬂying over a sensor network from which it has to collect some data. The UAV canlearn from previous experience, which is not assumed in our study. The problem of optimally deployingUAV BSs to serve trafﬁc demand has been addressed in the literature by considering static UAVs BSs orrelays, see e.g. [3], [4]. The goal is to optimally position the UAV so as to maximize the data rate withground stations or the number of served users. In these works, the notion of trajectory is either ignored orrestricted to be circular or linear. In robotics and autonomous systems, trajectory optimization is known as path planning . In this ﬁeld, there are classical methods like Cell Decomposition, Potential Field Method,Probabilistic Road Map, or heuristic approaches, e.g. bio-inspired algorithms [5]. Authors of [6] havecapitalized on this literature and proposed a path planning algorithm for drone BSs based on A* algorithm.The main goal of these papers is to reach a destination while avoiding obstacles. In our work, we intendto minimize a certain cost function along the trajectory by controlling the velocity of the UAV. This goalis studied in optimal control theory [7] and is applied for example in aircraft trajectory planning [8]. Mostnumerical methods in control theory can be classiﬁed in direct and indirect methods. In direct methods,the problem is transformed in a non linear programming problem using discretized time, locations andcontrols. Direct methods are heavily applied in a series of recent publications in the ﬁeld of UAV-aidedcommunications, see e.g. [9]–[12]. Formulated problems are usually non-convex. The standard approachis hence to rely on Successive Convex Approximation (SCA), which iteratively minimizes a sequenceof approximate convex functions. SCA is known to converge to a Karush-Kuhn-Tucker solution undermild conditions [13] but the quality of the solution may heavily depend on the initial guess. Here, simpleheuristics or solutions to the Travelling Salesman Problem (TSP) or the Pickup-and-Deliver Problem (PDP)can be used for ﬁnding an initial trajectory [14]. With direct methods, because of the discretization, thedifferential equations and the constraints of the systems are satisﬁed only at discrete points. This can leadto less accurate solutions than indirect methods and the quality of the solution depends on the quantization step [15]. Although every iteration of SCA has a polynomial time complexity, practical resolution timemay dramatically increase with the quantization grid and the dimension of the problem. On the other hand,indirect approaches relies on considering the Hamilton-Jacobi Partial Differential Equation associated to theoptimal control problem (see e.g., [16], [17][chp. 10]). Several recent methods have been proposed to solveHJ Partial Differential Equations (PDE) in high dimensions. These include max-plus algebra methods [18],[19], dynamic programming and reinforcement learning [20], tensor decomposition techniques [21], sparsegrids [22], model order reduction [23], polynomial approximation [24], optimization methods [25]–[28]and neural networks [29]–[32]. In this paper, we consider certain indirect methods that provide analyticalsolutions for certain classes of optimal control problem as we have shown in a preliminary study [33].

B. Contributions

Our contributions are the following: • Problem Formulation:

To the best of our knowledge, this is the ﬁrst time, after our preliminarystudy [33], that the UAV BS trajectory problem is formulated using the formalism of Lagrangianmechanics and solved using Hamilton-Jacobi equations. This approach provides closed-form equa-tions when the potential is quadratic and thus very low complexity solutions compared to existingsolutions in the literature. • Closed-form expression of the optimal trajectory with single phase trafﬁc intensity:

When the trafﬁcintensity map is made of a single hot spot or trafﬁc hole, has a quadratic form ( single phase ), andis time-independent, closed form expressions for the optimal trajectory are derived. It follows ahyperbola for a hot spot and corresponds to a repulsor in mechanics. For a trafﬁc hole, the trajectoryis on an ellipse and corresponds to the case of an attractor in mechanics. • Characterization of the optimal solution in multi-phase trafﬁc intensity:

When the trafﬁc map hasseveral hot spots or trafﬁc holes ( multi-phase ) whose regions are separated by interfaces and is time-independent, we derive necessary conditions to be fulﬁlled by the position and the instant at whichthe optimal trajectory crosses an interface (see Theorem 2). • A gradient algorithm for bi-phase trafﬁc:

An in-depth analysis of convexity vs. non-convexity issuesallows us to derive a gradient algorithm to solve the bi-phase problem (Algorithm 1). This algorithmﬁnds a stationary point for the cost function. This algorithm has a complexity O (1) at every iteration,whereas iterations of the sequential convex optimization technique have polynomial time complexity. • A new algorithm for the bi-phase optimization problem:

A new algorithm, called the B -algorithm(Algorithm 2), is proposed based on the linear control properties of the quadratic model. Thisalgorithm relies on a bisection scheme the complexity of which is proportional to the logarithmof the desired precision and closed form formulas. • A data processing procedure:

We propose a method to pre-process real measured trafﬁc data inorder to derive a bi-phase quadratic model. This procedure is based on smoothing steps followedby a modiﬁed K-means algorithm adapted to our quadratic model (Algorithm 3). In our numericalexperiments, the optimal trajectory is computed in a region where real trafﬁc data is available [34].The paper is structured as follows. In Section II, we give the system model, its interpretation in terms ofLagrangian mechanics formulate the problem and give preliminary results. Section III is devoted to thecharacterization of the optimal trajectories for both the single- and bi-phase cases. Section IV presents ouralgorithms, Section V our data processing procedure and numerical experiments. Section VI concludesthe paper.

Notations:

The usual Euclidean scalar product between x ∈ R n and y ∈ R is denoted by x · y . TheEuclidean norm (cid:107) x (cid:107) in R n of x ∈ R n is deﬁned by (cid:107) x (cid:107) := √ x · x . The set of matrices with m rows, n columns and real entries is denoted by M m,n ( R ) . The transpose of the A ∈ M m,n ( R ) is denotedby A † ∈ M n,m ( R ) . We classically identify M m, ( R ) and M ,n ( R ) as column vectors of R m and rowvectors of R n , respectively. Let f : R n × R m → R deﬁned by f ( x, y ) where x = ( x , . . . , x n ) ∈ R n and y = ( y , . . . , y m ) ∈ R m . Let a ∈ R n and b ∈ R m . We denote by ∂f∂x i ( a, b ) the partial derivative of f withrespect to the variable x i at ( a, b ) ∈ R n × R m . We also introduce the notations ∇ x f ( a, b ) = ( ∂f∂x ( a, b ) , . . . , ∂f∂x n ( a, b )) ∈ R n and ∇ y f ( a, b ) = ( ∂f∂y ( a, b ) , . . . , ∂f∂y m ( a, b )) ∈ R m .We also consider the following notation for partial Hessian matrices M m + n,m + n ( R ) (cid:51) ∇ f ( a, b ) =  ∇ x,x f ( a, b ) ∇ x,y f ( a, b ) ∇ y,x f ( a, b ) ∇ y,y f ( a, b )  (1)where ∇ x,x f ( a, b ) ∈ M n,n ( R ) =  ∂ f∂x ( a, b ) . . . ∂ f∂x ∂x n ( a, b ) ... . . . ... ∂ f∂x n x ( a, b ) . . . ∂ f∂x n ( a, b )  and Id n denotes the identity matrixof M n,n ( R ) . We shall see that the value function S : R × R × R × R → R will play a fundamentalrole in this paper. We use the notations ( T , X , T , X ) for S and therefore the partial derivatives of Fig. 1: A UAV Base Station travels from z at t to z T at T and serves a user trafﬁc characterized by itsintensity. S at ( t , x , t , x ) ∈ R × R × R × R are denoted as follows: ∂S∂T ( t , x , t , x ) , ∇ X S ( t , x , t , x ) , ∂S∂T ( t , x , t , x ) and ∇ X S ( t , x , t , x ) .II. S YSTEM M ODEL AND L AGRANGIAN M ECHANICS I NTERPRETATION

A. System Model

We consider a network area characterized by a trafﬁc density at position z and time t . We intend tocontrol the trajectory and the velocity of a UAV base station, which is located in z (cid:44) z ( t ) at t andshall reach a destination z T (cid:44) z ( T ) at T with the aim of minimizing a cost determined by the velocityand the trafﬁc, deﬁned hereafter by (2). At ( t, z ) , we assume that the UAV BS is able to cover an area,from which it can serve users (see Figure 1). The velocity of the UAV BS induces an energy cost. Inthis model, we control the velocity vector a of the UAV BS. The general form of the cost function is asfollows L ( t, z, a ) = K || a || − u ( t, z ) (2) where the ﬁrst term is a cost related to the velocity of the vehicle ( K is a positive constant), and (cid:107) · (cid:107) denotes the usual Euclidean norm. The higher is the speed, the higher is the energy cost. The second termis a user trafﬁc intensity , i.e., the amount of trafﬁc served by the UAV BS at ( t, z ) . Note that a non-zeroenergy at null speed can be incorporated in the model by adding a constant. Without loss of generality,we assume that this constant is null.Let S ( t , z , T, z T ) be the minimal total cost along any trajectory between z at t and z T at T (alsocalled the action in mechanics or value function in control theory). Let us deﬁne Ω( t , T ) as the spaceof absolutely continuous functions from [ t ; T ] to R . Our problem can now be formulated as follows S ( t , z , T, z T ) = min a ∈ Ω( t ,T ) (cid:90) Tt L ( s, z ( s ) , a ( s )) ds + J ( z ( T )) (3)where dzdt ( t ) = a ( t ) , z ( t ) = z , and J is the terminal cost deﬁned by J ( z ) = 0 if z = z T and J ( z ) = + ∞ otherwise. For simplicity reasons, we assume the existence and uniqueness of the optimal control a ∗ ( t ) in (3) and denote the associated optimal trajectory z ∗ ( t ) . In a trafﬁc map symmetric with respect to z and z T , the reader can convince himself that the uniqueness is not guaranteed. B. Preliminary Results From Lagrangian Mechanics

We provide in this section important results from the Lagrangian mechanics for the convenience of thereader.

Deﬁnition 1 (Impulsion) . The impulsion function is deﬁned as p ( t, z, a ) := ∇ a L ( t, z, a ) . (4)In the Newtonian classical framework that is used here (see (2)), the impulsion is the product of theparticle mass by its velocity (hence the standard term “impulsion”). Deﬁnition 2.

The

Hamiltonian function is deﬁned as H ( t, z, p ) := max a ∈ R p · a − L ( t, z, a ) . (5) Lemma 1 (Euler-Lagrange Equations) . Along the optimal trajectory z ∗ ( t ) that starts from z at t andends at z T at T , we have ddt ∇ a L ( t, z ∗ ( t ) , a ∗ ( t )) = ∇ z L ( t, z ∗ ( t ) , a ∗ ( t )) (6) or equivalently dpdt ( t, z ∗ ( t ) , a ∗ ( t )) = ∇ z L ( t, z ∗ ( t ) , a ∗ ( t )) . (7) Proof.

See Appendix A.The Euler-Lagrange equation is the ﬁrst-order necessary condition for optimality and holds for everypoint on the optimal trajectory.

Lemma 2.

If the Lagrangian L ( t, z, a ) is time-independent and α -homogeneous in z and a for α > ,i.e., L ( λz, λa ) = | λ | α L ( z, a ) for all λ ∈ R , S given by (3) reads S ( t , z , T, z T ) = 1 α [ z · p ] Tt + J ( z T ) . (8) Proof.

See Appendix B.

Lemma 3 (Hamilton-Jacobi) . Along the optimal trajectory, we have for t ∈ ( t ; T ) using previous notations ∂S∂T ( t, z ∗ ( t ) , T, z T ) = H ( t, z ∗ ( t ) , − p ∗ ( t )) , (9) ∂S∂T ( t , z , t, z ∗ ( t )) = − H ( t, z ∗ ( t ) , p ∗ ( t )) , (10) where p ∗ ( t ) = ∇ a L ( t, z ∗ ( t ) , a ∗ ( t )) = ∇ X S ( t , z ∗ ( t ) , T, z T ) . (11) Proof.

See Appendix C.From now, we assume that the Lagrangian is time-independent, i.e., L ( t, z, a ) = L ( z, a ) , and is an evenfunction in a , i.e., L ( z, − a ) = L ( z, a ) . A direct consequence is that H is time-independent and is an evenfunction in p , i.e., we write H ( t, z, p ) = H ( z, p ) and H ( z, − p ) = H ( z, p ) .III. O PTIMAL T RAJECTORY

In this section, we characterize the optimal trajectory when the trafﬁc intensity is a quadratic form andalso when it is made of two regions of quadratic form separated by an interface. We call these two cases single-phase and bi-phase intensities respectively. Both cases satisfy our assumptions on the Lagrangianwith α = 2 . S ( t , z , T, z T ) = Kω ω ( T − t ) (cid:0) ( | z | + | z T | ) cosh ω ( T − t ) − z · z T (cid:1) + J ( z T ) (14) S ( t , z , T, z T ) = Kω ω ( T − t ) (cid:0) ( | z | + | z T | ) cos ω ( T − t ) − z · z T (cid:1) + J ( z T ) (17) A. Single-Phase Optimal Trajectory

Assume that the trafﬁc intensity is of the form u ( z ) = u || z || . When u > , this function models atrafﬁc hole in z = 0 . When u < , it models a trafﬁc hot spot at z = 0 . We disregard the case u = 0 because it corresponds to a constant trafﬁc intensity that is not of interest in this paper. Thus the costfunction has the following form L ( z, a ) = 12 K || a || − u || z || . (12)Note that p ( z, a ) = ∇ a L ( z, a ) = Ka. (13)

1) Trajectory Equation:

In the single phase case, we have a closed form expression of the trajectory.

Theorem 1. If u < , the cost function is given by (14), the optimal trajectory is z ∗ ( t ) = z T sinh( ω ( t − t )) + z sinh( ω ( T − t ))sinh( ω ( T − t )) (15) and the control is given by a ∗ ( t ) = ω z T cosh( ω ( t − T )) − z cosh( ω ( T − t ))sinh( ω ( T − t )) (16) where ω = − u K .If u > , the cost function is given by (17), the optimal trajectory is z ∗ ( t ) = z T sin( ω ( t − t )) + z sin( ω ( T − t ))sin( ω ( T − t )) (18) and the control is given by a ∗ ( t ) = ω z T cos( ω ( t − t )) − z cos( ω ( T − t ))sin( ω ( T − t )) (19) where ω = u K .Proof. See Appendix D.

Corollary 1.

If the user trafﬁc intensity is of the form u ( t, z ) = u || z || + u z · e + u with u ∈ R and e ∈ R , then deﬁne ˜ z = z + e , ˜ z = z + e , ˜ z T = z T + e and trajectories given in Theorem 1are valid by replacing z , z , z T by ˜ z , ˜ z , ˜ z T , respectively. The cost function becomes: S ( t , z , T, z T ) = α [ z · p ] Tt + J ( z T ) − u ( T − t ) . Corollary 2.

If the user trafﬁc intensity is of the form u ( t, z ) = (cid:80) i u i || z − z i || with (cid:80) i u i (cid:54) = 0 , then u ( t, z ) = ( (cid:80) i u i ) || z − z b || + (cid:80) i u i || z i − z b || with z b = (cid:80) i u i z i (cid:80) i u i . Deﬁne ˜ z = z + z b , ˜ z = z + z b , ˜ z T = z T + z b , ˜ u = (cid:80) i u i and trajectories given in Theorem 1 are valid by replacing z , z , z T , u by ˜ z , ˜ z , ˜ z T , ˜ u respectively. The system is thus equivalent to the one assumed in Theorem 1 by changing the origin of the locationsto the barycentre z b of the z i .

2) Trafﬁc Hot Spot, Trafﬁc Hole:

We assume that there is a hot spot or a trafﬁc hole located in z h andthat the trafﬁc intensity is of the form u ( t, z ) = u || z − z h || + u = u || z || − u z · z h + u || z h || + u .We can apply Corollary 1 with e = − z h . Figure 2 shows optimal trajectories when z h is a hot spot,i.e., for u < , and different values of K . The starting point is z and the destination is z T . When K increases, the velocity cost increases and the trajectories tend to the straight line between z and z T , whichminimizes the speed. When K is small, the UAV can go fast to z h , reduces its speed in the vicinity ofthe hot spot and then goes fast to the destination (in order to decrease the cost function (2) by increasingits trafﬁc contribution). Figure 3 shows optimal trajectories when z h is a trafﬁc hole, i.e., for u > . InFigure 3a, T is smaller than the period of the ellipse, i.e., πω > T . When K decreases, the UAV canspend more time in the areas of higher trafﬁc intensity. In Figure 3b, T is larger than the period. In thiscase, the trajectory follows one period of the ellipse whose equation is given by (18) plus a part of thesame ellipse from z to z T . B. Multi-Phase Trajectory Characterization

We now consider a trafﬁc intensity (or potential) consisting in two quadratic functions separated by aninterface I of equal potentials delimiting two regions and . The interface is deﬁned by an equation f ( z ) = C , where C is a constant and f is a differentiable function. We assume that the optimal trajectory z z T z h K=0.1K=1 K=10

Fig. 2: Trafﬁc hot spot ( u < ). Circles are iso-trafﬁc levels. K=0.5K=0.3K=1z z T z h (a) T is smaller than the ellipse period. −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−1.5−1−0.500.511.522.53 z z T z h (b) T is larger than the ellipse period. Fig. 3: Trafﬁc hole ( u > ). Circles are iso-trafﬁc levels.crosses the interface only once, at position ξ and time τ . The impulsion p ∗ is deﬁned everywhere on theoptimal trajectory between ( t , z ) and ( T, z T ) . The following notations will be used in the sequel: p − = p ∗ ( τ − ) = lim s → τs<τ p ∗ ( s ) , p + = p ∗ ( τ + ) = lim s → τs>τ p ∗ ( s ) , H − = H ( ξ, p ∗ ( τ − )) , H + = H ( ξ, p ∗ ( τ + )) . (20) Theorem 2.

The location and time ( ξ, τ ) of interface crossing are characterized by the following equations p − − p + − µ ∇ z f ( ξ ) = 0 (21) H + − H − = 0 (22) f ( ξ ) = C (23) for some Lagrange multiplier µ ∈ R Proof.

See Appendix E.Equation (22) expresses the fact the energy is conserved when crossing the interface. One can show thatactually the energy is conserved along the whole trajectory. Equation (21) is related to the conservationof the tangential component of the impulsion at the interface. Equation (23) is the interface equation at ξ . One can show that under the assumption of equal potential on the interface, the kinetic energy, theimpulsion, and the velocity vector are conserved across the interface. C. Multi-phase optimal trajectory uniqueness and value function convexity issues

Looking for uniqueness/non-uniqueness of optimal multi-phase trajectories leads naturally to study theconvexity of the multi-phase total cost. As stated in Appendix E this cost is additive and satisﬁes thedynamic programming principle which reads ¯ S (Θ = τ, Ξ = ξ | t , z , T, z T ) = S ( t , z , τ, ξ ) + S ( τ, ξ, T, z T ) (24)where S and S are themselves minimal. Hamilton-Jacobi equations applied to each cost componentallows to study their ﬁrst- an second- order differential properties such as convexity. For instance inthe chosen quadratic model, each single-phase (minimal) total cost S ( t , z , τ, ξ ) , S ( τ, ξ, T, z T ) given byCorollary 1 and (14) is convex (since it is a positive quadratic form with respect to the spatial coordinates)with respect to the spatial position ξ , but not necessarily so with respect to the interface crossing time τ . We consider in what follows a general form of the single-phase value function between ( t , x ) and ( t , x ) that is denoted by S ( t , x , t , x ) . We also note by ω the pulsation, φ = ω ( t − t ) is the temporal phase and p , p are the the initial andﬁnal impulsions as derived from formula (16). p p α = ∂ S∂t i > p p α = ∂ S∂t i < Fig. 4: Physical interpretation of local convexity vs. non-convexity of the single-phase cost:the initial and ﬁnal impulsions ( p , p ) form an obtuse angle ⇒ the value function is non-convex . Theorem 3. i) The Hessian of the single-phase cost wrt. any joint variable ψ i = ( T i , X i ) i =1 , is H ( ψ i ) = ∇ T i ,X i S ( t , x , t , x ) =  α Π i † Π i Kg Id  with  α = ∂ S∂T i = ω p · p K sinh φg = ω coth φ > i = − ω p − i sinh φ ∈ R (25) where p i is the impulsion at time T i , i.e., at extremity X i ( i = 1 , and j = 3 − i ).ii) At most one eigenvalue of H ( ψ i ) can be negative and α < is a sufﬁcient condition for this to hold,namely implying local non-convexity of the (single-phase) value function.iii) The double-phase Hessian enjoys a similar structure as (25) and writes with respect to the variable Ψ = (Θ , Ξ) ¯ H (Θ = τ, Ξ = ξ ) = H ( χ ) + H ( χ ) =  α Π † Π Kh Id  (26a) with  α = ∂ S ∂T ( t , z , τ, ξ ) + ∂ S ∂T ( τ, ξ, T, z T ) ( cf. (25) ) h = ω coth φ + ω coth φ Π = − (cid:20) ω p ( t )sinh φ + ω p ( T )sinh φ (cid:21) and φ = ω ( τ − t ) , φ = ω ( T − τ ) (26b) and may thus be non-convex as well.Proof. See Appendix F subsections F1 and F2.In the single-phase case, the sufﬁcient condition ii) i.e., α < , supports ﬁrst a physical interpretation(see Fig. 4) and also a geometrical one (see next Theorem 4 and Fig. 5). z h x x vu u = x + x v = x − x Fig. 5: Geometrical interpretation of local non-convexity: long time and long distance to the hotspot.

Theorem 4.

Let u = x + x and v = x − x . (Note that (cid:107) u (cid:107) is the distance from the hotspot to the averageof x and x , and the (cid:107) v (cid:107) is the half-distance between x and x .) Then the following sufﬁcient conditionholds (cid:107) v (cid:107)(cid:107) u (cid:107) < cosh φ −

11 + cosh φ = tanh φ ⇒ α = ∂ S∂t i < . Proof : We combine the formulas (53) and (54) in (58) to obtain α = wK p · p sinh φ = K ω sinh φ (cid:2) ( (cid:107) x (cid:107) + (cid:107) x (cid:107) ) cosh φ − x . x (1 + cosh φ ) (cid:3) (27)which can be rewritten as α = K ω sinh φ (cid:2) − (cid:107) u (cid:107) (1 − cosh φ ) + (cid:107) v (cid:107) (1 + cosh φ ) ) (cid:3) . This non-convexity condition for the single-phase value function interprets as: long phase ( tanh φ > )and long distances between the hotspot z h to both initial and ﬁnal positions x , x , relatively to theirmutual distance ( (cid:107) u (cid:107)(cid:107) v (cid:107) ≫ ). Several preliminary simulations show that when this case happens for twophases, the total value function is indeed non-convex and that several (local) optimal solutions may indeedexist. IV. A LGORITHMS

Previous propositions allow us to propose several numerical algorithms seeking optimal trajectories. Inthis section, we present two algorithms aiming this goal: a gradient descent algorithm and a bisectionsearch method based on the linear control property (16).

A. A Gradient Descent Algorithm G RAD -A LGO

In this section, we propose G

RAD -A LGO (Algorithm 1) which is an alternated optimization-basedalgorithm relying on the following procedures. a) Procedure for seeking an optimal ξ given a ﬁxed τ : from Hamilton-Jacobi equations (Lemma 3and Appendix C) the gradient of the total cost function with respect to ξ is p ∗ ( τ − ) − p ∗ ( τ +) . Equation (13)says that in the Newtonian framework the impulsion is proportional to the control variable a . Equation (16)says that in the quadratic model the velocity vector is, at any time a linear combination of centered initialand ﬁnal positions. Then the searched gradient appears to be an afﬁne function of ξ which reads p ∗ ( τ − ) − p ∗ ( τ +) = ∇ X S ( t , z , τ, ξ ) + ∇ X S ( τ, ξ, T, z T )= Kh ( ξ − B ) . (28)Scalar h and point B (where the spatial gradient cancels at ﬁxed τ i.e., p ∗ ( τ − ) = p ∗ ( τ +) ) verify h = ω coth( ω ( τ − t )) + ω coth( ω ( T − τ )) , (29) B = 1 h (cid:2) ω z h coth( ω ( τ − t )) + ω z h coth( ω ( T − τ ))+ ω ( z − z h )sinh( ω ( τ − t )) + ω ( z T − z h )sinh( ω ( T − τ )) (cid:3) . (30)The equation involving the Lagrange multiplier (21) now reads Kh ( ξ − B ) − µ ∇ ξ f ( ξ ) = 0 (31)and shows that the optimal location ξ ∗ is the orthogonal projection of B on the interface I . b) Procedure for seeking an optimal τ given a ﬁxed ξ : we use the result of Theorem 2. As alsoshown by Hamilton-Jacobi equations (Lemma 3 and Appendix C), the gradient of S with respect to τ isgiven by H ( ξ, p ∗ ( τ + )) − H ( ξ, p ∗ ( τ − )) . The Hamiltonians are easily computed in each phase by applyingthe classical Newton formula at given location ( z, t ) H ( p, z, t ) = (cid:107) p (cid:107) K + V ( z ) . We then update τ by using a simple gradient descent scheme (see Algorithm 1). c) Stop criteria: the search procedure is performed until the impulsion and the Hamiltonian haveconverged with a given accuracy. For this purpose we consider the function g : ( R n \{ } ) × ( R n \{ } ) (cid:55)→ R deﬁned as follows g ( x, y ) = (cid:107) x − y (cid:107) inf( (cid:107) x (cid:107) , (cid:107) y (cid:107) ) which measures a relative “discrepancy” between x and y. Algorithm 1 G RAD -A LGO

An uncoupled projected gradient descent algorithm Input: precisions (cid:15) p , (cid:15) H ( (cid:15) p = (cid:15) H = . − ), M τ = number of gradient descent iterations on τ Init: starting position ξ ← ξ e.g., = z + z , τ ← τ e.g., = t + T Output: ( τ, ξ ) procedure TWO_PHASE ( τ, ξ ) computes: the current bi-phase trajectory given by ( t , z ) → ( τ, ξ ) → ( T, z T ) return ( B, H + , H − , ¯ H ) = (the B-point (30), the two phase Hamiltonians and the total Hessian atcurrent location ( τ, ξ ) (24, 26a, 26b)) end procedure N step = 0 ( B, H + , H − , ¯ H ) = TWO_PHASE ( τ, ξ ) repeat ξ ← Proj I ( B ) (31) for m = 1 , . . . , M τ do τ ← τ − ( H + − H − ) α with α = ¯ H = ∂ ¯ S∂ Θ ( τ, ξ ) (Newton descent wrt. τ ) ( B, H + , H − , ¯ H ) = TWO_PHASE ( τ, ξ ) end for N step = N step + 1 until STOP (32)Now let us give (cid:15) p and (cid:15) H (typically of the order of . − ) and deﬁne the stop criteria as follows STOP = g ( p + , p − ) / (cid:15) p < g ( H + , H − ) / (cid:15) H < . (32) Complexity:

Algorithm 1: if α ∗ = ∂ ¯ S∂ Θ ( τ ∗ , ξ ∗ ) (cid:54) = 0 then the convergence of Newton descent on τ isquadratic (see [35] for instance). If α ∗ = 0 say, H ( ξ, p ∗ ( τ + )) − H ( ξ, p ∗ ( τ − )) ∼ ( τ − τ ∗ ) ν with ν ≥ ,the convergence is linear (see [35] for instance). For safety we take M τ = 10 i.e., ν ≈ . The globalcomplexity is O ( N step M τ ) and usually N step = 2 . B. The B-curve algorithm

B-A

LGO

The B -curve algorithm aims to overcome the non-convexity issues developed previously. It proceedsas follows. Consider the B point deﬁned in formula (30). We notice that for each τ ∈ [ t , T ] this point isdeﬁned univocally knowing all parameters ( t , T, z , z T , z h ) and all (quadratic) trafﬁc proﬁles so thatwe can see B in (30) as a function of τ , denoted B ( τ ) . It can be shown that this function τ (cid:55)→ B ( τ ) is continuous in the interval [ t , T ] and that lim τ → t +0 B ( τ ) = z , lim τ → T − B ( τ ) = z T . Assume now thatthe two optimal sub-trajectories are such that crossing time τ ∈ [ t , T ] and interface position ξ verify Algorithm 2 the B-curve bisection algorithm

Input: initial/ﬁnal time search interval [ a, b ] ∈ [ t , T ] , precision (cid:15) B ﬁxed by the user ( (cid:15) B = 2 . − ) Output: ( ξ, τ ) Init: t ← a , t ← b , the algorithm only starts if IN_ZONE2 ( B ( a ) ) (cid:54) = IN_ZONE2 ( B ( b ) ) procedure IN_ZONE2 ( z ) u i ( z ) = (time-stationary) trafﬁc u ( z ) from hotspot z hi at point z ( i = 1 , ) return ( u ( z ) > u ( z )) end procedurerepeat x ← B ( t ) , x ← B ( t ) τ ← t + t , ξ ← B ( τ ) if ( IN_ZONE2 ( ξ ) == IN_ZONE2 ( x )) then t ← τ else t ← τ end ifcompute stop criterion STOP B (34) until

STOP B ξ ← B ( τ ) ξ = B ( τ ) . Then by formula (28) the spatial gradient of S at point ξ is p ∗ ( τ − ) − p ∗ ( τ +) = 0 (33)This implies that the kinetic components of both Hamiltonians are equal at the interface. Now since ξ belongs to the interface, both phase potentials (trafﬁcs) are equal by deﬁnition, so that the total Hamiltonianis also conserved at the interface: this is the optimality condition required with respect to time τ . Tosummarize, in these conditions, both the Hamiltonian and the total impulsion are conserved at interface I , i.e., local optimality conditions hold for the total value function. It is also worth noticing that the relatedLagrange multiplier appearing in (31) now simply vanishes: µ = 0 . The proposed B-algorithm consiststhen in seeking the intersection of the B-curve B = { B ( τ ) , τ ∈ [ t , T ] } with the interface I (Algorithm2). For this, we ﬁrst select precision (cid:15) B , then proceed by bisection and check the stopping criterion STOP B = | t − t || b − a | (cid:15) B < . (34) Complexity:

Algorithm 2: if (cid:15) B = 12 m then the bisection algorithm converges in m iterations. Itscomplexity is thus O (log 1 (cid:15) B ) . In our experiments we chose (cid:15) B = 2 . − ≈ and 12 iterations areindeed sufﬁcient to provide a trajectory with optimality conditions holding at this (relative) precision. Algorithm 3

Data preprocessing First smoothing: data aggregation Second smoothing: LOWESS Normalization K-means with quadratic models V. N

UMERICAL E XPERIMENTS

A. From Measurements to Quadratic Proﬁle

In this section, we explain how from measured or estimated trafﬁc load, we can derive a quadratic modelthat will allow us to apply our framework. To illustrate the procedure, we extract data from the open dataset presented in [34]. Trafﬁc data (in number of bytes) has been collected from an operational cellularnetwork in a medium-size city in China and is recorded for every base station and every hour. For ourexperiment, we extract a rectangle region [ X min , X max ] × [ Y min , Y max ] , where X min = 111 , X max = 111 . , Y min = 13 . , Y max = 13 . are the minimum and maximum longitude and latitude respectively (realﬁgures have been anonymized). This corresponds approximately to a rectangle of km × km with base stations having an average cell range of m. The trafﬁc of the 22th August 2012 between 5 and6pm is illustrated in Figure 6a. We assume that this trafﬁc is representative of the trafﬁc intensity whenthe drone is launched. In order to ﬁt this raw data to our model, we follow the pre-processing steps shownin Algorithm 3. The ﬁrst smoothing consists in aggregating the trafﬁc data on a grid of steps in bothlongitude and latitude directions. The resulting elementary regions should correspond approximately tothe drone coverage. The result is shown in Figure 6b. The data exhibits a very high variability with veryhigh peaks around few locations. The second smoothing is a Locally Weighted Scatterplot Smoothing(LOWESS) [36]. We use here the Matlab function fit with the option ”Lowess”. The choice of thesmoothing parameter α , i.e., the proportion of data points used for every local regression, has a decisiveimpact on the result. Increasing α has the effect of averaging out the different peaks. In our speciﬁcscenario, α = 0 . yields Figure 7a with two local maxima. With α = 0 . , we obtain a single maximum.In step 3 of the pre-processing, the trafﬁc is normalized between and with no inﬂuence on the optimaltrajectory.The ﬁnal pre-processing step is an adaptation of the classical K-means algorithm (see Algorithm 4) toﬁt to quadratic models. Inputs are the data points obtained after the normalization, K c , the number ofclusters (or hot spots), K n the number of nearest neighbors and M the number of iterations. Every clusteris associated to a quadratic function (in our case, we have K c = 2 ). Every data point j is associated to a cluster and has a related label L j in { , ..., K } . For every data point, a list of nearest neighbors is built(step 4). An arbitrary initial labelization is chosen (step 6). The algorithm then proceeds by iterations(steps 7-16). At every iteration, if a point j has some neighbor with a different label (step 9), a bestnew label is found for j (step 10-13) in terms of quadratic error e k . The error e k measures the differencebetween the data points and the K c -quadratic model, which ﬁts a quadratic function to every clusterassuming that j has label k (steps 18-31). In Algorithm 4, KNN ( K n , X, Y ) is a procedure that ﬁnds the K n nearest neighbors of ( X, Y ) with respect to the Euclidian distance. We use the Matlab implementation knnsearch . NLLS ( L ) is a non-linear least square method that ﬁts data points in L to a quadratic functionof the form u | z − z h | + u . The Matlab implementation is based on a trust-region approach of theLevenberg-Marquardt Algorithm. Complexity:

Algorithm 4: Searching the K n -nearest neighbors of a data point using k-d trees takes O ( K n log J ) in average and O ( K n J ) in the worst case. Steps 3-5 has thus a complexity of O ( K n J log J ) inaverage and O ( K n J ) in the worst case. Initial labelization is a simple linear partitioning of the 2D spacein K c zones and is thus performed in O ( J ) . In the main loop, there are at most O ( M J K c ) calls to thefunction FIT . The Levenberg-Marquardt Algorithm requires O ( (cid:15) − ) iterations to reach an (cid:15) -approximationof a stationary point of the objective function [35], [37]. The overall complexity of Algorithm 4 is thus O ( K n J + M J K(cid:15) − ) .The K-means partition obtained after M = 12 iterations is shown in Figure 7b. The ﬁnal ﬁts areshown in Figures 9a and 9b for ( K c , α ) = (2 , . and ( K c , α ) = (1 , . , respectively. Figure 8 showsthe quadratic error as a function of the number of iterations of the K-means algorithm for K c = 1 and K c = 2 . The error is constant for K c = 1 as there is only one iteration, which performs the non-linearleast square ﬁtting for the single cluster. For K c = 2 , we distinguish two cases: K n = 5 and K n = ∞ . Inthe former case, only the nearest neighbors of a data point are inspected to decide if a relabelizationshould be performed. In the later case, relabelization is systematically considered. Increasing K n increasesthe complexity of every iteration but provides a faster convergence. B. Trajectory optimization results1) Estimation procedure of parameters K and T: as in most similar algorithms, a ’prior’ estimation ofparameters is necessary since the optimal results strongly depend on them. Trafﬁc parameters have beenestimated just above, so the required parameters are ﬁrst the mass K representing trade-off between thekinetic term and the trafﬁc term in Lagrangian (2) and then the total duration time T which has also Algorithm 4

K-means with quadratic models Input: K c (number of clusters), J (number of measurement points) ( X j , Y j , Z j ) , j = 1 , . . . , J (coordinates and estimated trafﬁc load Z j in ( X j , Y j ) ), K n (number of nearest neighbors), M (numberof iterations) Output: z hk , u k , u k k = 1 , . . . , K c (hot spot characteristics), e (quadratic error) for j = 1 , . . . , J do K j ← KNN ( K n , X j , Y j ) end for L ← Initial labelization for m = 1 , . . . , M do for j = 1 , . . . , J do if ∃ j (cid:48) ∈ K j s.t. L j (cid:48) (cid:54) = L j then for k = 1 ...K c do e k , z hl , u l , u l , l = 1 , . . . , K c ← FIT (( X, Y, Z ) , j, L, k, K c ) end for L j ← arg min k e k end if end for end for return z hk , u k , u k k = 1 , . . . , K c , global quadratic error procedure FIT ( ( X, Y, Z ) , j , L , k , K c ) L j ← k for l = 1 ,..., K c do L l ← { ( X i , Y i , Z i ) | L i = l } z hl , u l , u l ← NLLS ( L l ) end for e k ← for j = 1 ,..., J do ˜ Z j ← max l =1 ,...,K c u l | z − z hl | + u l e k ← e k + | ˜ Z j − Z j | end for e k ← e k /J return e k , z hl , u l , u l , l = 1 , . . . , K c end procedure a signiﬁcant inﬂuence on the optimal trajectory (by convention t = 0 ). This procedure is developed inAppendix G and yields in our case: T ≈ , K = 50 × r with r = 40 × × and we select: T = 1200 , s and K = 30 , × r .

2) Results: optimal trajectories are shown in Figs. 9 and 10 and related numerical results in Table I.Length of various trajectories and related average velocities are given in Table II.The following main comments can be made on these results: • First the obtained trajectories for G

RAD -A LGO and B-A

LGO satisfy both the optimal conditions, (a) Raw trafﬁc data on the 22th Aug. 2012 6pm [34]. . (b) First smoothing: data aggregation at drone coverage level. Fig. 6: Data preprocessing: raw data and ﬁrst smoothing. (a) Second smoothing: LOWESS with α = 0 . . (b) K-means partition. Fig. 7: Data preprocessing: second smoothing and partitions in 2 phases.as “relative” discrepancies between impulsions and between Hamiltonians at the interface are indeedbelow selected precision of . − , numerically yielding a full trajectory, • Then G

RAD -A LGO is found to be stable due to our choice of initial and terminal drone positions (nottoo close to the interface and not too far from their related hotspots) and of time intervals (not toolarge temporal phases). Thus, we are far from the non-convexity conditions expressed in Theorems3 and 4. The positivity of the total hessian is indeed constantly checked at each iteration. Hence, alarge (spatio-temporal) convergence basin resulting into exact convergence of the G

RAD -A LGO (aswell as the B-A

LGO ) towards the unique optimal solution. Iterations Q uad r a t i c e rr o r Fig. 8: Quadratic error between normalized smoothed data and quadratic models.

111 111.02 111.04 111.06 111.08 111.113.1213.1313.1413.1513.1613.1713.1813.1913.213.21

InterfaceIso-traffic levelsHot spotsB curve AlgorithmGradient Algorithm T=1800B curveGradient Algorithm T=1200 (a) K = 60 , T = 1200 and , 2 phases.

111 111.02 111.04 111.06 111.08 111.113.1213.1313.1413.1513.1613.1713.1813.1913.213.21

Iso-traffic levelsHot spotSingle-phase Algorithm T=1200Single-phase Algorithm T=1800 (b) K = 60 , T = 1200 s and s, 1 phase. Fig. 9: Inﬂuence of T on the optimal trajectory. • As already noted for the single-phase case (paragraph III-A2), decreasing mass K enables the droneto collect more trafﬁc. In fact, during the allowed time interval, the drone will get closer to thehotspot with maximal trafﬁc in order to decrease the total value function, i.e., zone here. • Increasing time interval ( T (cid:37) ) will produce the same tendancy, allowing the drone to spend moretime near the hotspot with higher trafﬁc z h , see Figs. 9a and 10a. • Since the drone has enough time to pick up trafﬁc located close to hotspots, this is reﬂected in anaverage drone velocity about half its nominal value (see Table II).

111 111.02 111.04 111.06 111.08 111.113.1213.1313.1413.1513.1613.1713.1813.1913.213.21

InterfaceIso-traffic levelsHot spotsB curve Algorithm K=30B curve Algorithm K=60 (a) K = 30 and , T = 1200 , 2 phases.

111 111.02 111.04 111.06 111.08 111.1 111.1213.1213.1313.1413.1513.1613.1713.1813.1913.213.21

Iso-traffic levelsHot spotSingle-phase Algorithm K=30Single-phase Algorithm K=60 (b) K = 30 and , T = 1200 s, 1 phase. Fig. 10: Inﬂuence of K on the optimal trajectory. ξ S H − φ − φ + τ a ) K = 60 × r , T = 1200 → S ph = − . B-A

LGO ( N iter = 12)(111 . , . − . . . . . G RAD -A LGO ( N iter = 5)(111 . , . − . . . . . b ) K = 60 × r , T = 1800 → S ph = − . B-A

LGO ( N iter = 12)(111 . , . − . . . . . G RAD -A LGO ( N iter = 2)(111 . , . − . . . . . c ) K = 30 × r , T = 1200 → S ph = − . B-A

LGO ( N iter = 12)(111 . , . − . . . . . G RAD -A LGO ( N iter = 2)(11 . , . − . . . . . TABLE I: Table of results with B-A

LGO and G

RAD -A LGO for K = 30 , × r , T = 1200 , s and r = × × ≈ . (corresponding to scaled spatial unity of 100m).Also, S ph denotes the total single-phase cost (i.e., assuming only one, effective hotspot).VI. C ONCLUSION

In this paper, we propose a Lagrangian approach to solve the UAV base station optimal trajectoryproblem. When the trafﬁc intensity exhibits a single phase, closed-form expressions for the trajectory andspeed are derived from Hamilton-Jacobi equations. When the trafﬁc intensity exhibits multiple phases, wecharacterize the crossing time and location at the interface. We propose two low-complexity algorithms for case bi-phase single-phase L ( km ) ¯ V ( m/s ) L ( km ) ¯ V ( m/s ) a ) 12 .

41 10 .

34 11 .

71 9 . b ) 13 .

80 7 .

66 12 .

53 6 . c ) 13 .

64 11 .

37 12 .

37 10 . TABLE II: Table of trajectory lengths and related average speeds.the bi-phase time-stationary trafﬁc case that provide optimal crossing time and location on the interfaceand fulﬁll the necessary conditions of optimality. At last, we present a data processing procedure basedon a modiﬁed K-means algorithm that derives a single-phase or bi-phase quadratic model from real trafﬁcdata. Further extensions of this work are envisioned to generalize the approach to three or more hot-spotsand to consider multi-drone coordinated trajectories.A

PPENDIX

A. Proof of Lemma 1

In a neighborhood of the optimal trajectory, the ﬁrst order variation of S is null δS = (cid:90) Tt δ L ( t, z, a ) dt (35) = (cid:90) Tt [ ∇ z L ( t, z, a ) · δz ( t ) + ∇ a L ( t, z, a ) · δa ( t )] dt. We now note that δa = δ dzdt = d ( δz ) dt . Integrating by part the second term in the integral of δS , we obtain (cid:90) Tt ∇ a L ( t, z, a ) · d ( δz ) dt dt (36) = [ δz ( t ) · ∇ a L ( t, z, a )] Tt − (cid:90) Tt δz ( t ) · ddt ∇ a L ( t, z, a ) dt. Note that [ δz ∂ L ∂a ] Tt = 0 because z and z T are ﬁxed. Equating δS to zero gives (cid:90) Tt (cid:20) ∇ z L ( t, z, a ) − ddt ∇ a L ( t, z, a ) (cid:21) · δz ( t ) dt. (37)As this should be true for every δz , L , z and z T , we obtain the ﬁrst result. Now assume that we have the optimal a ( t ) , the condition for z ( T ) to be the optimal ﬁnal position is δS = [ δz ( t ) · ∇ a L ( t, z, a )] Tt + ∇ J ( z ( T )) · δz ( T )= ∇ a L ( z ( T ) , T, a ( T )) · δz ( T ) + ∇ J ( z ( T )) · δz ( T )= 0 . (38)Note that z is ﬁxed and so δz in z is null. We thus obtain the second result of the lemma. B. Proof of Lemma 2 As L ( z, a ) is an homogeneous function of z and a , we have: L ( λz, λa ) = | λ | α L ( z, a ) for all λ (in ourcase with α = 2 ). Deriving this expression with respect to λ , setting λ = 1 , and noting that a = ˙ z weobtain z · ∂ L ( z, ˙ z ) ∂z + ˙ z · ∂ L ( z, ˙ z ) ∂z = α L ( z, ˙ z ) . (39)Using (7) and (39), we have: z · dpdt + ˙ z · p = α L or equivalently d ( p · z ) dt = α L . We can now integrate thecost function (3) along the optimal trajectory as follows S ( t , z , T, z T ) = 1 α (cid:90) Tt d ( p · z ) dt ( t ) dt + J ( z T )= 1 α ( p ( T ) · z T − p ( t ) · z ) + J ( z T ) (40) C. Proof of Lemma 3

We assume that an optimal trajectory exists and we apply the principle of optimality on the optimaltrajectory between ( t, z ∗ ( t )) and ( t + h, z ∗ ( t ) + ah ) , where h > , to opbtain S ( t, z ∗ ( t ) , T, z T ) = min a [ h L ( z ∗ ( t ) , a ) + S ( t + h, z ∗ ( t ) + ah, T, z T )]= min a [ h L ( z, a ) + S ( t, z ∗ ( t ) , T, z T ) + ha · ∇ X S ( t, z ∗ ( t ) , T, z T ) + h ∂S∂T ( t, z ∗ ( t ) , T, z T )] . This implies that ∂S∂T ( t, z ∗ ( t ) , T, z T ) = − min a [ a · ∇ X S ( t, z ∗ ( t ) , T, z T ) + L ( z ∗ ( t ) , a )]= max a [ − a · ∇ X S ( t, z ∗ ( t ) , T, z T ) − L ( z ∗ ( t ) , a )]= H ( t, z ∗ ( t ) , −∇ X S ( t, z ∗ ( t ) , T, z T )) . By using the same approach between t − h and t , we deduce in the same way equation (10) when theﬁnal time T is varying. D. Proof of Theorem 1

From (6) and (12), we obtain the following ordinary differential equation of second degree: ¨ z = − u K z . If u K > , we deﬁne ω = u K and we look for an optimal trajectory of the form: z ( t ) = A cos( ωt )+ B sin( ωt ) .If u K < , we look for an optimal trajectory of the form: z ( t ) = A cosh( ωt ) + B sinh( ωt ) with ω = − u K .Let us denote z = z ( t ) and a = a ( t ) the initial conditions for z and ˙ z .Take the case u K < . Using the derivative of z ( t ) and identifying terms, we obtain: z ( t ) = z cosh ω ( t − t ) + a ω sinh ω ( t − t ) . At t = T , we have also: z T = z cosh ω ( T − t ) + a ω sinh ω ( T − t ) , from whichwe deduce a ( t ) = ω ( z T − z cosh ω ( T − t ))sinh ω ( T − t )) , (41) a ( T ) = ω ( − z + z T cosh ω ( T − t ))sinh ω ( T − t )) . (42)when u < . In a similar way, we have a ( t ) = ω ( z T − z cos ω ( T − t ))sin ω ( T − t ) , (43) a ( T ) = ω ( − z + z T cos ω ( T − t ))sin ω ( T − t ) , (44)when u > . Injecting a ( t ) = a in the equation of the trajectory provides the result.For the computation of S , we now use the result of Lemma 2 as our cost function is 2-homogeneous.From equation (8), we see that only initial and ﬁnal conditions are required to compute the cost function.Recall now that p = Ka . Injecting the equations of a ( t ) and a ( T ) in (8), we obtain the result for thecost function. E. Proof of Theorem 2

We assume that the location and time ( ξ, τ ) of interface crossing is known and unique. The optimaltrajectory between ( z , t ) and ( z T , T ) can be decomposed in two sub-trajectories that are themselves optimal between ( z , t ) and ( ξ, τ ) on the one hand and between ( ξ, τ ) and ( z T , T ) on the other hand, bythe principle of optimality. In region 1, the optimal cost up to τ is S ( t , z , τ, ξ ) = (cid:90) τt L ( z ∗ ( s ) , a ∗ ( s )) ds. (45)Using Hamilton-Jacobi, we obtain ∂S ∂T ( t , z , τ, ξ ) = − H ( ξ, p ∗ ( τ − )) . (46)In the same way, the optimal cost in region 2 is S ( τ, ξ, T, z T ) = (cid:90) Tτ L ( z ∗ ( s ) , a ∗ ( s )) ds. (47)Using again Hamilton-Jacobi, we obtain ∂S ∂T ( τ, ξ, T, z T ) = H ( ξ, p ∗ ( τ + )) . (48)A necessary condition for the optimality of τ is thus ∂S ∂T ( t , z , τ, ξ ) + ∂S ∂T ( τ, ξ, T, z T ) = 0 , (49)that is H ( ξ, p ∗ ( τ − )) = H ( ξ, p ∗ ( τ + )) . (50)A necessary condition for the optimality of ξ in the total cost under the constraint f ( ξ ) = C is also µ ∇ z f ( ξ ) = ∇ X S ( t , z , τ, ξ ) + ∇ X S ( τ, ξ, T, z T )= p ∗ ( τ − ) − p ∗ ( τ + ) where µ is a Lagrange multiplier associated to the constraint and where the second line comes fromequation (11) of Hamilton-Jacobi. Thus we obtain precisely equation (21). F. The structure and the diagonalization of single- and two- phase Hessian of the value fonction1) Proof of Theorem 3 :

Recall that we study the second-order differentiability properties of single-phase value function S ( t , x , t , x ) . Recall that from Eq. (1) we study Hessians of the form M , ( R ) (cid:51) ∇ σ =  ∂ ∂T i σ ∂∂T i ∇ Xi σ ∇ Xi ∂∂T i σ ∇ X i ,X i σ  where σ stands for S ( t , x , t , x ) and i = 1 , It is clear by inspecting the symmetries of (14) with respect to the spatial coordinates that ∇ X ,X S ( t , x , t , x ) = ∇ X ,X S ( t , x , t , x ) = Kg Id , where g = ω coth φ and φ = ω ( t − t ) is the temporal phase . (51)Concerning symmetry with respect to time variables, one also ﬁnds easily that ∂ S∂T ( t , x , t , x ) = ∂ S∂T ( t , x , t , x ) . The Hessian with respect to the variable χ i = ( T i , X i ) i =1 , ∈ R × R has thus the following structure: H ( χ i ) =  α Π i † Π i K g Id  with  α = ∂ S∂T = ∂ S∂T g = ω coth φ Π i = ∂∂T i ∇ X i S (52)where the vector Π i and real scalar α remain to be determined.Now recall that initial and ﬁnal impulsions p and p as given by (41) write: p = K ω − x cosh φ + x sinh φ , (53) p = K ω − x + x cosh φ sinh φ . (54)In fact the expression of Π i and α appearing in Hessian (52) result both from the computation of ∂p i ∂T i for i = 1 , . Using (51), the differentiation of (53) and (54) wrt. T resp. T leads to the following simple result Π = ∂∂T ∇ X S = ∂p ∂T = − ω p sinh φ resp. Π = ∂∂T ∇ X S = ∂p ∂T = − ω p sinh φ . (55)Now, as far as the second partial derivative of value function S with respect to time ( α ) is concerned α = ∂ S∂T = − ∂H∂T (Hamilton-Jacobi) (56)Recalling that the Hamiltonian for any Newtonian model has the form H ( z, p, t ) = (cid:107) p (cid:107) K + V ( z ) (57)and differentiating it with respect to the temporal phase φ at ﬁxed extremities x and x leads to δ H = p · δ p K = p · δ p K = − p · p K sinh φ δ φ , that is α = ∂ S∂T = ∂ S∂T = − ω ∂H∂φ = ωK p · p sinh φ . (58)Now from (24) the two-phase Hessian is simply the sum of the two single-phase Hessians.

2) Diagonalizing the Hessian of single- and two- phase value function : since the Hessians of single-and two-phase value function do have the same structure, we address a general Hessian of the form (52).Its characteristic polynom is easily developed as det (

H − ν Id ) = ( Kg − ν ) ( α − ν ) − ( Kg − ν ) (cid:107) Π (cid:107) . Since from (52) one has ∀ ( θ, x ) ∈ R × R H ( θ, x ) = ( α θ + Π · x, Π θ + Kg x ) ,the three real eigenvalues ( ν i ) i =0:2 and related eigenvectors Y i = ( θ i , x i ) thus satisfy i ) if ν = Kg > then Y = ( θ = 0 , x ) with x ⊥ Π : ”pure space” eigenvector Y ∈ { } × R ,ii ) if ν , ν such that ( Kg − ν ) ( α − ν ) − (cid:107) Π (cid:107) = 0 then Y i = ( θ i , x i = Π) i =1 , : two (mutually orthogonal) eigenvectors orthogonal to Y ( since Π ⊥ x ) with ν ν = α Kg − (cid:107) Π (cid:107) = det( H ) / ( Kg ) and ν + ν = α + Kg.

Thus ν , ν cannot be simultaneously negative since this would imply α < − Kg < ⇒ det( H ) = ν ν ν < . However the condition α Kg < (cid:107) Π (cid:107) (e.g. induced by the sufﬁcient condition α < ) implies that oneof eigenvalues ν , ν < namely the local non-convexity of the Hessian matrix H .This property holds thus for each single-phase value function as well as for the total two-phase case. G. Estimation of parameters

It is based on the following remarks. First, the interface location I is independent of K and T (sinceonly depends on trafﬁc variables). Then, ﬁnal time T remains always estimated in seconds ( t = 0) . Also,according to [38], a typical drone cell ﬂights at speed ¯ V = 20 m/s and has an autonomy of about min.Last, we test our procedure for spatially-scaled data, where the convenient spatial unity is 100 m, so thatthe scaling ratio is r = 40 . × ≈ . (59)Also, the following numerical estimations are considereda) for estimating T : the total length of the trajectory L is approximated by L ≈ c × (cid:107) z − z T (cid:107) with c = 1 . → T = ¯ VL (recall that T is un-scaled and stands in seconds.)b) for estimating K : the phase in each sub-trajectory should satisfy φ i = ω i T i < φ max , i = 1 , where φ max = 10 (also an un-scaled constant) .Then, spatial scaling by the quantity r implies u i ← u i r , i = 1 , , and thus: K ← Kr since K ω i = u i at any scale.Passing back into the original frame therefore implies that K ← K r .R EFERENCES [1] Y. Zeng, Q. Wu, and R. Zhang, “Accessing from the sky: A tutorial on uav communications for 5g and beyond,”

Proceedings of theIEEE , vol. 107, no. 12, pp. 2327–2375, 2019.[2] B. Pearre and T. X. Brown, “Model-free trajectory optimization for wireless data ferries among multiple sources,” in

IEEE GlobecomWorkshops , Dec 2010, pp. 1793–1798.[3] V. Sharma, M. Bennis, and R. Kumar, “Uav-assisted heterogeneous networks for capacity enhancement,”

IEEE Communications Letters ,vol. 20, no. 6, pp. 1207–1210, June 2016.[4] D. Yang, Q. Wu, Y. Zeng, and R. Zhang, “Energy tradeoff in ground-to-uav communication via trajectory design,”

IEEE Transactionson Vehicular Technology , vol. 67, no. 7, pp. 6721–6726, 2018. [5] T. T. Mac, C. Copot, D. T. Tran, and R. De Keyser, “Heuristic approaches in robot path planning: A survey,” Robotics and AutonomousSystems , vol. 86, pp. 13–28, 2016.[6] T.-Y. Chi, Y. Ming, S.-Y. Kuo, C.-C. Liao et al. , “Civil uav path planning algorithm for considering connection with cellular datanetwork,” in

IEEE Intl. Conf. on Computer and Information Technology (CIT) , June 2012, pp. 327–331.[7] D. Liberzon,

Calculus of variations and optimal control theory: a concise introduction . Princeton University Press, 2011.[8] D. Delahaye, S. Puechmorel, P. Tsiotras, and E. F´eron, “Mathematical models for aircraft trajectory design: A survey,” in

Air TrafﬁcManagement and Systems . Springer, 2014, pp. 205–247.[9] Y. Zeng, R. Zhang, and T. J. Lim, “Throughput maximization for uav-enabled mobile relaying systems,”

IEEE Transactions onCommunications , vol. 64, no. 12, pp. 4983–4996, Dec 2016.[10] Y. Zeng and R. Zhang, “Energy-efﬁcient uav communication with trajectory optimization,”

IEEE Transactions on Wireless Communi-cations , vol. 16, no. 6, pp. 3747–3760, June 2017.[11] Q. Wu, Y. Zeng, and R. Zhang, “Joint trajectory and communication design for multi-uav enabled wireless networks,”

IEEE Transactionson Wireless Communications , vol. 17, no. 3, pp. 2109–2121, Mar. 2018.[12] Y. Zeng, J. Xu, and R. Zhang, “Energy minimization for wireless communication with rotary-wing uav,”

IEEE Transactions on WirelessCommunications , vol. 18, no. 4, pp. 2329–2345, 2019.[13] B. R. Marks and G. P. Wright, “A general inner approximation algorithm for nonconvex mathematical programs,”

Operations research ,vol. 26, no. 4, pp. 681–683, 1978.[14] J. Zhang, Y. Zeng, and R. Zhang, “Uav-enabled radio access network: Multi-mode communication and trajectory design,”

IEEETransactions on Signal Processing , vol. 66, no. 20, pp. 5269–5284, 2018.[15] O. von Stryk and R. Bulirsch, “Direct and indirect methods for trajectory optimization,”

Annals of Operations Research , vol. 37, no. 1,pp. 357–373, Dec 1992.[16] G. Barles, A. Briani, and E. Tr´elat, “Value function and optimal trajectories for regional control problems via dynamicprogramming and Pontryagin maximum principles,”

Math. Cont. Related Fields

Partial differential equations , 2nd ed., ser. Graduate Studies in Mathematics. American Mathematical Society, Providence,RI, 2010, vol. 19.[18] M. Akian, S. Gaubert, and A. Lakhoua, “The max-plus ﬁnite element method for solving deterministic optimal control problems: basicproperties and convergence analysis,”

SIAM Journal on Control and Optimization , vol. 47, no. 2, pp. 817–848, 2008.[19] W. McEneaney,

Max-plus methods for nonlinear control and estimation . Springer Science & Business Media, 2006.[20] A. Alla, M. Falcone, and L. Saluzzi, “An efﬁcient DP algorithm on a tree-structure for ﬁnite horizon optimal control problems,”

SIAMJournal on Scientiﬁc Computing , vol. 41, no. 4, pp. A2384–A2406, 2019.[21] S. Dolgov, D. Kalise, and K. Kunisch, “A tensor decomposition approach for high-dimensional Hamilton-Jacobi-Bellman equations,” arXiv preprint arXiv:1908.01533 , 2019.[22] W. Kang and L. C. Wilcox, “Mitigating the curse of dimensionality: sparse grid characteristics method for optimal feedback controland HJB equations,”

Computational Optimization and Applications , vol. 68, no. 2, pp. 289–315, 2017.[23] A. Alla, M. Falcone, and S. Volkwein, “Error analysis for POD approximations of inﬁnite horizon problems via the dynamicprogramming approach,”

SIAM Journal on Control and Optimization , vol. 55, no. 5, pp. 3091–3115, 2017.[24] D. Kalise, S. Kundu, and K. Kunisch, “Robust feedback control of nonlinear PDEs by numerical approximation of high-dimensionalHamilton-Jacobi-Isaacs equations,” arXiv preprint arXiv:1905.06276 , 2019. [25] Y. T. Chow, J. Darbon, S. Osher, and W. Yin, “Algorithm for overcoming the curse of dimensionality for state-dependentHamilton-Jacobi equations,” Journal of Computational Physics

Research in the Mathematical Sciences , vol. 3, no. 1, p. 19, Sep 2016.[27] I. Yegorov and P. M. Dower, “Perspectives on characteristics based curse-of-dimensionality-free numerical approaches for solvingHamilton–Jacobi equations,”

Applied Mathematics & Optimization , pp. 1–49, 2017.[28] Y. T. Chow, J. Darbon, S. Osher, and W. Yin, “Algorithm for overcoming the curse of dimensionality for certain non-convexHamilton–Jacobi equations, projections and differential games,”

Annals of Mathematical Sciences and Applications , vol. 3, no. 2,pp. 369–403, 2018.[29] T. Nakamura-Zimmerer, Q. Gong, and W. Kang, “Qrnet: Optimal regulator design with lqr-augmented neural networks,”

IEEE ControlSystems Letters , vol. 5, no. 4, pp. 1303–1308, 2021.[30] J. Han, A. Jentzen, and W. E, “Solving high-dimensional partial differential equations using deep learning,”

Proceedings of the NationalAcademy of Sciences , vol. 115, no. 34, pp. 8505–8510, 2018.[31] J. Darbon, G. P. Langlois, and T. Meng, “Overcoming the curse of dimensionality for some Hamilton–Jacobi partial differentialequations via neural network architectures,”

Research in the Mathematical Sciences , vol. 7, no. 3, Jul. 2020. [Online]. Available:https://doi.org/10.1007/s40687-020-00215-6[32] J. Darbon and T. Meng, “On some neural network architectures that can represent viscosity solutions of certain high dimensionalHamilton–Jacobi partial differential equations,”

Journal of Computational Physics

IEEE INFOCOM 2019-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) . IEEE, 2019, pp. 626–631.[34] X. Chen, Y. Jin, S. Qiang, W. Hu, and K. Jiang, “Analyzing and modeling spatio-temporal dependence of cellular trafﬁc at city scale,”in

Communications (ICC), 2015 IEEE International Conference on , 2015.[35] M. Bierlaire,

Optimization: Principles and Algorithm . EPFL Press, 2015.[36] W. S. Cleveland, “Robust locally weighted regression and smoothing scatterplots,”

Journal of the American statistical association ,vol. 74, no. 368, pp. 829–836, 1979.[37] K. Ueda and N. Yamashita, “On a global complexity bound of the levenberg-marquardt method,”

Journal of optimization theory andapplications , vol. 147, no. 3, pp. 443–453, 2010.[38] A. Fotouhi, M. Ding, and M. Hassan, “Dronecells: Improving 5g spectral efﬁciency using drone-mounted ﬂying base stations,” arXivpreprint arXiv:1707.02041arXivpreprint arXiv:1707.02041