Optimal Trajectories of a UAV Base Station Using Hamilton-Jacobi Equations
Marceau Coupechoux, Jérôme Darbon, Jean-Marc Kélif, Marc Sigelle
OOptimal Trajectories of a UAV Base Station UsingHamilton-Jacobi Equations
Marceau Coupechoux ∗ , J´er ˆome Darbon † , Jean-Marc K´elif ‡ , and Marc Sigelle § ∗ LTCI,Telecom Paris, Institut Polytechnique de Paris, France, † Brown University, US, ‡ OrangeLabs, France, § On leave from Telecom Paris, France Email:[email protected], jerome [email protected],[email protected], [email protected]
Abstract
We consider the problem of optimizing the trajectory of an Unmanned Aerial Vehicle (UAV). Assuming atraffic intensity map of users to be served, the UAV must travel from a given initial location to a final positionwithin a given duration and serves the traffic on its way. The problem consists in finding the optimal trajectorythat minimizes a certain cost depending on the velocity and on the amount of served traffic. We formulate theproblem using the framework of Lagrangian mechanics. We derive closed-form formulas for the optimal trajectorywhen the traffic intensity is quadratic (single-phase) using Hamilton-Jacobi equations. When the traffic intensityis bi-phase, i.e. made of two quadratics, we provide necessary conditions of optimality that allow us to propose agradient-based algorithm and a new algorithm based on the linear control properties of the quadratic model. Thesetwo solutions are of very low complexity because they rely on fast convergence numerical schemes and closedform formulas. These two approaches return a trajectory satisfying the necessary conditions of optimality. At last,we propose a data processing procedure based on a modified K-means algorithm to derive a bi-phase model andan optimal trajectory simulation from real traffic data.
I. I
NTRODUCTION
Unmanned Aerial Vehicles (UAV) are expected to play an increasing role in future wireless net-works [1]. UAVs may be deployed in an ad hoc manner when the traditional cellular infrastructureis missing. They can serve as relays to reach distant users outside the coverage of wireless networks.They also may be used to disseminate data to ground stations or collect information from sensors. Inthis paper, we address one of the envisioned use cases for UAV-aided wireless communications, which J. Darbon is supported by NSF DMS-1820821. M. Coupechoux has performed his work at LINCS laboratory. a r X i v : . [ m a t h . O C ] F e b relates to cellular network offloading in highly crowded areas [1]. More specifically, we focus on the pathplanning problem or trajectory optimization problem that consists in finding an optimal path for a UAVBase Station (BS) that minimizes a certain cost depending on the velocity and on the amount of servedtraffic. Our approach is based on the Lagrangian mechanics framework and the use of Hamilton-Jacobi(HJ) equations. A. Related Work
UAV trajectory optimization for networks has been tackled maybe for the first time in [2]. The modelconsists in a UAV flying over a sensor network from which it has to collect some data. The UAV canlearn from previous experience, which is not assumed in our study. The problem of optimally deployingUAV BSs to serve traffic demand has been addressed in the literature by considering static UAVs BSs orrelays, see e.g. [3], [4]. The goal is to optimally position the UAV so as to maximize the data rate withground stations or the number of served users. In these works, the notion of trajectory is either ignored orrestricted to be circular or linear. In robotics and autonomous systems, trajectory optimization is known as path planning . In this field, there are classical methods like Cell Decomposition, Potential Field Method,Probabilistic Road Map, or heuristic approaches, e.g. bio-inspired algorithms [5]. Authors of [6] havecapitalized on this literature and proposed a path planning algorithm for drone BSs based on A* algorithm.The main goal of these papers is to reach a destination while avoiding obstacles. In our work, we intendto minimize a certain cost function along the trajectory by controlling the velocity of the UAV. This goalis studied in optimal control theory [7] and is applied for example in aircraft trajectory planning [8]. Mostnumerical methods in control theory can be classified in direct and indirect methods. In direct methods,the problem is transformed in a non linear programming problem using discretized time, locations andcontrols. Direct methods are heavily applied in a series of recent publications in the field of UAV-aidedcommunications, see e.g. [9]–[12]. Formulated problems are usually non-convex. The standard approachis hence to rely on Successive Convex Approximation (SCA), which iteratively minimizes a sequenceof approximate convex functions. SCA is known to converge to a Karush-Kuhn-Tucker solution undermild conditions [13] but the quality of the solution may heavily depend on the initial guess. Here, simpleheuristics or solutions to the Travelling Salesman Problem (TSP) or the Pickup-and-Deliver Problem (PDP)can be used for finding an initial trajectory [14]. With direct methods, because of the discretization, thedifferential equations and the constraints of the systems are satisfied only at discrete points. This can leadto less accurate solutions than indirect methods and the quality of the solution depends on the quantization step [15]. Although every iteration of SCA has a polynomial time complexity, practical resolution timemay dramatically increase with the quantization grid and the dimension of the problem. On the other hand,indirect approaches relies on considering the Hamilton-Jacobi Partial Differential Equation associated to theoptimal control problem (see e.g., [16], [17][chp. 10]). Several recent methods have been proposed to solveHJ Partial Differential Equations (PDE) in high dimensions. These include max-plus algebra methods [18],[19], dynamic programming and reinforcement learning [20], tensor decomposition techniques [21], sparsegrids [22], model order reduction [23], polynomial approximation [24], optimization methods [25]–[28]and neural networks [29]–[32]. In this paper, we consider certain indirect methods that provide analyticalsolutions for certain classes of optimal control problem as we have shown in a preliminary study [33].
B. Contributions
Our contributions are the following: • Problem Formulation:
To the best of our knowledge, this is the first time, after our preliminarystudy [33], that the UAV BS trajectory problem is formulated using the formalism of Lagrangianmechanics and solved using Hamilton-Jacobi equations. This approach provides closed-form equa-tions when the potential is quadratic and thus very low complexity solutions compared to existingsolutions in the literature. • Closed-form expression of the optimal trajectory with single phase traffic intensity:
When the trafficintensity map is made of a single hot spot or traffic hole, has a quadratic form ( single phase ), andis time-independent, closed form expressions for the optimal trajectory are derived. It follows ahyperbola for a hot spot and corresponds to a repulsor in mechanics. For a traffic hole, the trajectoryis on an ellipse and corresponds to the case of an attractor in mechanics. • Characterization of the optimal solution in multi-phase traffic intensity:
When the traffic map hasseveral hot spots or traffic holes ( multi-phase ) whose regions are separated by interfaces and is time-independent, we derive necessary conditions to be fulfilled by the position and the instant at whichthe optimal trajectory crosses an interface (see Theorem 2). • A gradient algorithm for bi-phase traffic:
An in-depth analysis of convexity vs. non-convexity issuesallows us to derive a gradient algorithm to solve the bi-phase problem (Algorithm 1). This algorithmfinds a stationary point for the cost function. This algorithm has a complexity O (1) at every iteration,whereas iterations of the sequential convex optimization technique have polynomial time complexity. • A new algorithm for the bi-phase optimization problem:
A new algorithm, called the B -algorithm(Algorithm 2), is proposed based on the linear control properties of the quadratic model. Thisalgorithm relies on a bisection scheme the complexity of which is proportional to the logarithmof the desired precision and closed form formulas. • A data processing procedure:
We propose a method to pre-process real measured traffic data inorder to derive a bi-phase quadratic model. This procedure is based on smoothing steps followedby a modified K-means algorithm adapted to our quadratic model (Algorithm 3). In our numericalexperiments, the optimal trajectory is computed in a region where real traffic data is available [34].The paper is structured as follows. In Section II, we give the system model, its interpretation in terms ofLagrangian mechanics formulate the problem and give preliminary results. Section III is devoted to thecharacterization of the optimal trajectories for both the single- and bi-phase cases. Section IV presents ouralgorithms, Section V our data processing procedure and numerical experiments. Section VI concludesthe paper.
Notations:
The usual Euclidean scalar product between x ∈ R n and y ∈ R is denoted by x · y . TheEuclidean norm (cid:107) x (cid:107) in R n of x ∈ R n is defined by (cid:107) x (cid:107) := √ x · x . The set of matrices with m rows, n columns and real entries is denoted by M m,n ( R ) . The transpose of the A ∈ M m,n ( R ) is denotedby A † ∈ M n,m ( R ) . We classically identify M m, ( R ) and M ,n ( R ) as column vectors of R m and rowvectors of R n , respectively. Let f : R n × R m → R defined by f ( x, y ) where x = ( x , . . . , x n ) ∈ R n and y = ( y , . . . , y m ) ∈ R m . Let a ∈ R n and b ∈ R m . We denote by ∂f∂x i ( a, b ) the partial derivative of f withrespect to the variable x i at ( a, b ) ∈ R n × R m . We also introduce the notations ∇ x f ( a, b ) = ( ∂f∂x ( a, b ) , . . . , ∂f∂x n ( a, b )) ∈ R n and ∇ y f ( a, b ) = ( ∂f∂y ( a, b ) , . . . , ∂f∂y m ( a, b )) ∈ R m .We also consider the following notation for partial Hessian matrices M m + n,m + n ( R ) (cid:51) ∇ f ( a, b ) = ∇ x,x f ( a, b ) ∇ x,y f ( a, b ) ∇ y,x f ( a, b ) ∇ y,y f ( a, b ) (1)where ∇ x,x f ( a, b ) ∈ M n,n ( R ) = ∂ f∂x ( a, b ) . . . ∂ f∂x ∂x n ( a, b ) ... . . . ... ∂ f∂x n x ( a, b ) . . . ∂ f∂x n ( a, b ) and Id n denotes the identity matrixof M n,n ( R ) . We shall see that the value function S : R × R × R × R → R will play a fundamentalrole in this paper. We use the notations ( T , X , T , X ) for S and therefore the partial derivatives of Fig. 1: A UAV Base Station travels from z at t to z T at T and serves a user traffic characterized by itsintensity. S at ( t , x , t , x ) ∈ R × R × R × R are denoted as follows: ∂S∂T ( t , x , t , x ) , ∇ X S ( t , x , t , x ) , ∂S∂T ( t , x , t , x ) and ∇ X S ( t , x , t , x ) .II. S YSTEM M ODEL AND L AGRANGIAN M ECHANICS I NTERPRETATION
A. System Model
We consider a network area characterized by a traffic density at position z and time t . We intend tocontrol the trajectory and the velocity of a UAV base station, which is located in z (cid:44) z ( t ) at t andshall reach a destination z T (cid:44) z ( T ) at T with the aim of minimizing a cost determined by the velocityand the traffic, defined hereafter by (2). At ( t, z ) , we assume that the UAV BS is able to cover an area,from which it can serve users (see Figure 1). The velocity of the UAV BS induces an energy cost. Inthis model, we control the velocity vector a of the UAV BS. The general form of the cost function is asfollows L ( t, z, a ) = K || a || − u ( t, z ) (2) where the first term is a cost related to the velocity of the vehicle ( K is a positive constant), and (cid:107) · (cid:107) denotes the usual Euclidean norm. The higher is the speed, the higher is the energy cost. The second termis a user traffic intensity , i.e., the amount of traffic served by the UAV BS at ( t, z ) . Note that a non-zeroenergy at null speed can be incorporated in the model by adding a constant. Without loss of generality,we assume that this constant is null.Let S ( t , z , T, z T ) be the minimal total cost along any trajectory between z at t and z T at T (alsocalled the action in mechanics or value function in control theory). Let us define Ω( t , T ) as the spaceof absolutely continuous functions from [ t ; T ] to R . Our problem can now be formulated as follows S ( t , z , T, z T ) = min a ∈ Ω( t ,T ) (cid:90) Tt L ( s, z ( s ) , a ( s )) ds + J ( z ( T )) (3)where dzdt ( t ) = a ( t ) , z ( t ) = z , and J is the terminal cost defined by J ( z ) = 0 if z = z T and J ( z ) = + ∞ otherwise. For simplicity reasons, we assume the existence and uniqueness of the optimal control a ∗ ( t ) in (3) and denote the associated optimal trajectory z ∗ ( t ) . In a traffic map symmetric with respect to z and z T , the reader can convince himself that the uniqueness is not guaranteed. B. Preliminary Results From Lagrangian Mechanics
We provide in this section important results from the Lagrangian mechanics for the convenience of thereader.
Definition 1 (Impulsion) . The impulsion function is defined as p ( t, z, a ) := ∇ a L ( t, z, a ) . (4)In the Newtonian classical framework that is used here (see (2)), the impulsion is the product of theparticle mass by its velocity (hence the standard term “impulsion”). Definition 2.
The
Hamiltonian function is defined as H ( t, z, p ) := max a ∈ R p · a − L ( t, z, a ) . (5) Lemma 1 (Euler-Lagrange Equations) . Along the optimal trajectory z ∗ ( t ) that starts from z at t andends at z T at T , we have ddt ∇ a L ( t, z ∗ ( t ) , a ∗ ( t )) = ∇ z L ( t, z ∗ ( t ) , a ∗ ( t )) (6) or equivalently dpdt ( t, z ∗ ( t ) , a ∗ ( t )) = ∇ z L ( t, z ∗ ( t ) , a ∗ ( t )) . (7) Proof.
See Appendix A.The Euler-Lagrange equation is the first-order necessary condition for optimality and holds for everypoint on the optimal trajectory.
Lemma 2.
If the Lagrangian L ( t, z, a ) is time-independent and α -homogeneous in z and a for α > ,i.e., L ( λz, λa ) = | λ | α L ( z, a ) for all λ ∈ R , S given by (3) reads S ( t , z , T, z T ) = 1 α [ z · p ] Tt + J ( z T ) . (8) Proof.
See Appendix B.
Lemma 3 (Hamilton-Jacobi) . Along the optimal trajectory, we have for t ∈ ( t ; T ) using previous notations ∂S∂T ( t, z ∗ ( t ) , T, z T ) = H ( t, z ∗ ( t ) , − p ∗ ( t )) , (9) ∂S∂T ( t , z , t, z ∗ ( t )) = − H ( t, z ∗ ( t ) , p ∗ ( t )) , (10) where p ∗ ( t ) = ∇ a L ( t, z ∗ ( t ) , a ∗ ( t )) = ∇ X S ( t , z ∗ ( t ) , T, z T ) . (11) Proof.
See Appendix C.From now, we assume that the Lagrangian is time-independent, i.e., L ( t, z, a ) = L ( z, a ) , and is an evenfunction in a , i.e., L ( z, − a ) = L ( z, a ) . A direct consequence is that H is time-independent and is an evenfunction in p , i.e., we write H ( t, z, p ) = H ( z, p ) and H ( z, − p ) = H ( z, p ) .III. O PTIMAL T RAJECTORY
In this section, we characterize the optimal trajectory when the traffic intensity is a quadratic form andalso when it is made of two regions of quadratic form separated by an interface. We call these two cases single-phase and bi-phase intensities respectively. Both cases satisfy our assumptions on the Lagrangianwith α = 2 . S ( t , z , T, z T ) = Kω ω ( T − t ) (cid:0) ( | z | + | z T | ) cosh ω ( T − t ) − z · z T (cid:1) + J ( z T ) (14) S ( t , z , T, z T ) = Kω ω ( T − t ) (cid:0) ( | z | + | z T | ) cos ω ( T − t ) − z · z T (cid:1) + J ( z T ) (17) A. Single-Phase Optimal Trajectory
Assume that the traffic intensity is of the form u ( z ) = u || z || . When u > , this function models atraffic hole in z = 0 . When u < , it models a traffic hot spot at z = 0 . We disregard the case u = 0 because it corresponds to a constant traffic intensity that is not of interest in this paper. Thus the costfunction has the following form L ( z, a ) = 12 K || a || − u || z || . (12)Note that p ( z, a ) = ∇ a L ( z, a ) = Ka. (13)
1) Trajectory Equation:
In the single phase case, we have a closed form expression of the trajectory.
Theorem 1. If u < , the cost function is given by (14), the optimal trajectory is z ∗ ( t ) = z T sinh( ω ( t − t )) + z sinh( ω ( T − t ))sinh( ω ( T − t )) (15) and the control is given by a ∗ ( t ) = ω z T cosh( ω ( t − T )) − z cosh( ω ( T − t ))sinh( ω ( T − t )) (16) where ω = − u K .If u > , the cost function is given by (17), the optimal trajectory is z ∗ ( t ) = z T sin( ω ( t − t )) + z sin( ω ( T − t ))sin( ω ( T − t )) (18) and the control is given by a ∗ ( t ) = ω z T cos( ω ( t − t )) − z cos( ω ( T − t ))sin( ω ( T − t )) (19) where ω = u K .Proof. See Appendix D.
Corollary 1.
If the user traffic intensity is of the form u ( t, z ) = u || z || + u z · e + u with u ∈ R and e ∈ R , then define ˜ z = z + e , ˜ z = z + e , ˜ z T = z T + e and trajectories given in Theorem 1are valid by replacing z , z , z T by ˜ z , ˜ z , ˜ z T , respectively. The cost function becomes: S ( t , z , T, z T ) = α [ z · p ] Tt + J ( z T ) − u ( T − t ) . Corollary 2.
If the user traffic intensity is of the form u ( t, z ) = (cid:80) i u i || z − z i || with (cid:80) i u i (cid:54) = 0 , then u ( t, z ) = ( (cid:80) i u i ) || z − z b || + (cid:80) i u i || z i − z b || with z b = (cid:80) i u i z i (cid:80) i u i . Define ˜ z = z + z b , ˜ z = z + z b , ˜ z T = z T + z b , ˜ u = (cid:80) i u i and trajectories given in Theorem 1 are valid by replacing z , z , z T , u by ˜ z , ˜ z , ˜ z T , ˜ u respectively. The system is thus equivalent to the one assumed in Theorem 1 by changing the origin of the locationsto the barycentre z b of the z i .
2) Traffic Hot Spot, Traffic Hole:
We assume that there is a hot spot or a traffic hole located in z h andthat the traffic intensity is of the form u ( t, z ) = u || z − z h || + u = u || z || − u z · z h + u || z h || + u .We can apply Corollary 1 with e = − z h . Figure 2 shows optimal trajectories when z h is a hot spot,i.e., for u < , and different values of K . The starting point is z and the destination is z T . When K increases, the velocity cost increases and the trajectories tend to the straight line between z and z T , whichminimizes the speed. When K is small, the UAV can go fast to z h , reduces its speed in the vicinity ofthe hot spot and then goes fast to the destination (in order to decrease the cost function (2) by increasingits traffic contribution). Figure 3 shows optimal trajectories when z h is a traffic hole, i.e., for u > . InFigure 3a, T is smaller than the period of the ellipse, i.e., πω > T . When K decreases, the UAV canspend more time in the areas of higher traffic intensity. In Figure 3b, T is larger than the period. In thiscase, the trajectory follows one period of the ellipse whose equation is given by (18) plus a part of thesame ellipse from z to z T . B. Multi-Phase Trajectory Characterization
We now consider a traffic intensity (or potential) consisting in two quadratic functions separated by aninterface I of equal potentials delimiting two regions and . The interface is defined by an equation f ( z ) = C , where C is a constant and f is a differentiable function. We assume that the optimal trajectory z z T z h K=0.1K=1 K=10
Fig. 2: Traffic hot spot ( u < ). Circles are iso-traffic levels. K=0.5K=0.3K=1z z T z h (a) T is smaller than the ellipse period. −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−1.5−1−0.500.511.522.53 z z T z h (b) T is larger than the ellipse period. Fig. 3: Traffic hole ( u > ). Circles are iso-traffic levels.crosses the interface only once, at position ξ and time τ . The impulsion p ∗ is defined everywhere on theoptimal trajectory between ( t , z ) and ( T, z T ) . The following notations will be used in the sequel: p − = p ∗ ( τ − ) = lim s → τs<τ p ∗ ( s ) , p + = p ∗ ( τ + ) = lim s → τs>τ p ∗ ( s ) , H − = H ( ξ, p ∗ ( τ − )) , H + = H ( ξ, p ∗ ( τ + )) . (20) Theorem 2.
The location and time ( ξ, τ ) of interface crossing are characterized by the following equations p − − p + − µ ∇ z f ( ξ ) = 0 (21) H + − H − = 0 (22) f ( ξ ) = C (23) for some Lagrange multiplier µ ∈ R Proof.
See Appendix E.Equation (22) expresses the fact the energy is conserved when crossing the interface. One can show thatactually the energy is conserved along the whole trajectory. Equation (21) is related to the conservationof the tangential component of the impulsion at the interface. Equation (23) is the interface equation at ξ . One can show that under the assumption of equal potential on the interface, the kinetic energy, theimpulsion, and the velocity vector are conserved across the interface. C. Multi-phase optimal trajectory uniqueness and value function convexity issues
Looking for uniqueness/non-uniqueness of optimal multi-phase trajectories leads naturally to study theconvexity of the multi-phase total cost. As stated in Appendix E this cost is additive and satisfies thedynamic programming principle which reads ¯ S (Θ = τ, Ξ = ξ | t , z , T, z T ) = S ( t , z , τ, ξ ) + S ( τ, ξ, T, z T ) (24)where S and S are themselves minimal. Hamilton-Jacobi equations applied to each cost componentallows to study their first- an second- order differential properties such as convexity. For instance inthe chosen quadratic model, each single-phase (minimal) total cost S ( t , z , τ, ξ ) , S ( τ, ξ, T, z T ) given byCorollary 1 and (14) is convex (since it is a positive quadratic form with respect to the spatial coordinates)with respect to the spatial position ξ , but not necessarily so with respect to the interface crossing time τ . We consider in what follows a general form of the single-phase value function between ( t , x ) and ( t , x ) that is denoted by S ( t , x , t , x ) . We also note by ω the pulsation, φ = ω ( t − t ) is the temporal phase and p , p are the the initial andfinal impulsions as derived from formula (16). p p α = ∂ S∂t i > p p α = ∂ S∂t i < Fig. 4: Physical interpretation of local convexity vs. non-convexity of the single-phase cost:the initial and final impulsions ( p , p ) form an obtuse angle ⇒ the value function is non-convex . Theorem 3. i) The Hessian of the single-phase cost wrt. any joint variable ψ i = ( T i , X i ) i =1 , is H ( ψ i ) = ∇ T i ,X i S ( t , x , t , x ) = α Π i † Π i Kg Id with α = ∂ S∂T i = ω p · p K sinh φg = ω coth φ > i = − ω p − i sinh φ ∈ R (25) where p i is the impulsion at time T i , i.e., at extremity X i ( i = 1 , and j = 3 − i ).ii) At most one eigenvalue of H ( ψ i ) can be negative and α < is a sufficient condition for this to hold,namely implying local non-convexity of the (single-phase) value function.iii) The double-phase Hessian enjoys a similar structure as (25) and writes with respect to the variable Ψ = (Θ , Ξ) ¯ H (Θ = τ, Ξ = ξ ) = H ( χ ) + H ( χ ) = α Π † Π Kh Id (26a) with α = ∂ S ∂T ( t , z , τ, ξ ) + ∂ S ∂T ( τ, ξ, T, z T ) ( cf. (25) ) h = ω coth φ + ω coth φ Π = − (cid:20) ω p ( t )sinh φ + ω p ( T )sinh φ (cid:21) and φ = ω ( τ − t ) , φ = ω ( T − τ ) (26b) and may thus be non-convex as well.Proof. See Appendix F subsections F1 and F2.In the single-phase case, the sufficient condition ii) i.e., α < , supports first a physical interpretation(see Fig. 4) and also a geometrical one (see next Theorem 4 and Fig. 5). z h x x vu u = x + x v = x − x Fig. 5: Geometrical interpretation of local non-convexity: long time and long distance to the hotspot.
Theorem 4.
Let u = x + x and v = x − x . (Note that (cid:107) u (cid:107) is the distance from the hotspot to the averageof x and x , and the (cid:107) v (cid:107) is the half-distance between x and x .) Then the following sufficient conditionholds (cid:107) v (cid:107)(cid:107) u (cid:107) < cosh φ −
11 + cosh φ = tanh φ ⇒ α = ∂ S∂t i < . Proof : We combine the formulas (53) and (54) in (58) to obtain α = wK p · p sinh φ = K ω sinh φ (cid:2) ( (cid:107) x (cid:107) + (cid:107) x (cid:107) ) cosh φ − x . x (1 + cosh φ ) (cid:3) (27)which can be rewritten as α = K ω sinh φ (cid:2) − (cid:107) u (cid:107) (1 − cosh φ ) + (cid:107) v (cid:107) (1 + cosh φ ) ) (cid:3) . This non-convexity condition for the single-phase value function interprets as: long phase ( tanh φ > )and long distances between the hotspot z h to both initial and final positions x , x , relatively to theirmutual distance ( (cid:107) u (cid:107)(cid:107) v (cid:107) ≫ ). Several preliminary simulations show that when this case happens for twophases, the total value function is indeed non-convex and that several (local) optimal solutions may indeedexist. IV. A LGORITHMS
Previous propositions allow us to propose several numerical algorithms seeking optimal trajectories. Inthis section, we present two algorithms aiming this goal: a gradient descent algorithm and a bisectionsearch method based on the linear control property (16).
A. A Gradient Descent Algorithm G RAD -A LGO
In this section, we propose G
RAD -A LGO (Algorithm 1) which is an alternated optimization-basedalgorithm relying on the following procedures. a) Procedure for seeking an optimal ξ given a fixed τ : from Hamilton-Jacobi equations (Lemma 3and Appendix C) the gradient of the total cost function with respect to ξ is p ∗ ( τ − ) − p ∗ ( τ +) . Equation (13)says that in the Newtonian framework the impulsion is proportional to the control variable a . Equation (16)says that in the quadratic model the velocity vector is, at any time a linear combination of centered initialand final positions. Then the searched gradient appears to be an affine function of ξ which reads p ∗ ( τ − ) − p ∗ ( τ +) = ∇ X S ( t , z , τ, ξ ) + ∇ X S ( τ, ξ, T, z T )= Kh ( ξ − B ) . (28)Scalar h and point B (where the spatial gradient cancels at fixed τ i.e., p ∗ ( τ − ) = p ∗ ( τ +) ) verify h = ω coth( ω ( τ − t )) + ω coth( ω ( T − τ )) , (29) B = 1 h (cid:2) ω z h coth( ω ( τ − t )) + ω z h coth( ω ( T − τ ))+ ω ( z − z h )sinh( ω ( τ − t )) + ω ( z T − z h )sinh( ω ( T − τ )) (cid:3) . (30)The equation involving the Lagrange multiplier (21) now reads Kh ( ξ − B ) − µ ∇ ξ f ( ξ ) = 0 (31)and shows that the optimal location ξ ∗ is the orthogonal projection of B on the interface I . b) Procedure for seeking an optimal τ given a fixed ξ : we use the result of Theorem 2. As alsoshown by Hamilton-Jacobi equations (Lemma 3 and Appendix C), the gradient of S with respect to τ isgiven by H ( ξ, p ∗ ( τ + )) − H ( ξ, p ∗ ( τ − )) . The Hamiltonians are easily computed in each phase by applyingthe classical Newton formula at given location ( z, t ) H ( p, z, t ) = (cid:107) p (cid:107) K + V ( z ) . We then update τ by using a simple gradient descent scheme (see Algorithm 1). c) Stop criteria: the search procedure is performed until the impulsion and the Hamiltonian haveconverged with a given accuracy. For this purpose we consider the function g : ( R n \{ } ) × ( R n \{ } ) (cid:55)→ R defined as follows g ( x, y ) = (cid:107) x − y (cid:107) inf( (cid:107) x (cid:107) , (cid:107) y (cid:107) ) which measures a relative “discrepancy” between x and y. Algorithm 1 G RAD -A LGO
An uncoupled projected gradient descent algorithm Input: precisions (cid:15) p , (cid:15) H ( (cid:15) p = (cid:15) H = . − ), M τ = number of gradient descent iterations on τ Init: starting position ξ ← ξ e.g., = z + z , τ ← τ e.g., = t + T Output: ( τ, ξ ) procedure TWO_PHASE ( τ, ξ ) computes: the current bi-phase trajectory given by ( t , z ) → ( τ, ξ ) → ( T, z T ) return ( B, H + , H − , ¯ H ) = (the B-point (30), the two phase Hamiltonians and the total Hessian atcurrent location ( τ, ξ ) (24, 26a, 26b)) end procedure N step = 0 ( B, H + , H − , ¯ H ) = TWO_PHASE ( τ, ξ ) repeat ξ ← Proj I ( B ) (31) for m = 1 , . . . , M τ do τ ← τ − ( H + − H − ) α with α = ¯ H = ∂ ¯ S∂ Θ ( τ, ξ ) (Newton descent wrt. τ ) ( B, H + , H − , ¯ H ) = TWO_PHASE ( τ, ξ ) end for N step = N step + 1 until STOP (32)Now let us give (cid:15) p and (cid:15) H (typically of the order of . − ) and define the stop criteria as follows STOP = g ( p + , p − ) / (cid:15) p < g ( H + , H − ) / (cid:15) H < . (32) Complexity:
Algorithm 1: if α ∗ = ∂ ¯ S∂ Θ ( τ ∗ , ξ ∗ ) (cid:54) = 0 then the convergence of Newton descent on τ isquadratic (see [35] for instance). If α ∗ = 0 say, H ( ξ, p ∗ ( τ + )) − H ( ξ, p ∗ ( τ − )) ∼ ( τ − τ ∗ ) ν with ν ≥ ,the convergence is linear (see [35] for instance). For safety we take M τ = 10 i.e., ν ≈ . The globalcomplexity is O ( N step M τ ) and usually N step = 2 . B. The B-curve algorithm
B-A
LGO
The B -curve algorithm aims to overcome the non-convexity issues developed previously. It proceedsas follows. Consider the B point defined in formula (30). We notice that for each τ ∈ [ t , T ] this point isdefined univocally knowing all parameters ( t , T, z , z T , z h ) and all (quadratic) traffic profiles so thatwe can see B in (30) as a function of τ , denoted B ( τ ) . It can be shown that this function τ (cid:55)→ B ( τ ) is continuous in the interval [ t , T ] and that lim τ → t +0 B ( τ ) = z , lim τ → T − B ( τ ) = z T . Assume now thatthe two optimal sub-trajectories are such that crossing time τ ∈ [ t , T ] and interface position ξ verify Algorithm 2 the B-curve bisection algorithm
Input: initial/final time search interval [ a, b ] ∈ [ t , T ] , precision (cid:15) B fixed by the user ( (cid:15) B = 2 . − ) Output: ( ξ, τ ) Init: t ← a , t ← b , the algorithm only starts if IN_ZONE2 ( B ( a ) ) (cid:54) = IN_ZONE2 ( B ( b ) ) procedure IN_ZONE2 ( z ) u i ( z ) = (time-stationary) traffic u ( z ) from hotspot z hi at point z ( i = 1 , ) return ( u ( z ) > u ( z )) end procedurerepeat x ← B ( t ) , x ← B ( t ) τ ← t + t , ξ ← B ( τ ) if ( IN_ZONE2 ( ξ ) == IN_ZONE2 ( x )) then t ← τ else t ← τ end ifcompute stop criterion STOP B (34) until
STOP B ξ ← B ( τ ) ξ = B ( τ ) . Then by formula (28) the spatial gradient of S at point ξ is p ∗ ( τ − ) − p ∗ ( τ +) = 0 (33)This implies that the kinetic components of both Hamiltonians are equal at the interface. Now since ξ belongs to the interface, both phase potentials (traffics) are equal by definition, so that the total Hamiltonianis also conserved at the interface: this is the optimality condition required with respect to time τ . Tosummarize, in these conditions, both the Hamiltonian and the total impulsion are conserved at interface I , i.e., local optimality conditions hold for the total value function. It is also worth noticing that the relatedLagrange multiplier appearing in (31) now simply vanishes: µ = 0 . The proposed B-algorithm consiststhen in seeking the intersection of the B-curve B = { B ( τ ) , τ ∈ [ t , T ] } with the interface I (Algorithm2). For this, we first select precision (cid:15) B , then proceed by bisection and check the stopping criterion STOP B = | t − t || b − a | (cid:15) B < . (34) Complexity:
Algorithm 2: if (cid:15) B = 12 m then the bisection algorithm converges in m iterations. Itscomplexity is thus O (log 1 (cid:15) B ) . In our experiments we chose (cid:15) B = 2 . − ≈ and 12 iterations areindeed sufficient to provide a trajectory with optimality conditions holding at this (relative) precision. Algorithm 3
Data preprocessing First smoothing: data aggregation Second smoothing: LOWESS Normalization K-means with quadratic models V. N
UMERICAL E XPERIMENTS
A. From Measurements to Quadratic Profile
In this section, we explain how from measured or estimated traffic load, we can derive a quadratic modelthat will allow us to apply our framework. To illustrate the procedure, we extract data from the open dataset presented in [34]. Traffic data (in number of bytes) has been collected from an operational cellularnetwork in a medium-size city in China and is recorded for every base station and every hour. For ourexperiment, we extract a rectangle region [ X min , X max ] × [ Y min , Y max ] , where X min = 111 , X max = 111 . , Y min = 13 . , Y max = 13 . are the minimum and maximum longitude and latitude respectively (realfigures have been anonymized). This corresponds approximately to a rectangle of km × km with base stations having an average cell range of m. The traffic of the 22th August 2012 between 5 and6pm is illustrated in Figure 6a. We assume that this traffic is representative of the traffic intensity whenthe drone is launched. In order to fit this raw data to our model, we follow the pre-processing steps shownin Algorithm 3. The first smoothing consists in aggregating the traffic data on a grid of steps in bothlongitude and latitude directions. The resulting elementary regions should correspond approximately tothe drone coverage. The result is shown in Figure 6b. The data exhibits a very high variability with veryhigh peaks around few locations. The second smoothing is a Locally Weighted Scatterplot Smoothing(LOWESS) [36]. We use here the Matlab function fit with the option ”Lowess”. The choice of thesmoothing parameter α , i.e., the proportion of data points used for every local regression, has a decisiveimpact on the result. Increasing α has the effect of averaging out the different peaks. In our specificscenario, α = 0 . yields Figure 7a with two local maxima. With α = 0 . , we obtain a single maximum.In step 3 of the pre-processing, the traffic is normalized between and with no influence on the optimaltrajectory.The final pre-processing step is an adaptation of the classical K-means algorithm (see Algorithm 4) tofit to quadratic models. Inputs are the data points obtained after the normalization, K c , the number ofclusters (or hot spots), K n the number of nearest neighbors and M the number of iterations. Every clusteris associated to a quadratic function (in our case, we have K c = 2 ). Every data point j is associated to a cluster and has a related label L j in { , ..., K } . For every data point, a list of nearest neighbors is built(step 4). An arbitrary initial labelization is chosen (step 6). The algorithm then proceeds by iterations(steps 7-16). At every iteration, if a point j has some neighbor with a different label (step 9), a bestnew label is found for j (step 10-13) in terms of quadratic error e k . The error e k measures the differencebetween the data points and the K c -quadratic model, which fits a quadratic function to every clusterassuming that j has label k (steps 18-31). In Algorithm 4, KNN ( K n , X, Y ) is a procedure that finds the K n nearest neighbors of ( X, Y ) with respect to the Euclidian distance. We use the Matlab implementation knnsearch . NLLS ( L ) is a non-linear least square method that fits data points in L to a quadratic functionof the form u | z − z h | + u . The Matlab implementation is based on a trust-region approach of theLevenberg-Marquardt Algorithm. Complexity:
Algorithm 4: Searching the K n -nearest neighbors of a data point using k-d trees takes O ( K n log J ) in average and O ( K n J ) in the worst case. Steps 3-5 has thus a complexity of O ( K n J log J ) inaverage and O ( K n J ) in the worst case. Initial labelization is a simple linear partitioning of the 2D spacein K c zones and is thus performed in O ( J ) . In the main loop, there are at most O ( M J K c ) calls to thefunction FIT . The Levenberg-Marquardt Algorithm requires O ( (cid:15) − ) iterations to reach an (cid:15) -approximationof a stationary point of the objective function [35], [37]. The overall complexity of Algorithm 4 is thus O ( K n J + M J K(cid:15) − ) .The K-means partition obtained after M = 12 iterations is shown in Figure 7b. The final fits areshown in Figures 9a and 9b for ( K c , α ) = (2 , . and ( K c , α ) = (1 , . , respectively. Figure 8 showsthe quadratic error as a function of the number of iterations of the K-means algorithm for K c = 1 and K c = 2 . The error is constant for K c = 1 as there is only one iteration, which performs the non-linearleast square fitting for the single cluster. For K c = 2 , we distinguish two cases: K n = 5 and K n = ∞ . Inthe former case, only the nearest neighbors of a data point are inspected to decide if a relabelizationshould be performed. In the later case, relabelization is systematically considered. Increasing K n increasesthe complexity of every iteration but provides a faster convergence. B. Trajectory optimization results1) Estimation procedure of parameters K and T: as in most similar algorithms, a ’prior’ estimation ofparameters is necessary since the optimal results strongly depend on them. Traffic parameters have beenestimated just above, so the required parameters are first the mass K representing trade-off between thekinetic term and the traffic term in Lagrangian (2) and then the total duration time T which has also Algorithm 4
K-means with quadratic models Input: K c (number of clusters), J (number of measurement points) ( X j , Y j , Z j ) , j = 1 , . . . , J (coordinates and estimated traffic load Z j in ( X j , Y j ) ), K n (number of nearest neighbors), M (numberof iterations) Output: z hk , u k , u k k = 1 , . . . , K c (hot spot characteristics), e (quadratic error) for j = 1 , . . . , J do K j ← KNN ( K n , X j , Y j ) end for L ← Initial labelization for m = 1 , . . . , M do for j = 1 , . . . , J do if ∃ j (cid:48) ∈ K j s.t. L j (cid:48) (cid:54) = L j then for k = 1 ...K c do e k , z hl , u l , u l , l = 1 , . . . , K c ← FIT (( X, Y, Z ) , j, L, k, K c ) end for L j ← arg min k e k end if end for end for return z hk , u k , u k k = 1 , . . . , K c , global quadratic error procedure FIT ( ( X, Y, Z ) , j , L , k , K c ) L j ← k for l = 1 ,..., K c do L l ← { ( X i , Y i , Z i ) | L i = l } z hl , u l , u l ← NLLS ( L l ) end for e k ← for j = 1 ,..., J do ˜ Z j ← max l =1 ,...,K c u l | z − z hl | + u l e k ← e k + | ˜ Z j − Z j | end for e k ← e k /J return e k , z hl , u l , u l , l = 1 , . . . , K c end procedure a significant influence on the optimal trajectory (by convention t = 0 ). This procedure is developed inAppendix G and yields in our case: T ≈ , K = 50 × r with r = 40 × × and we select: T = 1200 , s and K = 30 , × r .
2) Results: optimal trajectories are shown in Figs. 9 and 10 and related numerical results in Table I.Length of various trajectories and related average velocities are given in Table II.The following main comments can be made on these results: • First the obtained trajectories for G
RAD -A LGO and B-A
LGO satisfy both the optimal conditions, (a) Raw traffic data on the 22th Aug. 2012 6pm [34]. . (b) First smoothing: data aggregation at drone coverage level. Fig. 6: Data preprocessing: raw data and first smoothing. (a) Second smoothing: LOWESS with α = 0 . . (b) K-means partition. Fig. 7: Data preprocessing: second smoothing and partitions in 2 phases.as “relative” discrepancies between impulsions and between Hamiltonians at the interface are indeedbelow selected precision of . − , numerically yielding a full trajectory, • Then G
RAD -A LGO is found to be stable due to our choice of initial and terminal drone positions (nottoo close to the interface and not too far from their related hotspots) and of time intervals (not toolarge temporal phases). Thus, we are far from the non-convexity conditions expressed in Theorems3 and 4. The positivity of the total hessian is indeed constantly checked at each iteration. Hence, alarge (spatio-temporal) convergence basin resulting into exact convergence of the G
RAD -A LGO (aswell as the B-A
LGO ) towards the unique optimal solution. Iterations Q uad r a t i c e rr o r Fig. 8: Quadratic error between normalized smoothed data and quadratic models.
111 111.02 111.04 111.06 111.08 111.113.1213.1313.1413.1513.1613.1713.1813.1913.213.21
InterfaceIso-traffic levelsHot spotsB curve AlgorithmGradient Algorithm T=1800B curveGradient Algorithm T=1200 (a) K = 60 , T = 1200 and , 2 phases.
111 111.02 111.04 111.06 111.08 111.113.1213.1313.1413.1513.1613.1713.1813.1913.213.21
Iso-traffic levelsHot spotSingle-phase Algorithm T=1200Single-phase Algorithm T=1800 (b) K = 60 , T = 1200 s and s, 1 phase. Fig. 9: Influence of T on the optimal trajectory. • As already noted for the single-phase case (paragraph III-A2), decreasing mass K enables the droneto collect more traffic. In fact, during the allowed time interval, the drone will get closer to thehotspot with maximal traffic in order to decrease the total value function, i.e., zone here. • Increasing time interval ( T (cid:37) ) will produce the same tendancy, allowing the drone to spend moretime near the hotspot with higher traffic z h , see Figs. 9a and 10a. • Since the drone has enough time to pick up traffic located close to hotspots, this is reflected in anaverage drone velocity about half its nominal value (see Table II).
111 111.02 111.04 111.06 111.08 111.113.1213.1313.1413.1513.1613.1713.1813.1913.213.21
InterfaceIso-traffic levelsHot spotsB curve Algorithm K=30B curve Algorithm K=60 (a) K = 30 and , T = 1200 , 2 phases.
111 111.02 111.04 111.06 111.08 111.1 111.1213.1213.1313.1413.1513.1613.1713.1813.1913.213.21
Iso-traffic levelsHot spotSingle-phase Algorithm K=30Single-phase Algorithm K=60 (b) K = 30 and , T = 1200 s, 1 phase. Fig. 10: Influence of K on the optimal trajectory. ξ S H − φ − φ + τ a ) K = 60 × r , T = 1200 → S ph = − . B-A
LGO ( N iter = 12)(111 . , . − . . . . . G RAD -A LGO ( N iter = 5)(111 . , . − . . . . . b ) K = 60 × r , T = 1800 → S ph = − . B-A
LGO ( N iter = 12)(111 . , . − . . . . . G RAD -A LGO ( N iter = 2)(111 . , . − . . . . . c ) K = 30 × r , T = 1200 → S ph = − . B-A
LGO ( N iter = 12)(111 . , . − . . . . . G RAD -A LGO ( N iter = 2)(11 . , . − . . . . . TABLE I: Table of results with B-A
LGO and G
RAD -A LGO for K = 30 , × r , T = 1200 , s and r = × × ≈ . (corresponding to scaled spatial unity of 100m).Also, S ph denotes the total single-phase cost (i.e., assuming only one, effective hotspot).VI. C ONCLUSION
In this paper, we propose a Lagrangian approach to solve the UAV base station optimal trajectoryproblem. When the traffic intensity exhibits a single phase, closed-form expressions for the trajectory andspeed are derived from Hamilton-Jacobi equations. When the traffic intensity exhibits multiple phases, wecharacterize the crossing time and location at the interface. We propose two low-complexity algorithms for case bi-phase single-phase L ( km ) ¯ V ( m/s ) L ( km ) ¯ V ( m/s ) a ) 12 .
41 10 .
34 11 .
71 9 . b ) 13 .
80 7 .
66 12 .
53 6 . c ) 13 .
64 11 .
37 12 .
37 10 . TABLE II: Table of trajectory lengths and related average speeds.the bi-phase time-stationary traffic case that provide optimal crossing time and location on the interfaceand fulfill the necessary conditions of optimality. At last, we present a data processing procedure basedon a modified K-means algorithm that derives a single-phase or bi-phase quadratic model from real trafficdata. Further extensions of this work are envisioned to generalize the approach to three or more hot-spotsand to consider multi-drone coordinated trajectories.A
PPENDIX
A. Proof of Lemma 1
In a neighborhood of the optimal trajectory, the first order variation of S is null δS = (cid:90) Tt δ L ( t, z, a ) dt (35) = (cid:90) Tt [ ∇ z L ( t, z, a ) · δz ( t ) + ∇ a L ( t, z, a ) · δa ( t )] dt. We now note that δa = δ dzdt = d ( δz ) dt . Integrating by part the second term in the integral of δS , we obtain (cid:90) Tt ∇ a L ( t, z, a ) · d ( δz ) dt dt (36) = [ δz ( t ) · ∇ a L ( t, z, a )] Tt − (cid:90) Tt δz ( t ) · ddt ∇ a L ( t, z, a ) dt. Note that [ δz ∂ L ∂a ] Tt = 0 because z and z T are fixed. Equating δS to zero gives (cid:90) Tt (cid:20) ∇ z L ( t, z, a ) − ddt ∇ a L ( t, z, a ) (cid:21) · δz ( t ) dt. (37)As this should be true for every δz , L , z and z T , we obtain the first result. Now assume that we have the optimal a ( t ) , the condition for z ( T ) to be the optimal final position is δS = [ δz ( t ) · ∇ a L ( t, z, a )] Tt + ∇ J ( z ( T )) · δz ( T )= ∇ a L ( z ( T ) , T, a ( T )) · δz ( T ) + ∇ J ( z ( T )) · δz ( T )= 0 . (38)Note that z is fixed and so δz in z is null. We thus obtain the second result of the lemma. B. Proof of Lemma 2 As L ( z, a ) is an homogeneous function of z and a , we have: L ( λz, λa ) = | λ | α L ( z, a ) for all λ (in ourcase with α = 2 ). Deriving this expression with respect to λ , setting λ = 1 , and noting that a = ˙ z weobtain z · ∂ L ( z, ˙ z ) ∂z + ˙ z · ∂ L ( z, ˙ z ) ∂z = α L ( z, ˙ z ) . (39)Using (7) and (39), we have: z · dpdt + ˙ z · p = α L or equivalently d ( p · z ) dt = α L . We can now integrate thecost function (3) along the optimal trajectory as follows S ( t , z , T, z T ) = 1 α (cid:90) Tt d ( p · z ) dt ( t ) dt + J ( z T )= 1 α ( p ( T ) · z T − p ( t ) · z ) + J ( z T ) (40) C. Proof of Lemma 3
We assume that an optimal trajectory exists and we apply the principle of optimality on the optimaltrajectory between ( t, z ∗ ( t )) and ( t + h, z ∗ ( t ) + ah ) , where h > , to opbtain S ( t, z ∗ ( t ) , T, z T ) = min a [ h L ( z ∗ ( t ) , a ) + S ( t + h, z ∗ ( t ) + ah, T, z T )]= min a [ h L ( z, a ) + S ( t, z ∗ ( t ) , T, z T ) + ha · ∇ X S ( t, z ∗ ( t ) , T, z T ) + h ∂S∂T ( t, z ∗ ( t ) , T, z T )] . This implies that ∂S∂T ( t, z ∗ ( t ) , T, z T ) = − min a [ a · ∇ X S ( t, z ∗ ( t ) , T, z T ) + L ( z ∗ ( t ) , a )]= max a [ − a · ∇ X S ( t, z ∗ ( t ) , T, z T ) − L ( z ∗ ( t ) , a )]= H ( t, z ∗ ( t ) , −∇ X S ( t, z ∗ ( t ) , T, z T )) . By using the same approach between t − h and t , we deduce in the same way equation (10) when thefinal time T is varying. D. Proof of Theorem 1
From (6) and (12), we obtain the following ordinary differential equation of second degree: ¨ z = − u K z . If u K > , we define ω = u K and we look for an optimal trajectory of the form: z ( t ) = A cos( ωt )+ B sin( ωt ) .If u K < , we look for an optimal trajectory of the form: z ( t ) = A cosh( ωt ) + B sinh( ωt ) with ω = − u K .Let us denote z = z ( t ) and a = a ( t ) the initial conditions for z and ˙ z .Take the case u K < . Using the derivative of z ( t ) and identifying terms, we obtain: z ( t ) = z cosh ω ( t − t ) + a ω sinh ω ( t − t ) . At t = T , we have also: z T = z cosh ω ( T − t ) + a ω sinh ω ( T − t ) , from whichwe deduce a ( t ) = ω ( z T − z cosh ω ( T − t ))sinh ω ( T − t )) , (41) a ( T ) = ω ( − z + z T cosh ω ( T − t ))sinh ω ( T − t )) . (42)when u < . In a similar way, we have a ( t ) = ω ( z T − z cos ω ( T − t ))sin ω ( T − t ) , (43) a ( T ) = ω ( − z + z T cos ω ( T − t ))sin ω ( T − t ) , (44)when u > . Injecting a ( t ) = a in the equation of the trajectory provides the result.For the computation of S , we now use the result of Lemma 2 as our cost function is 2-homogeneous.From equation (8), we see that only initial and final conditions are required to compute the cost function.Recall now that p = Ka . Injecting the equations of a ( t ) and a ( T ) in (8), we obtain the result for thecost function. E. Proof of Theorem 2
We assume that the location and time ( ξ, τ ) of interface crossing is known and unique. The optimaltrajectory between ( z , t ) and ( z T , T ) can be decomposed in two sub-trajectories that are themselves optimal between ( z , t ) and ( ξ, τ ) on the one hand and between ( ξ, τ ) and ( z T , T ) on the other hand, bythe principle of optimality. In region 1, the optimal cost up to τ is S ( t , z , τ, ξ ) = (cid:90) τt L ( z ∗ ( s ) , a ∗ ( s )) ds. (45)Using Hamilton-Jacobi, we obtain ∂S ∂T ( t , z , τ, ξ ) = − H ( ξ, p ∗ ( τ − )) . (46)In the same way, the optimal cost in region 2 is S ( τ, ξ, T, z T ) = (cid:90) Tτ L ( z ∗ ( s ) , a ∗ ( s )) ds. (47)Using again Hamilton-Jacobi, we obtain ∂S ∂T ( τ, ξ, T, z T ) = H ( ξ, p ∗ ( τ + )) . (48)A necessary condition for the optimality of τ is thus ∂S ∂T ( t , z , τ, ξ ) + ∂S ∂T ( τ, ξ, T, z T ) = 0 , (49)that is H ( ξ, p ∗ ( τ − )) = H ( ξ, p ∗ ( τ + )) . (50)A necessary condition for the optimality of ξ in the total cost under the constraint f ( ξ ) = C is also µ ∇ z f ( ξ ) = ∇ X S ( t , z , τ, ξ ) + ∇ X S ( τ, ξ, T, z T )= p ∗ ( τ − ) − p ∗ ( τ + ) where µ is a Lagrange multiplier associated to the constraint and where the second line comes fromequation (11) of Hamilton-Jacobi. Thus we obtain precisely equation (21). F. The structure and the diagonalization of single- and two- phase Hessian of the value fonction1) Proof of Theorem 3 :
Recall that we study the second-order differentiability properties of single-phase value function S ( t , x , t , x ) . Recall that from Eq. (1) we study Hessians of the form M , ( R ) (cid:51) ∇ σ = ∂ ∂T i σ ∂∂T i ∇ Xi σ ∇ Xi ∂∂T i σ ∇ X i ,X i σ where σ stands for S ( t , x , t , x ) and i = 1 , It is clear by inspecting the symmetries of (14) with respect to the spatial coordinates that ∇ X ,X S ( t , x , t , x ) = ∇ X ,X S ( t , x , t , x ) = Kg Id , where g = ω coth φ and φ = ω ( t − t ) is the temporal phase . (51)Concerning symmetry with respect to time variables, one also finds easily that ∂ S∂T ( t , x , t , x ) = ∂ S∂T ( t , x , t , x ) . The Hessian with respect to the variable χ i = ( T i , X i ) i =1 , ∈ R × R has thus the following structure: H ( χ i ) = α Π i † Π i K g Id with α = ∂ S∂T = ∂ S∂T g = ω coth φ Π i = ∂∂T i ∇ X i S (52)where the vector Π i and real scalar α remain to be determined.Now recall that initial and final impulsions p and p as given by (41) write: p = K ω − x cosh φ + x sinh φ , (53) p = K ω − x + x cosh φ sinh φ . (54)In fact the expression of Π i and α appearing in Hessian (52) result both from the computation of ∂p i ∂T i for i = 1 , . Using (51), the differentiation of (53) and (54) wrt. T resp. T leads to the following simple result Π = ∂∂T ∇ X S = ∂p ∂T = − ω p sinh φ resp. Π = ∂∂T ∇ X S = ∂p ∂T = − ω p sinh φ . (55)Now, as far as the second partial derivative of value function S with respect to time ( α ) is concerned α = ∂ S∂T = − ∂H∂T (Hamilton-Jacobi) (56)Recalling that the Hamiltonian for any Newtonian model has the form H ( z, p, t ) = (cid:107) p (cid:107) K + V ( z ) (57)and differentiating it with respect to the temporal phase φ at fixed extremities x and x leads to δ H = p · δ p K = p · δ p K = − p · p K sinh φ δ φ , that is α = ∂ S∂T = ∂ S∂T = − ω ∂H∂φ = ωK p · p sinh φ . (58)Now from (24) the two-phase Hessian is simply the sum of the two single-phase Hessians.
2) Diagonalizing the Hessian of single- and two- phase value function : since the Hessians of single-and two-phase value function do have the same structure, we address a general Hessian of the form (52).Its characteristic polynom is easily developed as det (
H − ν Id ) = ( Kg − ν ) ( α − ν ) − ( Kg − ν ) (cid:107) Π (cid:107) . Since from (52) one has ∀ ( θ, x ) ∈ R × R H ( θ, x ) = ( α θ + Π · x, Π θ + Kg x ) ,the three real eigenvalues ( ν i ) i =0:2 and related eigenvectors Y i = ( θ i , x i ) thus satisfy i ) if ν = Kg > then Y = ( θ = 0 , x ) with x ⊥ Π : ”pure space” eigenvector Y ∈ { } × R ,ii ) if ν , ν such that ( Kg − ν ) ( α − ν ) − (cid:107) Π (cid:107) = 0 then Y i = ( θ i , x i = Π) i =1 , : two (mutually orthogonal) eigenvectors orthogonal to Y ( since Π ⊥ x ) with ν ν = α Kg − (cid:107) Π (cid:107) = det( H ) / ( Kg ) and ν + ν = α + Kg.
Thus ν , ν cannot be simultaneously negative since this would imply α < − Kg < ⇒ det( H ) = ν ν ν < . However the condition α Kg < (cid:107) Π (cid:107) (e.g. induced by the sufficient condition α < ) implies that oneof eigenvalues ν , ν < namely the local non-convexity of the Hessian matrix H .This property holds thus for each single-phase value function as well as for the total two-phase case. G. Estimation of parameters
It is based on the following remarks. First, the interface location I is independent of K and T (sinceonly depends on traffic variables). Then, final time T remains always estimated in seconds ( t = 0) . Also,according to [38], a typical drone cell flights at speed ¯ V = 20 m/s and has an autonomy of about min.Last, we test our procedure for spatially-scaled data, where the convenient spatial unity is 100 m, so thatthe scaling ratio is r = 40 . × ≈ . (59)Also, the following numerical estimations are considereda) for estimating T : the total length of the trajectory L is approximated by L ≈ c × (cid:107) z − z T (cid:107) with c = 1 . → T = ¯ VL (recall that T is un-scaled and stands in seconds.)b) for estimating K : the phase in each sub-trajectory should satisfy φ i = ω i T i < φ max , i = 1 , where φ max = 10 (also an un-scaled constant) .Then, spatial scaling by the quantity r implies u i ← u i r , i = 1 , , and thus: K ← Kr since K ω i = u i at any scale.Passing back into the original frame therefore implies that K ← K r .R EFERENCES [1] Y. Zeng, Q. Wu, and R. Zhang, “Accessing from the sky: A tutorial on uav communications for 5g and beyond,”
Proceedings of theIEEE , vol. 107, no. 12, pp. 2327–2375, 2019.[2] B. Pearre and T. X. Brown, “Model-free trajectory optimization for wireless data ferries among multiple sources,” in
IEEE GlobecomWorkshops , Dec 2010, pp. 1793–1798.[3] V. Sharma, M. Bennis, and R. Kumar, “Uav-assisted heterogeneous networks for capacity enhancement,”
IEEE Communications Letters ,vol. 20, no. 6, pp. 1207–1210, June 2016.[4] D. Yang, Q. Wu, Y. Zeng, and R. Zhang, “Energy tradeoff in ground-to-uav communication via trajectory design,”
IEEE Transactionson Vehicular Technology , vol. 67, no. 7, pp. 6721–6726, 2018. [5] T. T. Mac, C. Copot, D. T. Tran, and R. De Keyser, “Heuristic approaches in robot path planning: A survey,” Robotics and AutonomousSystems , vol. 86, pp. 13–28, 2016.[6] T.-Y. Chi, Y. Ming, S.-Y. Kuo, C.-C. Liao et al. , “Civil uav path planning algorithm for considering connection with cellular datanetwork,” in
IEEE Intl. Conf. on Computer and Information Technology (CIT) , June 2012, pp. 327–331.[7] D. Liberzon,
Calculus of variations and optimal control theory: a concise introduction . Princeton University Press, 2011.[8] D. Delahaye, S. Puechmorel, P. Tsiotras, and E. F´eron, “Mathematical models for aircraft trajectory design: A survey,” in
Air TrafficManagement and Systems . Springer, 2014, pp. 205–247.[9] Y. Zeng, R. Zhang, and T. J. Lim, “Throughput maximization for uav-enabled mobile relaying systems,”
IEEE Transactions onCommunications , vol. 64, no. 12, pp. 4983–4996, Dec 2016.[10] Y. Zeng and R. Zhang, “Energy-efficient uav communication with trajectory optimization,”
IEEE Transactions on Wireless Communi-cations , vol. 16, no. 6, pp. 3747–3760, June 2017.[11] Q. Wu, Y. Zeng, and R. Zhang, “Joint trajectory and communication design for multi-uav enabled wireless networks,”
IEEE Transactionson Wireless Communications , vol. 17, no. 3, pp. 2109–2121, Mar. 2018.[12] Y. Zeng, J. Xu, and R. Zhang, “Energy minimization for wireless communication with rotary-wing uav,”
IEEE Transactions on WirelessCommunications , vol. 18, no. 4, pp. 2329–2345, 2019.[13] B. R. Marks and G. P. Wright, “A general inner approximation algorithm for nonconvex mathematical programs,”
Operations research ,vol. 26, no. 4, pp. 681–683, 1978.[14] J. Zhang, Y. Zeng, and R. Zhang, “Uav-enabled radio access network: Multi-mode communication and trajectory design,”
IEEETransactions on Signal Processing , vol. 66, no. 20, pp. 5269–5284, 2018.[15] O. von Stryk and R. Bulirsch, “Direct and indirect methods for trajectory optimization,”
Annals of Operations Research , vol. 37, no. 1,pp. 357–373, Dec 1992.[16] G. Barles, A. Briani, and E. Tr´elat, “Value function and optimal trajectories for regional control problems via dynamicprogramming and Pontryagin maximum principles,”
Math. Cont. Related Fields
Partial differential equations , 2nd ed., ser. Graduate Studies in Mathematics. American Mathematical Society, Providence,RI, 2010, vol. 19.[18] M. Akian, S. Gaubert, and A. Lakhoua, “The max-plus finite element method for solving deterministic optimal control problems: basicproperties and convergence analysis,”
SIAM Journal on Control and Optimization , vol. 47, no. 2, pp. 817–848, 2008.[19] W. McEneaney,
Max-plus methods for nonlinear control and estimation . Springer Science & Business Media, 2006.[20] A. Alla, M. Falcone, and L. Saluzzi, “An efficient DP algorithm on a tree-structure for finite horizon optimal control problems,”
SIAMJournal on Scientific Computing , vol. 41, no. 4, pp. A2384–A2406, 2019.[21] S. Dolgov, D. Kalise, and K. Kunisch, “A tensor decomposition approach for high-dimensional Hamilton-Jacobi-Bellman equations,” arXiv preprint arXiv:1908.01533 , 2019.[22] W. Kang and L. C. Wilcox, “Mitigating the curse of dimensionality: sparse grid characteristics method for optimal feedback controland HJB equations,”
Computational Optimization and Applications , vol. 68, no. 2, pp. 289–315, 2017.[23] A. Alla, M. Falcone, and S. Volkwein, “Error analysis for POD approximations of infinite horizon problems via the dynamicprogramming approach,”
SIAM Journal on Control and Optimization , vol. 55, no. 5, pp. 3091–3115, 2017.[24] D. Kalise, S. Kundu, and K. Kunisch, “Robust feedback control of nonlinear PDEs by numerical approximation of high-dimensionalHamilton-Jacobi-Isaacs equations,” arXiv preprint arXiv:1905.06276 , 2019. [25] Y. T. Chow, J. Darbon, S. Osher, and W. Yin, “Algorithm for overcoming the curse of dimensionality for state-dependentHamilton-Jacobi equations,” Journal of Computational Physics
Research in the Mathematical Sciences , vol. 3, no. 1, p. 19, Sep 2016.[27] I. Yegorov and P. M. Dower, “Perspectives on characteristics based curse-of-dimensionality-free numerical approaches for solvingHamilton–Jacobi equations,”
Applied Mathematics & Optimization , pp. 1–49, 2017.[28] Y. T. Chow, J. Darbon, S. Osher, and W. Yin, “Algorithm for overcoming the curse of dimensionality for certain non-convexHamilton–Jacobi equations, projections and differential games,”
Annals of Mathematical Sciences and Applications , vol. 3, no. 2,pp. 369–403, 2018.[29] T. Nakamura-Zimmerer, Q. Gong, and W. Kang, “Qrnet: Optimal regulator design with lqr-augmented neural networks,”
IEEE ControlSystems Letters , vol. 5, no. 4, pp. 1303–1308, 2021.[30] J. Han, A. Jentzen, and W. E, “Solving high-dimensional partial differential equations using deep learning,”
Proceedings of the NationalAcademy of Sciences , vol. 115, no. 34, pp. 8505–8510, 2018.[31] J. Darbon, G. P. Langlois, and T. Meng, “Overcoming the curse of dimensionality for some Hamilton–Jacobi partial differentialequations via neural network architectures,”
Research in the Mathematical Sciences , vol. 7, no. 3, Jul. 2020. [Online]. Available:https://doi.org/10.1007/s40687-020-00215-6[32] J. Darbon and T. Meng, “On some neural network architectures that can represent viscosity solutions of certain high dimensionalHamilton–Jacobi partial differential equations,”
Journal of Computational Physics
IEEE INFOCOM 2019-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) . IEEE, 2019, pp. 626–631.[34] X. Chen, Y. Jin, S. Qiang, W. Hu, and K. Jiang, “Analyzing and modeling spatio-temporal dependence of cellular traffic at city scale,”in
Communications (ICC), 2015 IEEE International Conference on , 2015.[35] M. Bierlaire,
Optimization: Principles and Algorithm . EPFL Press, 2015.[36] W. S. Cleveland, “Robust locally weighted regression and smoothing scatterplots,”
Journal of the American statistical association ,vol. 74, no. 368, pp. 829–836, 1979.[37] K. Ueda and N. Yamashita, “On a global complexity bound of the levenberg-marquardt method,”
Journal of optimization theory andapplications , vol. 147, no. 3, pp. 443–453, 2010.[38] A. Fotouhi, M. Ding, and M. Hassan, “Dronecells: Improving 5g spectral efficiency using drone-mounted flying base stations,” arXivpreprint arXiv:1707.02041arXivpreprint arXiv:1707.02041