[PDF] Fresh, Fair and Energy-Efficient Content Provision in a Private and Cache-Enabled UAV Network

Abstract

In this paper, we investigate a private and cache-enabled unmanned aerial vehicle (UAV) network for content provision. Aiming at delivering fresh, fair, and energy-efficient content files to terrestrial users, we formulate a joint UAV caching, UAV trajectory, and UAV transmit power optimization problem. This problem is confirmed to be a sequential decision problem with mixed-integer non-convex constraints, which is intractable directly. To this end, we propose a novel algorithm based on the techniques of subproblem decomposition and convex approximation. Particularly, we first propose to decompose the sequential decision problem into multiple repeated optimization subproblems via a Lyapunov technique. Next, an iterative optimization scheme incorporating a successive convex approximation (SCA) technique is explored to tackle the challenging mixed-integer non-convex subproblems. Besides, we analyze the convergence and computational complexity of the proposed algorithm and derive the theoretical value of the expected peak age of information (PAoI) to estimate the content freshness. Simulation results demonstrate that the proposed algorithm can achieve the expected PAoI close to the theoretical value and is more 22.11% and 70.51% energy-efficient and fairer than benchmark algorithms.

Full PDF

aa r X i v : . [ c s . N I] F e b Fresh, Fair and Energy-Efﬁcient Content Provisionin a Private and Cache-Enabled UAV Network

Peng Yang,

Member, IEEE , Kun Guo,

Member, IEEE , Xing Xi,

Graduate Student Member, IEEE , Tony Q. S.Quek,

Fellow, IEEE , Xianbin Cao,

Senior Member, IEEE , and Chenxi Liu,

Member, IEEE

Abstract —In this paper, we investigate a private and cache-enabled unmanned aerial vehicle (UAV) network for contentprovision. Aiming at delivering fresh, fair, and energy-efﬁcientcontent ﬁles to terrestrial users, we formulate a joint UAVcaching, UAV trajectory, and UAV transmit power optimizationproblem. This problem is conﬁrmed to be a sequential decisionproblem with mixed-integer non-convex constraints, which isintractable directly. To this end, we propose a novel algorithmbased on the techniques of subproblem decomposition and convexapproximation. Particularly, we ﬁrst propose to decompose thesequential decision problem into multiple repeated optimizationsubproblems via a Lyapunov technique. Next, an iterative op-timization scheme incorporating a successive convex approxi-mation (SCA) technique is explored to tackle the challengingmixed-integer non-convex subproblems. Besides, we analyze theconvergence of the proposed algorithm and derive the theoreticalvalue of the expected peak age of information (PAoI) to estimatethe content freshness. Simulation results demonstrate that theproposed algorithm can achieve the expected PAoI close tothe theoretical value and is more . % and . % energy-efﬁcient and fairer than benchmark algorithms. Index Terms —Fresh content, private UAV network, UAVcaching, trajectory design, power control

I. I

NTRODUCTION T HE data trafﬁc requested by terrestrial mobile userswill increase dramatically in terrestrial wireless mobilecommunication networks [2]. It is predicted that monthly datatrafﬁc in the global mobile networks will reach 77 exabytesby 2022 [2]. It is also foreseeable that with the advancementof manufacturing, chips, and sensors technologies, the globaldata trafﬁc in future wireless networks will increase exponen-tially [3]. However, the ﬂexibility and resilience of terrestrialnetwork services are insufﬁcient [4]. It is a challenging taskfor terrestrial networks to guarantee satisfactory network per-formance at any time, especially during peak trafﬁc time [5].Owing to the agile and resilient deployment, the unmannedaerial vehicle (UAV) network has been widely consideredas a signiﬁcant complement in 5G and beyond to terrestrialnetworks to boost the capacity of terrestrial networks andextend the network coverage [4]. Moreover, by deployinga private UAV network with complete control over someaspects (e.g., network resources and storage resources) of

P. Yang, K. Guo, and T. Q. S. Quek are with the Information SystemsTechnology and Design, Singapore University of Technology and Design,487372 Singapore. X. Xi, and X. Cao are with the School of Electronic andInformation Engineering, Beihang University, Beijing 100083, China.C. Liuis with the State Key Laboratory of Networking and Switching Technology,Beijing University of Posts and Telecommunications, Beijing 100876, China.This paper was presented in part in the IEEE Global CommunicationsConference 2020 [1]. the network, the private UAV network can provide furtheroptimized services over the service area [6]. Recently, theresearch on the private UAV network attracts much attentionfrom academia and industry [6]–[9].On the other hand, UAV caching is a promising paradigmto assist terrestrial networks [10]. By proactively cachingpopular and repetitively requested content ﬁles with large size(e.g., high-resolution map, football match video), UAV cachingcan signiﬁcantly alleviate the trafﬁc burden and backhaulcongestion of terrestrial networks in the peak hours of somehotspots [3], [5]. Besides, when content requests are hit bythe caching, content ﬁles can be directly transmitted withouttraversing wireless backhaul, which reduces the response delaysigniﬁcantly [11]. On-demand UAV communications can alsobe dispatched when terrestrial networks are overloaded, themanner of which is ﬂexible and cost-effective. As a result,during the past few years, the issue of UAV caching has beenstudied extensively [3], [5], [12]–[14].

A. Related work

In terms of the research on the private UAV network, thework in [6] deployed a private blockchain-enabled UAV 5Gnetwork to meet dynamic user demands in a reliable andsecure manner. In [7], the UAV-aided interference assessmentfor private 5G new radio (NR) deployments was investigated.The utilization of dedicated portions of cellular spectrum toprovide the high-reliable command and control link for UAVswas evaluated in [8]. Besides, the work in [9] designed optimaltrajectories of UAVs in private UAV networks to alwaysmaintain connections between UAVs and a ground stationunder the constraint that the total distance travelled by allUAVs is minimum.In terms of the research on the UAV caching, the work in[3] explored a cache-enabled UAV assisted wireless network tomaximize the minimum throughput among UAV served users,by jointly optimizing the cache placement, UAV trajectory, andUAV transmit power in a ﬁnite ﬂight period. In [5], the UAV-aided edge caching to assist terrestrial vehicular networks indelivering high-bandwidth content ﬁles was investigated. Be-sides, the issue of UAV caching for decreasing the transmissionlatency and alleviating backhaul congestion in a UAV networkwith limited wireless backhaul capacity was studied in [14].Although the UAV caching issue was extensively discussed in[3], [5], [12]–[14] via optimizing cache placement and UAVtrajectory, and so on, few of them discuss the problem ofmaintaining the “freshness” of cached contents. It is crucial to deliver fresh content ﬁles to the destination nodes in someapplications especially in some delay-sensitive applications,such as intelligent transportation, environmental monitoring,and health monitoring [15]. The outdated information mayresult in degraded user experience, erroneous control, evencause big catastrophes [16]. The “freshness” is an importantmetric, referred to as the age of information (AoI) or statusage, which is deﬁned as the amount of time elapsed sincethe instant at which the most recently delivered update takesplace [17]. Due to the signiﬁcance of delivering fresh contentﬁles, AoI-aware UAV-assisted wireless network design hasattracted increasing interest from the research community[15], [16], [18]. For example, the work in [15] proposed tojointly optimize the UAV trajectory, the time required forenergy harvesting, and data collection for each sensor nodeto minimize the average AoI of collected data in a UAV-assisted wireless network. The work in [18] investigated theproblem of jointly optimizing the UAV trajectory, sensingtime, transmission time, and task scheduling to guarantee thefreshness of the UAV sensed data in a UAV network.

B. Motivation and contributions

Despite the extensive existing work on fresh data provision,all of them [15], [16], [18] focused on the issue of maintainingthe freshness of data delivered to a single destination nodefrom a single/multiple source nodes without discussing thesigniﬁcant problem of minimizing the AoI of data destinedto multiple destination nodes. In fact, the investigation on theissue of delivering fresh data to multiple destination nodes issigniﬁcant. On one hand, multiple UAVs instead of a singleUAV should be deployed to deliver data to destination nodesowing to the limited service ability and coverage range of asingle UAV. On the other hand, for delay-sensitive information(e.g., trafﬁc information and live news), it is important toguarantee the freshness of delivered data.The goal of minimizing the AoI of data delivered to multipledestination nodes, however, poses novel and greater challengesto the optimization of AoI and the theoretical analysis of AoIin a UAV-assisted wireless network. First, the minimizationof AoI of data depends on the simultaneous decision makingon data placement, data delivery, UAV trajectory, and UAVtransmit power [15], [16], [18]. Multiple UAVs rather thana single UAV should be deployed to deliver fresh data todestination nodes. Nevertheless, the joint decision making ondata placement, data delivery, UAV trajectory, and UAV trans-mit power in a multi-UAV network is much more challengingthan that in a single UAV network studied in [15], [16], [18].Besides, this joint decision making problem can be conﬁrmedas a sequential decision problem with a high-dimensional andmixed discrete and continuous decision space. Therefore, it isdifﬁcult to explore some standard optimization methods andthe popular deep reinforcement learning methods, which aredesigned either for the low-dimensional discrete decision spaceor the high-dimensional continuous decision space, to solvethis problem. Although we considered multiple destinationnodes for a joint decision making on UAV trajectory and UAVtransmit power in our previous work [1], the issues of ensuring fresh data delivery was not addressed. Second, owing to thecomplex interaction between multiple decision variables andthe freshness of data, the theoretical analysis on the freshnessof data destined to multiple destination nodes has not beenwell investigated.To tackle the above challenges and provide further opti-mized services, we focus on the joint design of UAV caching(including content placement and content delivery), UAVtrajectory, and UAV transmit power in a private and cache-enabled UAV network in this paper. The main contributionsof this paper can be summarized as follows:1) A time-varying UAV network is desired to be deployedto deliver fresh content to terrestrial users due to the limitedUAV service ability and communication coverage range. Inthis regard, we formulate the problem of content provisionby deploying a private and cache-enabled UAV network asa sequential decision problem. The goal of this problem is tomaintain the freshness of data arriving at all users and providefair and energy-efﬁcient content delivery for all users, subjectto UAVs’ transmit power and trajectory constraints.2) To effectively solve the formulated problem, a Lyapunov-based optimization framework and a novel algorithm withprovable performance guarantees are proposed. Particularly,the framework solves the complicated sequential decisionproblem via decomposing it into repeatedly optimized sub-problems of multi-tier structure rather than solve it as awhole. The decomposed subproblems, however, are conﬁrmedto be mixed-integer non-convex, which are still intractabledirectly. To make them tractable, the proposed algorithm ﬁrstlyexplores an iterative optimization scheme to handle the mixed-integer issue. Then, a successive convex approximation (SCA)technique is leveraged to tackle the non-convexity.3) Besides, we analysis the convergence of the proposedalgorithm. The theoretical value of the expected peak AoI(PAoI) to estimate the freshness of the content is also obtainedby the probability theory.4) Finally, the performance of the proposed algorithm iscompared with different benchmark algorithms, and impactof different design parameters is discussed. Simulation resultsverify that the proposed algorithm can achieve the expectedPAoI close to the theoretical value. Further, the proposedalgorithm is more . % and . % energy-efﬁcient andfairer than benchmark algorithms, respectively.II. S YSTEM MODEL AND PROBLEM FORMULATION

A. Scenario description

This paper considers a content server and a private andcache-enabled UAV network, which includes one ground basestation (BS), multiple UAVs, and many terrestrial mobile users,as shown in Fig. 1. The content server will proactively transmitcontent ﬁles, each of which consists of many data packets,requested by users to the BS such that the latency for usersto obtain content ﬁles can be signiﬁcantly reduced. However,terrestrial users, the set of which is I = { , , . . . , N } with N being the number of users, may be in poor communicationenvironment due to serious signal occlusion or too far from theBS. Then, the BS-user transmission links may be interrupted, Edge router

ContentserverContent server

Storage

AUSFAMF

Backhaul

Control khau ge USF l C EAP-based network access authentication

Private and cache-enabled UAV network

Public network

Fig. 1. The communication scenario and network architecture of a privateand cache-enabled UAV network. and users will have poor quality-of-experience (QoE). Hence,a set J of J energy-constrained rotary-wing UAVs acting asaerial relays is deployed to perform the communication taskof proving fresh, fair and energy-efﬁcient content delivery forterrestrial users. In addition, to provide better direct contentdelivery services for terrestrial users, each UAV is equippedwith a capacity-limited storage to dynamically cache contentﬁles from the BS and then deliver cached ﬁles to users.To theoretically model the communication task, the timedomain in the private and cache-enabled UAV network isassumed to be discretized. We consider a general case ofassuming that the UAV communication task of deliveringcontent ﬁles may last long enough, i.e., t = { , , . . . } , and ∆ t is the duration of a time slot. Owing to the limited numberof UAVs and a UAV’s restricted communication range, thelocations of UAVs should be continuously adjusted whenexecuting the task such that UAVs can deliver content ﬁlesto all terrestrial users. B. Network architecture

In the above scenario, terrestrial users served by the UAVsneed to access the content server residing in the publicnetwork. To this aim, we design a network architecture shownin Fig. 1. This network architecture includes a public network,a private and cache-enabled UAV network, and a BS sharedby the public network and the private and cache-enabled UAVnetwork (referred to as UAV network for brevity). For theBS, it is split into higher radio access layers (i.e., radioresource control (RRC), packet data control protocol (PDCP))and lower-layer radio interface (i.e., radio link control (RLC),medium access control (MAC), physical layer (PHY)). Thehigher layers can be conﬁgured to operate in a user-speciﬁcmode. For example, for a user requiring low-latency services,RRC can be conﬁgured and tailored to disable the lower-layerInternet protocol (IP) stack and related header compression.Besides, RLC can be conﬁgured in transparent mode by theRRC. In contrast, for a user requiring high quality of expe-rience (QoE), IP and acknowledged RLC should be initiated[19]. The UAV network and the public network will share thelower-layer radio interface of the BS. The higher radio accesslayers of the BS are split to the public network. Besides, thepublic network will provide core network functions.To access the content server, a roaming agreement should beestablished between the public network and users in the UAVnetwork [20]. This can be done by using a communicationterminal or else just by using the subscriber identity in thepublic network. Yet, when sharing the BS, the network security is one of the major concerns [20]. In 3GPP speciﬁed 5Gnetworks, an extensible authentication protocol (EAP) basedmethod (see [21]) can be introduced to ensure the security. Inthe considered scenario, the authentication is done betweenusers and an authentication server function (AUSF) in theUAV network. This leads to keys shared between the AUSFand users. The keys are derived during authentication andare utilized to protect user trafﬁc. According to [22], oneof these keys is passed on towards the access and mobilitymanagement function (AMF) in the public network as a basisfor establishing security between the mobile device and thepublic network. Another key agreed with the mobile during thesame EAP run can however be retained by the private networkand can be used to establish user plane security between usersand the UAV network. Besides, network security is terminatedby the PDCP. Making the split between the UAV network andthe public network below the PDCP allows to keep all securityrelated functions within the UAV network.To instantiate the network architecture, the network functionvirtualization (NFV) and software-deﬁned networking (SDN)techniques should be explored. NFV and SDN are complemen-tary technologies that achieve the level of abstraction and ﬂex-ibility required to satisfy stringent applications’ requirementswhile maximizing network infrastructure reutilization [23].Speciﬁcally, NFV can decouple physical network functions(PNFs) (e.g., ﬁrewalls, routers, load balancers) from dedicatedhardware by implementing the same functionality in software,called virtualized network functions (VNFs) [23]. VNFs maythen be instantiated in data centers at backend clouds, or ontop of devices equipped with compute and storage resourcesat the edge [24]. For example, in the considered architecture,candidates for VNF instances include AUSF and AMF.SDN can decouple the user plane from the control planeand centralize network management in an SDN controller. Thenetwork management can then be facilitated via a softwariza-tion approach. With a global view of the network resources,SDN controller applications can take advantage of numeroussouthbound interfaces (e.g., OpenFlow, NETCONF) to gathernetwork state information and act upon each forwarding device(PNF or VNF) accordingly. In the considered architecture, dueto the exploration of the SDN technique, the control planenetwork functions (e.g., home subscribe server (HSS), AMF)can be deployed on the public network and shared by the UAVnetwork. The public network and UAV network can maintaintheir own user plane network functions which are responsiblefor handling user-speciﬁc and bearer trafﬁc.Based on the above scenario and network architecture, wenext mathematically model the communication task.

C. UAV caching model

As terrestrial users in general submit their content requestsearlier than the expected time that content ﬁles are received,their content requests may be known in advance [25]. For anyuser i ∈ I , denote by Pr i the probability of content requestsof user i received by the content server with P Ni =1 Pr i = 1 .According to the known content request information, decisionson UAV caching including the content placement and delivery,modelled as below, can then be made.

1) Content placement model:

In the model, any terrestrialuser may obtain its requested content ﬁles from one of thefollowing two links, i.e., UAV cache-user link (if content ﬁlesare cached in UAVs) and BS cache-UAV-user link . Denote by F = { f , f , . . . , f i , . . . , f N } a ﬁnite content library, where f i represents the content ﬁle requested by user i ∈ I . The size ofeach content ﬁle is assumed to be same [5]. For any UAV j ∈J , denote by b j,f i ( t ) a content placement decision variable attime slot t . b j,f i ( t ) = 1 if the content ﬁle f i destined to user i is cached in UAV j . In this case, it is possible for user i to obtain the content ﬁle f i from UAV j directly; otherwise, b j,f i ( t ) = 0 . When b j,f i ( t ) = 0 , user i has to obtain thecontent ﬁle f i via the BS cache-UAV-user link. Just like [26],we assume that one user requests at most one content ﬁle andeach UAV can cache at most one content ﬁle at time slot t .Then, we have the following content placement constraint X f i ∈F b j,f i ( t ) ≤ , ∀ j, t. (1)

2) Content delivery model:

Denote by s ij ( t ) , ∀ i , j , t , thecontent delivery status of UAV j at time slot t . s ij ( t ) = 1 represents that UAV j directly delivers the cached content ﬁleor forward the content ﬁle from the BS cache to user i at t . s ij ( t ) = 0 indicates that UAV j does not deliver a content ﬁleto user i at t . Besides, each UAV can deliver a content ﬁle toone user, and one user can obtain the requested content ﬁlefrom one UAV at each time slot. Formally, we have ≤ X j s ij ( t ) ≤ , ∀ i, t, ≤ X i s ij ( t ) ≤ , ∀ j, t (2)where we lighten P j s ij ( t ) and P i s ij ( t ) for P j ∈J s ij ( t ) and P i ∈I s ij ( t ) , respectively. The similar lightened notationis adopted for brevity throughout the rest of this paper. D. UAV power consumption and movement model

By referring to (2), we know that a terrestrial user cannotreceive content ﬁles from a UAV at every time slot. Therefore,we investigate the time average communication behaviors ofterrestrial users and UAVs in this paper.

1) UAV power consumption model:

Given UAV j , deﬁneits time average transmit power during the ﬁrst t time slotsas ¯ p j ( t ) = t P tτ =1 p j ( τ ) with p j ( τ ) being the instantaneoustransmit power of UAV j at time slot τ . Except for the transmitpower, UAVs are subject to inherent circuit power consump-tion mainly including power consumption of mixers, frequencysynthesizers, and digital-to-analog converters. Denote p cj as thecircuit power of UAV j during a time slot, we then model thepower consumption of UAV j at time slot t as p totj ( t ) = p j ( t ) + p cj , ∀ j, t, (3)which is upper-bounded by a constant ˆ p j , i.e., p totj ( t ) ≤ ˆ p j .Accordingly, the time average power consumption of UAV j during the ﬁrst t time slots can be written as ¯ p totj ( t ) = ¯ p j ( t ) + p cj , ∀ j, t, (4) As the case of obtaining content ﬁles via the BS cache-user link was wellstudied in [3], [13], [14], [26], we did not investigate this type of link here. Note that different users may request the similar content ﬁle. For ﬁles withdiverse sizes, the analysis can be extended by dividing each ﬁle into chunksof equal size. which is constrained by ¯ p totj ( t ) ≤ ˜ p j , and ˜ p j is a constant.

2) UAV movement model:

At each time slot, all UAVsare movement controlled to execute the communication taskefﬁciently. Denote the horizontal location of UAV j ∈ J as x j ( t ) = [ x j ( t ) , y j ( t )] T at time slot t . Like [3], [5], we considera scenario that all UAVs ﬂy horizontally at a constant altitude g to achieve a lower level of energy consumption. During theﬂight, the distance between two consecutive waypoints on aUAV trajectory will be constrained by the UAV’s maximumspeed. As such, the mathematical expression of the waypointdistance constraint is given by || x j ( t ) − x j ( t − || ≤ e , ∀ j, t, (5)where e max is the UAV’s maximum ﬂight distance during aslot, x j (0) represents the initial location of j .Additionally, for collision avoidance, the distance betweenany two UAVs at each slot should not be less than a safetydistance. Mathematically, the expression can be written as || x j ( t ) − x k ( t ) || ≥ d , ∀ j, k = j, t, (6)where d min is the minimum safety distance. E. Air-to-ground communications

For air-to-ground (AtG) communications, each terrestrialuser may have a line-of-sight (LoS) view towards a UAV witha certain probability. A widely adopted expression of the LoSprobability is [27]

Pr( r ij ( t )) = [1 + aexp ( − b ( φ ij ( t ) − a ))] − , ∀ i, j, t, (7)where a and b are constants relying on the type of environment,such as rural and dense urban, φ ij ( t ) = π × arctan( gr ij ( t ) ) is the elevation angle of user i towards UAV j , r ij ( t ) denotesthe horizontal distance between user i ∈ I and UAV j ∈ J ,i.e., r ij ( t ) = || x j ( t ) − x u i ( t ) || , and x u i ( t ) = [ x u i ( t ) , y u i ( t )] T represents the location of user i at time slot t , which can beknown via a global positioning system (GPS).Consider the setting that the user-altitude and antenna-heights of both user and UAV are neglected. The path-lossexpression between user i and UAV j can be given by [27]

10 log ( h ij ( t )) = 20 log (cid:0) πς (cid:1) + 20 log ( q g + r ij ( t ))+Pr( r ij ( t )) η LoS + (1 − Pr( r ij ( t ))) η NLoS , ∀ i, j, t, (8)where ς = c/f c is the carrier wavelength, c (in m/s) is thespeed of light, f c (in Hz) is the carrier frequency, η LoS (indB) and η NLoS (in dB) are losses corresponding to LoS andnon line-of-sight (NLoS) connections.In the considered UAV network, UAVs will hover duringa time slot to deliver content ﬁles to terrestrial users. In thiscase, it is signiﬁcant for UAVs to establish LoS links towardsterrestrial users. This is because UAVs are energy-constrainedand will consume much less power for content provision viaLoS links than NLoS links [3], [12]. Therefore, we focus onthe LoS AtG communications and then discuss the conditionfor establishing LoS connections between UAVs and users.According to statistical analysis results in [27], under theworst-case AtG propagation environment (i.e., dense urban), the probability of a LoS AtG propagation link can be over90%, when the elevation angle between a UAV and a useris not less than a threshold θ th . Thus, we have the followingcondition for approximately establishing LoS AtG connections || x j ( t ) − x u i ( t ) || ≤ g tan − θ th , ∀ i, j, t. (9)Under the approximated LoS AtG connection condition, wecan approximate (8) as h ij ( t ) ≈ G LoS ς / (16 π ( D ij ( t )) ) , ∀ i, j, t, (10)where G LoS = 10 − η LoS / , and D ij ( t ) = q g + r ij ( t ) is thedistance between UAV j and user i at time slot t . F. Quality-of-experience model

In this subsection, the concept of QoE is introduced to un-derstand and improve the subjective perception of the qualityof a network service as a whole by the end user [28]. Weleverage a widely used mean opinion score (MOS) [28] tomodel the QoE of a user.

1) Mean opinion score:

For any user i at time slot t , itsMOS can take the following form [28] ¯ D i,f i ( t ) = ˆ D − D i,f i ( t ) / ( ˆ D − L/u maxdl ) , ∀ i, t, (11)where ˆ D is conﬁgured based on the desired system require-ment [26], [28], D i,f i ( t ) represents the edge transmissionlatency of user i , which is deﬁned as the required time totransmit a content ﬁle from the BS or a UAV to user i attime slot t , u maxdl = log (cid:16) p max G LoS ς π g σ W (cid:17) (in bps/Hz), theconstant p max = ˆ p j − p cj , ∀ j , is the maximum instantaneoustransmit power of a UAV during each time slot, and σ W isthe noise power with W (in MHz) being the total bandwidth.User i will have a “very good” QoE state if ¯ D i,f i ( t ) ≥ D th [26], [28]. Here, D th is the MOS threshold that maximizesthe edge transmission latency of user i enjoying the desired“very good” QoE. (11) shows the interplay between the MOSand the edge transmission latency of user i at time slot t . Wenext model the edge transmission latency of user i , ∀ i ∈ I .

2) Edge transmission latency model:

Recall that contentﬁles can be transmitted to terrestrial users via a BS cache-UAV-user link or a UAV cache-user link. Thus, during timeslot t , the edge transmission latency for user i to receive acontent ﬁle f i can be given by D i,f i ( t ) = (cid:26) L/ ( W u dl i ( t )) , b j,f i ( t ) = 1 D ul + L/ ( W u dl i ( t )) , b j,f i ( t ) = 0 (12)where L (in Mbits) denotes the size of the transmitted contentﬁle, u dl i ( t ) (in bps/Hz) is the achievable data rate of user i ,and the constant D ul represents the transmission latency ofdelivering the content ﬁle from the BS to a UAV .From (11), we know that improving users’ QoE indicatesreducing the edge transmission latency of delivering contentﬁles. There is an inherent interplay between the edge trans-mission latency and data rate.We next model the data rate. The constant transmission latency can be achieved by exploring BS powercontrol strategy, which has been investigated in our previous paper [29]. (cid:959) (cid:3036)(cid:481)(cid:2869) (cid:959) (cid:3036)(cid:481)(cid:2870) (cid:959) (cid:3036)(cid:481)(cid:2871) (cid:959) (cid:3036)(cid:481)(cid:3040) (cid:4666)(cid:1869)(cid:3548)(cid:482)(cid:1869)(cid:4667)

Time, (cid:1869) (cid:1869) (cid:3036)(cid:481)(cid:2870) (cid:1869)(cid:3548) (cid:3036)(cid:481)(cid:2869) (cid:1869)(cid:3548) (cid:3036)(cid:481)(cid:2870) (cid:1869) (cid:3036)(cid:481)(cid:2871) (cid:1869)(cid:3548) (cid:3036)(cid:481)(cid:2871) (cid:1869) (cid:3036)(cid:481)(cid:2872) (cid:1869) (cid:3036)(cid:481)(cid:2869) (cid:1850) (cid:3036)(cid:481)(cid:2870) (cid:1851) (cid:3036)(cid:481)(cid:2870) … Fig. 2. An example of PAoI evolutionmodel of packets towards user i [30]. . . . Preprocessing queue Transmission buffers N S server

Fig. 3. Management strategy of theBS for newly arrived packets.

3) AtG data rate model:

By referring to (11) and (12), forany UAV j and user i , the required data rate (denoted by C th i,f i ( t ) (in bps/Hz)) for user i to achieve the desired QoEstate when receiving the content ﬁle f i can be written as C th i,f i ( t ) = L P j ∈J i b j,fi ( t ) W ( ˆ D − D th ( ˆ D − L/ ( W u maxdl )) ) + L (1 − P j ∈J i b j,fi ( t )) W ( ˆ D − D th ( ˆ D − L/ ( W u maxdl )) − D ul ) , ∀ i, t, (13)where J i ( t ) denotes a set of UAVs with horizontal distancestowards user i being small than e max at time slot t .Next, we use Shannon capacity to quantify the receivingdata rate (in bps/Hz) of user i from UAVs at time slot t , i.e., u dl i ( t ) = X j s ij ( t )log (1 + p j ( t ) h ij ( t ) σ W + I ij ( t ) ) , ∀ i, t, (14)where I ij ( t ) = P k ∈J \{ j } p k ( t ) h ik ( t ) denotes the interferenceexperienced at user i when all UAVs share the spectrum. Then,the condition of u dl i ( t ) ≥ C th i,f i ( t ) should be satisﬁed if user i ’s achievable data rate can enable its desired QoE state. G. PAoI evolution model of packets

Different from AoI, PAoI provides information of the max-imum value of AoI and can capture the extent to whichupdate information is stale [30]. Therefore, just like [30], weadopt the PAoI as the metric to estimate the freshness ofinformation. Additionally, we discuss the above models fromthe perspective of content ﬁles, each of which includes a batchof packets. However, as most PAoI-related work [30], [31]discuss the PAoI evolution from the viewpoint of data packets,we next model the PAoI evolution of packets.For any user i ∈ I , deﬁne the AoI of a packet m destined touser i as Γ i,m ( q ) = q − q i,m − , where q i,m − is the generationtime of the most recently received packet m − from the dataserver until time q [17]. When user i does not receive packet m , the value of Γ i,m ( q ) increases linearly with q , which inturn shows the fact that packet m is getting older. In otherwords, the m -th peak value of Γ i,m ( q ) is obtained right beforethe m -th newly generated packet arrives at user i . The m -thpeak value of Γ i,m ( q ) is deﬁned as the PAoI [30], denoted by ∆ i,m (ˆ q ; q ) , where ˆ q denotes the time that a packet generated attime q arrives at its destination user. Fig. 2 shows an exampleof the PAoI evolution model ∆ i,m (ˆ q ; q ) for user i and packet The timescale of modelling of PAoI differs from that in the above models.This is because the inter-arrival time of packets is different from that oftransmitting content ﬁles to users. For clariﬁcation, we use the terminology‘time’ and the corresponding notation ‘q’ when analyzing PAoI of pcakets. m . Formally, for any user i and packet m , the PAoI of packet m destined to user i evolves as follows [30] ∆ i,m (ˆ q ; q ) = (cid:26) ˆ q i,m m = 1 X i,m + Y i,m m > (15)where ˆ q i,m is the time that packet m reaches user i , X i,m = q i,m − q i,m − , Y i,m = ˆ q i,m − q i,m , and q i,m is the generationtime of packet m towards user i , as shown in Fig. 2.The inter-arrival time X i,m is related to the packet arrivalrate of packets sent to user i . The value of Y i,m is determinedby many factors, such as the time cost of preprocessing datapackets in the BS and the edge arrival duration. Speciﬁcally,for the BS, it will maintain a queue with inﬁnite buffer spaceto manage arrived data packets, as shown in Fig. 3. Uponarriving at the BS, data packets will enter into the BS queueto wait to be preprocessed (e.g., packet classiﬁcation, packetheader update) according to a ﬁrst-come-ﬁrst-served (FCFS)principle. We call this queue the preprocessing queue. Afterbeing preprocessed, packets destined to different users will becached at diverse transmission buffers maintained by the BSor forwarded and cached in UAVs. Therefore, we can re-write Y i,m as Y i,m = Y Q i,m + Y S i,m + Y A i,m , where Y Q i,m denotes thequeueing delay of packet m , Y S i,m is the preprocessing time ofpacket m in the content ﬁle f i , and Y A i,m ≤ D i,f i ( t ) is the edgearrival duration of packet m . It is noteworthy that the valuesof X i,m , Y Q i,m , and Y S i,m are determined by some networkparameters such as backhaul capacity and CPU computingspeed and cannot be reduced by optimizing the deploymentand resource (e.g., UAV transmit power) allocation of the UAVnetwork [30]; yet, Y A i,m can be optimized. In the followingsubsection we formulate an optimization problem of reducingthe value of Y A i,m while providing fair and energy-efﬁcientcontent ﬁle delivery for all terrestrial users via the joint designof UAV caching, UAV trajectory, and UAV transmit power. H. Problem formulation

Improving users’ achievable data rates will result in reducededge arrival duration and the following fresher content ﬁletransmission. Thus, a goal of the problem is to maximize users’achievable data rates. During the ﬁrst t time slots, the timeaverage achievable data rate of user i , ∀ i , is written as ¯ u dl i ( t ) = t P tτ =1 u dl i ( τ ) . Deﬁne φ ( { ¯ u dl i ( t ) } ) = P i log (1 + ¯ u dl i ( t )) as a proportional fairness function of time average achiev-able data rates across all terrestrial users. Then, maximizing φ ( { ¯ u dl i ( t ) } ) will result in fresh and fair content provision forall users. Besides, to implement the energy-efﬁcient contentﬁle delivery, the power consumption of UAVs should beminimized. To achieve the above goals, the joint optimizationof UAV caching, UAV trajectory, and UAV transmit powershould be investigated. Mathematically, we can formulate thejoint optimization problem as below Maximize B ( t ) , S ( t ) , P ( t ) , X ( t ) lim inf t →∞ ( φ ( { ¯ u dl i ( t ) } ) − ρ X j ¯ p totj ( t )) (16a) s . t . lim inf t →∞ [¯ u dl i ( t ) − ¯ C th i,f i ( t )] ≥ , ∀ i (16b) Like [30], [31], we discuss the PAoI evolution of packets in the discretetime domain. lim sup t →∞ ¯ p totj ( t ) ≤ ˜ p j , ∀ j (16c) p totj ( t ) ≤ ˆ p j , ∀ j, t (16d) s ij ( t ) ∈ { , } , ∀ i, j, t (16e) b j,f i ( t ) ∈ { , } , ∀ j, t (16f) p j ( t ) ≥ p min j , ∀ j, t (16g) (1) , (2) , (5) , (6) , (9) . (16h)where B ( t ) , S ( t ) , P ( t ) , and X ( t ) represent the sets of con-tent placement decision variables, content delivery decisionvariables, UAV transmit power, and UAV locations at timeslot t , respectively, ρ is a non-negative coefﬁcient that weighsthe trade-off between fresh and fair content delivery andpower consumption, p min j is a small constant, ¯ C th i,f i ( t ) = t P tτ =1 ϕC th i,f i ( τ ) with ϕ = JN . The constant ϕ is introducedbecause the condition u dl i ( t ) ≥ C th i,f i ( t ) should be satisﬁed ifuser i ’s “very good” QoE state can be achieved at time slot t . However, user i cannot receive a content ﬁle from a UAVat each time slot. Besides, each user has the probability of JN to receive the content ﬁle from a UAV due to the goal ofachieving fair content delivery.The solution to (16) is quite challenging mainly becausei) time-coupled objective function: the objective function isthe logarithmic function of time average achievable datarates. The calculation of the objective function requires theobtaining of all users’ achievable data rates over the ﬁrst t time slots, which indicates the optimization of a greatnumber of decision variables. Besides, the number of de-cision variables in the problem will exponentially increasewith an increasing t , which seriously hinders the solutionto the problem; ii) sequential decision problem: it needsto optimize the UAV cache placement scheme, UAV cachedelivery strategy, UAV transmit power, and UAV trajectoriesduring the ﬁrst t time slots; iii) thorny optimization problem: (16) includes a logarithmic-quadratic objective function, non-convex constraints (explained in detail in Section III), andcontinuous and integer variables. Therefore, (16) is a mixed-integer non-convex optimization problem that may be NP-hardor even undecidable.To solve this highly challenging problem, we propose aLyapunov-based optimization framework. In this framework,we ﬁrst attempt to decouple the objective function of (16)in terms of time slots. Next, we leverage a Lyapunov drift-plus-penalty technique [32] to further decompose the problemwith a time-decoupled objective function into multiple repeat-edly optimized subproblems. Finally, an iterative optimizationscheme is designed to tackle the mixed-integer non-convexcharacteristic of the subproblems.III. L YAPUNOV -B ASED OPTIMIZATION F RAMEWORK

Observing that the objective function is the logarithmicfunction of time average achievable data rates, we refer to theobjective function as time-coupled objective function. We ﬁrstleverage the Jensen inequality to decouple the time-coupledobjective function. A sequential decision problem with a time-decoupled objective function can then be obtained, which isstill difﬁcult to be solved effectively. Reinforcement learning (RL) approaches, such as Q-learning [33], and deep determin-istic policy gradient (DDPG) [34], can be explored to solvesequential decision problems. However, Q-learning-based ap-proaches can only handle discrete and low-dimensional actionspaces, and DDPG-based methods are designed for continuous(real valued) action spaces [34]. (16) simultaneously involvesdiscrete and continuous action spaces, which indicates that itwill be highly difﬁcult to design RL approaches to solve (16).What’s more, RL approaches suffer from lack of completetheoretical basis.To solve the sequential decision problem effectively, wepropose a Lyapunov-based optimization framework, whichdecomposes the problem into mutiple repeatedly optimizedsubproblems rather than solve this problem as a whole. Theprocedure of the framework is as follows.

A. Decouple of the objective function

Let γ ( t ) = ( γ ( t ) , . . . , γ N ( t )) be an auxiliary vectorwith ≤ γ i ( t ) ≤ u maxdl , ∀ i, t . Deﬁne g ( t ) = φ ( γ ( t )) = P i log (1 + γ i ( t )) . Then, according to the Jensen’s inequality,we can achieve ¯ g ( t ) ≤ φ (¯ γ ( t )) . With this important inequality,the following Proposition shows that we can equivalentlytransform the original problem into a new one with a time-decoupled objective function. Proposition 1.

The original problem (16) can be equivalentlytransformed into the following sequential decision problem.

Maximize B ( t ) , S ( t ) , P ( t ) , X ( t ) , γ ( t ) lim inf t →∞ (¯ g ( t ) − ρ X j ¯ p totj ( t )) (17a) s . t . lim inf t →∞ [¯ u dl i ( t ) − ¯ γ i ( t )] = 0 , ∀ i (17b) lim inf t →∞ [¯ u dl i ( t ) − ¯ C th i,f i ( t )] ≥ , ∀ i (17c) lim inf t →∞ [˜ p j − ¯ p totj ( t )] ≥ , ∀ j (17d) ≤ γ i ( t ) ≤ u max i , ∀ i, t (17e) (16d) − (16h) . (17f) Proof.

Please refer to Appendix A.

B. Lyapunov drift-plus-penalty

In this subsection, we leverage a Lyapunov drift-plus-penalty technique [32] to tackle the time average constraints in(17). Speciﬁcally, to enforce the constraint (17c), we introducea family of virtual queues { Q i ( t ) } as the following Q i ( t ) = Q i ( t −

1) + ϕC th i,f i ( t ) − u dl i ( t − , ∀ i, t. (18)It can be concluded that the constraint (17c) is satisﬁed ifthe following mean-rate stability condition holds [32] lim t →∞ E { [ Q i ( t )] + } /t = 0 , ∀ i, (19)where the non-negative operation [ x ] + = max { x, } .Likewise, to enforce the time average constraints (17b)and (17d), we deﬁne the virtual queues Z i ( t ) , and H j ( t ) ,respectively, as Z i ( t ) = Z i ( t −

1) + γ i ( t − − u dl i ( t − , ∀ i, t, (20) H j ( t ) = H j ( t −

1) + p totj ( t − − ˜ p j , ∀ j, t. (21) (17b) and (17d) can be satisﬁed, if the following mean-ratestability conditions can be held lim t →∞ E { [ Z i ( t )] + } /t = 0 , ∀ i, (22) lim t →∞ E { [ H j ( t )] + } /t = 0 , ∀ j. (23)With the deﬁnitions of the virtual queues [ Q i ( t )] + , [ Z i ( t )] + , and [ H j ( t )] + , we can deﬁne a Lyapunov func-tion L ( t ) as a sum of square of these virtual queues attime slot t , i.e., L ( t ) ∆ = P i ([ Q i ( t )] + ) + P i ([ Z i ( t )] + ) + P j ([ H j ( t )] + ) . L ( t ) is a scalar measure of constraintviolations. Intuitively, if the value of L ( t ) is small, the absolutevalues of all queues are small; otherwise, the absolute valueof at least one queue is great. Additionally, we deﬁne adrift-plus-penalty function as ∆( t ) − V (cid:16) g ( t ) − ρ P j p totj ( t ) (cid:17) ,where ∆( t ) = L ( t + 1) − L ( t ) represents a Lyapunov drift , − (cid:16) g ( t ) − ρ P j p totj ( t ) (cid:17) is a penalty , and V is a non-negativepenalty coefﬁcient that weighs the trade-off between constraintviolations and optimality. Lemma 1 presents the upper boundof the function value. Lemma 1.

At each time slot t , the upper bound of the value ofthe drift-plus-penalty function ∆( t ) − V (cid:16) g ( t ) − ρ P j p totj ( t ) (cid:17) can be expressed as (24) with B ∆ = P i ( u maxdl ) + P j (ˆ p j ) / t ) − V (cid:16) g ( t ) − ρ P j p totj ( t ) (cid:17) ≤ B − P j [ H j ( t )] + (cid:0) ˜ p j − p cj (cid:1) + V ρ P j p cj + P i [ Q i ( t )] + ϕC th i,f i ( t ) − V φ ( γ ( t ))+ P i [ Z i ( t )] + γ i ( t ) + P j { V ρ + [ H j ( t )] + } p j ( t ) − P i { [ Q i ( t )] + + [ Z i ( t )] + } u dl i ( t ) . (24) Proof.

Please refer to Appendix B.In (24), the right-hand-side expression constitutes the upperbound of the drift-plus-penalty. As such, the minimization ofthe drift-plus-penalty can be approximated by minimizing itsupper bound. We therefore mitigate (16) by greedily minimiz-ing the upper bound of the drift-plus-penalty function at each t .Meanwhile, at each t , the upper bound can be decomposed intofour independent terms including a constant term, an auxiliaryvariable term, a term related to content caching, and a termconsisting of content delivery strategies, UAV transmit poweras well as UAV trajectories. As a result, the Lyapunov-basedoptimization framework of mitigating (16) can be summarizedas the following repeated optimization subproblems of three-tier structure. • At each time slot t , observe Q i ( t ) , Z i ( t ) , H j ( t ) for anyuser i ∈ I , and UAV j ∈ J . • AUxiliary-Tier (AUT) optimization:

Choose γ i ( t ) for eachuser i to mitigate (25) Minimize γ ( t ) − V φ ( γ ( t )) + X i [ Z i ( t )] + γ i ( t ) (25a) s . t . ≤ γ i ( t ) ≤ u maxdl (25b) • Content-Placement-Tier (CPT) optimization : Determinethe content placement decision variable b j,f i ( t ) for eachUAV j to optimize (26) Minimize B ( t ) X i [ Q i ( t )] + C th i,f i ( t ) (26a) s . t . constraint (1) . (26b) • Delivery-Power-and-Trajectory-Tier (DPT ) optimiza-tion : Given UAV trajectories X ( t − , choose S ( t ) , P ( t ) , and X ( t ) to mitigate (27) Minimize S ( t ) , P ( t ) , X ( t ) X j { V ρ + [ H j ( t )] + } p j ( t ) − X i { [ Q i ( t )] + + [ Z i ( t )] + } u dl i ( t ) (27a) s . t . (2) , (5) , (6) , (9) , (16d) − (16g) . (27b) • Compute u dl i ( t ) using (14). Update there virtual queuesusing (18), (20), and (21).As shown in the above framework, the solution of (16) liesin the optimization of some subproblems. In this section, wepresent the detailed procedure of solving it. C. AUT optimization

As the proportional fairness function φ ( γ ( t )) is a separablesum of individual logarithmic functions, the mitigation of (25)is equivalent to a separate selection of the individual auxiliaryvariable γ i ( t ) ∈ [0 , u maxdl ] for each user i ∈ I that minimizesa convex function − V log (1 + γ i ( t )) + [ Z i ( t )] + γ i ( t ) withrespect to (w.r.t.) γ i ( t ) . Thus, the closed-form solution to (25)can be written as γ i ( t ) =  u maxdl , [ Z i ( t )] + = 0min (cid:26)h V [ Z i ( t )] + ln 2 − i + , u maxdl (cid:27) , else (28) D. CPT optimization

The goal of (26) is to reduce the data rate requirementsof all terrestrial users. To this aim, content ﬁles should becached in UAVs. As a result, the total power of all UAVscan be reduced when delivering content ﬁles to terrestrialusers. Minimizing the total power of all UAVs is equal tothe maximization of the reduction of transmit power of eachUAV brought by content caching [26]. Thus, we can designthe following content placement scheme to solve (26) b j,f i ( t ) = (cid:26) , i = i ⋆ , otherwise (29)where i ⋆ = arg max i ∈N j [ Q i ( t )] + ( p j ( β ) − p j ( α )) , (30)and N j is the set of terrestrial users with the horizontaldistance towards UAV j at time slot t being small than e max , p j ( β ) − p j ( α ) represents the transmit power reduc-tion of UAV j due to the content caching. Besides, UAV j will cache the content for its nearest user if N j = ∅ . β = L ( ˆ D − D th ( ˆ D − L/u maxdl ) − D ul ) , α = L ( ˆ D − D th ( ˆ D − L/u maxdl ) ) , and p j ( ̟ ) = (2 ̟/W − σ W + I ij ( t )) h ij ( t ) . Remark 2: From (29) and (30), we can see that the contentplacement decision will be made based on the states (e.g.,UAV locations) of the UAV network at the current time slot.Besides, the content placement depends on the pre-knowledgeof content request of each terrestrial user and users’ locations,which corresponds to the result given in [26]. E. DPT optimization It can be observed that (27) includes logarithmic-quadratic-terms and continuous and integer variables. Besides, the con-straint (16f) is non-convex; thus, (27) is a mixed-integer non-convex programming problem that is difﬁcult to be addresseddirectly.To address this challenge, we ﬁrst propose to tackle themixed-integer issue of (27) by leveraging an iterative opti-mization scheme. Particularly, the solution to (27) includes theiterative optimization of content delivery decision variables,UAV trajectory, and transmit power. Second, we explore anSCA technique [35] to approximately convert the generatednon-convex optimization problems during the iterative opti-mization into convex ones.

1) Content delivery decision variable optimization:

Forgiven UAV trajectories X ( t ) and transmit power P ( t ) , thecontent delivery stratety of (27) can be developed by solvingthe following problem Maximize S ( t ) X i X j c ij ( t ) s ij ( t ) (31a) s . t . (2) , (16e) . (31b)where c ij ( t ) = { [ Q i ( t )] + + [ Z i ( t )] + } log (cid:16) p j ( t ) h ij ( t ) σ W + I ij ( t ) (cid:17) .It is easy to know that (31) is an integer linear programmingproblem and can be efﬁciently solved by some optimizationtools such as MOSEK.

2) UAV trajectory optimization:

Given the UAV transmitpower P ( t ) , UAV trajectories at time slot t − , X ( t − , andthe content delivery decision variable set S ( t ) , (27) is stilldifﬁcult to be solved by some standard optimization methodsdue to the non-convex objective function and constraint (6).To solve this problem effectively, an SCA technique [35]is exploited to tackle the non-convexity and approximatelytransform the non-convex optimization problem into a convexone. The key idea of SCA is to solve a sequence of convexoptimization problems with different initial points to obtain anapproximate solution to a non-convex optimization probleminstead of solving the hard non-convex problem directly. Thefollowing Proposition presents the approximately transformedconvex UAV trajectory optimization problem. Proposition 2.

By exploring the SCA technique, UAV trajec-tories at time slot t can be obtained by mitigating the followingconvex optimization problem. Maximize X ( t ) , { η i ( t ) } , { B ik ( t ) } X i { [ Q i ( t )] + + [ Z i ( t )] + } η i ( t ) (32a) s . t . X j s ij ( t )( D ( r ) i ( t ) − X k ∈J E ( r ) ik ( t )( || x k ( t ) − x u i ( t ) || −|| x ( r ) k ( t ) − x u i ( t ) || )) + X j s ij ( t ) ˜ R ij ( t ) ≥ η i ( t ) , ∀ i, t (32b) B ik ( t ) ≤ || x ( r ) k ( t ) − x u i ( t ) || +2( x ( r ) k ( t ) − x u i ( t )) T ( x k ( t ) − x u i ( t )) , ∀ i, k = j, t (32c) − || x ( r ) j ( t ) − x ( r ) k ( t ) || + 2( x ( r ) j ( t ) − x ( r ) k ( t )) T × ( x j ( t ) − x k ( t )) ≥ d , ∀ j, k = j, t (32d) (5) , (9) (32e)where η i ( t ) and B ik ( t ) are slack variables, D ( r ) i ( t ) = log ( σ W + P k ∈J p k ( t ) θ ij g + || x ( r ) k ( t ) − x u i ( t ) || ) , E ( r ) ik ( t ) = p k ( t ) θ ij (cid:16) g + || x ( r ) k ( t ) − x u i ( t ) || (cid:17) D ( r ) i ( t ) ln 2 , ˜ R ij ( t ) = − log ( σ W + P k ∈J \{ j } p k ( t ) θ ij g + B ik ( t ) ) , θ ij = G LoS ς π , x ( r ) j ( t ) ,and x ( r ) k ( t ) are given locations of UAV j and UAV k at the r -th iteration of the SCA technique. Proof.

Please refer to Appendix C.Remark 3: The objective function (32a) is linear. As theleft-hand-side (LHS) of (32b) is concave w.r.t. both x k ( t ) and B i,k ( t ) , it is a convex constraint. (32c) and (32d) are linearconstraints. Besides, both (9) and (5) are convex quadraticconstraints. Thus, (32) is now convex and can be efﬁcientlymitigated by MOSEK. It is noteworthy that the lower-boundedapproximation conducted in (32b)-(32d) shows that the feasi-ble domain of (32) is smaller than that of (27). Hence, theopposite optimal value of (32a) is the upper bound of that of(27).

3) UAV transmit power optimization:

Given the contentdelivery decision variable set S ( t ) and UAV trajectories X ( t ) ,it is still hard to mitigate (27) owing to the non-convexobjective function. Likewise, the SCA technique is exploredto tackle the non-convexity. The following Proposition showsa method of optimizing UAV transmit power. Proposition 3.

By exploring an SCA technique, the UAVtransmit power at time slot t can be conﬁgured by mitigatingthe following convex optimization problem. Maximize P ( t ) , { η i ( t ) } − V ρ X j p j ( t ) − X j [ H j ( t )] + p j ( t )+ X i { [ Q i ( t )] + + [ Z i ( t )] + } η i ( t ) (33a) s . t . X j (cid:16) s ij ( t ) ˆ R ij ( t ) − s ij ( t ) F ( r ) ij ( t ) (cid:17) − X j ( s ij ( t ) × X k ∈J \{ j } G ( r ) ik, ( t )( p k ( t ) − p ( r ) k ( t ))) ≥ η i ( t ) , ∀ i, t (33b) η i ( t ) ≥ s ij ( t ) C th i,f i ( t ) , ∀ i, j, t (33c) (16d) , (16g) (33d)where ˆ R ij ( t ) = log ( σ W + P k ∈J p k ( t ) θ ij g + || x k ( t ) − x u i ( t ) || ) , F ( r ) ij ( t ) = log ( σ W + P k ∈J \{ j } p ( r ) k ( t ) h ik ( t )) , G ( r ) ik ( t ) = h ik ( t )2 F ( r ) ij ( t ) ln 2 , and p ( r ) k ( t ) is the given transmit power of UAV k at the r -th iteration of SCA technique. (33c) is enforced dueto the data rate requirement of enabling a user’s desired QoEstate. Proof.

Please refer to Appendix D.Remark 4: The objective function (33a) is linear. As theLHS of (33b) is concave w.r.t. p k ( t ) , it is a convex constraint. Then, we can conclude that (33) is convex that can beefﬁciently alleviated by MOSEK. Similarly, the utilization ofthe approximation results in that the feasible set of (33) is asubset of that of (27). Therefore, the optimal opposite valueof (33a) is the upper bound of that of (27).Based on the above derivation, the main steps of the iterativeoptimization scheme of solving (27) can be summarized inAlgorithm 1. Algorithm 1

Iterative UAV content delivery, trajectory, andtransmit power optimization Initialization:

Randomly initialize X (0) ( t ) and P (0) ( t ) ,let r = 0 . repeat Given X ( r ) ( t ) , P ( r ) ( t ) , solve (31) to obtain the optimalsolution S ( r +1) ( t ) . Given S ( r +1) ( t ) , X ( r ) ( t ) , P ( r ) ( t ) , solve (32) to gener-ate the optimal solution X ( r +1) ( t ) . Given S ( r +1) ( t ) , X ( r +1) ( t ) , P ( r ) ( t ) , solve (33) to ob-tain the optimal solution P ( r +1) ( t ) . Update r = r + 1 . until Convergence or r = r max .Finally, we can summarize the main steps of Lyapunov-based optimization framework of solving the original problem(16) in Algorithm 2. Algorithm 2

Fresh, Fair, and Energy-Efﬁcient Content Provi-sion (F E CP) Initialization:

Initialize Q i (1) ∈ [0 , , Z i (1) ∈ [0 , , H j (1) ∈ [0 , for all user i ∈ I , UAV j ∈ J . for each time slot t = 1 , , . . . , T do Observe the virtual queues Q i ( t ) , Z i ( t ) , and H j ( t ) . Compute γ i ( t ) using (28) for all user i . Compute b j,f i ( t ) using (29) for all UAV j . Obtain the content delivery decision set S ( t ) , UAVtrajectories X ( t ) , and UAV transmit power P ( t ) usingAlgorithm 1. Calculate p totj ( t ) for all UAV j using (3). Calculate u dl i ( t ) for all user i using (14). Update Q i ( t + 1) , Z i ( t + 1) , and H j ( t + 1) for all user i and UAV j using (18), (20), and (21), respectively. end for IV. P

ERFORMANCE ANALYSIS

In this section, the convergence performance of the proposedalgorithms is analyzed. Observing that the PAoI of a datapacket varies with many time-varying factors (e.g., inter-arrivaltime of the packet and packet queueing delay), the analysis onthe PAoI of a data packet in the average sense is conducted.

A. Convergence analysis

The following Lemma shows that the convergency andvalidity of Algorithms 1 and 2 can be guaranteed. Lemma 2.

Algorithm 1 is convergent, and Algorithm 2 canmake all virtual queues mean-rate stable.

Proof.

Please refer to Appendix E.

B. Analysis of expected PAoI

Recall that many factors such as the inter-arrival time ofa packet, packet queueing delay in the preprocessing queue,packet preprocessing time, and the edge arrival duration willcontribute to the PAoI of a packet. We next analyze thesefactors in detail.

1) Packet queueing delay analysis:

The analysis on thepacket queueing delay requires the study of packet queueingbehavior. To study the packet queueing behavior, the queueevolution process that involves the arrival, accumulation anddeparture of packets should be investigated.During time q , once received the content requests of usersduring time q , the content server will generate and send outpackets required by the users. To facilitate the analysis, aPoisson distribution with intensity (or the average number ofne w packets) ϑ w ( q ) is explored to model the random andindependent packet arrivals in the BS. As mentioned in thesystem model, once arrival, new packets will not be processedimmediately and will enter the preprocessing queue to waitto be preprocessed. If the packet preprocessing rate, whichis determined by the CPU computing speed [30], is slowerthan the packet arrival rate, some packets will be accumulatedin the preprocessing queue. After being preprocessed, packetswill depart from the preprocessing queue. In this regard, thepacket departure rate is equal to the packet preprocessingrate. Denote by N a ( q ) the a ccumulated number of packetsin the preprocessing queue at time q . The value of N a ( q ) issimultaneously determined by the following three factors: a)the accumulated number of packets; b) the number of newarrivals during q − , which will be counted at at time q ; c)the packet departure rate; thus, we can present the evolutionmodel of N a ( q ) as follows N qa =  , q = 1[ N q − w − n c ] + , q = 2[ N q − a + N q − w − n c ] + , q ≥ (34)where we write x q instead of x ( q ) to lighten the notation. N q − w is the number of ne w arrivals in time q - , the subtractionoperation of n c is performed because n c = rκl packets can bepreprocessed during each time interval, l is the packet size (inbits), r is the CPU computing speed of the preprocessor withthe units CPU cycles per second, and κ is a scaling parameterdepending on the speciﬁc operation conducted on the packetwith the units being CPU cycles per bit [30]. As new arrivals inthe -st time interval will be counted in the -nd time intervaland there are no accumulated packets in the -st time interval,we have N a = 0 .Based on the above evolution model, we have the followingLemma that derives the closed-form expressions of the accu-mulated number of packets in the preprocessed queue and thepacket queueing delay of a newly arrival packet at time q . Lemma 3.

The accumulated number of packets in the queuecan be approximated as Poisson distribution. As such, the average number of accumulated packets in the queue at time q > can be derived as ϑ qa = h ϑ q − w + ϑ q − a − n c (1 − e − ϑ q − w − ϑ q − a ) i + . (35)Besides, the average packet queueing delay of a newly arrivalpacket m sent to user i at time q > is ϑ qa Y S i,m . Proof.

Please refer to Appendix F.

2) Analysis of the expected edge arrival duration:

We knowthat a content ﬁle including a batch of packets will be deliveredfrom the BS or a UAV to a destination user in each time slot t . Then, we attempt to tackle the following issue: What is theexpected edge arrival duration for transmitting a packet?

Thefollowing Proposition gives the answer.

Proposition 4.

For any user i and packet m , the expectededge arrival duration of packet m destined to user i can begiven by E [ Y A i,m ] = ( l + L ) N ∆ t JL (36)where E [ · ] represents an expectation operation. Proof.

Please refer to Appendix G.

3) Closed-form expression of expected PAoI:

Based on theobtained results in Lemma 3 and Proposition 4, we can derivethe closed-form expression of the expected PAoI of a packetin the following Lemma.

Lemma 4.

The expected PAoI of packet m , which is generatedat time q and sent to user i , ∀ i , can be expressed as E [∆ i,m (ˆ q ; q )] = ( n c + ( l + L ) N ∆ t JL q = 1 , m = 1 λ qi + n c + ϑ qa n c + ( l + L ) N ∆ t JL q > , m > (37)where λ qi = ϑ qw Pr i is the intensity of new arrival packetsdestined to user i at time q . Proof.

Please refer to Appendix H.V. S

IMULATION R ESULTS

A. Comparison algorithms and parameter setting

To verify the effectiveness of the proposed Algorithm 2,we compare it with ﬁve benchmark algorithms: 1)

Static UAVcaching with power control (SUWPC) algorithm:

It randomlygenerates horizontal locations for N hovering UAVs withthe same deployment altitude g . The proposed UAV trans-mit power optimization method is adopted in the algorithm.Besides, each UAV randomly delivers a cached or forwardedcontent ﬁle to terrestrial users within its LoS coverage regionat each time slot after receiving users’ content requests; 2) Static UAV caching without power control (SUPC) algorithm:

The unique difference between SUPC and SUWPC lies in thatSUPC adopts the maximum UAV transmit power to delivercontent ﬁles to terrestrial users; 3)

Circular trajectory-basedjoint optimization (CTJO) algorithm:

Each UAV ﬂies in acircular trajectory with a speed of 10 m/s. At the beginningof the simulation, UAVs are deployed in line with an equalinterval. The distance between two adjacent UAVs is / J km. The horizontal locations of the ﬁrst and the last UAVsare (1 / / ⌊ J ⌋ , / km and (1 / − / ⌊ J ⌋ , / km,respectively, and turning radiuses of them are / ⌊ N ⌋ kmand / − / ⌊ J ⌋ km. In CTJO, all UAVs are deployedat the same altitude g . Besides, the proposed UAV cachingand UAV transmit power optimization methods are utilizedin this algorithm. 4) Circular trajectory-based UAV caching(CTUC) algorithm:

The unique difference between CTUCand CTJO is that CTUC adopts the maximum UAV transmitpower to deliver content ﬁles to terrestrial users. 5)

Circulartrajectory without UAV caching (CTWUC) algorithm:

Theunique difference between CTWUC and CTJO is that UAVsdo not cache content ﬁles and will randomly forward contentﬁles from the BS to users in CTWUC.We consider an urban area of size . × . km with high-rise buildings, where terrestrial users are randomly distributedand moving. This scenario corresponds to the AtG channelenvironment of the worst case [27]. The radio frequencypropagation parameters are: η LoS = 2 . , carrier frequency f c = 4 . GHz, speed of light c = 3 . × m/s, θ th = 70 ◦ , σ = − dBm/Hz, total bandwidth W = 100 MHz, ˆ D = 24 s, D ul = 5 s, L = 150 Mbits, D th = 0 . , ∆ t = ˆ D − D th ( ˆ D − L/ ( W u maxdl )) s [26], [27]. The defaultvalues of parameters related to UAVs and terrestrial users are: ˜ p j = 450 mW, ˆ p j = 500 mW, p cj = 20 mW, p min j = 1 mW, ∀ j , p max = 480 mW, e max = 250 m, g = 200 m, d min = 50 m, J = 4 , and N = 50 . Expected PAoI analysisrelated parameters are: l = 5000 Bytes, γ/κ = 1

Mbits/s, ϑ qw ∼ Pois( γτl ) , and Pr i = 1 /N , ∀ i, q [30]. Other systemparameters are listed as follow: r max = 200 , V = 0 . , T = 500 , and ρ = 0 . [1]. B. Performance evaluation

In this subsection, we conduct simulations to comprehen-sively understand the availability of the developed F E CPalgorithm. Besides, we repeat all comparison algorithms forﬁfteen times to eliminate the inﬂuence of some randomly ini-tialized parameters such as UAV transmit power and locationson the performance evaluation. The ﬁnal simulation result isthe average of the obtained ﬁfteen results.First, we design a simulation to verify the effectivenessof the F E CP algorithm under the default parameter set-ting. Particularly, we plot the tendency of the virtual queuestability values, deﬁned as S Q = max i [ Q i ( t )] + /t , S Z =max i [ Z i ( t )] + /t , and S H = max j [ H j ( t )] + /t in Fig. 4.Besides, the two-dimensional (2D) trajectories of four UAVs inthe ﬁrst 150 time slots and their ﬁnal 2D positions are plottedin Fig. 5.The following observations can be achieved from theseﬁgures: 1) the obtained queue stability values are boundedduring the whole content provision period; 2) the obtainedqueue stability values tend to zero with an increasing timeslot; as a result, all introduced virtual queues are mean-ratestable according to (19), i.e., all time average constraints in(17) can be imposed. This result veriﬁes the effectiveness ofthe proposed F E CP algorithm; 3) the movement constraintsof each UAV can be satisﬁed at each time slot.

Time slot A v e r age v a l ue s o f s t ab ili t y v a r i ab l e s S Z S Q S H Fig. 4. Trend of virtual queue stabilityvalues vs. time slot. x-coordinate (m) y - c oo r d i na t e ( m ) Trajectory of UAV 1Trajectory of UAV 2Trajectory of UAV 3Trajectory of UAV 4Final coordinate of UAV 1Final coordinate of UAV 2Final coordinate of UAV 3Final coordinate of UAV 4

Fig. 5. Trajectories of four UAVsprojected in a 2D space.

40 60 80 100 120

The number of users N e t w o r k p r o f i t F E CPCTJOCTUCCTWUCSUWPCSUPC (a) Network proﬁt vs. the number ofusers

The number of UAVs N e t w o r k p r o f i t F E CPCTJOCTUCCTWUCSUWPCSUPC (b) Network proﬁt vs. the number ofUAVsFig. 6. Comparison of the obtained network proﬁt.

40 60 80 100 120

The number of users T o t a l po w e r c on s u m p t i on ( m W ) F E CPCTJOCTUCCTWUCSUWPCSUPC (a) Total UAV power consumption vs. N The number of UAVs T o t a l po w e r c on s u m p t i on ( m W ) F E CPCTJOCTUCCTWUCSUWPCSUPC (b) Total UAV power consumption vs. J Fig. 7. Comparison of the obtained total UAV power consumption of allcomparison algorithms.

Next, we conduct a simulation to testify the performanceof F E CP by comparing it with other ﬁve benchmark algo-rithms. To quantify the algorithm performance, the followingkey performance indicators are introduced: the network proﬁtthat is deﬁned as φ ( { ¯ u dl i ( T ) } ) , the total UAV transmit powerconsumption that is calculated by P j ¯ p totj ( T ) , the energy efﬁ-ciency that is computed by (16a), the Jain’s fairness index, de-ﬁned as (cid:0)P i ¯ u dl i (cid:1) /N P i (¯ u dl i ) with ¯ u dl i = T P Tt =1 u dl i ( t ) ,the expected PAoI of a packet computed by (37), where theobtained E [ Y A i,m ] = N ( l + L ) T ∆ t / (2 L P Tt =1 P i P j s ij ( t )) .In Figs. 6 and 7, the inﬂuence of the number of usersand UAVs on the obtained network proﬁt and the total UAVpower consumption is plotted. The tendency of the obtainedenergy efﬁciency and Jain’s fairness indexes of all comparisonalgorithms are depicted in Figs. 8 and 9, respectively. Besides,in Fig. 8, the energy efﬁciency results of two comparison algo-rithms (i.e., CTUC and SUPC) are not plotted. This is becausethey suggest to transmit content ﬁles with the maximum UAVtransmit power, and will achieve smaller energy efﬁciencyvalues than other UAV power control-based algorithms. InFig. 9, the fairness index results of CTJO and SUWPC arenot plotted. As CTJO and SUWPC achieve smaller networkproﬁt than CTUC and SUPC, CTJO and SUWPC will obtainsmaller Jain’s fairness indexes. Fig. 8. Energy efﬁciency vs. N and J . Fig. 9. Jain’s fairness index vs. N and J . From these ﬁgures, we can observe that: 1) for all al-gorithms, their achieved network proﬁt increases with anincreasing number of users because content ﬁles can besuccessfully delivered to more users; 2) CTUC obtains thegreatest network proﬁt when N ≤ , followed by theproposed F E CP algorithm. This is mainly because CTUCadopts the maximum UAV transmit power that results ingreat AtG data rates. Although SUPC adopts the scheme oftransmitting content ﬁles with the maximum UAV transmitpower, small network proﬁt is obtained as many terrestrialusers are experiencing outage throughout the content provisionperiod. The observations show that the design of suitableUAV trajectories can help improve network proﬁt; 3) exceptfor F E CP, the obtained network proﬁt of other comparisonalgorithms increase with an increasing number of UAVs. ForF E CP, a great number of UAVs does not lead to big networkproﬁt. This is due to the complex impact of signal interference.For other comparison algorithms, the distance between anytwo UAVs is large. As a result, they can increase UAV transmitpower to achieve greater network proﬁt. Yet, the nearestdistance between two UAVs can be m which may resultin large signal interference. To reduce signal interference,F E CP decreases the UAV transmit power when J ≥ ;4) the generated total power consumption of all algorithmsalmost does not vary with the number of users because aUAV can deliver a content ﬁle to one user at each timeslot. When J < , F E CP consumes greater UAV transmitpower than other three comparison algorithms adopting theproposed UAV transmit power optimization method such thatgreater network proﬁt can be gained; 5) when N ≤ and J ≤ , CTJO may be energy-efﬁcient than F E CP. Forexample, when N = 50 and J = 4 , the obtained energyefﬁciency of the proposed F E CP algorithm is . % ofCTJO. However, from Fig. 6, we can ﬁnd that F E CP mayobtain higher network proﬁt than CTJO under the similarparameter setting. This observation shows that F E CP tendsto deliver content ﬁles to fewer users continuously when usersare sparsely distributed, resulting from a small N . Whenthe number of users and UAVs becomes great, F E CP canachieve greater energy efﬁciency than the CTJO algorithm.For instance, when N = 130 and J = 7 , F E CP is energy-efﬁcient by . % than CTJO. The above result indicates thata simple circular UAV trajectory is preferable when terrestrialusers are sparsely distributed in the considered communicationarea. However, when users are densely distributed, optimizingUAV trajectory will greatly improve the energy efﬁciency ofcontent provision. Besides, CTJO may be energy-efﬁcient than F E CPCTJOCTUCCTWUCSUWPCSUPCTheoretical value

Fig. 10. Comparison of expected PAoI of all comparison algorithms.

CTWUC, which means that the UAV caching can improvethe energy efﬁciency of content provision. SUWPC providesinefﬁcient content delivery due to the static UAV deployment;6) similarly, when the number of users is small (i.e., N ≤ ),F E CP may be overwhelmed by CTUC and CTWUC in termsof fair content provision. For example, when N = 30 and J = 2 , the obtained fairness index of F E CP is . % and . % of that of CTUC and CTWUC, respectively. However,when N > , F E CP can provide fairer content deliveryservices for terrestrial users than CTUC, CTWUC, and SUPC.For instance, the obtained Jain’s fairness index of F E CP is . , . , and . times of CTWUC, CTUC, and SUPC,respectively, when N = 130 and J = 7 .We plot the tendency of expected PAoI obtained by all com-parison algorithms in Fig. 10 to show whether the expectedPAoI can be reduced by the joint design of UAV caching, UAVtrajectory, and UAV transmit power. In Fig. 10, the expectedPAoI is obtained with N = 30 and J = 5 .From this ﬁgure, we can observe: 1) the proposed F E CPalgorithm achieves a small value of expected PAoI that is closeto the theoretical value. The obtained values of E [∆ i,m (ˆ q ; q )] of other comparison algorithms are much greater than the the-oretical value. Besides, the two circular UAV trajectory-basedalgorithms with UAV caching achieve smaller E [∆ i,m (ˆ q ; q )] than the two static UAV trajectory-based algorithms. This ob-servation shows the importance of UAV trajectory optimizationin delivering fresh packets to terrestrial users; 2) CTWUCobtains the largest E [∆ i,m (ˆ q ; q )] and takes at least . seconds more than CTJO and CTUC to deliver packets tousers. It indicates that UAV caching can signiﬁcantly reducethe latency of delivering packets to users; 3) it is interestingto ﬁnd that the UAV power control method alone cannoteffectively reduce the expected edge arrival duration. Forexample, CTUC achieves smaller E [∆ i,m (ˆ q ; q )] than CTJO,yet, SUPC obtains greater E [∆ i,m (ˆ q ; q )] than SUWPC.Summarily, the above simulation results indicate that thejoint design of UAV caching, UAV trajectory, and UAVtransmit power can guarantee the provision of fresh, fair, andenergy-efﬁcient content ﬁles for terrestrial users.VI. C ONCLUSION AND FUTURE WORK

This paper investigated a private and cache-enabled UAVnetwork for providing fresh, fair, and energy-efﬁcient contentdelivery services for terrestrial users. To achieve this goal,we formulated a joint UAV caching, UAV trajectory, andUAV transmit power optimization problem. We proposed a novel algorithm by leveraging a Lyapunov-based optimizationframework integrating an iterative optimization scheme andan SCA technique to solve the formulated problem. Besides,we discussed the convergence behaviour of the proposed algo-rithm as well as derived the theoretical value of expected PAoIof data packets. Simulation results veriﬁed that the proposedalgorithm could provide fresh content ﬁles for users and wasmore . % and . % energy-efﬁcient and fairer thanbenchmark algorithms. This paper explored an optimizationmethod to design UAV trajectories in a 2D space. How toincorporate reinforcement learning with optimization to solvethe joint UAV caching, three-dimensional UAV trajectory, andUAV transmit power optimization problem is a topic worthyof research in the near future.A PPENDIX

A. Proof of Proposition 1

Suppose all limits in (17) exist, the constraint (17b) is there-fore equivalent to ¯ u dl i ( t ) = ¯ γ i ( t ) . ¯ g ( t ) ≤ φ (¯ u dl1 ( t ) , . . . , ¯ u dl N ( t )) can then be achieved. It means that the maximum value ofthe objective function of (17) is no greater than that of (16).Besides, the maximum value of the objective function of (16)can be obtained through letting ¯ γ i ( t ) = ¯ u dl; ⋆i ( t ) for all i ∈ I ,and t ∈ { , , . . . } with (¯ u dl; ⋆ ( t ) , . . . , ¯ u dl; ⋆N ( t )) being theoptimal time average achievable data rates of all users for (16)[32]. It indicates that the feasible domain of (16) is smallerthan that of (17). Therefore, (17) and (16) are equivalent. Thiscompletes the proof. B. Proof of Lemma 1

We discuss the upper bound of ([ Q i ( t + 1)] + ) in threecases. According to (18) and the non-negative operation,Case I: when Q i ( t + 1) ≥ and Q i ( t ) ≥ , we can obtain ([ Q i ( t + 1)] + ) = ([ Q i ( t )] + ) +[ Q i ( t )] + ( ϕC th i,f i ( t ) − u dl i ( t ))+ ( ϕC th i,f i ( t ) − u dl i ( t )) (38)Case II: when Q i ( t + 1) ≥ and Q i ( t ) < , we can achieve ϕC th i,f i ( t ) − u dl i ( t ) > Q i ( t + 1) ≥ , [ Q i ( t )] + = 0 and ([ Q i ( t + 1)] + ) < ( ϕC th i,f i ( t ) − u dl i ( t )) = ([ Q i ( t )] + ) + [ Q i ( t )] + ( ϕC th i,f i ( t ) − u dl i ( t ))+ ( ϕC th i,f i ( t ) − u dl i ( t )) (39)Case III: when Q i ( t + 1) < , we can obtain ([ Q i ( t + 1)] + ) = 0 ≤ ([ Q i ( t )] + + ( ϕC th i,f i ( t ) − u dl i ( t )) = ([ Q i ( t )] + ) + [ Q i ( t )] + ( ϕC th i,f i ( t ) − u dl i ( t ))+ ( ϕC th i,f i ( t ) − u dl i ( t )) (40)Therefore, we can have ([ Q i ( t + 1)] + ) ≤ ([ Q i ( t )] + ) +[ Q i ( t )] + ( ϕC th i,f i ( t ) − u dl i ( t ))+ ( ϕC th i,f i ( t ) − u dl i ( t )) (41) Similarly, according to (20), (21), and the non-negativeoperation, we have ([ Z i ( t + 1)] + ) = ([ Z i ( t )] + ) +[ Z i ( t )] + ( γ i ( t ) − u dl i ( t )) + ( γ i ( t ) − u dl i ( t )) (42)and ([ H j ( t + 1)] + ) ≤ ([ H j ( t )] + ) +[ H j ( t )] + ( p j ( t ) − ˜ p j + p cj ) + ( p j ( t ) − ˜ p j + p cj ) (43)With inequalities (41)-(43), we can obtain a new inequalityby utilizing the deﬁnition of Lyapunov drift. Next, we canachieve (24) by adding − V (cid:16) g ( t ) − P Nj =1 p j ( t ) (cid:17) to both sidesof the new inequality. This completes the proof. C. Proof of Proposition 2

For any given UAV transmit power P ( t ) , UAV trajectories X ( t − at time slot t − , and the content delivery decision set S ( t ) , we can optimize the variables X ( t ) in (27) via mitigatingthe following problem Maximize X ( t ) X i { [ Q i ( t )] + + [ Z i ( t )] + } u dl i ( t ) (44a) s . t . : (5) , (6) , (9) . (44b)To simplify (44a), we introduce slack variables { η i } , withwhich (44) can be reformulated as Maximize X ( t ) , { η i ( t ) } X i { [ Q i ( t )] + + [ Z i ( t )] + } η i ( t ) (45a) s . t . : u dl i ( t ) ≥ η i ( t ) , ∀ i, t (45b) (5) , (6) , (9) . (45c)Denote by η ⋆i the optimal solution to (45). If η ⋆i satisﬁes(45b) with strict inequality, we can then decrease u dl i ( t ) to make (45b) active without changing the value of (45a).Therefore, (45) is equivalent to (44).As h ij ( t ) can be rewritten as h ij ( t ) = θ ij g + || x j ( t ) − x u i ( t ) || ,where θ ij = G LoS ς π , the achievable data rate of user i can be expressed as u dl i ( t ) = P j s ij ( t ) R ij ( t ) with R ij ( t ) = ˆ R ij ( t ) − log ( σ W + P k ∈J \{ j } p k ( t ) θ ij g + || x k ( t ) − x i || ) ,and ˆ R ij ( t ) = log ( σ W + P k ∈J p k ( t ) θ ij g + || x k ( t ) − x u i ( t ) || ) .(45) is not convex due to the non-convex constraint (6),and (45b). Therefore, we may not ﬁnd efﬁcient methods toobtain the optimal solution to (45). Although (45b) is notconcave with w.r.t. x j ( t ) , we can observe that ˆ R ij ( t ) isconvex w.r.t. || x k ( t ) − x u i ( t ) || . Accordingly, a slack variable B ik ( t ) = || x k ( t ) − x u i ( t ) || ∀ i, k = j is involved to transform(45) into the following new problem Maximize X ( t ) , { η i ( t ) } , { B ik ( t ) } X i { [ Q i ( t )] + + [ Z i ( t )] + } η i ( t ) (46a) s . t . : X j s ij ( t )( ˆ R ij ( t ) + ˜ R ij ( t )) ≥ η i ( t ) , ∀ i, t (46b) B ik ( t ) ≤ || x k ( t ) − x u i ( t ) || , ∀ i, k = j, t (46c) (5) , (6) , (9) . (46d)where ˜ R ij ( t ) = − log ( σ W + P k ∈J \{ j } p k ( t ) θ ij g + B ik ( t ) ) . Similar to (45), although a slack variable B ik ( t ) is intro-duced, (46) is equivalent to (45). Unfortunately, (46) is stillnon-convex as (6), (46b), and (46c) are non-convex.To handle the non-convexity of (46), an SCA technique isexplored. It can be observed that ˆ R ij ( t ) , ∀ i, j is convex w.r.t. || x k ( t ) − x u i ( t ) || and will be globally lower-bounded by itsﬁrst-order Taylor expansion at any local point [36]. Therefore,for a given local point at the ( r + 1) -th iteration ( r ≥ ),denoted by x ( r ) k ( t ) , ˆ R ij ( t ) is lower-bounded by ˆ R ij ( t ) ≥ log (cid:18) σ W + P k ∈J p k ( t ) θ ij g + || x ( r ) k ( t ) − x u i ( t ) || (cid:19) − P k ∈J pk ( t ) θij ( g || x ( r ) k ( t ) − x u i ( t ) || ) (cid:16) || x k ( t ) − x u i ( t ) || −|| x ( r ) k ( t ) − x u i ( t ) || (cid:17) σ W + P k ∈J pk ( t ) θijg || x ( r ) k ( t ) − x u i ( t ) || ! ln 2 = D ( r ) i ( t ) − P k ∈J E ( r ) ik ( t )( || x k ( t ) − x u i ( t ) || −|| x ( r ) k ( t ) − x u i ( t ) || ) (47)where D ( r ) i ( t ) = log (cid:18) σ W + P k ∈J p k ( t ) θ ij g + || x ( r ) k ( t ) − x u i ( t ) || (cid:19) and E ( r ) ik ( t ) = p k ( t ) θ ij (cid:16) g + || x ( r ) k ( t ) − x u i ( t ) || (cid:17) D ( r ) i ( t ) ln 2 .Besides, for a given location point ( x ( r ) j ( t ) , x ( r ) k ( t )) , we canobtain the lower bound of k x j ( t ) − x k ( t ) k via the ﬁrst orderTaylor expansion as below || x j ( t ) − x k ( t ) || ≥ −|| x ( r ) j ( t ) − x ( r ) k ( t ) || +2( x ( r ) j ( t ) − x ( r ) k ( t )) T ( x j ( t ) − x k ( t )) . (48)Similarly, for a given location point x rk ( t ) , || x k ( t ) − x u i ( t ) || is lower-bounded by || x k ( t ) − x u i ( t ) || ≥ || x ( r ) k ( t ) − x u i ( t ) || +2( x ( r ) k ( t ) − x u i ( t )) T ( x k ( t ) − x u i ( t )) . (49)For any local point X ( r ) ( t ) = { x ( r ) k ( t ) } , by referring to(47)-(49), (46) can be approximated as (32). This completesthe proof. D. Proof of Proposition 3

For any given content delivery decision set S ( t ) as well asUAV trajectories X ( t ) , the UAV transmit power of (16) canbe optimized via mitigating the following problem Maximize P ( t ) , { η i ( t ) } − V ρ X j p j ( t ) − X j [ H j ( t )] + p j ( t )+ X i { [ Q i ( t )] + + [ Z i ( t )] + } η i ( t ) (50a) s . t . : X j s ij ( t )log (1 + p j ( t ) h ij ( t ) σ W + P k ∈J \{ j } p k ( t ) h ik ( t ) ) ≥ η i ( t ) , i, t (50b) η i ( t ) ≥ s ij ( t ) C th i,f i ( t ) , i, j, t (50c) (16d) , (16g) (50d)where (50c) is enforced due to the data rate requirement ofenabling a user’s desired QoE state.Owing to the non-convex constraint (50b), (50) is non-convex; as a result, it is challenging to achieve its optimal solution. However, we observe that (50b) is a difference of twoconcave functions w.r.t. p k ( t ) . Accordingly, we adopt the SCAtechnique again to approximate (50b). Speciﬁcally, R ij ( t ) canbe rewritten as R ij ( t ) = ˆ R ij ( t ) − ⌣ R ij ( t ) , where ⌣ R ij ( t ) =log (cid:16) σ W + P k ∈J \{ j } p k ( t ) h ik ( t ) (cid:17) . For any local point P ( r ) ( t ) = { p rj ( t ) } , via the ﬁrst order Taylor expansion ⌣ R ij ( t ) is upper-bounded by ⌣ R ij ( t ) ≤ log (cid:16) σ W + P k ∈J \{ j } p ( r ) k ( t ) h ik ( t ) (cid:17) + P k ∈J \{ j } h ik ( t ) ln − σ W + P k ∈J\{ j } p ( r ) k ( t ) h ik ( t ) (cid:16) p k ( t ) − p ( r ) k ( t ) (cid:17) = F ( r ) ij ( t ) + P k ∈J \{ j } G ( r ) ik ( t ) (cid:16) p k ( t ) − p ( r ) k ( t ) (cid:17) (51)where F ( r ) ij ( t ) = log (cid:16) σ W + P k ∈J \{ j } p ( r ) k ( t ) h ik ( t ) (cid:17) and G ( r ) ik ( t ) = h ik ( t )2 F ( r ) ij ( t ) ln 2 .We can thus write the lower bound of R ij ( t ) as R ij ( t ) ≥ ˆ R ij ( t ) − F ( r ) ij ( t ) − P k ∈J \{ j } G ( r ) ik ( t )( p k ( t ) − p ( r ) k ( t )) .In summary, for any local point P ( r ) ( t ) , (50) can beapproximated as (33). This completes the proof. E. Proof of Lemma 2

Given a local point ( X ( r ) ( t ) , P ( r ) ( t )) , theobtained value of (31a) at the ( r + 1) -th iteration,denoted by Γ( S ( r +1) ( t ) , X ( r ) ( t ) , P ( r ) ( t )) , isno greater than Γ( S ( r ) ( t ) , X ( r ) ( t ) , P ( r ) ( t )) viaoptimizing (31). Given a point ( S ( r +1) ( t ) , P ( r ) ( t )) ,we have Γ( S ( r +1) ( t ) , X ( r ) ( t ) , P ( r ) ( t )) ≥ Γ( S ( r +1) ( t ) , X ( r +1) ( t ) , P ( r ) ( t )) due to the minimizationof the upper-bounded problem of (45). Likewise,the inequality Γ( S ( r +1) ( t ) , X ( r +1) ( t ) , P ( r +1) ( t )) ≥ Γ( S ( r +1) ( t ) , X ( r +1) ( t ) , P ( r ) ( t )) can be obtained at ( S ( r +1) ( t ) , X ( r +1) ( t )) . Besides, Γ( S ( r ) ( t ) , X ( r ) ( t ) , P ( r ) ( t )) is bounded at each iteration. Therefore, Algorithm 1 isconvergent.Lemma 1 points out that ∆( t ) − V ( g ( t ) − ρ P j p totj ( t )) isupper-bounded at each time slot t . The time average of L ( t ) then tends to be zero when t → ∞ . Therefore, Algorithm 2can make all virtual queues mean-rate stable. This completesthe proof. F. Proof of Lemma 3

According to the evolution model in (34), N a = 0 . There-fore, we focus on deriving the close-form expression of N qa with q > . During the -nd time interval, n = 0 indicates thatthere are no new arrivals or n c new arrivals during the -sttime interval; n > means that the number of new arrivalsin the -st time interval is n + n c . Therefore, the probabilitymass function (PMF) of the accumulated packets, denoted as f N a ( n ) , can be expressed as f N a ( n ) = ( e − ϑ w + ϑ w e − ϑ w , n = 0( ϑ w ) n + nc e − ϑ w ( n + n c )! , n > (52)Similarly, in the -rd time interval, three cases will result in n = 0 : Case I, both the accumulate number of packets and the number of new arrivals in the -nd time interval are zero; CaseII, the accumulated number of packets is n c and the numberof new arrivals is zero in the -nd time interval; Case III, theaccumulated number of packets is zero while the number ofnew arrivals is n c in the -nd time interval. n > indicatesthat the sum of the accumulated number of packets and thenumber of new arrivals in the -nd time interval is n + n c .Therefore, we can obtain the PMF of the accumulate numberof packets, denoted by f N a ( n ) , in the -rd time interval asfollows f N a ( n ) =  e − ϑ w f N a (0) + ϑ w e − ϑ w f N a (0)+ e − ϑ w f N a ( n c ) , n = 0 n + n c P z =0 ( ϑ w ) z z ! e − ϑ w f N a ( n + n c − z ) , n > (53)Likewise, in the q -th time interval, we can obtain the PMFof the accumulated number of packets, denoted by f N qa ( n ) , asfollows f N qa ( n ) =  e − ϑ q − w f N q − a (0) + ϑ q − w e − ϑ q − w × f N q − a (0) + e − ϑ t − w f N q − a ( n c ) , n = 0 n + n c P z =0 ( ϑ q − w ) z z ! e − ϑ q − w f N q − a ( n + n c − z ) , n > (54)In (54), f N qa ( n ) correlates with f N q − a ( n + n c − z ) ina sophisticated recursive way, which signiﬁcantly hindersthe theoretical derivation of the closed-form expression of f N qa ( n ) . Moreover, the complexity of the theoretical derivationexponentially increases with q . To tackle this problem, wepropose to derive an approximated expression of f N qa ( n ) . Asnew packets arrive in a Poisson process, the packet departurecan be considered as an approximated packet thinning of thearrived packets [37], [38]. After this packet thinning in aspeciﬁc time interval, the number of accumulated packets intime q ( q > can be approximated as a Poisson distribution[38]. Then, denote by ϑ qa the average number of accumulatedpackets in time q . In the -nd time interval, we can calculate ϑ a as ϑ a ( a ) = ∞ P n =1 ( n − n c ) ( ϑ w ) n e − ϑ w n ! = ∞ P n =0 n ( ϑ w ) n e − ϑ w n ! − n c (cid:18) ∞ P n =0 ( ϑ w ) n e − ϑ w n ! − e − ϑ w (cid:19) ( b ) = h ϑ w − n c (cid:16) − e − ϑ w (cid:17)i + (55)where (a) holds because Pr { N a = n − n c } = Pr { N w = n } if n = 0 with Pr { x } denoting the probability of event x , and(b) holds because ϑ a is non-negative at each time slot t .In the -rd time interval, the intensity of accumulated datapackets in the preprocessing queue can be derived as thefollowing ϑ a = ∞ P n =1 ( n − n c ) n P z =0 ( ϑ w ) z e − ϑ w z ! ( ϑ a ) n − z e − ϑ a ( n − z )! = ∞ P n =1 n n P z =0 ( ϑ w ) z e − ϑ w z ! ( ϑ a ) n − z e − ϑ a ( n − z )! − n c (cid:18) ∞ P n =0 n P z =0 ( ϑ w ) z e − ϑ w z ! ( ϑ a ) n − z e − ϑ a ( n − z )! − e − ϑ w − ϑ a (cid:19) = h ϑ w + ϑ a − n c (cid:16) − e − ϑ w − ϑ a (cid:17)i + (56) When q > , since the accumulated packet evolution modelof the preprocessing queue is similar to that at q = 3 , we canextend the conclusion obtained at q = 3 to that at q > .Therefore, we can obtain the closed-form expression of ϑ qa at q > with ϑ qa = h ϑ q − w + ϑ q − a − n c (cid:16) − e − ϑ q − w − ϑ q − a (cid:17)i + (57)Besides, at time q , for packet m , it has to wait until thecompletion of the preprocessing of accumulated packets inthe preprocessing queue according to the FCFS principle. Thepreprocessing time of each packet is Y S i,m due to the samepacket size and preprocessing operation. Thus, the averagepacket queueing delay of packet m is ϑ qa Y S i,m . This completesthe proof. G. Proof of Proposition 4

For packet m , it will be delivered to its receiving user i if and only if a) m has been processed by the BS; b) P j s ij ( t ) = 1 at time slot t . As there are N users and J UAVsand the goal of the communication problem is to provide faircontent delivery for all users, the probability that any user canconnect to a UAV is

J/N at each time slot. The delivery of acontent ﬁle including

L/l packets from the BS or a UAV toits corresponding user should be completed in a time slot ofduration ∆ t . Further, one packet will be sent to user i at a time,and L/l packets will be sequentially delivered in duration ∆ t .Then, we have E [ Y A i, ] = Nl ∆ t JL , and the expected edge arrivalduration of the last packet of the content ﬁle is N ∆ t J . Thus,for any packet m in the content ﬁle transmitted to user i , theexpected edge arrival duration of packet m is Nl ∆ t JL + N ∆ t J .This completes the proof. H. Proof of Lemma 4

By referring to (15), we can write the expectation of PAoIof packet m sent to i as follows ∆ i,m (ˆ q ; q ) = E [ X i,m ] + E [ Y i,m ]= E [ X i,m ] + E [ Y Q i,m ] + E [ Y S i,m ] + E [ Y A i,m ] (58)Next, we discuss the value of ∆ i,m (ˆ q ; q ) from the followingtwo cases:Case I: q = 1 and m = 1 . The value of Pr { M ( q ) = 0 } will become smaller as q increases, where M ( q ) denotes thenumber of newly arrived packets before time q . For example,according to (55), the average number of accumulated packetsin the preprocessing queue is [ ϑ w − n c (1 − e − ϑ w )] + when q = 2 . Besides, the content server will generate packets re-quired by a user after receiving its content request. It indicatesthat the probability that the content server generates packetsdestined to i during each time interval is equal to Pr i , ∀ i .Then, the expected number of new packets is ϑ w + n c e − ϑ w ,and the number of packets sent to user i is M i ( q ) =( ϑ w + n c e − ϑ w )Pr i , where M i ( q ) is the expected number ofpackets destined to user i . We can envision that the expectednumber of generated packets sent to user i will be greaterthan ( ϑ w + n c e − ϑ w )Pr i when q > . Recall that the contentserver generates packets according to a Poisson process with rate ϑ qw at time q and sends them to the BS. The generatedpackets can be considered to be belonging to N different datastreams for N terrestrial users, respectively. Each data stream i is chosen independently at time q with probability Pr i . Thissetup is equivalent to having N independent Poisson sourceswith rates λ qi = ϑ qw Pr i , ∀ i , and ϑ qw = λ q + · · · + λ qN (see[39]). Thus, we have Pr { N i ( q ) = m } = ( ϑ qw q Pr i ) m m ! e − ϑ qw q Pr i and Pr { N i (1) = 0 } = 1 /e ϑ w Pr i , which will be a small value.It indicates that the event that the content server does notgenerate the ﬁrst packet for user i in the -st time intervalis a small probability event. We therefore consider the caseof q = 1 and m = 1 . On the other hand, according to (34),the accumulated packets in the preprocessing queue is zerowhen q = 1 . Therefore, the expected time for the ﬁrst packetdestined to user i includes the preprocessing time and theexpected edge arrival duration, i.e., we can obtain E [∆ i, (ˆ1; 1)] = E [ Y S i, ] + E [ Y A i,m ]= n c + ( l + L ) N ∆ t JL (59)Case II: q > and m > . As the inter-arrival timeof packets destined to user i from the content server fol-lows an exponential distribution with intensity λ qi , we have E [ X i,m ] = 1 /λ qi . Besides, since each packet has the samesize, the expected preprocessing time for each packet will bethe same, i.e., E [ Y S i,m ] = 1 /n c . From Lemma 3, we know thatthe average packet queueing delay of packet m is ϑ qa Y S i,m .Thus, we have E [ Y Q i,m ] = ϑ qa /n c . Proposition 4 also showsthat E [ Y A i,m ] = ( l + L ) N ∆ t JL ∀ i, m . Therefore, (37) is obtained.This completes the proof.R EFERENCES[1] P. Yang, X. Xi, T. Q. S. Quek, J. Chen, X. Cao, and D. Wu, “Repeatedlyenergy-efﬁcient and fair service coverage: Uav slicing,” in

GLOBECOM2020-2020 IEEE Global Communications Conference . IEEE, 2020, pp.1–6.[2] Cisco, “Cisco visual networking index: global mobile datatrafﬁc forecast update, 2017–2022,” https://davidellis.ca/wp-content/uploads/2019/12/cisco-vni-mobile-data-trafﬁc-feb-2019.pdf,2019.[3] J. Ji, K. Zhu, D. Niyato, and R. Wang, “Joint cache placement, ﬂighttrajectory and transmission power optimization for multi-UAV assistedwireless networks,”

IEEE Transactions on Wireless Communications ,vol. 19, no. 8, pp. 5389–5403, 2020.[4] X. Cao, P. Yang, M. Alzenad, X. Xi, D. Wu, and H. Yanikomeroglu,“Airborne communication networks: A survey,”

IEEE Journal on Se-lected Areas in Communications , vol. 36, no. 9, pp. 1907–1926, 2018.[5] H. Wu, F. Lyu, C. Zhou, J. Chen, L. Wang, and X. Shen, “Optimal UAVcaching and trajectory in aerial-assisted vehicular networks: A learning-based approach,”

IEEE Journal on Selected Areas in Communications ,vol. 38, no. 12, pp. 2783–2797, 2020.[6] M. Aloqaily, O. Bouachir, A. Boukerche, and I. Al Ridhawi, “Designguidelines for blockchain-assisted 5G-UAV networks,”

IEEE Network ,vol. 35, no. 1, pp. 64–71, 2021.[7] J. Urama, R. Wiren, O. Galinina, J. Kauppi, K. Hiltunen, J. Erkkila,F. Chernogorov, P. Etelaaho, M. Heikkila, J. Torsner et al. , “UAV-aidedinterference assessment for private 5G NR deployments: Challenges andsolutions,”

IEEE Communications Magazine , vol. 58, no. 8, pp. 89–95,2020.[8] R. Amorim, I. Z. Kov´acs, J. Wigard, T. B. Sorensen, and P. Mogensen,“Forecasting spectrum demand for UAVs served by dedicated allocationin cellular networks,” in . IEEE, 2019, pp. 1–6.[9] R. de Silva and S. Rajasinghege, “Optimal desired trajectories ofUAVs in private UAV networks,” in . IEEE, 2018, pp.310–314. [10] B. Jiang, J. Yang, H. Xu, H. Song, and G. Zheng, “Multimediadata throughput maximization in Internet-of-Things system based onoptimization of cache-enabled UAV,”

IEEE Internet of Things Journal ,vol. 6, no. 2, pp. 3525–3532, 2018.[11] L. Wang, H. Wu, Z. Han, P. Zhang, and H. V. Poor, “Multi-hop cooper-ative caching in social IoT using matching theory,”

IEEE Transactionson Wireless Communications , vol. 17, no. 4, pp. 2127–2145, 2017.[12] J. Ji, K. Zhu, D. Niyato, and R. Wang, “Joint trajectory design andresource allocation for secure transmission in cache-enabled UAV-relaying networks with D2D communications,”

IEEE Internet of ThingsJournal , 2020, in Press. DOI: 10.1109/JIOT.2020.3013647.[13] S. Chai and V. K. Lau, “Online trajectory and radio resource optimiza-tion of cache-enabled UAV wireless networks with content and energyrecharging,”

IEEE Transactions on Signal Processing , vol. 68, pp. 1286–1299, 2020.[14] E. Kalantari, H. Yanikomeroglu, and A. Yongacoglu, “Wireless networkswith cache-enabled and backhaul-limited aerial base stations,”

IEEETransactions on Wireless Communications , vol. 19, no. 11, pp. 7363–7376, 2020.[15] H. Hu, K. Xiong, G. Qu, Q. Ni, P. Fan, and K. B. Letaief, “AoI-minimaltrajectory planning and data collection in UAV-assisted wireless poweredIoT networks,”

IEEE Internet of Things Journal , vol. 8, no. 2, pp. 1211– 1223, 2021.[16] M. Samir, C. Assi, S. Sharafeddine, and A. Ghrayeb, “Online altitudecontrol and scheduling policy for minimizing AoI in UAV-assisted IoTwireless networks,”

IEEE Transactions on Mobile Computing , 2020, inPress. DOI: 10.1109/TMC.2020.3042925.[17] S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often shouldone update?” in . IEEE, 2012, pp.2731–2735.[18] S. Zhang, H. Zhang, Z. Han, H. V. Poor, and L. Song, “Age of informa-tion in a cellular Internet of UAVs: Sensing and communication trade-off design,”

IEEE Transactions on Wireless Communications , vol. 19,no. 10, pp. 6578–6592, 2020.[19] P. Rost, C. Mannweiler, D. S. Michalopoulos, C. Sartori, V. Sciancale-pore, N. Sastry, O. Holland, S. Tayade, B. Han, D. Bega et al. , “Networkslicing to enable scalability and ﬂexibility in 5G mobile networks,”

IEEECommunications magazine , vol. 55, no. 5, pp. 72–79, 2017.[20] P. Schneider, C. Mannweiler, and S. Kerboeuf, “Providing strong 5Gmobile network slice isolation for highly sensitive third-party services,”in . IEEE, 2018, pp. 1–6.[21] B. Aboba, L. Blunk, J. Vollbrecht, J. Carlson, H. Levkowetz et al.

IEEE CommunicationsSurveys & Tutorials , vol. 19, no. 3, pp. 1657–1681, 2017.[25] M. Ma and V. W. Wong, “Age of information driven cache contentupdate scheduling for dynamic contents in heterogeneous networks,”

IEEE Transactions on Wireless Communications , vol. 19, no. 2, pp. 8427– 8441, 2020.[26] M. Chen, M. Mozaffari, W. Saad, C. Yin, M. Debbah, and C. S. Hong,“Caching in the sky: Proactive deployment of cache-enabled unmannedaerial vehicles for optimized quality-of-experience,”

IEEE Journal onSelected Areas in Communications , vol. 35, no. 5, pp. 1046–1061, 2017.[27] A. Al-Hourani, S. Kandeepan, and S. Lardner, “Optimal LAP altitudefor maximum coverage,”

IEEE Wireless Communications Letters , vol. 3,no. 6, pp. 569–572, 2014.[28] K. Mitra, A. Zaslavsky, and C. ˚Ahlund, “Context-aware QoE modelling,measurement, and prediction in mobile computing systems,”

IEEETransactions on Mobile Computing , vol. 14, no. 5, pp. 920–936, 2013.[29] X. Xi, X. Cao, P. Yang, J. Chen, T. Q. Quek, and D. Wu, “Net-work resource allocation for eMBB payload and URLLC controlinformation communication multiplexing in a multi-UAV relay net-work,”

IEEE Transactions on Communications , 2020, in Press. DOI:10.1109/TCOMM.2020.3042970.[30] C. Xu, H. H. Yang, X. Wang, and T. Q. Quek, “Optimizing informationfreshness in computing-enabled IoT networks,”

IEEE Internet of ThingsJournal , vol. 7, no. 2, pp. 971–985, 2019. [31] R. Talak, S. Karaman, and E. Modiano, “Optimizing informationfreshness in wireless networks under general interference constraints,” IEEE/ACM Transactions on Networking , vol. 28, no. 1, pp. 15–28, 2019.[32] M. J. Neely, “A Lyapunov optimization approach to repeated stochasticgames,” in . IEEE, 2013, pp. 1082–1089.[33] C. J. Watkins and P. Dayan, “Q-learning,”

Machine learning , vol. 8, no.3-4, pp. 279–292, 1992.[34] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,D. Silver, and D. Wierstra, “Continuous control with deep reinforcementlearning,”

Computer Science , vol. 8, no. 6, p. A187, 2015.[35] G. Scutari, F. Facchinei, and L. Lampariello, “Parallel and distributedmethods for constrained nonconvex optimization-part I: Theory,”

IEEETransactions on Signal Processing , vol. 65, no. 8, pp. 1929–1944, 2016.[36] B. Stephen and V. Lieven,

Convex Optimization . Cambridge UniversityPress, 2004.[37] N. Hohn and D. Veitch, “Inverting sampled trafﬁc,”

IEEE/ACM Trans-actions on Networking , vol. 14, no. 1, pp. 68–80, 2006.[38] N. Jiang, Y. Deng, X. Kang, and A. Nallanathan, “Random accessanalysis for massive IoT networks under a new spatio-temporal model: Astochastic geometry approach,”

IEEE Transactions on Communications ,vol. 66, no. 11, pp. 5788–5803, 2018.[39] S. M. Ross, J. J. Kelly, R. J. Sullivan, W. J. Perry, D. Mercer, R. M.Davis, T. D. Washburn, E. V. Sager, J. B. Boyce, and V. L. Bristow,