[PDF] Resource Reservation in Backhaul and Radio Access Network with Uncertain User Demands

Abstract

Resource reservation is an essential step to enable wireless data networks to support a wide range of user demands. In this paper, we consider the problem of joint resource reservation in the backhaul and Radio Access Network (RAN) based on the statistics of user demands and channel states, and also network availability. The goal is to maximize the sum of expected traffic flow rates, subject to link and access point budget constraints, while minimizing the expected outage of downlinks. The formulated problem turns out to be non-convex and difficult to solve to global optimality. We propose an efficient Block Coordinate Descent (BCD) algorithm to approximately solve the problem. The proposed BCD algorithm optimizes the link capacity reservation in the backhaul using a novel multipath routing algorithm that decomposes the problem down to link-level and parallelizes the computation across backhaul links, while the reservation of transmission resources in RAN is carried out via a novel scalable and distributed algorithm based on Block Successive Upper-bound Minimization (BSUM). We prove that the proposed BCD algorithm converges to a Karush-Kuhn-Tucker solution. Simulation results verify the efficiency and the efficacy of our BCD approach against two heuristic algorithms.

Full PDF

11 Resource Reservation in Backhaul and RadioAccess Network with Uncertain User Demands

Navid Reyhanian, Hamid Farmanbar, and Zhi-Quan Luo,

Fellow, IEEE

Abstract —Resource reservation is an essential step to enablewireless data networks to support a wide range of user de-mands. In this paper, we consider the problem of joint resourcereservation in the backhaul and Radio Access Network (RAN)based on the statistics of user demands and channel states,and also network availability. The goal is to maximize the sumof expected trafﬁc ﬂow rates, subject to link and access pointbudget constraints, while minimizing the expected outage ofdownlinks. The formulated problem turns out to be non-convexand difﬁcult to solve to global optimality. We propose an efﬁcientBlock Coordinate Descent (BCD) algorithm to approximatelysolve the problem. The proposed BCD algorithm optimizes thelink capacity reservation in the backhaul using a novel multi-path routing algorithm that decomposes the problem downto link-level and parallelizes the computation across backhaullinks, while the reservation of transmission resources in RANis carried out via a novel scalable and distributed algorithmbased on Block Successive Upper-bound Minimization (BSUM).We prove that the proposed BCD algorithm converges to aKarush–Kuhn–Tucker (KKT) solution. Simulation results verifythe efﬁciency and the efﬁcacy of our BCD approach against twoheuristic algorithms.

Index Terms —Resource reservation, multi-path routing, trafﬁcmaximization, outage minimization, parallel computation.

I. I

NTRODUCTION

Resource reservation is an important step in network plan-ning and management due to its signiﬁcant effects on theuser quality of service. For wireless data networks operat-ing in random and dynamic environments, ﬁnding resourcereservation protocols that remain robust under uncertain userdemands is challenging. Resource reservation, which balancesnetwork performance and its hardware costs, involves trafﬁcforecasting and resource allocation for the predicted trafﬁc [2]–[4]. Resource reservation in the backhaul and Radio AccessNetwork (RAN) should satisfy a wide range of applicabletrafﬁc demands. In particular, both the link capacity in thebackhaul and transmission resources in RAN should be slicedand reserved for users such that upon the arrival of a newdemand, the network is able to support it.Resource reservation for the uncertain demand was ﬁrststudied by Gomory and Hu in [5], which reserved link

N. Reyhanian is with the Department of Electrical and Computer Engi-neering, University of Minnesota, Minneapolis, MN, 55455 USA (e-mail:[email protected]).H. Farmanbar is with Huawei Canada Research Center, Ottawa, Canada(e-mail: [email protected]).Z.-Q. Luo is with Shenzhen Research Institute of Big Data, The ChineseUniversity of Hong Kong, Shenzhen, China (e-mail: [email protected]).This paper was presented in part at the st IEEE International Workshop onSignal Processing Advances in Wireless Communications (SPAWC), Atlanta,GA, USA, May 26–29, 2020 [1]. capacities using a single commodity routing problem witha ﬁnite number of sources. For communication networks,where both link budget and node budget are to be reserved,different approaches are proposed for resource reservation. Intrafﬁc oblivious approaches, to make reservations and slicethe network resources, user demand and its statistics are notconsidered in the problem formulation [6]–[8]. The drawbackof trafﬁc oblivious approaches is that they limit the abilityof a network to adapt to any given demand. To reserve linkcapacities in ﬂow networks, a collection of predicted demandscenarios are considered in [9], [10]. The proposed algorithmsin [9], [10] reserve link capacities such that the predicteddemand scenarios are supported as much as possible. Theaccuracy of the reservations in [9], [10] is based on the numberof predicted scenarios. However, as the number of scenariosincreases, the complexity of solving the problem increases.Short term user demands are predicted by Long Short-TermMemory (LSTM) neural networks in [11]–[13]. Recurringresource reservations based on the short-term trafﬁc variationsincur reconﬁguration costs, service interruptions, and overheadin networks [14]. The mean of user demands is used in [15]to balance the workload among a set of data centers in anetwork that consists of the backhaul and RAN such that theutilization of resources is maximized. The joint reservation ofcomputational and radio resources is studied in [16], wheredifferent ranges are considered for uncertain user demands. Alinear program is formulated in [16] to support the uncertainuser demands, which vary in given ranges, as much as thenetwork allows. In [17], the transmission resource reservationin RAN is considered where the minimum requirements ofusers are known and deterministic. The authors of [17] pro-posed a matching-based algorithm to solve an optimizationproblem with the goal of minimizing the consumption ofnetwork resources while meeting the requirements of users.Optimal routing is studied widely for many settings, e.g.,[18]–[20], while optimal resource allocation in RAN has alsobeen studied for different wireless channels, e.g., [12], [17],[21]–[26]. The joint routing in the backhaul network andresource allocation RAN is studied in a number of morerecent papers [27]–[34]. In [27] and [31], the user demandrequirements are deterministic and known. On the other hand,in [28], [30], [32]–[34], the trafﬁc of users is maximized asmuch as the network is able to support, regardless of userdemand statistics. To ﬁnd a robust resource reservation, net-work resources should be reserved based on demand statistics.In [28]–[30] and [32]–[34], the wireless channel capacity is adeterministic function of input power. Moreover, the convexityof the problem is assumed in [28]–[30], [33]. Neither of these a r X i v : . [ ee ss . SP ] F e b assumptions holds in practice, where the wireless channelcapacity is random and its distribution is a function of suppliedtransmission resources [35]–[37].In addition to different proposed formulations for resourceallocations and network planning with certain and uncer-tain user demands in existing literature, several algorithmshave been used to solve the resulting optimization problems.Among them, the Alternating Direction Method of Multipliers(ADMM) has been used widely [10], [27], [38]–[40]. ADMMenables ﬂow decoupling in the network optimization process.The efﬁciency of ADMM depends on the number of auxiliarylink variables introduced to make the optimization subprob-lems separable. For networks with a large number of links,ADMM can be slow, i.e., requiring a large number of iter-ations. A dual decomposition method for path-based routingis used in [41], where a gradient ascent approach has beenproposed to solve the dual problem. Since in most problemsthe dual function is non-smooth, the gradient ascent approachhas to take small steps, resulting in slow convergence. Adistributed approach for large-scale revenue management prob-lems in airline networks is proposed by Kemmer et al. in[42]. The single-path dynamic programming approach in [42]has shown great success in practice despite the absence ofconvergence or solution enhancement guarantees.In this paper, we propose a resource management schemefor end-to-end resource reservation, i.e., from data centers tousers, based on user demand and downlink achievable ratestatistics for a data network consisting of the backhaul andRAN. We consider a multi-path routing in our formulation,where a user can be served by several Access Points (APs)through multiple paths from a data center. We formulate theproblem of jointly reserving the transmission resources inRAN and link capacities in the backhaul based on user demandand downlink achievable rate statistics so as to maximize thetotal expected supportable user trafﬁc, while minimizing theexpected outage of downlinks. Since the formulated problemis non-convex and hard to solve, we propose an efﬁcient BlockCoordinate Descent (BCD) algorithm, which is convergentto a Karush–Kuhn–Tucker (KKT) solution of the resourcereservation problem.In the proposed BCD approach, one block of variablesdetermines the link capacity reservation in the backhaul andthe other block of variables speciﬁes the transmission resourcereservation in RAN. We alternately optimize the two blocksof variables in the BCD algorithm. Fixing the transmissionresources in RAN, we update the link capacity reservation inthe backhaul via a novel multi-path routing algorithm. Inspiredby the resource level decomposition ideas in [42], the proposedmulti-path routing decomposes the problem down to link-level and parallelizes the computation across backhaul links.Based on the convergence theory for Block Successive Upper-bound Minimization (BSUM) methods in [43], we prove thatthe proposed multi-path routing is convergent to the globalminima of an arbitrary convex cost function with Lipschitzcontinuous gradient. The required computation time for eachiteration of the proposed multi-path routing is equal to thatfor one link regardless of the network size. After updating thelink capacity reservations, we update the transmission resource Figure 1: A network comprised of APs and backhaul parts.reservation in RAN. Since the resource reservation problemin RAN is possibly non-convex, we propose a distributedalgorithm based on the BSUM techniques to iteratively solvea sequence of convex approximations of the original problem.We prove that the proposed BCD algorithm converges to aKKT solution. To verify the performance of the proposedalgorithm, two heuristic algorithms are also developed andused as benchmarks to evaluate the efﬁciency and the efﬁcacyof the proposed approach via simulations.The rest of this paper is organized as follows. The systemmodel and problem formulation are given in Section II. SectionIII describes a general scalable and distributed algorithm forthe multi-path ﬂow routing. In Section IV, we propose a BCDalgorithm for the network resource reservation problem. Thesimulation results are given in Section V, and concludingremarks are given in Section VI.II. S YSTEM M ODEL AND P ROBLEM F ORMULATION

Consider a typical scenario whereby user data is transmittedvia backhaul network links from data centers to APs in RAN,which in turn relay the data to the desired users as depictedin Fig. 1. Suppose B denotes the set of APs and K denotesthe set of mobile users. The set of directed wired links of thebackhaul is denoted by L . A path connects a data center andan AP through a sequence of wired links in the backhaul andﬁnally goes through one downlink to reach the end user. Thedownlinks between APs and users are predetermined accordingto channel quality, interference levels, and path loss.We consider each user demands one commodity and thereare K = |K| datastreams in the backhaul network. Theproposed scheme can be easily extended to the scenariothat each user demands multiple commodities. To serve eachuser, several candidate paths are selected between the originand destination, and trafﬁc reservation for the correspondingcommodities is implemented over those paths. The candidatepaths can go through different APs, and the joint transmissionof APs to a user (coordinated multi-point mode) is consideredin this paper. Only the last hop on each path is wireless.Each path is denoted by p , and the set of all paths isrepresented by P . The set of paths that carry user k data isdenoted by P k . The backhaul network links comprising path p for serving user k are represented by the set L pk . Similarly, the network nodes on path p ∈ P k are denoted by the set U pk . Thedemand of user k is a random variable represented by d k . Itfollows a certain Probability Density Function (PDF) denotedby f k ( d k ) . The corresponding Cumulative Density Function(CDF) is represented by F k ( d k ) . Let r k denote the trafﬁc ratereserved for user k . The actual trafﬁc ﬂow of user k supportedby the network is a random variable given by min( d k , r k ) =  r k , if r k ≤ d k ,d k , otherwise . We calculate the expected supportable trafﬁc rate for user k as follows: E (min( d k , r k )) = (cid:90) r k y k .f k ( y k ) dy k + r k (cid:90) ∞ r k f k ( y k ) dy k . Since the network is not able to support the demand when itexceeds the reserved rate, we have the minimum in the aboveexpectation. In the ﬁrst integral, the random demand of user k falls below the reserved rate. In the second integral, therandom demand exceeds r k .Since a user receives their data from multiple APs, trans-mission resources should be reserved in multiple APs for thepaths available to the user. The resource reservation in thebackhaul and RAN is limited by two physical constraints: • The aggregate reserved trafﬁc rate for paths that sharea link must not exceed the link capacity. Therefore, wehave the following constraint: K (cid:88) k =1 (cid:88) p : { p ∈P k ,l ∈L pk } r pk ≤ C l , ∀ l ∈ L , (1)where r pk is the reserved trafﬁc rate for path p (for servinguser k ). Moreover, the capacity of link l is denoted by C l .Flows on different paths available to one user are treatedas separate ﬂows. Thus, we have the inner summation inthe above constraint. • The total reserved transmission resources for differentpaths must not exceed the AP capacity. Hence, we have K (cid:88) k =1 (cid:88) p : { p ∈P k ,b ∈U pk } t pk ≤ C b , ∀ b ∈ B , (2)where t pk is the reserved transmission resources in AP b to transmit incoming data from path p ∈ P k to user k .Moreover, the capacity of AP b is denoted by C b .In addition to the above physical constraints, our multi-pathmodel enforces another constraint. Since each datastreamoriginating from a data center splits into a number of sub-ﬂows, we have the following constraint: • The aggregate reserved trafﬁc rate for the different pathswhich carry data to one user is equal to the reserved ratefor that user. Hence, we have the following constraint: (cid:88) p ∈P k r pk = r k , ∀ k. (3) In the considered model, we do not make any assumptionabout the type of the transmission resource. It can be band-width, transmission power, or time-slot fraction. Based on theallocated resources, the distribution of the achievable rate ofa downlink follows a particular PDF. As only the last hop oneach path is wireless, path p uniquely identiﬁes the downlinkof the last hop. The achievable rate (i.e., instantaneous ca-pacity) of the downlink of path p is random and follows anarbitrary distribution with a PDF represented by z pk ( v pk , t pk ) anda CDF denoted by Z pk ( v pk , t pk ) . The PDF is a function of twovariables: the achievable rate of the downlink, denoted by v pk ,and the allocated transmission resource, denoted by t pk . Whenthe achievable rate of a downlink falls below the reserved rate r pk , some outage is experienced and its amount is r pk − v pk ,given that the amount of allocated transmission resources tothe downlink is t pk . The probability that this amount of outagetakes place is z pk ( v pk , t pk ) . In light of the above arguments,the expected outage of the downlink of path p is obtainedas follows: (cid:90) r pk z pk ( v pk , t pk ) ( r pk − v pk ) dv pk . (4)Since the achievable rate is a continuous random variable, wehave the above integral.In this paper, we aim to maximize the expected trafﬁc ofusers as much as the network is able to support, while min-imizing the expected outage of downlinks. We formulate thefollowing optimization problem to ﬁnd resource reservationsin the backhaul and RAN:max r , t K (cid:88) k =1 (cid:104) E [min( r k , d k )] − θ k (cid:88) p ∈P k (cid:90) r pk z pk ( v pk , t pk ) ( r pk − v pk ) dv pk (cid:105) s.t. (1) , (2) , (3) , r k , r pk , t pk ≥ , p ∈ P k , ∀ k, (5)where θ k : θ k ≥ is a coefﬁcient chosen by the systemdesigner that adjusts the priorities of maximizing the expectedsupportable trafﬁc of user k and the minimization of the aggre-gate outage of downlinks, which serve user k . The two blocksof variables in the above problem are r = { r k , r pk } p ∈P k ,k =1: K and t = { t pk } p ∈P k ,k =1: K . Remark 1.

Suppose that multiple paths available to user k share a downlink (the last hop). The aggregate outage ofdownlinks for serving user k is calculated as follows: (cid:88) w ∈W k (cid:90) (cid:80) p : { p ∈P k,w ∈ p } r pk z wk ( v wk , t wk ) × ( (cid:88) p : { p ∈P k ,w ∈ p } r pk − v wk ) dv wk , (6) where W k is the set of downlinks, each denoted by w , forserving user k . When multiple paths available to user k share adownlink, the above outage is placed in the objective functionof (5) instead of its second term, which includes (4) . The maximization problem (5) is not easy to solve to globaloptimality. The objective function of (5) is in general notnecessarily jointly concave in r and t for an arbitrary PDF z pk ( v pk , t pk ) . The reason is that (cid:82) r pk ∂ z pk ( v pk , t pk ) / ( ∂t pk ) ( r pk − v pk ) dv pk is not always non-negative. Proposition 1.

Given t , the optimization in (5) becomesconcave in r .Proof. Fixing t , the objective function is separable in k . Weﬁnd the Hessian with respect to r k and { r pk } p ∈P k for thoseobjective function terms which are associated with user k asfollows: H k =  − f k ( r k ) 0 . . . − z k ( r k , t k ) . . . ... ... . . . ... . . . − z |P k | k ( r |P k | k , t |P k | k )  . The overall Hessian matrix is H =  H . . . K  . It is observed that the above matrix is negative semideﬁnite.Since the constraints of problem (5) are all afﬁne, it followsthat the maximization (5) is concave with ﬁxed t . (cid:4) Separable constraints on r and t in (5) motivate the BCDalgorithm. It is straightforward to show that with (6) instead of(4) in the objective function, the optimization in (5) remainsconcave in r .III. D ISTRIBUTED M ULTI -P ATH R OUTING IN THE B ACKHAUL

This section is concerned with solving (5) when t is keptﬁxed, and (5) is converted to the minimization format aftermultiplying the objective function by − . In particular, westudy a general multi-path routing to minimize any convexcost function with a Lipschitz continuous gradient. We developan algorithm that is dual-based and decomposes the problemdown to link-level and parallelizes computations across linksof the network. The required computation time for eachiteration of the proposed multi-path routing algorithm is equalto that for one link regardless of the network size. Thisinteresting property makes the proposed algorithm appropriatefor the online optimization of large networks.For each datastream in the network, several candidate pathsare selected. We assume that each ﬂow can be split into multi-ple sub-ﬂows. To formulate the multi-path routing problem, weﬁrst assume that the cost function is separable in variables, i.e., ψ ( r ) = (cid:80) Kk =1 (cid:80) p ∈P k ψ pk ( r pk ) , where each ψ pk ( r pk ) is strictlyconvex.The optimization problem for the multi-path ﬂow routingcan be written as follows: min r K (cid:88) k =1 (cid:88) p ∈P k ψ pk ( r pk ) s.t. (1) , r pk ≥ , p ∈ P k , ∀ k. (7)Since typically the number of variables is greater than thenumber of constraints in the above optimization, solving the problem is easier in the dual domain. The Lagrangian functionfor the above problem is L c ( r , µ , φ ) = K (cid:88) k =1 (cid:88) p ∈P k ψ pk ( r pk ) + (cid:88) l ∈L µ l ( K (cid:88) k =1 (cid:88) p : { p ∈P k ,l ∈L pk } r pk − C l ) − K (cid:88) k =1 (cid:88) p ∈P k φ pk r pk , (8)where µ l : µ l ≥ is the Lagrange multiplier for the capacityconstraint of link l , and φ pk : φ pk ≥ is the Lagrange multiplierfor constraint r pk ≥ . Furthermore, µ = { µ l } l ∈L and φ = { φ pk } p ∈P k ,k =1: K . We ﬁnd the dual problem of (7) as follows:max µ , φ min r L c ( r , µ , φ ) s.t. µ ≥ , φ ≥ . (9)For many cost functions, no closed-form solution for r =arg min r L c ( r , µ , φ ) exists. Therefore, commonly, the aboveproblem is solved via a primal-dual method such as ADMM[10], [38]–[40]. However, the auxiliary link variables intro-duced to make the per-ﬂow subproblems of optimization in(7) separable can slow down ADMM in practice.Resource level decomposition for large-scale single-pathapplications was ﬁrst proposed in [42] to solve the revenuemanagement problems in airline networks. The proposed de-composition in [42] does not involve any auxiliary variables. Inspite of the absence of convergence or solution enhancementguarantees, the resource level decomposition has been rathersuccessful in practice. We leverage resource level decompo-sition ideas to develop a distributed algorithm to solve thegeneral multi-path routing problem (7) in a parallel fashionsuch that the trafﬁc passing on each link can be obtainedindependently from the other links. Unlike the dynamic pro-gramming approach in [42], an optimization-based approachis proposed here to solve subproblems. In each iteration, theproposed dual algorithm decomposes the problem in (9) andsolves the subproblems globally and in parallel. The optimized µ l in the j th iteration of the proposed algorithm is denoted by µ jl . Here, we explain the decomposition. The dualized linkcapacity constraints (cid:80) l ∈L µ l ( (cid:80) Kk =1 (cid:80) p : { p ∈P k ,l ∈L pk } r pk − C l ) in the Lagrangian (8) are separable across links. Each link l receives µ l ( (cid:80) Kk =1 (cid:80) p : { p ∈P k ,l ∈L pk } r pk − C l ) . In each iteration,based on µ j − = { µ j − l } l ∈L in the previous iteration, wedecompose the non-separable terms in the Lagrangian (8),which include r pk , across links on path p . Each link l of path p receives a portion of α p,jk,l = µ j − l / (cid:88) l (cid:48) ∈L pk µ j − l (cid:48) , (10)In the j th iteration, the decomposed per-link Lagrangian func-tion is as follows: L l ( r l , µ l , φ l , µ j − ) = K (cid:88) k =1 (cid:88) p : { p ∈P k ,l ∈L pk } α p,jk,l ψ pk ( r pk )+ µ l ( K (cid:88) k =1 (cid:88) p : { p ∈P k ,l ∈L pk } r pk − C l ) − K (cid:88) k =1 (cid:88) p : { p ∈P k ,l ∈L pk } α p,jk,l φ pk r pk , (11) where r l = { r pk } p ∈P k ,l ∈L pk ,k =1: K and φ l = { φ pk } p ∈P k ,l ∈L pk ,k =1: K . We notice that based on (10), { α p,jk,l } l ∈L pk in (11) is calculated using µ j − . Based on theabove decomposition, we obtain L c ( r , µ , φ ) = (cid:88) l ∈L L l ( r l , µ l , φ l , µ j − ) . (12)Instead of solving the problem in (9), we solvemax { µ l , φ l } l ∈L (cid:88) l ∈L min r l L l ( r l , µ l , φ l , µ j − ) s.t. µ l ≥ , φ l ≥ , l ∈ L , (13)iteratively and then update α p,j +1 k,l for iteration j + 1 . Theabove problem is decomposable in { µ l , φ l } and can be solvedin parallel for all links. Due to strong duality [44, p. 226–p.227], each subproblem of (13) is equivalent to the followingper-link problem in the primal domain: min r l K (cid:88) k =1 (cid:88) p : { p ∈P k ,l ∈L pk } α p,jk,l ψ pk ( r pk ) s.t. K (cid:88) k =1 (cid:88) p : { p ∈P k ,l ∈L pk } r pk ≤ C l ,α p,jk,l r pk ≥ , p ∈ P k , ∀ k, l ∈ L pk . (14)The optimal r pk and µ l can be obtained using the ﬁrst-orderoptimality condition for the per-link subproblem in (14). Here,we list KKT conditions as follows: ∂L l ( r l , µ l , φ l , µ j − ) ∂r pk = α p,jk,l ∂ψ pk ( r pk ) ∂r pk + µ l − α p,jk,l φ pk = 0 , (15a) K (cid:88) k =1 (cid:88) p : { p ∈P k ,l ∈L pk } r pk ≤ C l , (15b) µ l ( K (cid:88) k =1 (cid:88) p : { p ∈P k ,l ∈L pk } r pk − C l ) = 0 , µ l ≥ , (15c) α p,jk,l r pk φ pk = 0 , r pk ≥ , φ pk ≥ . (15d)First, we consider that r pk > and φ pk = 0 . Due to the strictconvexity of ψ pk ( r pk ) , ∂ψ pk ( r pk ) /∂r pk is strictly increasing. Thus,given µ l , there is a unique r pk to solve α p,jk,l ∂ψ pk ( r pk ) /∂r pk + µ l =0 . Since ∂ψ pk ( r pk ) /∂r pk is strictly increasing, we implement abisection search on r pk in the non-negative orthant r pk ≥ toﬁnd r pk from α p,jk,l ∂ψ pk ( r pk ) /∂r pk + µ l = 0 . If the obtained r pk ispositive, we keep φ pk = 0 . Otherwise, we set r pk = 0 and ﬁnd φ pk = ∂ψ pk ( r pk ) /∂r pk | r pk =0 + µ l /α p,jk,l . For a given µ l , we obtaineach r pk variable associated with link l , i.e., r pk : p ∈ P k , l ∈L pk , k = 1 : K . The dual approach for solving the optimizationin (14) works as follows: implement a bisection search on theLagrange multiplier µ l in the positive orthant and numericallyﬁnd each r pk variable from (15a) and (15d) for each µ l until wehave (cid:80) p : { p ∈P k ,l ∈L pk } r pk = C l . If there is no such positive µ l ,we drop the ﬁrst constraint from optimization (14) and solve(14) by setting the gradient of the cost function to zero. Then,we project the solution to the positive orthant. Due to the strict Algorithm 1:

Dual algorithm to solve the per-linkoptimization in (14)0.

Initialization s = 0 , s = large number, q = 0 , q = 0 , q = 0 ; repeat s = ( s + s ) / ;2. Implement a bisection search to solve (15a) with φ pk = 0 and ﬁnd r pk : r pk ≥ , where µ l = s ;3. if there is no positive solution for r pk then r pk = 0 and φ pk = ∂ψ pk ( r pk ) /∂r pk | r pk =0 + µ l /α p,jk,l ;4. q = (cid:80) Kk =1 (cid:80) p : { p ∈P k ,l ∈L pk } r pk − C l ;5. Implement a bisection search to solve (15a) with φ pk = 0 and ﬁnd r pk : r pk ≥ , where µ l = s ;6. if there is no positive solution for r pk then r pk = 0 and φ pk = ∂ψ pk ( r pk ) /∂r pk | r pk =0 + µ l /α p,jk,l ;7. q = (cid:80) Kk =1 (cid:80) p : { p ∈P k ,l ∈L pk } r pk − C l ;8. Implement a bisection search to solve (15a) with φ pk = 0 and ﬁnd r pk : r pk ≥ , where µ l = s ;9. if there is no positive solution for r pk then r pk = 0 and φ pk = ∂ψ pk ( r pk ) /∂r pk | r pk =0 + µ l /α p,jk,l ;10. q = (cid:80) Kk =1 (cid:80) p : { p ∈P k ,l ∈L pk } r pk − C l ;11. if q .q < then s = s ;12. if q .q < then s = s ;13. if q < , q < , q < then µ l = 0 ;13.2. Solve ∂ψ pk ( r pk ) /∂r pk = 0 to ﬁnd r pk ;13.3. Project the obtained r pk variable to thepositive orthant;13.4. s = s ; until s − s is small enough ;convexity of each subproblem, the optimal primal variables areunique. The optimal µ l for each per-link subproblem is alsounique. We justify this claim. If the link capacity constraintis not tight, then due to (15c), µ l has to be zero. If the linkcapacity is tight, then at least one r pk : p ∈ P k , l ∈ L pk , k = 1 : K is non-zero and φ pk = 0 . Due to a) the strict convexity of ψ pk ( r pk ) and the monotone variation of ψ pk ( r pk ) /∂r pk ; and b) theuniqueness of the optimal r pk , the obtained µ l from (15a) isunique. We justify the bisection search on µ l as follows: if theunique optimal µ l is positive, from (15c), we observe that wemust have (cid:80) p : { p ∈P k ,l ∈L pk } r pk = C l , where each r pk is foundfrom (15a) and (15d). Such positive µ l can be uniquely foundusing a bisection search due to the strictly monotone variationof (cid:80) p : { p ∈P k ,l ∈L pk } r pk , with µ l (strict convexity of ψ pk ( r pk ) asexplained above). If the optimal µ l is zero, then (15b) and(15c) are already satisﬁed and it is enough to ﬁnd the uniquenon-negative minimizer of each ψ pk ( r pk ) from (15a) and (15d).In light of the above arguments, two nested bisection methodsare required to solve (14): the inner bisection works on r pk and the outer one works on µ l . The summary of the proposed Algorithm 2:

Multi-path routing algorithm to solve theoptimization in (7)0.

Initialization

Assign some small positive number toeach µ l , j = 0 ; repeatfor all links do

1. Find α p,j +1 k,l = µ jl / (cid:80) l (cid:48) ∈L pk µ jl (cid:48) ; if µ jl > then

2. Apply Algorithm 1 to ﬁnd µ j +1 l ;3. j = j + 1 ; until µ j converge ; for all { r pk } p ∈P k ,k =1: K variables do

4. Use the latest computed r pk by Algorithm 1 froma per-link subproblem, where l ∈ L pk and µ jl > ;bisection approach to solve the per-link optimization in (14)is given in Algorithm 1.Suppose that the optimization in (14) is iteratively solvedin parallel for all links of the network. For a link with a largecapacity, the link capacity constraint is not tight and Algorithm1 ﬁnds µ jl = 0 and we have α p,j +1 k,l = 0 . For those links, we donot need to continue computation as the KKT conditions listedin (15a)–(15d) remain satisﬁed. In the following iterations, weignore those links and consider links with µ jl > . We alternatebetween solving the optimization in (14) in parallel for all linksand updating α p,j +1 k,l until all { µ jl } l ∈L variables converge, i.e., (cid:13)(cid:13) µ j − µ j − (cid:13)(cid:13) < (cid:15) . Once µ j converges, for each r pk variable,we use the computed r pk in the last iteration of Algorithm 1from a subproblem with µ jl > , l ∈ L pk . A brief descriptionof the proposed dual algorithm for solving the optimization in(7) is given in Algorithm 2.After Algorithm 2 converges, we use the obtained r pk froma per-link problem with tight link capacity constraint, i.e., l : µ jl > , for the other links on that path for which Algorithm2 ﬁnds µ jl = 0 . The key property of Algorithm 2 is that afterconvergence, the obtained r pk on different links of one path areidentical. Proposition 2.

Upon convergence of Algorithm 2, the ﬂowrates across links on each path are identical.Proof.

Algorithm 2 ﬁnds r pk , from the per-link subproblem forlink l ∈ L pk , µ jl > , using the following equation: ∂L l ( r l , µ jl , φ jl , µ j − ) ∂r pk = α p,jk,l ∂ψ pk ( r pk ) ∂r pk + µ jl − α pk,l φ p,jk = µ j − l (cid:80) l (cid:48) ∈L pk µ j − l (cid:48) ∂ψ pk ( r pk ) ∂r pk + µ jl − µ j − l (cid:80) l (cid:48) ∈L pk µ j − l (cid:48) φ p,jk = 0 . (16)Suppose Algorithm 2 has converged in the j th iteration; wehave (cid:13)(cid:13) µ j − µ j − (cid:13)(cid:13) < (cid:15) . Then, we have µ j − µ j − = ϑ ,where (cid:107) ϑ (cid:107) < (cid:15) . We have α p,jk,l = (cid:80) l (cid:48) ∈L pk µ j − l (cid:48) µ j − l = (cid:80) l (cid:48) ∈L pk ( µ jl (cid:48) − ϑ l (cid:48) ) µ jl − ϑ l . We multiply (16) by /α p,jk,l and we have the following: ∂ψ pk ( r pk ) ∂r pk + (cid:80) l (cid:48) ∈L pk ( µ jl (cid:48) − ϑ l (cid:48) ) µ jl − ϑ l µ jl − φ p,jk = ∂ψ pk ( r pk ) ∂r pk + (cid:88) l (cid:48) ∈L pk ( µ jl (cid:48) − ϑ l (cid:48) )(1 + ϑ l µ jl − ϑ l ) − φ p,jk = ∂ψ pk ( r pk ) ∂r pk + (cid:88) l ∈L pk µ jl − φ p,jk (cid:124) (cid:123)(cid:122) (cid:125) ∂L c ( r , µ j , φ j ) /∂r pk − (cid:88) l (cid:48) ∈L pk ϑ l (cid:48) (1 + ϑ l µ jl − ϑ l )+ ϑ l ( (cid:80) l (cid:48) ∈L pk µ jl (cid:48) ) µ jl − ϑ l = ∂L c ( r , µ j , φ j ) ∂r pk − (cid:88) l (cid:48) ∈L pk ϑ l (cid:48) (1 + ϑ l µ jl − ϑ l )+ ϑ l ( (cid:80) l (cid:48) ∈L pk µ jl (cid:48) ) µ jl − ϑ l = 0 . (17)When (cid:15) tends to zero, then ϑ → and from (17) weﬁnd ∂L c ( r , µ j , φ j ) /∂r pk = 0 . Moreover, we observe that ∂L c ( r , µ j , φ j ) /∂r pk is independent of the link index on path p . This means that { r pk } p ∈P k variables obtained by solving thelink subproblems are identical for all links along each path p for which µ jl > . They are also equal to the minimizer ofLagrangian function L c ( r , µ j , φ j ) in (8). (cid:4) Theorem 1. If ψ ( r ) is strictly convex and separable, then theprimal and dual iterates of Algorithm 2 will converge to theoptimal primal and dual solutions of (7) .Proof. Notice that based on the deﬁnition of α p,jk,l , givenidentical feasible variables r pk , ˆ µ l and ˆ φ l to both La-grangian functions in (8) and (11), from (12), we have (cid:80) l ∈L L l ( r l , ˆ µ l , ˆ φ l , µ j − ) = L c ( r , ˆ µ , ˆ φ ) . First, we showthat (cid:80) l ∈L min r l L l ( r l , ˆ µ l , ˆ φ l , µ j − ) is a lower-bound for min r L c ( r , ˆ µ , ˆ φ ) . Since the minimum of L l ( r l , ˆ µ l , ˆ φ l , µ j − ) is less than or equal to the other values of L l ( r l , ˆ µ l , ˆ φ l , µ j − ) ,we have min r l L l ( r l , ˆ µ l , ˆ φ l , µ j − ) ≤ L l ( r l , ˆ µ l , ˆ φ l , µ j − ) . Thus, we obtain (cid:88) l ∈L min r l L l ( r l , ˆ µ l , ˆ φ l , µ j − ) ≤ (cid:88) l ∈L L l ( r l , ˆ µ l , ˆ φ l , µ j − ) = L c ( r , ˆ µ , ˆ φ ) , where the equality is due to (12). In L c ( r , ˆ µ , ˆ φ ) , we choose r to be the minimizer of L c ( r , ˆ µ , ˆ φ ) . Thus, we obtain (cid:88) l ∈L min r l L l ( r l , ˆ µ l , ˆ φ l , µ j − ) ≤ min r L c ( r , ˆ µ , ˆ φ ) . (18)From (18), we observe that solving the problem in (13)iteratively is a successive lower-bound maximization (upper-bound minimization if we rewrite problems (9) and (13) asminimizations).We justify the claim that the primal and dual solutionsobtained from solving (13) successively converge to the primaland dual solutions of (7). We build our proof based on the convergence theory for BSUM given in [43]. We show,in the same order given in the Appendix, that the lower-bound satisﬁes all four convergence conditions given in [43,Assumption 2]:1) At feasible points ˆ µ ≥ and ˆ φ ≥ , we show that min r L c ( r , ˆ µ , ˆ φ ) = (cid:80) l ∈L min r l L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) . FromKKT conditions for each subproblem, we obtain L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) ∂r pk = ˆ µ l (cid:80) l (cid:48) ∈L pk ˆ µ l (cid:48) ∂ψ pk ( r pk ) ∂r pk + ˆ µ l − ˆ µ l (cid:80) l (cid:48) ∈L pk ˆ µ l (cid:48) ˆ φ pk = 0 . (19)Assuming ˆ µ l > , after multiplication by (cid:80) l (cid:48)∈L pk ˆ µ l (cid:48) ˆ µ l , weobtain ∂ψ pk ( r pk ) ∂r pk + (cid:88) l ∈L pk ˆ µ l − ˆ φ pk = ∂L c ( r , ˆ µ , ˆ φ ) ∂r pk = 0 . (20)When ψ ( r ) is strictly convex, there is a unique min-imizer for each L c ( r , ˆ µ , ˆ φ ) and L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) . Weobserve from (19) and (20) that, at feasible points ˆ µ ≥ and ˆ φ ≥ , the minimizer of L c ( r , ˆ µ , ˆ φ ) isequal to that of L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) . From (12), applyingidentical variables r pk , ˆ µ l and ˆ φ l to both L c ( r , ˆ µ , ˆ φ ) and L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) , we have L c ( r , ˆ µ , ˆ φ , ˆ µ ) = (cid:80) l ∈L L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) . We choose each r pk to bethe minimizer, and we ﬁnd min r L c ( r , ˆ µ , ˆ φ , ˆ µ ) = (cid:80) l ∈L min r l L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) .2) From (18), we observe that (cid:80) l ∈L min r l L l ( r l , ˆ µ l , ˆ φ l , µ j − ) is a lower-bound.3) We deploy [45, Proposition 7.1.1] to ﬁnd the derivativeof min r l L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) with respect to µ l . There arethree satisﬁed conditions that ensure the existence ofthe derivative: a) the feasible set of (14) is compact;b) L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) is continuous in µ l ; and c) for each ˆ µ l , the equation ∂L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) /∂r pk = 0 has a uniquesolution for r pk due to the strict convexity of ψ ( r ) . Givenidentical ˆ µ l and ˆ φ l to both Lagrangian functions (8) and(11), the derivative of (cid:80) l ∈L min r l L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) withrespect to µ l is K (cid:88) k =1 (cid:88) p : { p ∈P k ,l ∈L pk } r pk − C l , where r pk = arg min r pk L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) . The derivative of min r L c ( r , ˆ µ , ˆ φ ) with respect to µ l is K (cid:88) k =1 (cid:88) p : { p ∈P k ,l ∈L pk } r pk − C l , where r pk = arg min r pk L c ( r , ˆ µ , ˆ φ ) . As the minimizers of L c ( r , ˆ µ , ˆ φ ) and L l ( r l , ˆ µ l , ˆ φ l , ˆ µ ) are equal at point ˆ µ due to (19) and (20), we observethat both above derivatives are equal.4) (cid:80) l ∈L min r l L l ( r l , ˆ µ l , ˆ φ l , µ j − ) is a piecewise linearfunction of ˆ µ , and thus, it is a continuous function of ˆ µ . Algorithm 3:

Multi-path routing algorithm for non-separable cost functions0.

Initialization

Choose a feasible vector r , m = 0 ; repeat

1. Find the upper-bound (21) using r m ;2. Apply Algorithm 2 to ﬁnd r ;3. m = m + 1 and r m = r ; until variables in r m converge ;Building on the above arguments, Algorithm 2 is a blocksuccessive lower-bound maximization method, which satisﬁesall four convergence conditions given in [43, Assumption 2].Algorithm 2 converges to the global optimal solution of theconcave problem (9) [43, Theorem 2], which has an identicalobjective function to (7) at the optimal point as a resultof strong duality [44, p. 226–p. 227]. Once Algorithm 2converges, µ j and r pk = arg min r pk L l ( r l , µ jl , φ jl , µ j − ) byAlgorithm 2 satisfy (15a)–(15d). The KKT conditions for(7) are (15b)–(15d) in addition to ∂L c ( r , µ j , φ j ) ∂r pk = 0 . Due to(16) and (17), when Algorithm 2 converges, minimizers of L l ( r l , µ jl , φ jl , µ j − ) and L c ( r , µ j , φ j ) are identical, and thus,(15a) ensures ∂L c ( r , µ j , φ j ) ∂r pk = 0 . Hence, the primal and dualvariables obtained by Algorithm 2 satisfy the KKT conditionsfor (7). (cid:4) Remark 2.

If the cost function is convex and has a gra-dient that is Lipschitz continuous, but the function is notseparable in r pk , i.e., ψ ( r ) cannot be written as ψ ( r ) = (cid:80) Kk =1 (cid:80) p ∈P k ψ pk ( r pk ) , we use the quadratic upper-boundgiven in [46, eq. (12)], which is separable in variables. Foran arbitrary convex cost function with a Lipschitz continuousgradient like ψ ( r ) , we have the following upper-bound: ψ ( r ) ≤ ψ ( r m ) + ∇ ψ ( r m ) T ( r − r m ) + γ (cid:107) r − r m (cid:107) , (21) where γ is the Lipschitz constant, and r m = { r p,mk } p ∈P k ,k =1: K is the m th iterate in the successiveupper-bound minimization. We start from an initial point r in the feasible set and ﬁnd the upper-bound (21) . Then, weapply Algorithm 2 to solve the problem with the upper-bound (21) to the global optimal solution in a parallel fashion.When the upper-bound (21) is substituted for the costfunction, the ﬁrst KKT condition is ∂L l ( r l , µ l , φ l , µ j − ) ∂r pk = α p,jk,l (cid:18) ∂ψ pk ( r pk ) ∂r pk | r Pk = r p,mk + γ ( r pk − r p,mk ) (cid:19) + µ l − α p,jk,l φ pk = 0 , (22) instead of (15a) . Once the problem with the upper-bound (21) is solved, we use the obtained solution to update r m in theupper-bound (21) . We repeat this approach until r m converges.We summarize this approach in Algorithm 3.In iteration m , the value of the upper-bound (21) and itsgradient are ψ ( r m ) and ∇ ψ ( r m ) , respectively, which areequal to the value and the gradient of the non-separable cost function ψ ( r ) . Furthermore, the upper-bound in (21) iscontinuous, and thus, all four convergence conditions given in[43, Assumption 2] and listed in the Appendix are satisﬁed.Due to [43, Theorem 2] and the convexity of the non-separablecost function, the obtained solution by Algorithm 3, whichimplements BSUM, is identical to the solution of the originalproblem with the non-separable cost function. Remark 3.

When the cost function is convex and separable,but not strictly convex, we add a proximal term to the costfunction and make it locally strongly convex as follows: min r K (cid:88) k =1 (cid:88) p ∈P k ψ pk ( r pk ) + κ (cid:107) r − r m (cid:107) s.t. (1) , r k ≥ , p ∈ P k , ∀ k, (23) where κ is a small positive constant. We use Algorithm 2 tosolve the above problem when we use the following equation: ∂L l ( r l , µ l , φ l , µ j − ) ∂r pk = α p,jk,l ∂ψ pk ( r pk ) ∂r pk + α p,jk,l κ ( r pk − r p,mk )+ µ l − α p,jk,l φ pk = 0 , instead of (15a) to ﬁnd r pk , where r p,mk is the value of r pk inthe m th iteration of solving (23) . We successively solve (23) with Algorithm 2 and update r m until r m converges. Similarto Remark 2, one can show that the cost function with theproximal term in (23) satisﬁes the four convergence conditionsin [43, Assumption 2] and the global minimum is obtainedafter successive minimizations, since each local minimum isalso global for a convex function. IV. S

IMULTANEOUS R ESOURCE R ESERVATIONS IN THE B ACKHAUL AND

RANIn this section, we study the joint link capacity and APtransmission resource reservation based on the user demandand downlink statistics. Prior to the observation of user de-mands, based on the formulated model in (5), the networkoperator ﬁnds the optimal amount of reserved resources in thebackhaul and APs such that neither the link capacity nor APcapacity is exceeded.

A. Resource Reservation in the Backhaul

Let us drop the equality constraint (3) from (5) and substi-tute (cid:80) p ∈P k r pk for r k . Then, we have min r , t K (cid:88) k =1 (cid:104) − E [min( (cid:88) p ∈P k r pk , d k )]+ θ k (cid:88) p ∈P k (cid:90) r pk z pk ( v pk , t pk ) ( r pk − v pk ) dv pk (cid:105) s.t. (1) , (2) , r pk , t pk ≥ , p ∈ P k , ∀ k. (24)We solve the above problem using the proposed BCD algo-rithm. With the ﬁxed t , we minimize (24) with respect to r andupdate it. With updated r , we minimize (24) with respect to t and update it. We underline the iterates of the BCD algorithm.In the i + 1 th iteration of the BCD algorithm, ﬁxing t i , we minimize with respect to r . Then, the minimization problemin (24) reduces to the following convex one: min r K (cid:88) k =1 (cid:104) − E [min( (cid:88) p ∈P k r pk , d k )]+ θ k (cid:88) p ∈P k (cid:90) r pk z pk ( v pk , t p,ik ) ( r pk − v pk ) dv pk (cid:105) s.t. (1) , r pk ≥ , p ∈ P k , ∀ k. It is observed that although the expected outage is separablein r pk , variables are coupled in the ﬁrst term of the objectivefunction. Therefore, we substitute the global quadratic upper-bound given in (21) for the expected supportable trafﬁcdemand. First, let us calculate the Lipschitz constant for thegradient of − E [min( (cid:80) p ∈P k r pk , d k )] . The second derivative of − E [min( (cid:80) p ∈P k r pk , d k )] is − ∂ E [min( (cid:80) p ∈P k r pk , d k )] ∂r pk ∂r p (cid:48) k = f k ( (cid:88) p ∈P k r pk ) . The Hessian matrix for − E [min( (cid:80) p ∈P k r pk , d k )] is |P k |×|P k | dimensional, where all entries are f k ( (cid:80) p ∈P k r pk ) . The eigen-values of the Hessian matrix are all zeros except one of them,which is |P k | f k ( (cid:80) p ∈P k r pk ) . Therefore, the Lipschitz constantis |P k | . We now place the Lipschitz constant in (21) and ﬁndthe upper-bound which is separable in r pk as follows: − E [min( (cid:88) p ∈P k r pk , d k )] ≤ − E [min( (cid:88) p ∈P k r p,mk , d k )]+ ( F k ( (cid:88) p ∈P k r p,mk ) − (cid:88) p ∈P k r pk − (cid:88) p ∈P k r p,mk )+ |P k | (cid:88) p ∈P k ( r p,mk − r pk ) , (25)where r p,mk is the m th iterate. We substitute upper-bound (25)for the expected supportable demand and the optimizationproblem in each iteration becomes: min r K (cid:88) k =1 (cid:34) ( F k ( (cid:88) p ∈P k r p,mk ) − (cid:88) p ∈P k r pk − (cid:88) p ∈P k r p,mk ) (26) + |P k | (cid:88) p ∈P k ( r pk − r p,mk ) + θ k (cid:88) p ∈P k (cid:90) r pk z pk ( v pk , t p,ik ) ( r pk − v pk ) dv pk (cid:35) s.t. (1) , r pk ≥ , p ∈ P k , ∀ k. We leverage Algorithm 3 to solve (25) in a parallel fashion. Ineach iteration of Algorithm 3, Algorithm 2 is called to solvethe problem in (26). Moreover, Algorithm 1 is called withinAlgorithm 2 and it needs to solve ∂L l ( r l ,µ l , φ l , µ j − ) ∂r pk = 0 . Werewrite (15a) for the above optimization problem in the j th iteration of Algorithm 2 as follows: ∂L l ( r l , µ l , φ l , µ j − ) ∂r pk = α p,jk,l ( F k ( (cid:88) p ∈P k r p,mk ) −

1) + µ l + α p,jk,l |P k | ( r pk − r p,mk ) + θ k α p,jk,l Z pk ( r pk , t p,ik ) − α p,jk,l φ pk = 0 . We observe that for each µ l , we are able to obtain r pk numerically using r p,mk , independent of the other variables.The solution obtained by Algorithm 3 is unique due to thestrong convexity of (26) and is global minima as explained inRemark 2. After Algorithm 3 converges, we set r i +1 = r m . Remark 4.

Suppose that the number of paths that are avail-able to user k and share downlink w ∈ W k is ϕ wk . Whenmultiple paths for serving a user share one downlink, wesubstitute the quadratic upper-bound (21) for the outage (6) as follows: (6) ≤ (cid:88) w ∈W k (cid:90) (cid:80) p : { p ∈P k,w ∈ p } r p,mk z wk ( v pk , t w,ik )( (cid:88) p : { p ∈P k ,w ∈ p } r p,mk − v pk ) dv pk + (cid:88) w ∈W k Z wk ( (cid:88) p : { p ∈P k ,w ∈ p } r p,mk , t w,ik )( (cid:88) p : { p ∈P k ,w ∈ p } ( r pk − r p,mk ))+ (cid:88) w ∈W k (cid:88) p : { p ∈P k ,w ∈ p } ϕ wk r pk − r p,mk ) , (27) where ϕ wk is the Lipschitz constant. In this case, the objectivefunction of (26) is obtained from adding (25) and the RHS of (27) . We rewrite (15a) in the j th iteration of Algorithm 2 forthis case as follows: ∂L l ( r l , µ l , φ l , µ j − ) ∂r pk = α p,jk,l ( F k ( (cid:88) p ∈P k r p,mk ) − α p,jk,l |P k | ( r pk − r p,mk ) + θ k α p,jk,l Z wk ( (cid:88) p : { p ∈P k ,w ∈ p } r p,mk , t w,ik )+ θ k ϕ wk α p,jk,l ( r pk − r p,mk ) + µ l − α p,jk,l φ pk = 0 . B. Resource Reservation in RAN

When we minimize (24) with respect to t in the BCDalgorithm, we use r i +1 obtained by Algorithm 3. We proposea dual approach to minimize with respect to t . The objectivefunction of (24) is separable in t pk . We are able to parallelizethe algorithm across APs since each AP has a separate trans-mission resource capacity constraint. However, the problem in(24) is not necessarily convex in t pk for an arbitrary z pk ( v pk , t pk ) .To tackle the potential non-convexity of the problem, weuse the BSUM method and convexify the problem locally.We iteratively solve a sequence of convex approximations.Suppose that for each outage term in the objective functionof (24), we add a proximal term ζ p,jk (cid:13)(cid:13)(cid:13) t pk − t p,jk (cid:13)(cid:13)(cid:13) , ζ p,jk > ,to make it locally strongly convex. In the proximal term, t p,jk isthe value of t pk in the j th iteration of successively minimizing(24) with respect to t . The objective function with the proximalterms is an upper-bound of the original objective function. Weﬁnd the Lagrangian for (24) with respect to t with proximalterms in the objective function as follows: L t ( t , λ , β ) = K (cid:88) k =1 (cid:88) p ∈P k (cid:32) θ k (cid:90) r p,i +1 k z pk ( v pk , t pk ) ( r p,i +1 k − v pk ) dv pk + θ k ζ p,jk (cid:13)(cid:13)(cid:13) t pk − t p,jk (cid:13)(cid:13)(cid:13) (cid:33) + (cid:88) b ∈B λ b ( K (cid:88) k =1 (cid:88) p : { p ∈P k ,b ∈U pk } t pk − C b ) − K (cid:88) k =1 (cid:88) p ∈P k β pk t pk , where r i +1 block is kept ﬁxed. We can decompose the aboveLagrangian across APs as follows: L t,b ( t b , λ b , β b ) = θ k K (cid:88) k =1 (cid:88) p : { p ∈P k ,b ∈U pk } (cid:32) (cid:90) r p,i +1 k z pk ( v pk , t pk ) ( r p,i +1 k − v pk ) dv pk + ζ p,jk (cid:13)(cid:13)(cid:13) t pk − t p,jk (cid:13)(cid:13)(cid:13) (cid:33) + λ b ( K (cid:88) k =1 (cid:88) p : { p ∈P k ,b ∈U pk } t pk − C b ) − K (cid:88) k =1 (cid:88) p : { p ∈P k ,b ∈U pk } β pk t pk , (28)where t b = { t pk } p ∈P k ,b ∈U pk ,k =1: K and β b = { β pk } p ∈P k ,b ∈U pk ,k =1: K ≥ . To develop an algorithm tosolve each subproblem with respect to t b , we use KKTconditions. We write the ﬁrst-order optimality conditions withrespect to t as follows: ∂L t,b ( t b , λ b , β b ) ∂t pk = θ k (cid:90) r p,i +1 k ∂z pk ( v pk , t pk ) ∂t pk ( r p,i +1 k − v pk ) dv pk + λ b + θ k ζ p,jk ( t pk − t p,jk ) − β pk = 0 , (29a) K (cid:88) k =1 (cid:88) p : { p ∈P k ,b ∈U pk } t pk ≤ C b , (29b) λ b ( K (cid:88) k =1 (cid:88) p : { p ∈P k ,b ∈U pk } t pk − C b ) = 0 , λ b ≥ , (29c) β pk t pk = 0 , t pk ≥ , β pk ≥ . (29d)From (29a), we observe that a given dual variable λ b , whichcorresponds to AP b , identiﬁes the reserved resource t pk for alldownlinks created by that AP. The proposed dual algorithmworks as follows: implement a bisection search on λ b in thenon-negative orthant and ﬁnd each t pk : t pk ≥ , which isassociated with the AP b , from (29a) when β pk = 0 . Continuethe bisection search until one λ b is obtained such that for theobtained λ b , we have (cid:80) Kk =1 (cid:80) p : { p ∈P k ,b ∈U pk } t pk = C b . If thereis no such λ b , we set λ b = 0 and solve (29a) and (29d) without(29b)–(29c). Once the optimized variables are obtained, weupdate t p,jk and j = j + 1 . We repeat the same processuntil t j = { t p,jk } p ∈P k ,k =1: K converges. As it is explained inRemark 3, after a sequence of upper-bound minimizations andupdating the proximal terms in the objective function, a KKT(local stationary) solution to the original problem is obtained.If expected outage terms for downlinks are non-increasing in t ,one can show that the successive upper-bound minimizationconverges to the global minima with respect to t . After t j converges, we set t i +1 = t j . C. The Proposed BCD Algorithm

To solve the problem in (24) to a KKT point, we optimizewith respect to two blocks of variables, r and t , alternatively Algorithm 4:

The proposed BCD algorithm to solve(24)0.

Initialization

Feasible initializations for r and t , i = 0 ; repeat

1. Apply Algorithm 3 to solve (24) and ﬁnd r i +1 ;2. Solve (24) with respect to t and ﬁnd t i +1 ;3. i = i + 1 ; until (cid:13)(cid:13) r i − r i − (cid:13)(cid:13) + (cid:13)(cid:13) t i − t i − (cid:13)(cid:13) is small enough ;with the Gauss-Seidel update style. Therefore, if we choose r to update ﬁrst, with r i +1 , we optimize with respect to t , andthen, we update t i +1 . We keep optimizing with respect to r and t alternatively until both blocks converge. The summaryof the overall BCD approach is given in Algorithm 4. Proposition 3.

Algorithm 4 converges to a KKT solution to (24) .Proof.

First, the objective function of (24) is continuouslydifferentiable. Second, feasible sets of two blocks of variablesare separate in (24). Hence, updating one block of variablesdoes not change the other block. Third, in each iteration ofAlgorithm 4, a KKT solution is obtained. Therefore, accordingto [45, Proposition 3.7.1], the proposed (BCD) Algorithm 4converges to a KKT solution. (cid:4)

V. N

UMERICAL T ESTS

In this section, we demonstrate the performance of ourproposed approach against two heuristic algorithms.

A. Simulation Setup

The considered network for evaluations is shown in Fig.2, which includes both the backhaul and radio access parts.A data center is connected to routers of the network throughthree gateway routers, GW , GW , and GW . The networkincludes APs and network routers. APs are distributedon the X-Y plane and they are connected to each other androuters via wired links. The backhaul network has links.Wired link capacities are identical in both directions. Backhaullink capacities are determined as • Links between the data center and routers: Gnats/s; • Links between routers: 2 Gnats/s; • Links between routers and APs: Gnats/s; • Mnats/s; • Mnats/s; • Mnats/s.The considered paths originate from the data center and areextended toward users. We consider users are distributedrandomly in the same plane of APs; however, they are notshown in Fig. 2. User AP associations are determined by thehighest long-term received power. We consider three wirelessconnections, which have the highest received power, to serveeach user. There are three paths for carrying data from a datacenter to APs. The distribution of the demand is log-normal: d k ∼ d k σ k √ π exp( − (ln d k − η k ) σ k ) . (30) Figure 2: A wireless data network consists of APs and routers.In addition, it is assumed that η k is realized randomly from anormal distribution for each user. The power allocations in APsare ﬁxed. The dispensed resource in an AP is bandwidth. Thechannel between each user and an AP is a Rayleigh fadingchannel. The CDF of the wireless channel capacity, whichis parameterized by the allocated bandwidth t pk , is given asfollows [35]: Z pk ( v pk , t pk ) = 1 − exp( 1 − v pk /t pk SNR pk ) , where SNR pk is the average SNR. The PDF of the wirelesschannel capacity is z pk ( v pk , t pk ) = ln(2)2 v pk /t pk exp( − vpk/tpk SNR pk ) SNR pk t pk . (31)Benchmark heuristic algorithms are the single-path and theaverage-based approaches. In the single-path approach, eachuser is served through one path from a data center to auser. Moreover, the average-based algorithm only considers themean of the user demand and the average achievable rate of adownlink. To compare algorithms, with an identical network,we measure the objective function of (5), the sum of userexpected supportable rates, the aggregate expected outage ofdownlinks and the amount of trafﬁc that each algorithm canreserve for users. One datastream is associated with each user.In total, we have paths in the backhaul. We use C toimplement algorithms. B. Learning Probability Density Functions

The optimization problem in (5) takes into account PDFsof user demands and achievable rates of downlinks. WhenPDFs are not given, one can use a data-driven approachto learn PDFs used in (5) based on collected observations.Upon the collection of user demands and achievable ratesof downlinks, one can estimate the PDFs using a recursivenon-parametric estimator. In order to estimate PDFs in anonline streaming fashion, one can use efﬁcient recursive kernel Iteration number -8 -7 -6 -5 -4 -3 -2 -1 || i - i - || Mean of k is 1 (Mnats/s)Mean of k is 2 (Mnats/s)Mean of k is 3 (Mnats/s)Mean of k is 4 (Mnats/s)Mean of k is 5 (Mnats/s)Mean of k is 6 (Mnats/s) Figure 3: The convergence of Algorithm 2.Table I: CPU

TIME FOR ITERATIONS OF A LGORITHM Mean of η k Mnats/s Mnats/s Mnats/s Mnats/sCPU time . s . s . s . sMean of η k Mnats/s Mnats/sCPU time . s . s estimators, such as the Wolverton and Wagner estimator [47].Suppose that independent random variables X , X , . . . , X n are observations that are collected from an identical PDF χ with respect to Lebesgue’s measure. The estimated PDF is ˆ χ n, h n = 1 n n (cid:88) k =1 h k K ( X k − xh k ) , where h n = ( h , h , . . . , h n ) , h > · · · > h n and K ( · ) is akernel function. The advantage of the above estimator is thatit can be written in a recursive form as follows: ˆ χ n +1 , h n +1 = nn + 1 ˆ χ n, h n + 1( n + 1) h n +1 K ( X n +1 − xh n +1 ) , which makes it suitable for real-time applications. The band-width selection in [48] can be used for the above estimator. Thebandwidth h k is selected in [48] as h k = k − γ , k ∈ { , . . . , n } ,where γ = β +1 and β > . C. Simulation Results

Before demonstrating the performance of Algorithm 4,we depict the convergence of Algorithm 2 in Fig. 3. Theconvergence of Algorithm 2 for different means of the userdemand is depicted in Fig. 3. It is observed that Algorithm2 has a fast convergence for the large network of Fig. 2 with paths. Numerical results show that the number of requirediterations for Algorithm 3 to converge for the simulationsetting described above is at most . The CPU time forAlgorithm 3 is measured and is given in Table I.First, let us assume that transmission rates on downlinksare deterministic functions of bandwidth in APs. Therefore,no outage (rate loss) is considered. For each downlink, thetransmission rate and the allocated bandwidth are connected toeach other as r pk = δ pk t pk , where δ pk is the spectral efﬁciency ofthe downlink of path p to serve user k . Furthermore, suppose

15 20 25 30 35 40

Access point bandwidth budget (MHz) k r k ( M na t s / s ) Single-path, E( k )=1 (Mnats/s)Alg. 4, E( k )=1 (Mnats/s)Single-path, E( k )=2 (Mnats/s)Alg. 4, E( k )=2 (Mnats/s)Single-path, E( k )=3 (Mnats/s)ALg. 4, E( k )=3 (Mnats/s) (a)

15 20 25 30 35 40

Access point bandwidth budget (MHz) k E [ m i n (r k , d k ) ] ( M na t s / s ) Single-path, E( k )=1 (Mnats/s)ALg. 4, E( k ) =1 (Mnats/s)Single-path, E( k )=2 (Mnats/s)Alg. 4, E( k )=2 (Mnats/s)Single-path, E( k )=3 (Mnats/s)Alg. 4, E( k )=3 (Mnats/s) (b)

20 25 30 35 40 45 50 55 60

Access point bandwidth budget (MHz) O b j e c t i v e f un c t i on ( M na t s / s ) Alg. 4, E( k )=3 (Mnats/s)Alg. 4, E( k )=3.5 (Mnats/s)Alg. 4, E( k )=4 (Mnats/s)Single-path, E( k )=3 (Mnats/s)Single-path, E( k )=3.5 (Mnats/s)Single-path, E( k )=4 (Mnats/s) (c) Figure 4: (a) Reserved rates by Algorithm 4 (multi-path) andthe single-path approach when wireless channels are determin-istic. (b) The expected supportable rates for users by Algorithm4 and the single-path approach when wireless channels aredeterministic. (c) The objective function of problem (5) withthe single-path approach and Algorithm 4 when wirelesschannels are stochastic.

20 25 30 35 40 45 50 55 60

Access point bandwidth budget (MHz) k E [ m i n (r k , d k ) ] ( M na t s / s ) Alg. 4, E( k )=3 (Mnats/s)Alg. 4, E( k )=3.5 (Mnats/s)Alg. 4, E( k )=4 (Mnats/s)Single-path, E( k )=3 (Mnats/s)Single-path, E( k )=3.5 (Mnats/s)Single-path, E( k )=4 (Mnats/s) (a)

20 25 30 35 40 45 50 55 60

Access point bandwidth budget (MHz) k r k ( M na t s / s ) Alg. 4, E( k )=3 (Mnats/s)Alg. 4, E( k )=3.5 (Mnats/s)Alg. 4, E( k )=4 (Mnats/s)Single-path, E( k )=3 (Mnats/s)Single-path, E( k )=3.5 (Mnats/s)Single-path, E( k )=4 (Mnats/s) (b) Figure 5: Stochastic wireless channels: performance of the single-path approach and Algorithm 4 in terms of (a) the aggregateexpected supportable trafﬁc; and (b) aggregate reserved rates.

20 25 30 35 40 45 50 55 60

Access point bandwidth budget (MHz) A gg r ega t e ou t age ( M na t s / s ) Alg. 4, E( k )=3 (Mnats/s)Alg. 4, E( k )=3.5 (Mnats/s)Alg. 4, E( k )=4 (Mnats/s)Single-path, E( k )=3 (Mnats/s)Single-path, E( k )=3.5 (Mnats/s)Single-path, E( k )=4 (Mnats/s) (a)

20 25 30 35 40 45 50 55 60

Access point bandwidth budget (MHz) k p t k p ( M H z / s ) Alg. 4, E( k )=3 (Mnats/s)Alg. 4, E( k )=3.5 (Mnats/s)Alg. 4, E( k )=4 (Mnats/s)Single-path, E( k )=3 (Mnats/s)Single-path, E( k )=3.5 (Mnats/s)Single-path, E( k )=4 (Mnats/s) (b) Figure 6: Stochastic wireless channels: performance of the single-path approach and Algorithm 4 in terms of (a) expectedoutage of downlinks; and (b) reserved bandwidth.that σ k = 3 . and the capacity of each backhaul link listedpreviously is divided by . When the bandwidth budget of eachAP increases from MHz to MHz, the aggregate reservedrates for users by Algorithm 4 (multi-path) and the single-path approach are shown in Fig. 4a. The aggregate expectedsupportable rates of users with both approaches are depicted inFig. 4b. It is observed that Algorithm 4 outperforms the single-path approach. Both approaches utilize all available bandwidthin APs.Consider the distribution of each wireless channel (down-link) achievable rate follows (31) and backhaul link capacitiesare as listed previously. Suppose that the available bandwidthin each AP increases by a step size of MHz, where θ k = 1 / and σ k = 0 . . The objective function of the problemin (5) by Algorithm 4 and the single-path approach arecompared in Fig. 4c. Our proposed Algorithm 4 outperforms the single-path approach. It is observed that with the increaseof mean for η k and the AP bandwidth budget, the objectivefunction increases.The expected supportable demands of users, depicted in Fig.5a, increases when the mean of η k and the AP bandwidthbudget increase. It is observed from Fig. 5a that the aggregateexpected supportable trafﬁc for users obtained by Algorithm 4is greater than that by the single-path approach. In Fig. 5b, weobserve that the aggregate reserved rates for users increaseswith the increase of mean for η k . Furthermore, it increaseswhen the bandwidth budgets of APs increase. From Fig. 6a,we observe that the aggregate expected outage increases as themean of η k increases and decreases when the AP bandwidthbudget increases. We observe from Fig. 6b that the bandwidthreservation by Algorithm 4 is almost equal to that by thesingle-path approach. Numerical results show that iterations Supply demand ratio P r obab ili t y Alg. 4, E( k ) is 2 (Mnat/s)Ave.-based, E( k ) is 2 (Mnat/s)Alg. 4, E( k ) is 2.2 (Mnat/s)Ave.-based, E( k ) is 2.2 (Mnat/s)Alg. 4, E( k ) is 2.4 (Mnat/s)Ave.-based, E( k ) is 2.4 (Mnat/s)Alg. 4, E( k ) is 2.6 (Mnat/s)Ave.-based, E( k ) is 2.6 (Mnat/s) Figure 7: The probability of being able to support the userdemands up to a certain percentage.are sufﬁcient for the convergence of Algorithm 4.Next, we evaluate the performance of Algorithm 4 againstthe average-based approach when both the demand anddownlink achievable rates are stochastic. The average-basedalgorithm is oblivious to the user demand and the downlinkachievable rate distributions. It only considers the averageof each user demand and the average achievable rate of adownlink. The average-based approach uses the same setof paths used by Algorithm 4. The bandwidth budget ineach AP is 40 MHz. Furthermore, σ k = 0 . and θ k = 1 / .The demand and downlink achievable rate distributions areas given in (30) and (31), respectively. Both approachesare set to make reservations for users assuming the meanof η k is Mnast/s. We generate scenarios in whichuser demands and downlink capacities are random. Foreach scenario, we measure how much the user demands aresatisﬁed using the reserved resources in the network by bothapproaches. After collecting results for scenarios, weplot the empirical CDF for the supply demand ratio in Fig.7. It is observed that when the mean of demand exceedswhat it was supposed to be, the resource reservation made byAlgorithm 4 is more robust and supports random demandsbetter. The total reserved link capacities in the backhaul byAlgorithm 4 is . × Mnats/s and is . × Mnats/s by the average-based approach. Furthermore, the totalreserved bandwidth in RAN by Algorithm 4 is . × MHz and is . × MHz by the average-based approach.VI. C

ONCLUDING R EMARKS AND F UTURE D IRECTIONS

In this paper, we studied link capacity and transmissionresource reservation in wireless data networks prior to theobservation of user demands. Using the statistics of userdemands and achievable rates of downlinks, we formulated anoptimization problem to maximize the sum of user expectedsupportable trafﬁc while minimizing the expected outage ofdownlinks. We demonstrated that this problem is non-convex in general. To solve the problem approximately, an efﬁcientBCD approach is proposed which beneﬁts from distributedand parallel computation when each block of variables ischosen to be updated. We demonstrated that despite the non-convexity of the problem, our proposed approach converges toa KKT solution to the problem. We veriﬁed the efﬁciency andthe efﬁcacy of our proposed approach against two heuristicalgorithms developed for joint resource reservation in thebackhaul and RAN.In future work, we consider multi-tenant networks andreservation-based network slicing. In addition to users, ten-ants have different requirements [49], and maximum isolationbetween sliced resources should be enforced [50]. The demanddistribution of users may change over time and the networkresources should be sliced for tenants accordingly. However,the slice reconﬁguration for each tenant involves cost andoverhead. Based on the cost of reconﬁguration and newlyarrived statistics, we formulate the problem from a sparse op-timization perspective and propose an efﬁcient approach basedon iteratively solving a sequence of group Least AbsoluteShrinkage and Selection Operator (LASSO) problems [49].R

EFERENCES[1] N. Reyhanian, H. Farmanbar, and Z.-Q. Luo, “Resource reservationin backhaul and radio access network with uncertain user demands,”in

Proc. IEEE Signal Process. Adv. Wireless Commun. (SPAWC) , May2020, pp. 1–5.[2] S. Albasheir and M. Kadoch, “Enhanced control for adaptive resourcereservation of guaranteed services in LTE networks,”

IEEE InternetThings J. , vol. 3, no. 2, pp. 179–189, Apr. 2015.[3] K. Kaur, A. Dua, A. Jindal, N. Kumar, M. Singh, and A. Vinel, “A novelresource reservation scheme for mobile PHEVs in V2G environmentusing game theoretical approach,”

IEEE Trans. Veh. Technol. , vol. 64,no. 12, pp. 5653–5666, Dec. 2015.[4] E. Van Den Berg, T. Zhang, J. Chennikara, P. Agrawal, and T. Kodama,“Time series-based localized predictive resource reservation for handoffin multimedia wireless networks,” in

Proc. IEEE Int. Conf. Commun. ,2001, vol. 2, pp. 346–350.[5] R. E. Gomory and T. C. Hu, “An application of generalized linearprogramming to network ﬂows,”

J. Soc. Ind. Appl. Math. , vol. 10, no.2, pp. 260–283, 1962.[6] R. Dai, L. Li, S. Wang, and X. Zhang, “Planning trafﬁc-oblivioussurvivable WDM networks using differentiated reliable partial SRLG-disjoint protection,” in

Proc. Sym. Photon. Optoelectronics , Jun. 2010,pp. 1–7.[7] P. Kumar, Y. Yuan, C. Yu, N. Foster, R. Kleinberg, and R. Soulé, “Kulﬁ:Robust trafﬁc engineering using semi-oblivious routing,” arXiv preprintarXiv:1603.01203 , 2016.[8] C. Cicconetti, V. Gardellin, L. Lenzini, E. Mingozzi, and A. Erta, “End-to-end bandwidth reservation in IEEE 802.16 mesh networks,” in

Proc.IEEE Int. Conf. Mobile Adhoc and Sensor Syst. , Oct. 2007, pp. 1–6.[9] D. Applegate and E. Cohen, “Making routing robust to changing trafﬁcdemands: algorithms and evaluation,”

IEEE/ACM Trans. Netw. , vol. 14,no. 6, pp. 1193–1206, Dec. 2006.[10] N. Moehle, X. Shen, Z.-Q. Luo, and S. Boyd, “A distributed methodfor optimal capacity reservation,”

J. Optim. Theory Appl. , vol. 182, no.3, pp. 1130–1149, May 2019.[11] D. Ma, B. Sheng, S. Jin, X. Ma, and P. Gao, “Short-term trafﬁcﬂow forecasting by selecting appropriate predictions based on patternmatching,”

IEEE Access , vol. 6, pp. 75629–75638, Nov. 2018.[12] M. Yan, G. Feng, J. Zhou, Y. Sun, and Y.-C. Liang, “Intelligent resourcescheduling for 5G radio access network slicing,”

IEEE Trans. Veh.Technol. , vol. 68, no. 8, pp. 7691–7703, Jun. 2019.[13] Q. He, A. Moayyedi, G. Dán, G. P. Koudouridis, and P. Tengkvist, “Ameta-learning scheme for adaptive short-term network trafﬁc prediction,”

IEEE J. Sel. Areas Commun. , Jun. 2020.[14] L. U. Khan, I. Yaqoob, N.-H. Tran, Z. Han, and C.-S. Hong, “Networkslicing: Recent advances, taxonomy, requirements, and open researchchallenges,”

IEEE Access , vol. 8, pp. 36009–36028, Feb. 2020. [15] J. Prados-Garzon, A. Laghrissi, M. Bagaa, T. Taleb, and J. M. Lopez-Soler, “A complete LTE mathematical framework for the network sliceplanning of the EPC,” IEEE Trans. Mobile Comput. , vol. 19, no. 1, pp.1–14, Jan. 2020.[16] Y. Li, J. Liu, B. Cao, and C. Wang, “Joint optimization of radio andvirtual machine resources with uncertain user demands in mobile cloudcomputing,”

IEEE Trans. Multimedia , vol. 20, no. 9, pp. 2427–2438,Sep. 2018.[17] T. Hößler, P. Schulz, E. A. Jorswieck, M. Simsek, and G. P. Fettweis,“Stable matching for wireless URLLC in multi-cellular, multi-usersystems,”

IEEE Trans. Commun. , vol. 68, no. 8, pp. 5228–5241, Aug.2020.[18] D. P. Bertsekas,

Linear network optimization: algorithms and codes ,MIT press, 1991.[19] J. Tsitsiklis and D. Bertsekas, “Distributed asynchronous optimal routingin data networks,”

IEEE Trans. Autom. Control , vol. 31, no. 4, pp. 325–332, Apr. 1986.[20] D. P. Bertsekas,

Network optimization: continuous and discrete models ,Athena Scientiﬁc Belmont, MA, 1998.[21] H. Zhang and V. W. S. Wong, “A two-timescale approach for networkslicing in C-RAN,”

IEEE Trans. Veh. Technol. , vol. 69, no. 6, pp. 6656–6669, Jun. 2020.[22] L. Li and A. J. Goldsmith, “Capacity and optimal resource allocationfor fading broadcast channels. II. outage capacity,”

IEEE Trans. Inf.Theory , vol. 47, no. 3, pp. 1103–1127, Mar. 2001.[23] X. Liao, J. Shi, Z. Li, L. Zhang, and B. Xia, “A model-driven deepreinforcement learning heuristic algorithm for resource allocation inultra-dense cellular networks,”

IEEE Trans. Veh. Technol. , vol. 69, no.1, pp. 983–997, Jan. 2019.[24] V. Sciancalepore, X. Costa-Perez, and A. Banchs, “RL-NSB: Reinforce-ment learning-based 5G network slice broker,”

IEEE/ACM Trans. Netw. ,vol. 27, no. 4, pp. 1543–1557, Aug. 2019.[25] Y. Liang and V. V. Veeravalli, “Gaussian orthogonal relay channels:Optimal resource allocation and capacity,”

IEEE Trans. Inf. Theory ,vol. 51, no. 9, pp. 3284–3289, Sep. 2005.[26] S.-J. Kim and G. B. Giannakis, “Optimal resource allocation for MIMOad–hoc cognitive radio networks,”

IEEE Trans. Inf. Theory , vol. 57, no.5, pp. 3117–3131, May 2011.[27] W.-C. Liao, M. Hong, Y.-F. Liu, and Z.-Q. Luo, “Base station activationand linear transceiver design for optimal resource management inheterogeneous networks,”

IEEE Trans. Signal Process. , vol. 62, no.15, pp. 3939–3952, Jul. 2014.[28] L. Xiao, M. Johansson, and S. P. Boyd, “Simultaneous routing andresource allocation via dual decomposition,”

IEEE Trans. Commun. ,vol. 52, no. 7, pp. 1136–1144, Jul. 2004.[29] A. A. El-Sherif and A. Mohamed, “Joint routing and resource allocationfor delay minimization in cognitive radio based mesh networks,”

IEEETrans. Wireless Commun. , vol. 13, no. 1, pp. 186–197, Jan. 2014.[30] K. Wang, K. Yang, and C. S. Magurawalage, “Joint energy minimizationand resource allocation in C-RAN with mobile cloud,”

IEEE Trans.Cloud Comput. , vol. 6, no. 3, pp. 760–770, Sep. 2018.[31] S. Matoussi, I. Fajjari, S. Costanzo, N. Aitsaadi, and R. Langar, “5GRAN: Functional split orchestration optimization,”

IEEE J. Sel. AreasCommun. , vol. 38, no. 7, pp. 1448–1463, Jul. 2020.[32] J. Liu, Y. Pang, H. Ding, L. Cai, H. Zhang, and Y. Fang, “OptimizingIoT Energy Efﬁciency on Edge (EEE): a cross-layer design in a cognitivemesh network,” arXiv preprint arXiv:1901.05494 , 2019.[33] H. Kordbacheh, H. Dalili Oskouei, and N. Mokari, “Robust cross-layerrouting and radio resource allocation in massive multiple antenna andOFDMA-based wireless ad-hoc networks,”

IEEE Access , vol. 7, pp.36527–36539, Mar. 2019.[34] K. Karakayali, J. H. Kang, M. Kodialam, and K. Balachandran, “Jointresource allocation and routing for OFDMA-based broadband wirelessmesh networks,” in

Proc. IEEE Int. Conf. Commun. (ICC) , Jun. 2007,pp. 5088–5092.[35] S. Choudhury and J. D. Gibson, “Information transmission over fadingchannels,” in

Proc. IEEE Global Commun. Conf. , Nov. 2007, pp. 3316–3321.[36] D. Wu and R. Negi, “Effective capacity: a wireless link model forsupport of quality of service,”

IEEE Trans. Wireless Commun. , vol. 2,no. 4, pp. 630–643, Jul. 2003.[37] O. Ertug, “Asymptotic ergodic capacity of multidimensional vector-sensor array MIMO channels,”

IEEE Trans. Wireless Commun. , vol. 7,no. 9, pp. 3297–3300, Sep. 2008.[38] W.-C. Liao, M. Hong, H. Farmanbar, and Z.-Q. Luo, “A distributedsemiasynchronous algorithm for network trafﬁc engineering,”

IEEETrans. Signal Inf. Process. Netw. , vol. 4, no. 3, pp. 436–450, Sep. 2018. [39] H. K. Nguyen, Y. Zhang, Z. Chang, and Z. Han, “Parallel and distributedresource allocation with minimum trafﬁc disruption for network virtu-alization,”

IEEE Trans. Commun. , vol. 65, no. 3, pp. 1162–1175, Mar.2017.[40] Z. Wu, Z. Fei, Y. Yu, and Z. Han, “Toward optimal remote radiohead activation, user association, and power allocation in C-RANs usingbenders decomposition and ADMM,”

IEEE Trans. Commun. , vol. 67,no. 7, pp. 5008–5023, Jul. 2019.[41] D. P. Palomar and M. Chiang, “A tutorial on decomposition methodsfor network utility maximization,”

IEEE J. Sel. Areas Commun. , vol.24, no. 8, pp. 1439–1451, Aug. 2006.[42] P. Kemmer, A. K. Strauss, and T. Winter, “Dynamic simultaneous fareproration for large-scale network revenue management,”

J. Oper. Res.Soc. , vol. 63, no. 10, pp. 1336–1350, 2012.[43] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A uniﬁed convergenceanalysis of block successive minimization methods for non-smoothoptimization,”

SIAM J. Optim. , vol. 23, no. 2, pp. 1126–1153, 2013.[44] S. Boyd and L. Vandenberghe,

Convex optimization , Cambridgeuniversity press, 2004.[45] D. P. Bertsekas,

Nonlinear Programming , Athena Scientiﬁc, 3rd ed.,2016.[46] M. Hong, M. Razaviyayn, Z.-Q. Luo, and J.-S. Pang, “A uniﬁedalgorithmic framework for block-structured optimization involving bigdata: With applications in machine learning and signal processing,”

IEEESignal Process. Mag. , vol. 33, no. 1, pp. 57–77, Dec. 2015.[47] C. Wolverton and T. V. Wagner, “Asymptotically optimal discriminantfunctions for pattern classiﬁcation,”

IEEE Trans. Inf. Theory , vol. 15,no. 2, pp. 258–265, Mar. 1969.[48] F. Comte and N. Marie, “Bandwidth selection for the Wolverton–Wagner estimator,”

J. Stat. Planning Inference , vol. 207, pp. 198–214,2020.[49] N. Reyhanian, H. Farmanbar, and Z.-Q. Luo, “Data-driven adaptivenetwork resource slicing for multi-tenant networks,” in

Proc. IEEE Int.Conf. Acoust. Speech Signal Process (ICASSP) , Jun. 2021.[50] N. Reyhanian and B. Maham, “Statistical slice selection in multi-tenantnetworks with maximum isolation of reserved resources,” in

Proc. 54thAsilomar Conf. Signals, Syst. Comput. , Paciﬁc Grove, CA, Nov. 2020. A PPENDIX B LOCK S UCCESSIVE U PPER -B OUND M INIMIZATION

Notations in this Appendix are identical to [43] and are notrelated to those deﬁned in the paper. According to the BSUMalgorithm [43, Theorem 2], when an upper-bound satisﬁesfour conditions, the solution acquired by the BSUM convergesto a local minima to the problem. Here, we give a briefdescription of the BSUM approach. Suppose that u i ( x , x t − ) is an upper-bound for an arbitrary objective function f ( x ) atthe point x t − . In iteration t , one selected block (say, blocki) is optimized by solving the following subproblem: min x i u i ( x i , x t − ) s.t. x i ∈ X i , (32)where X i is the feasible set of block x i . Conditions on theupper-bound are listed in [43, Assumption 2] as follows:1) u i ( y i , y ) = f ( y ) , ∀ y ∈ X , ∀ i, u i ( x i , y ) ≥ f ( y , . . . , y i − , x i , y i +1 , . . . , y n ) , ∀ x i ∈X i , ∀ y ∈ X , ∀ i, u (cid:48) i ( x i , y ; d i ) | x i = y i = f (cid:48) ( y ; d ) , ∀ d =(0 , . . . , d i , . . . , s.t. y i + d i ∈ X i , ∀ i, u i ( x i , y ) is continuous in ( x i , y ) , ∀ i .When problem (32) is solved sequentially for different i and there exists a unique solution for each subproblem, x t converges to a KKT point of f ( x ))