Blind Optimal User Association in Small-Cell Networks
Livia Elena Chatzieleftheriou, Apostolos Destounis, Georgios Paschos, Iordanis Koutsopoulos
BBlind Optimal User Association inSmall-Cell Networks
Livia Elena Chatzieleftheriou
Athens University of Economicsand Business, [email protected]
Georgios Paschos
Amazon, [email protected]
Apostolos Destounis
Huawei Technologies, [email protected]
Iordanis Koutsopoulos
Athens University of Economicsand Business, [email protected]
Abstract
We learn optimal user association policies for traffic from different locations toAccess Points(APs), in the presence of unknown dynamic traffic demand. We aimat minimizing a broad family of α -fair cost functions that express various objectivesin load assignment in the wireless downlink, such as total load or total delayminimization. Finding an optimal user association policy in dynamic environmentsis challenging because traffic demand fluctuations over time are non-stationaryand difficult to characterize statistically, which obstructs the computation of cost-efficient associations. Assuming arbitrary traffic patterns over time, we formulatethe problem of online learning of optimal user association policies using the OnlineConvex Optimization (OCO) framework. We introduce a periodic benchmark forOCO problems that generalizes state-of-the-art benchmarks. We exploit inherentproperties of the online user association problem and propose PerOnE, a simpleonline learning scheme that dynamically adapts the association policy to arbitrarytraffic demand variations. We compare PerOnE against our periodic benchmark andprove that it enjoys the no-regret property, with additional sublinear dependence ofthe network size. To the best of our knowledge, this is the first work that introducesa periodic benchmark for OCO problems and a no-regret algorithm for the onlineuser association problem. Our theoretical findings are validated through results ona real-trace dataset. To appear in IEEE International Conference on Computer Communications - INFOCOM, 10-13 May2021, Virtual Conference. a r X i v : . [ c s . N I] J a n . I NTRODUCTION
Communication networks in the Beyond 5G (B5G)/6G era are envisioned tosupport ultra-low latency and bandwidth-damanding services, like those enabledby Internet of Things (IoT) or autonomous vehicles. Two key technologicalenablers of such services in future communication networks are novel net-work architectures and the embedded use of Artificial Intelligence (AI) [1].The new architectures will generalize the Coordinated MultiPoint transmission(CoMP), where APs cooperate to jointly serve requests within their coveragearea, and each user’s traffic may be served by more than one AP. The pervasiveintroduction of AI at the network edge, including distributed algorithms forproactive learning and prediction of unknown dynamic processes in the system,will enable the self-optimization of network resource allocation.In the envisioned ultra-dense wireless networks, devices will be in rangeof multiple Access Points (APs). These enhanced association possibilities willbring more degrees of freedom, and additional possibilities for optimization. Thenumerous devices and association alternatives call for a fast and agile user-to-APassociation scheme. This is of vital importance for the upcoming bandwidth-demanding services, especially for the downlink, that supports the majority oftraffic. Moreover, traffic demand at different locations heavily fluctuates duringthe day. This could happen, for example, due to sudden changes in the existingsources, or due to new unpredictable sources of traffic. Thus traffic is generallynon-stationary during the day, which complicates its accurate statistical char-acterization and precludes the use of approaches that operate under stationaryregimes, such as Lyapunov optimization.In this work we perform blind user associations on-the-fly, without anyassumption or information about the actual traffic demand. We use the OnlineConvex Optimization (OCO) framework to produce updated solutions incremen-tally, by readjusting existing ones as new samples are observed. We considerarbitrarily time-varying traffic demand for different locations and allocate it toAPs, which we model as queues that capture their own load. These queues forman association policy whose cost belongs to a broad family of α -fair functionsof the load at APs, including as special cases several objectives, such as delayor load minimization. We aim at producing association policies that minimizeregret, i.e. , the deviation of the cost of our online association, from that of theoptimal offline association that knows in hindsight the traffic variations. Weintroduce OPS, a novel periodic benchmark that generalizes state-of-the-art,and then propose PerOnE: an online algorithm that quickly adapts to unpre-dictable traffic variations, and that learns scalable and asymptotically optimaluser association policies for downlink traffic routing to different locations. . Contributions The contributions of our work to the literature are as follows: • We provide a model and an OCO formulation for the problem of OnlineLearning (OL) of how to dynamically associate traffic of geographic lo-cations (and therefore users) to APs. Our objective cost function modelsvarious targets for communication networks, such as AP load or delayminimization. • We introduce Optimal Periodic Static (OPS), a novel peri- odic benchmarkfor OL problems that generalizes state-of-the-art. In cases of traffic peri-odicity, a benchmark where the association is the same during the day isnot suitable. OPS is appropriate to compare against, because the optimalpolicy will most likely be periodic as well. • We identify and exploit inherent properties of the online problem anddesign PerOnE, an efficient online algorithm that produces cost-effectiveassociation policies under arbitrary changes in traffic demand, with lack ofinformation about the actual generated traffic and its statistical properties.PerOnE stems from Online Mirror Descend. • We prove PerOnE’s asymptotical optimality, as it achieves regret sublinearto the time-horizon against OPS, that knows traffic variations in hindsight.Further, PerOnE’s regret also scales sublinearly with the network size,which renders it a valid association scheme for the upcoming large wirelessnetworks. • Our evaluation with publicly available traffic traces confirms the derivedanalytical results, showing that our algorithm achieves zero regret asymp-totically. In fact, its performance appears to be near-optimal with respectto a dynamic algorithm that chooses the optimum user association in eachtime slot.In section II we present the state-of-the-art. In section III we describe themodel and the static user association problem. In section IV we introduce andanalyse OPS. In section V we perform a transformation of the static formulationconcluding to an OCO formulation. We then design PerOnE, proving that itsregret against OPS is sublinear both to the time horizon and to the problemdimension. Finally, in section VI we evaluate our scheme on a real traffic dataset.II. R
ELATED W ORK
User association (UA).
A widely adopted optimization framework is Net-work Utility Maximization (NUM) [2], which is exemplified further for APassociation. It considers a broad family of convex utility functions of the APs’load, capturing a variety of objectives, such as load balancing. The followingworks also consider convex cost functions. In [3] an iterative, distributed andeterministic UA policy that is asymptotically optimal for NUM is presented.The authors in [4] propose an exponentiated gradient algorithm for NUM,proving its convergence rate to the optimum UA. In [5] load balancing acrossAPs is considered. Iterative and combinatorial algorithms that perform localadjustments are presented. In [6] the dynamic load balancing is studied bycapturing the system state with fluid equations, and an asymptotically optimalsimple myopic strategy is presented. The authors in [7] predict future trafficbased on the traffic history by using robust optimization tools and propose aniterative UA technique that minimizes costs.UA is seen jointly with channel assignment in [8] for minimizing the numberof channels needed to serve users. After applying an iterative load balancingalgorithm, the problem reduces to a simple channel allocation problem. Thework [9] additionally considers transmission power, quantifying limits of theachievable gains. In [10] and [11] UA is seen jointly with content cachingfor cache hit ratio maximization, and low-complexity practical schemes arepresented. The fast-converging scheme of [10] iterates between UA and con-tent caching, while in [11] users are initially clustered based on their contentpreferences, and then clusters are assigned to APs. The work [12] additionallyconsiders content recommendation. A simple three-step scheme that sequentiallyperforms a preference-aware UA with service guarantees, a recommendation-aware cache placement, and an adjustment of content recommendations, revealsthe gains that can be achieved when UA is considered jointly with contentcaching and recommendations (as introduced in [13]). Despite their interestingresults, works [2]–[5], [8]–[12] consider only static UA instances, work [6]focuses on load balancing, and work [7] performs complex computations onthe historical traffic.
OCO theory.
The goal in OCO is the minimization of regret against a staticbenchmark, where regret is the worst-case deviation of the preformance ofonline algorithms from the optimal algorithm that knows all data in hindsight,but is restricted to a single action for the entire time horizon T . The followingworks consider convex and Lipschitz-continuous objective functions, adversarialconstraints and decisions taken over a convex set. In [14] a general class ofOnline Gradient Ascent (OGA) algorithms with O ( √ T ) regret is introduced.The authors in [15] substitute OGA’s projection with a Frank-Wolfe linearoptimization step, achieving O ( √ T ) regret for stochastic and adversarial costs.In [16] time-varying stochastic constraints under a stochastic Slater assumptionare studied, and a drift-plus-penalty algorithm with O ( √ T ) expected regretis presented. In [17] regret is systematically balanced with constraint viola-tion. Combining stochastic optimization [16] and standard OCO [14] methods, O ( KT /V + √ T ) regret for O ( √ V T ) constraint violation is achieved, where ig. 1. At each time slot t location i ∈ I requests traffic with intensity λ i ( t ) , which can be split intoportions π ji ( t ) λ i and served by different APs j ∈ N i in its neighbourhood. K = T k , k ∈ [0 , and V ∈ [ K, T ) . These works do not consider the dimensionof the problem in their solutions, which in our case is the size of the network,and either consider no constraints [14], or rely on heavier assumptions on theinput [16], [17]. OCO in network resource allocation.
The authors in [18] study onlinecontent caching under unknown file popularity. Their no-regret algorithm adaptscaching and routing decisions to any file request pattern. In [19] an asymptot-ically optimal online learning algorithm for video rate adaptation in HTTPAdaptive Streaming under no channel model assumptions is presented. Thework [20] studies network power and bandwidth allocation under adversarialcosts with bounded variations in consecutive slots. Constraints are satisfied onaverage, tolerating instantaneous violations. Under an additional Slater assump-tion, their algorithm achieves sublinear regret against a benchmark that takesthe optimal decision in each time slot.Our work is the first one that applies OCO to the minimum-cost UA problem.Our scheme achieves no-regret in UA decisions, with sublinear depencence bothon the time horizon and on the network size, under no assumptions on the input.Our work also introduces a novel periodic benchmark that generalizes state-of-the-art.II. S
YSTEM MODEL AND PROBLEM FORMULATION
Basic definitions.
We start by providing some definitions and function prop-erties that are needed throughout the paper. Although we later consider differ-entiable cost functions, the results of this paper are valid for any other costfunction, considering ∇ f ( x ) to also stand for a subgradient of f ( · ) at point x . • Convexity . A function f ( x ) : A → B is convex iff ∀ x , x ∈ A,f ( x ) − f ( x ) ≤ (cid:104)∇ f ( x ) , x − x (cid:105) , for (cid:104) a , b (cid:105) the inner product of a and b . If it exists, the Hessian matrix of aconvex function is positive semi-definite, and vice versa. • p -norm and its dual norm . Let x ∈ R d . Its p -norm is defined as (cid:107) x (cid:107) p := (cid:32) d (cid:88) i =1 | x i | p (cid:33) / p . A q -norm is said to be the dual of p -norm iff p + q = 1 . • Lipschitz-continuity . A function f ( x ) : A → B is Lipschitz-continuous iff the p -norm of the gradient is bounded, i.e. , if ∃ L : (cid:107)∇ f ( x ) (cid:107) p = L < + ∞ . • Strong convexity . A function f ( x ) : A → B is σ -strongly-convex w.r.t. a p -norm iff ∀ x , x ∈ A , f ( x ) ≥ f ( x ) + (cid:104)∇ f ( x ) , x − x (cid:105) + σ (cid:107) x − x (cid:107) p . Model components.
We consider downlink transmissions in a geographicalarea that is partitioned into locations i ∈ I and is covered by a set J of APs.We define the ”neighbour-hood” set N j , j ∈ J , as the subset of locations thatcan be served by AP j . Similarly, N i , i ∈ I , is the subset of APs that can servetraffic of location i . An overview of our model and relevant notation are givenin Fig. 1 and Table I, respectively. Location traffic.
We denote as λ = ( λ i ) i ∈I the traffic in-tensity vector.Each element λ i ≥ is the aggregate amount (in packets/second) of requestedtraffic of all users in location i , modeled as a random variable from a generaldistribution. Access Point (AP) load.
The traffic requested by a location i can be servedby multiple APs, those in N i . An association policy determines the associationcontrol variables π ji ∈ [0 , denoting the fraction of traffic λ i which is routedfrom AP j to location i . Each location’s demand must be entirely served, soits association variables are constrained to lie in the probability simplex. Thus, ∀ i ∈ I , ( π ji ) j ∈J ∈ Π , where: Π = (cid:8) x ∈ [0 , |J | : (cid:88) j ∈J x j = 1 (cid:9) . (1)Following an association decision π = ( π ji ) j ∈J ,i ∈I , AP j transmits an aggregatedemand intensity (cid:80) i ∈N j λ i π ji . The packet transmission process at each APis modeled as a queuing process. Prior work [3] has shown that statistical odel I Set of locations i J Set of APs j N i Set of neighbour APs for location i N j Set of neighbour locations for AP jλ i Intensity of traffic requested at i π ji Fraction of λ i routed to i by jρ j Total load at AP j ρ Load threshold Π Probability simplex Ω Feasibility set φ α ( · ) Cost function T Time horizon t Time slot K Number of zones in each period W k Time window: Set of slots in zone k Equivalent problem formulation and Association Algorithm V ( · ) Penalty-featured costs Ω (cid:48) Extended feasibility set L Lipschitz constant for
V h ( · ) Regularization function g ( · ) Mirror function Θ Matrix with gradient information t τk τ -th time slot in window W k TABLE IN
OTATION TABLE multiplexing effects can be captured by modeling this queue with processorsharing service. Assuming the packets have exponentially distributed sizes withmean /ω , and denoting as C ji the average transmission rate from BS j tolocation i (averaged over the channel statistics), the load ρ j of BS j is ρ j ( π , λ ) = (cid:88) i ∈N j λ i π ji ωC ji . Let ρ = ( ρ j ) j ∈J . The AP traffic load is a measure of the percentage of time theAP is busy with packet transmission. When ρ j < , AP j is stable in the sensethat its packet transmission queue does not grow unbounded. Values close to 1indicate large delays. If ρ j > , the AP queue is unstable and grows withoutlimit. It results in infinite delays and bad user experience, and therefore mustbe avoided. To ensure stability and a high-quality service in terms of delay forthe end-users, association decisions π are constrained so that: ρ j ( π , λ ) ≤ ρ , ∀ j ∈ J , (2)where ρ ∈ (0 , a load threshold. Combining (1) and (2), the feasible set forassociation variables is: Ω = π ∈ Π |I| : (cid:88) i ∈N j λ i π ji ωC ji ≤ ρ , ∀ j ∈ J . (3) Cost function.
Let φ ( π , λ ) be the system cost as a result of associationpolicy π under traffic λ . Our cost functions belong to the following family ofconvex and Lipschitz-continuous in π functions [2], for ρ ≤ ρ and α ≥ : φ α ( π , λ ) = (cid:88) j ∈J φ jα ( π , λ ) , (4)here φ jα ( π , λ ) = (cid:26) α − (1 − ρ j ( π , λ )) − α , α (cid:54) = 1 − log(1 − ρ j ( π , λ )) , α = 1 . (5)To confirm convexity in π when ρ < ρ , observe that the cost functionsare twice differentiable with positive second derivative, hence their Hessianmatrix is positive semidefinite, which implies convexity. To confirm Lipschitz-continuity in π , observe that when ρ j < ρ , then ∀ j, i , ∇ φ α ( π ji , λ ji ) is bounded,and so is the p -norm (cid:107)∇ φ α ( π , λ ) (cid:107) p . We will rely on both the convexity and theLipschitz-continuity of the cost function to design our online user associationalgorithm and prove its performance guarantees.Different values of α lead to different cost functions. For example, for α = 0 , (4) reduces to the total system load, φ ( π , λ ) = (cid:80) j ∈J ρ j ( π , λ ) . For α = 2 , itis φ ( π , λ ) = (cid:80) j ∈J (1 − ρ j ( π , λ )) − , and (4) is equivalent to the average delayexperienced by a typical demand flow in a stationary system under a temporalfair scheduler, e.g. , round robin [3]. Optimal user association for known demand.
If the traffic demand vector λ is known, the association policy that minimizes the system costs is found bysolving problem: Problem 1 (Optimal user association for known demand) . min π φ α ( π , λ ) ,s.t. (cid:88) j ∈N i π ji = 1 , ∀ i ∈ I ,ρ j ( π , λ ) ≤ ρ , ∀ j ∈ J . Problem 1 is a convex minimization problem of a Lipschitz-continuous costfunction, on the intersection of simplex and hyperplane constraints. Therefore,it can be solved through convex optimization methods [21]. Such instances arealready studied in literature. In this work we focus on online instances, wherethe traffic demand is unkown at the time of the decision.
Time dynamics.
We capture time dynamics by denoting as λ ( t ) , π ( t ) , and φ α (cid:0) π ( t ) , λ ( t ) (cid:1) , the traffic vector, association decision, and resulting system costduring time slot t ∈ { , , . . . , T } , where T a time horizon.IV. A NOVEL PERIODIC BENCHMARK AND R EGRET
Adversarial Online Learning.
In realistic conditions, the traffic λ ( t ) forthe next time slot is unknown. Therefore, the association decisions π ( t ) for slot t will be computed based on knowledge of traffic demands λ ( t − . After thedecision π ( t ) is taken, the actual demand λ ( t ) emerges, and the actual value φ α (cid:0) π ( t ) , λ ( t ) (cid:1) of the cost function is revealed. This lack of information duringthe decision π ( t ) at time slot t may imply additional costs, or even instabilityof AP packet transmission queues, due violation of the load threshold. ig. 2. Toy example to demonstrate periodicity, with T = 18 time slots in the time horizon, P = 3 periods, K = 2 time zones in each period, Z = 3 time slots in each time zone during each period. Thetime windows are: W = { , , , , , , , , } , W = { , , , , , , , , } . An appropriate setting for such online optimization problems is Online Con-vex Optimization (OCO) [22], [23]. We assume traffic demand vectors λ ( t ) , and therefore cost functions φ α (cid:0) π ( t ) , λ ( t ) (cid:1) , to be arbitrarily selected by anadversary who tweaks them without adhering to any probability distribution,aiming at obstructing our decisions. While in reality the traffic vectors are notchanged by such an adversary, this framework offers a convenient way to designalgorithms with provable worst-case guarantees under arbitrary variations ofsystem parameters. Traffic periodicity.
The inherent nature of human activity results in trafficperiodicity. For example, daily and weekly patterns can be observed due topeople going at work or returning at home. Motivated by this we introduceour periodic benchmark. It generalizes state-of-the-art and characterizes theperformance of online algorithms, being in between of the two extremes: thestatic benchmark [14] and the dynamic one.We divide the time horizon T in P periods, and each period p in K timezones. Without loss of generality, let each time zone k contain Z time slotsin each period. Then P = T / KZ , and each period p contains KZ slots. Thisnaturally defines a time window W k for each time zone k , which includes alltime slots that belong to time zone k , across all periods. It is: W k = (cid:110) t : t = KZ ( p −
1) + Z ( k −
1) + τ, where τ ∈ { , ..., L } , p ∈ { , ..., P } (cid:111) , k = 1 , ..., K. The K time zones define the manner in which each period is partitioned, whiletime windows include all time slots of a time zone across the time horizon.This partitioning of the time horizon captures any type of periodicity, e.g. ,daily, weekly, or any underlying combination. Assuming daily periodicity inour toy example of Fig. 2, the time horizon T is divided in P = 3 days, eachhaving K = 2 zones. During each period, each zone contains Z = 3 time slots, lgorithm 1 Optimal Periodic Static (OPS) benchmark policy
Input:
Traffic vectors λ ( t ) , t ∈ { , . . . , T } , partition of time horizon in timewindows W k , k ∈ { , . . . , K } . Output:
K optimal periodic static policies π ∗ = (cid:8) π ∗ [ k ] (cid:9) k , one for each timewindow W k . for k = 1 to K do Compute optimal static association policy in zone k , π ∗ [ k ] = arg min π ∈ Ω (cid:88) t ∈W k φ a (cid:0) π , λ ( t ) (cid:1) (6) end for each of 4 hours duration.We want to stress that we do not consider periodicity on the traffic demands :traffic vectors λ are considered to have arbitrary variations during time. Onthe contrary, we aim to capture possible (approximate) periodic-like “patterns”or “trends” that may exist. In fact, a key contribution of this work is theintroduction of the following periodic benchmark. Regret against the Optimal Periodic Static (OPS) algorithm: a novelperiodic benchmark.
Given a sequence of traffic vectors λ (1) , λ (2) , . . . , λ ( T ) over a time horizon T , OPS consists in finding K user association policies π ∗ [1] , π ∗ [2] , . . . , π ∗ [ K ] , one for each time zone k . Each static policy π ∗ [ k ] isoptimal regarding only traffic loads in the respective time window W k , and isdefined as in (6). This novel periodic benchmark exploits possible approximatetraffic periodicity and allows the comparison of dynamic online policies to staticassociation rules that change according to the general traffic characteristics ineach window. For example, it is possible to consider two different associationpolicies, one for peak hours and one for hours with low traffic, and compareour dynamic policy against these. In the toy example of fig. 2, OPS would findtwo static association policies: one for W and one for W . We provide itspseudocode in Algorithm 1.A performance metric that characterizes the learning performance of an onlinealgorithm is regret : the difference between the performance, which in our caseis the experienced cost, between an online policy and a benchmark. Let π A ( t ) be the decision taken by an online algorithm A at slot t . The regret Reg A ( T, K ) of A with respect to OPS, for K time zones in each period over a time horizon T , is: Reg A ( T, K ) := T (cid:88) t =1 φ a ( π A ( t ) , λ ( t )) − K (cid:88) k =1 (cid:88) t ∈W k φ a ( π ∗ [ k ] , λ ( t )) . (7) PS vs. existing benchmarks and regrets. We remind the reader of theoptimal static and optimal dynamic benchmark policies as defined in [14], whichwe denote as π ∗ S and π ∗ D ( t ) , respectively. The optimal static benchmark knowsall traffic changes in hindsight and finds one user association policy π ∗ S thatminimizes the costs over the entire time horizon, i.e. , π ∗ S := arg min π ∈ Ω T (cid:88) t =1 φ a (cid:0) π , λ ( t ) (cid:1) . (8)On the contrary, the optimal dynamic benchmark knows all traffic changes butaims at minimizing the cost functions for each time slot t and finds one userassociation π ∗ D ( t ) : π ∗ D ( t ) := arg min π ∈ Ω φ a (cid:0) π , λ ( t ) (cid:1) , ∀ t = 1 , . . . T. (9)OPS generalizes the state-of-the-art benchmarks, and the respective regretsagainst them. It is easily verifiable that: • For K = 1 , then W k = { , . . . , T } , π ∗ [ k ] ≡ π ∗ S , and (7) reduces to thestatic regret. • For K = T, then W k = { t = k } , π ∗ [ k ] ≡ π ∗ D ( t ) , and (7) reduces to thedynamic regret. Online learning with “no regret”.
A desirable property for the regret isto scale sublinearly with the time horizon T , i.e. , Reg A ( T, K ) = o ( T ) . In thiscase, lim T → + ∞ Reg A ( T, K ) T = 0 , and the online algorithm A is said to have ”no regret”, which means that itlearns to perform as well as the benchmark asymptotically as the time horizon T → + ∞ .Another desirable feature for online algorithms is to have scalable regret.This happens when regret is also sublinear to the problem dimension d , whichin our case equals |J | · |I| . It means that by increasing the size of the networkby a unit, a sublinear increase in the regret is implied. At the moment, mostregret results arrive at √ dT . In low dimensions, this is a good result, implyingthe ability to learn quickly. However, as d starts to grow and becomes d ∼ T ,the above expression results in a regret O ( T ) , which means that learning is notattainable in the long run. Indeed, large systems may require a very large horizon T to learn - unless we are able to decrease the dependence of regret expressionto d . The above is thus a property of vital importance for the envisioned futurelarge-scaled communication networks.. O NLINE U SER A SSOCIATION A LGORITHMWITH N O R EGRET
A. Augmented penalty function
In order to avoid overloading cells, we reformulate the user associationproblem with the use of a penalty function that is added to the total cost,while removing the constraints. The penalty is active and adds to the cost whenconstraints are violated, i.e. , when ∃ j : ρ j > ρ . The set of optimal solutionsremains the same, because the structure of the problem and the coupling withthe load-constraints now appear in the objective.Our penalty function B j ( π ( t ) , λ ( t )) for overloading AP j could be anyconvex and Lipschitz-continuous in ρ function, such as B j ( π ( t ) , λ ( t )) = ψ ∇ φ jα ( ρ ) · ( ρ j − ρ ) , where ψ > a penalty factor for AP-overloading. This captures the cost for eachoverloaded AP as the linear extension of the cost function at the overloadingpoint ρ . Then, the cost function becomes: V (cid:0) π ( t ) , λ ( t ) (cid:1) = (cid:88) j ∈J V j (cid:0) π ( t ) , λ ( t ) (cid:1) , where (10) V j (cid:0) π ( t ) , λ ( t ) (cid:1) = (cid:40) φ jα ( ρ j ) , ρ j ≤ ρ φ jα ( ρ ) + ∇ φ jα ( ρ ) · ( ρ j − ρ ) , ρ j > ρ . The optimal user association problem reduces to:
Problem 2 (Online user association for unknown demand) . min π ( t ) ∈ Ω (cid:48) T (cid:88) t =1 V (cid:0) π ( t ) , λ ( t ) (cid:1) − K (cid:88) k =1 (cid:88) t ∈W k V (cid:0) π ∗ [ k ] , λ ( t ) (cid:1) . where Ω (cid:48) = (cid:8) π : π ∈ Π |I| (cid:9) . (11)This is a typical formulation for online learning problems. It aims at findingonline a sequence of association policies π ( t ) , t = 1 , ..., T that minimize regret, i.e. , the deviation of online decisions from those of an offline benchmark, whichin this work is OPS. Compared to Problem 1, the feasibility set is expanded to asimplex for each location i and it remains convex. This penalty formulation willenable us to perform a customised modification of a traditional algorithm, basedon the specific characteristics of the new feasible space. The objective functionremains convex and Lipschitz-continuous, as the sum of such functions. Bothconvexity and Lipschitz-continuity are crucial properties for proving that theonline algorithm we will design has no regret against OPS.We will analyze the regret with respect to this augmented cost. Since thelinear part comes into play only when ρ j > ρ (which does not happen in lgorithm 2 Online Mirror Descent (OMD)
Input:
Mirror function g : R |J ||I| → Ω (cid:48) , stepsize η , objective function V ( · ) Output:
User association π ( t ) , ∀ t = 1 ...T Initialize: Θ (1) = for t=1, 2, ..., T do decide association π ( t ) = g (cid:0) Θ ( t ) (cid:1) update Θ ( t + 1) = Θ ( t ) − ∇ V (cid:0) π ( t ) , λ ( t ) (cid:1) end for the benchmark), a sublinear regret here implies sublinear regret for the φ α ( · ) functions as well. B. PerOnE: Online user association with no regret
Online Mirror Descent.
A general class of online schemes with no regretagainst the static benchmark is Online Mirror Descent (OMD) [22], presentedin Algorithm 2. It gives the opportunity to exploit the feasibility set of ourproblem, and leads to decision updates that lie in the feasible set without theneed for expensive projections. OMD computes the current decision from theprevious one using a simple gradient update rule. Let Θ ( t ) be a matrix ofdimension |J | x |I| , initialized as Θ (1) = , and updated as Θ ( t ) = Θ ( t − − ∇ V (cid:0) π ( t − , λ ( t − (cid:1) , t > . (12)During slot t it is given as input to a ”link” function g ( · ) , that combines it withthe previous decision π ( t − and ”mirrors” it to a feasible association decision π ( t ) . More specifically, the updated user association is π ( t ) = g ( Θ ( t )) , where g ( Θ ) := arg min π ∈ Ω (cid:48) { h ( π ) − (cid:104) η Θ , π (cid:105)} , (13)with η a stepsize, and h ( · ) a ”regularization” function that is strongly-convexwith respect to a norm over the feasible set Ω (cid:48) , where Ω (cid:48) as in (11). Regularization function.
The regularization function ensures stability ofthe decision and, if chosen appropriately, it leads to solutions that exploit thegeometry of the problem, do not need expensive projections to the feasiblespace, and enjoy the no-regret property.In our setting, we aim at finding associations that lie in the unit simplex foreach location. Thus, each association policy is basically a set of probabilitydistributions, one for each location. Since the feasibility set regarding location Since the first available traffic vector is λ (1) , the first update in (12) cannot be performed for t < . Thus, the initialization is performed for t = 1 , instead of the common choice t = 0 . is the probability simplex, the most natural regularization function would bethe Gibbs-Shannon entropy, h i ( π ) = (cid:88) j ∈J π ji log π ji , which would give the well known Exponentiated Gradient Descend (EGD).Here we consider the regularization function h ( π ) := (cid:88) i ∈I h i ( π ) = (cid:88) i ∈I (cid:88) j ∈J π ji log π ji , (14)which for a given user association policy equals the aggregate entropy of theassociations for all locations. In Appendix A we prove that: Lemma 1.
The modified entropic regularization function in (14) is |I| -stronglyconvex w.r.t the 1-norm. Normalized exponentiated gradient.
Combining (14) and (13), we get g ( Θ ) = arg min π ∈ Ω (cid:48) (cid:40)(cid:88) i ∈I (cid:88) j ∈J π ji log π ji , − (cid:104) η Θ , π (cid:105) (cid:41) . By differentiating with respect to π ji , we get: ∂g ( Θ ) ∂π ji = log ( π ji ) − η Θ ji + 1 , where Θ ji is the element of matrix Θ related to AP j and location i . Thisbecomes zero at π ji = e η Θ ji − . In order to ensure that the updated associationvariables π ji will lie in the unit simplex for each location i , we need to normalizethe association of each location. Each element Θ ji is thus ”mirrored” throughthe exponentiated mirror function to: g ji ( Θ ) = e η Θ ji (cid:80) j ∈J e η Θ ji . (15)Each association variable π ji is then updated through this mirroring as: π ji ( t + 1) = g ji (cid:0) Θ ( t + 1) (cid:1) (15) = e η Θ ji ( t +1) (cid:80) j ∈J e η Θ ji ( t +1) (12) = e η Θ ji ( t ) e − η ∇ V (cid:0) π ji ( t ) ,λ i ( t ) (cid:1)(cid:80) j ∈J e η Θ ji ( t ) e − η ∇ V (cid:0) π ji ( t ) ,λ i ( t ) (cid:1) · (cid:80) j ∈J e η Θ ji ( t ) (cid:80) j ∈J e η Θ ji ( t ) (15) = π ji ( t ) e − η ∇ V (cid:0) π ji ( t ) ,λ i ( t ) (cid:1)(cid:80) j ∈J π ji ( t ) e − η ∇ V (cid:0) π ji ( t ) ,λ i ( t ) (cid:1) . (16)his mapping is a simple normalization of the product of the previous associa-tion, multiplied with a negative exponentiation of the gradient of the objectivefunction in the previous step. The controller, thus, needs only the value of ∇ V ( · ) in order to decide the association of all locations i to their neighbourhood APs j ∈ N i . Using an adequate (for the geometry of the problem) normalizationfunction leads us to decision updates that are always on the feasible set, avoidingexpensive projections that would be otherwise necessary but prohibitive forlarge-scale networks. PerOnE: Our PERiodic, ONline, Exponentiated gradient association al-gorithm with ”no regret”.
We design it based on the normalized exponentiatedgradient-based association update (16). We refer to it as PerOnE and we provideits pseudocode in Algorithm 3. PerOnE exploits possible traffic periodicity andoperates in each time window W k , k = 1 , . . . , K separately.Let t k , t τk and t |W k | k be the first, the τ -th, and last time slot in time window W k , respectively. For the first slot t k in each window W k , PerOnE does nothave a previous user association π to rely on, nor any prior information abouttraffic vectors λ ( t ) for t ∈ W k . Therefore, it simply splits the requested trafficevenly across neighbouring APs. At time t = t τ +1 k , it updates the associationvariables π ji ( t τ +1 k ) as in (16). For this update, it is based on the slot t = t τk , which precedes t τ +1 k within the same time window W k , ∀ k , as shown in (18).PerOnE, is a simple, projection-free and cost-efficient modi-fication of theOMD. It also achieves a sublinear bound on the regret against the OPS bench-mark over the time-horizon T and over the total number |J ||I| of decisionvariables. Let M I = max j |N j | and M J = max i |N i | (19)be the maximum number of locations that are in range of an AP j in the system,and the maximum number of APs that a location i is in range of, respectively.Then it holds: Theorem 1 (PerOnE No-regret) . For a Lipschitz-continuous and convex objec-tive function V ( · ) , with L Lipschitz constant, and M I , M J as in (19) , a stepsize η and T time horizon, PerOnE achieves the regret bound: Reg ( T, K ) ≤ KM I log( M J ) ηM J + ηT L |I| . In particular, for stepsize η = (cid:113) KM I |I| log( M J ) T L M J , and since M I ≤ |I| and log( M J ) ≤ M J , we get: Reg ( T, K ) ≤ (cid:115) KM I T L log( M J ) |I| M J ≤ L √ KT . lgorithm 3
Periodic Online Exponentiated (PerOnE)
Input:
Set of locations I , APs J and neighbouring APs N i , ∀ i , penalty-featured cost functions V ( · ) , partition of time horizon T in time windows W k , k = 1 , ..., K , step size η. Output:
User association π ( t ) , ∀ t = 1 , ..., T for t = 1 , , ..., T do Identify time window W k (cid:51) t if t = t k for W k then Initialize association as π ji ( t k ) = (cid:40) |N i | j ∈ N i , j / ∈ N i (17) else if t = t τk for W k then Update association as π ji ( t τ +1 k ) = π ji ( t τk ) · e − η ∇ V (cid:0) π ji ( t τk ) , λ i ( t τk ) (cid:1)(cid:80) j ∈J π ji ( t τk ) · e − η ∇ V (cid:0) π ji ( t τk ) , λ i ( t τk ) (cid:1) (18) end if Observe actual traffic λ ( t ) Compute gradient ∇ V (cid:0) π ( t ) , λ ( t ) (cid:1) end for Please refer to Appendix B for a proof.
Remark 1.
The EGD, obtained as the OMD with regularization function theentropic h i ( π ) for only one location, has a regret of √ T on the horizon, and a log( |J ||I| ) dependence on the number of association variables [22]. Remark 2.
Our entropic function h ( π ) that considers multiple locations,and the initialization step in (17), imply a dependence of regret on topologicalcharacteristics such as the maximum number of locations M I that an AP hasin its range, and the maximum number M J of APs in whose range the locationbelongs. Overall, its regret is sublinear on the total number of associationdecision variables |J ||I| . Moreover, in realistic systems, the impact of thelinear dependence in M I and that of the logarithmic dependence in M J , onregret is very limited. In fact, their values can be considered constant comparedto the system’s dimension |J ||I| , due to the progressively decreasing range ofAPs as technology evolves, which results in N i and N j being progressivelysmaller sets.PerOnE’s regret follows the √ T dependence of EGD, and it also dependson the number K of time zones. For K = o ( T ) , the regret is sublinear to the cheme ρ = 1 ρ = 0 . K = 24 K = 12 K = 2 K = 24 K = 12 K = 2 OPS 234 936 2808 4212 3744 2808PerOnE 127 57 9 145 72 11Optimum (OPS, K = T ) 0 0PerOnE ( K = 1 ) 1 3 TABLE IIA
GGREGATE CONSTRAINT VIOLATIONS DURING TIME HORIZON , FOR DIFFERENT VALUES OF ZONES K AND LOAD THRESHOLDS ρ . time horizon T , i.e. , lim T → + ∞ Reg ( T,K ) T = 0 , which means that PerOnE learnsassociation policies that are asymptotically optimal. The standard [14] staticregret is obtained for K = 1 , and aligns with the above. For K = O ( T ) , theregret scales linearly with time, which is aligned with the impossibility resultstated in [24]: when the adversary can change its decision in each time slot, no-regret is not-attainable without other assumptions on the input. Indeed, PerOnEwill just play the initialization for each t , or any other linear scaling, sinceit will play very few rounds for each part of the horizon. The state-of-the-artdynamic regret, obtained for K = T , aligns with the above.VI. N UMERICAL E VALUATION
A. System architecture and traffic demand
We perform our evaluation on the internet traffic activity of the publiclyavailable dataset [25]. It provides the demand of Telecom Italia’s customersin Milano, Italy, from 1/11/2013 to 1/1/2014. The spatial distribution λ i oftelecommunication events is aggregated in a 100 x 100 grid of locations i ∈ I .The temporal distribution of events is aggregated over 10-minute time intervals.For our analysis we consider only working days, in order to evaluate the systemunder high traffic and under the periodicity created by the work-cycles behaviourof people. The used dataset consists of P = 39 days, each containing 144 time-slots, with a horizon of T = 5616 traffic observations.Our network architecture consists of 40 BS, most of them being close to thecity center, where the load is higher. We follow the setup of [7], and considerMacro- and Micro- BSs transmitting at P M = 43 dBm and P m = 33 dBm,respectively. The system bandwidth is W = 10 MHz, while the noise densityis N = − dBm/MHz. The path loss exponent is P lo = 3 , and G ji is theresulting coefficient for the signal degradation from AP j to location i . Then,the transmission rates C ji between location i and AP j are given by the Shannonformula: C ji = W log (cid:32) G ji P j W N + (cid:80) k (cid:54) = j G ki P k (cid:33) . ig. 3. Total cell loads vs. time, for α = 0 and cost function φ a ( ρ ) = (cid:80) j ∈J ρ j and for load threshold ρ = 1 and ρ = 0 . , and for K = 24 . We are interested in evaluating the total cell loads that arise from user associ-ation policies produced by PerOnE, and to compare them to those of policiesproduced by OPS.
B. Results
We conduct a sensitivity analysis on the number K of time zones and loadthreshold ρ values, to capture the scenario where the maximum availableresources are considered ( ρ = 1 ), and a scenario with more limited resources( ρ = 0 . ). The ”Optimum” is for OPS when K = T and ρ = 1 , i.e. , it is theoptimal association decision for each individual time slot t under the maximumamount of resources that could be considered. The PerOnE for K = 1 considersonly one time zone, i.e. , runs taking as input the association policy of theprevious slot, and without considering any division in the time horizon. Forconvenience, in our plots we provide an enlargement of the first time slots and ig. 4. Total cell loads vs. time, for α = 0 and cost function φ a ( ρ ) = (cid:80) j ∈J ρ j and for load threshold ρ = 1 and ρ = 0 . , and for K = 12 . of some slots that are indicative of how close PerOnE performs to the Optimum,at the top-right and bottom-right corner of the subfigures, respectively. We listsome of our observations: PerOnE quickly learns the optimal user association.
PerOnE exploitsthe geometry of the problem and rapidly learns the optimal user association,despite the lack of actual traffic information. From Figs. 3-5, and Table II wesee that as the number K of time zones increases, PerOnE needs more slotsin order to learn not to violate constraints, to converge to optimal solutionsand to produce more cost-efficient associations. This interesting feature allowsPerOnE’s solution updates to adapt to any traffic fluctuation. It is due to thefact that, as the number of the considered time zones decreases, the time slotsthat PerOnE initializes its decisions as for the expensive uniform solutions in(17) decreases too, similarly impacting the total cell loads. The contrary holdsfor OPS, whose static solutions benefit from a partition of the time horizon ig. 5. Total cell loads vs. time, for α = 0 and cost function φ a ( ρ ) = (cid:80) j ∈J ρ j and for load threshold ρ = 1 and ρ = 0 . , and for K = 2 . in more zones. Observe that it produces the minimum-cost static policies fora given partition of the time horizon, which does not necessarily imply thatit will not have any constraint violations. In fact, as K grows, OPS violatesconstraints during more time slots and under more limited resources. However,observe from Figs. 3-5, that the actual cost of the produced associations growsas K decreases. PerOnE effectively adapts to traffic changes.
Despite the arbitrary andlarge traffic variations, PerOnE manages to adapt its solutions and decide cost-effective and near-optimal policies, both under high and low load threshold, asseen from Figs. 3-5. From these and table II, it can be observed that OPS failsto adapt, thus resulting to association policies that lead to a higher system loadand constraint violations.
PerOnE has no regret against OPS.
Despite the large flunctuations duringthe duration of the day, and the lack of actual information during the decision,
Time -10-8-6-4-202 A v e r age r eg r e t Fig. 6. PerOnE’s regret over OPS, for α = 0 and cost function φ a ( ρ ) = (cid:80) j ∈J ρ j and for differentvalues of ρ and K . PerOnE manages to produce asymptotically optimal solutions, under differentpartitions of each period in zones, and under different load-thresholds. As fig. 6suggests, PerOnE’s advantage over OPS grows as the load threshold ρ increasesand as the number K of time zones decreases. Intuitively, for larger ρ hasgreater ”margins” to adapt to the upcoming actual traffic. Also, for smaller K OPS is more restricted in its decisions, increasing PerOnE’s advantage ofadjusting its decisions dynamically.VII. C
ONCLUSIONS
We assume arbitrary traffic variations over time. We introduce OPS, a novelperiodic benchmark for online learning problems, which is significant to com-pare against in cases of conjectured traffic periodicity and generalizes state-of-the-art. We propose PerOnE, an asymptotically optimal online algorithm thatproduces association policies by performing a simple update. PerOnE learns toadapt to traffic fluctuations even under lack of actual information. We demon-strate its no-regret property against OPS both analytically and by performingsimulations over a real-trace dataset. Moreover, PerOnE operates under nossumptions over traffic, which renders it a great user-association option for thehighly dynamic environments envisioned for the large-scaled 5G and B5G/6Gnetworks. In our future work we are interested to explore algorithms that jointlylearn several dynamic parameters, for example user association and powercontrol. VIII. A
CKNOWLEDGMENTS
This work was supported by the CHIST-ERA LeadingEdge project, call on”Smart Distribution of Computing in Dynamic Networks” (SDCDN).A
PPENDIX AP ROOF OF L EMMA h i ( π ) = (cid:80) j ∈J π ji log π ji is 1-strongly convex withrespect to the 1-norm [22], i.e. , h i ( π ) ≥ h i ( π ) + (cid:104)∇ h i ( π ) , π − π (cid:105) + 12 (cid:107) π − π (cid:107) , ∀ i ∈ I . Summing over all locations i ∈ I we obtain: h ( π ) ≥ h ( π ) + (cid:88) i ∈I (cid:104)∇ h i ( π ) , π − π (cid:105) + |I| (cid:107) π − π (cid:107) , which due to the interchangeability of the sum and the dot-product, and due to(14), becomes: h ( π ) ≥ h ( π ) + (cid:104)∇ h ( π ) , π − π (cid:105) + |I| (cid:107) π − π (cid:107) . A PPENDIX BP ROOF OF T HEOREM z ( t ) := ∇ V (cid:0) π ( t ) , λ ( t ) (cid:1) . Theorem 2.21 in [22]states that when the regularization function h ( · ) is |I| -strongly-convex w.r.t. a p -norm and the OMD is run with mirror function as in (13), then: T (cid:88) t =1 (cid:104) z ( t ) , π ( t ) − π ∗ (cid:105) ≤ h ( π ∗ ) η − h (cid:0) π (1) (cid:1) η + T (cid:88) t =1 η (cid:107) z ( t ) (cid:107) q |I| , (20)where π ∗ an optimal static decision over T time slots as in (8), and the q -normis the dual norm of the p -norm. We adopt this result and modify it to fit inthe context of our periodic benchmark, using it for each time window W k , separately. Thus: (cid:88) t ∈W k (cid:104) z ( t ) , π ( t ) − π ∗ [ k ] (cid:105) ≤ h ( π ∗ [ k ]) η − h (cid:0) π ( t k ) (cid:1) η + (cid:88) t ∈W k η (cid:107) z ( t ) (cid:107) q |I| . rom the Lipschitz-continuity of the objective function, it exists a positiveconstant L ≥ (cid:107) z ( t ) (cid:107) q , for all q -norms and time windows W k . Thus, for k =1 , ..., K, it is: (cid:88) t ∈W k (cid:104) z ( t ) , π ( t ) − π ∗ [ k ] (cid:105) ≤ h ( π ∗ [ k ]) η − h (cid:0) π ( t k ) (cid:1) η + (cid:88) t ∈W k ηL |I| , (21)Since the association variables for each location belong in [0 , , for eachlocation i it holds that h i ( π ) = (cid:80) j ∈J π ji log π ji ≤ , which implies that h ( π ∗ [ k ]) = (cid:88) i ∈I h i ( π ∗ [ k ]) ≤ . (22)Moreover: h (cid:0) π ( t k ) (cid:1) (14) , (17) = (cid:88) j ∈J (cid:88) i ∈I |N i | log( 1 |N i | )= − (cid:88) j ∈J (cid:88) i ∈N j log( |N i | ) |N i | ≥ − M I log( M J ) M J , ∀ k, (23)where the inequality is due to (19), because M I ≤ |I| and M J ≤ |J | . Then,(21) together with (22) and (23) leads to: (cid:88) t ∈W k (cid:104) z ( t ) , π ( t ) − π ∗ [ k ] (cid:105) ≤ M I log( M J ) ηM J + |W k | ηL |I| , ∀ k. (24)Convexity of the objective function implies: V (cid:0) π ( t ) , λ ( t ) (cid:1) − V ( π ∗ [ k ] , λ ( t )) ≤ (cid:104) z ( t ) , π ( t ) − π ∗ [ k ] (cid:105) , (25)for all t ∈ W k , k = 1 , ..., K. Then:
Reg ( T, K ) (7) = T (cid:88) t =1 φ a ( π ( t ) , λ ( t )) − K (cid:88) k =1 (cid:88) t ∈W k φ a ( π ∗ [ k ] , λ ( t )) (10) , (6) ≤ K (cid:88) k =1 (cid:88) t ∈W k (cid:0) V (cid:0) π ( t ) , λ ( t ) (cid:1) − V (cid:0) π ∗ [ k ] , λ ( t ) (cid:1)(cid:1) (25) ≤ K (cid:88) k =1 (cid:88) t ∈W k (cid:104) z ( t ) , π ( t ) − π ∗ [ k ] (cid:105) (24) ≤ K (cid:88) k =1 (cid:18) M I log( M J ) ηM J + |W k | ηL |I| (cid:19) = KM I log( M J ) ηM J + ηT L |I| , (26)which concludes the first part of the proof. The first equality is basically thedefinition of regret. From (6) π ∗ [ k ] minimizes costs for t ∈ W k , and from (10)he objective V ( · ) is at least equal to costs. The first inequality comes fromthe observation that the penalty paid by the optimal benchmark π ∗ [ k ] can’t begreater than that paid by the online algorithm that takes decisions under lack ofinformation. The second inequality comes as a result of the convexity of V ( · ) , and the next from substituting the RHS of (24) for each window W k . The lastequality holds because (cid:80) Kk =1 |W k | = T, since summing all time slots over allthe time windows is equivalent with summing over the entire time horizon.For the second part of the Theorem, it is easily verifiable that (26) is mini-mized for η = (cid:113) KM I log( M J )2 |I| T L M J . Substituting in (26), and since log( M J ) < M J and M I < |I| , we get: Reg ( T, K ) ≤ (cid:115) KM I log( M J ) T L M J |I| ≤ L √ KT . R EFERENCES [1] E. Calvanese Strinati, S. Barbarossa, J. L. Gonzalez-Jimenez, D. Ktenas, N. Cassiau, L. Maret, andC. Dehos, “6G: The Next Frontier: From Holographic Messaging to Artificial Intelligence UsingSubterahertz and Visible Light Communication,”
IEEE Vehicular Technology Magazine , vol. 14,no. 3, pp. 42–50, 2019.[2] F. Kelly, A. Mauilloo, and D. Tan, “Rate control for communication networks: Shadow prizes,proportional fairness and stability,”
Journal of Operation Research Society , vol. 49, pp. 237–252,1998.[3] H. Kim, G. de Veciana, X. Yang, and M. Venkatachalam, “Distributed α -optimal user associationand cell load balancing in wireless networks,” IEEE/ACM Trans. on Networking , vol. 20, no. 1, pp.177–190, Feb 2012.[4] L. Vigneri, G. Paschos, and P. Mertikopoulos, “Large-scale network utility maximization: Counteringexponential growth with exponentiated gradients,” in
IEEE Conference on Computer Communica-tions - IEEE INFOCOM , 2019.[5] B. Hajek, “Performance of global load balancing by local adjustment,”
IEEE Transactions onInformation Theory , vol. 36, no. 6, pp. 1398–1414, 1990.[6] M. Alanyali and B. Hajek, “On simple algorithms for dynamic load balancing,” in
IEEE Conferenceon Computer Communications - IEEE INFOCOM , vol. 1, 1995, pp. 230–238 vol.1.[7] N. Liakopoulos, GS. Paschos, and T. Spyropoulos, “Robust user association for ultra densenetworks,” in
IEEE Conference on Computer Communications - IEEE INFOCOM , 2018, pp. 2690–2698.[8] I. Koutsopoulos and L. Tassiulas, “Joint optimal access point selection and channel assignment inwireless networks,”
IEEE/ACM Transactions on Networking , vol. 15, no. 3, pp. 521–532, 2007.[9] S. Papavassiliou and L. Tassiulas, “Improving the capacity in wireless networks through integratedchannel base station and power assignment,”
IEEE Transactions on Vehicular Technology , vol. 47,no. 2, pp. 417–427, 1998.[10] M. Karaliopoulos, L.E. Chatzieleftheriou, G. Darzanos, and I. Koutsopoulos, “On the joint contentcaching and user association problem in small cell networks,” in , 2020, pp. 1–6.[11] G. Darzanos, L.E. Chatzieleftheriou, M. Karaliopoulos, and I. Koutsopoulos, “Content preference-aware user association and caching in cellular networks,” in
International Symposium on Modelingand Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT) Workshops , 2020, pp. 1–8.[12] LE. Chatzieleftheriou, G. Darzanos, M. Karaliopoulos, and I. Koutsopoulos, “Joint user association,content caching and recommendations in wireless edge networks,”
ACM SIGMETRICS PerformanceEvaluation Review , vol. 46, no. 3, pp. 12–17, 2018.13] L. E. Chatzieleftheriou, M. Karaliopoulos, and I. Koutsopoulos, “Caching-Aware Recommendations:Nudging User Preferences towards better Caching Performance,” in
IEEE Conference on ComputerCommunications - IEEE INFOCOM , 2017, pp. 784–792.[14] M. Zinkevich, “Online convex programming and generalized infinitesimal gradient ascent,” in
International Conference on Machine Learning - ICML , 2003, pp. 928–935.[15] E. Hazan and S. Kale, “Projection-free online learning,” in
International Conference on MachineLearning - ICML , 2012.[16] H. Yu, M. Neely, and X. Wei, “Online convex optimization with stochastic constraints,” in
Conferenceon Neural Information Processing Systems - NIPS , 2017, pp. 1428–1438.[17] N. Liakopoulos, A. Destounis, G. Paschos, T. Spyropoulos, and P. Mertikopoulos, “Cautious regretminimization: Online optimization with long-term budget constraints,” in
International Conferenceon Machine Learning - ICML , 2019.[18] G. Paschos, A. Destounis, and G. Iosifidis, “Online convex optimization for caching networks,”
IEEE/ACM Trans. on Networking , 2020.[19] T. Karagkioules, GS. Paschos, N. Liakopoulos, A. Fiandrotti, D. Tsilimantos, and M. Cagnazzo, “On-line learning for robust adaptive video streaming in mobile networks,” arXiv preprint: 1905.11705 ,2019.[20] T. Chen and Q. Ling and G. Giannakis, “An online convex optimization approach to proactivenetwork resource allocation,”
IEEE Trans. on Signal Processing , 2017.[21] D. Bertsekas,
Convex Optimization Algorithms . Athena Scientific, 2015.[22] S. Shalev-Shwartz, “Online Learning and Online Convex Optimization,”
Foundations and Trends®in Machine Learning , 2012.[23] E. V. Belmega, P. Mertikopoulos, R. Negrel, and L. Sanguinetti, “Online Convex Optimization andNo-Regret Learning: Algorithms, Guarantees and Applications,” arXiv: 1804.04529 , 2018.[24] T. M. Cover, “Behavior of sequential predictors of binary sequences,” in