[PDF] Blind Optimal User Association in Small-Cell Networks

Abstract

We learn optimal user association policies for traffic from different locations to Access Points(APs), in the presence of unknown dynamic traffic demand. We aim at minimizing a broad family of \alpha-fair cost functions that express various objectives in load assignment in the wireless downlink, such as total load or total delay minimization. Finding an optimal user association policy in dynamic environments is challenging because traffic demand fluctuations over time are non-stationary and difficult to characterize statistically, which obstructs the computation of cost-efficient associations. Assuming arbitrary traffic patterns over time, we formulate the problem of online learning of optimal user association policies using the Online Convex Optimization (OCO) framework. We introduce a periodic benchmark for OCO problems that generalizes state-of-the-art benchmarks. We exploit inherent properties of the online user association problem and propose PerOnE, a simple online learning scheme that dynamically adapts the association policy to arbitrary traffic demand variations. We compare PerOnE against our periodic benchmark and prove that it enjoys the no-regret property, with additional sublinear dependence of the network size. To the best of our knowledge, this is the first work that introduces a periodic benchmark for OCO problems and a no-regret algorithm for the online user association problem. Our theoretical findings are validated through results on a real-trace dataset.

Full PDF

BBlind Optimal User Association inSmall-Cell Networks

Livia Elena Chatzieleftheriou

Athens University of Economicsand Business, [email protected]

Georgios Paschos

Amazon, [email protected]

Apostolos Destounis

Huawei Technologies, [email protected]

Iordanis Koutsopoulos

Athens University of Economicsand Business, [email protected]

Abstract

We learn optimal user association policies for trafﬁc from different locations toAccess Points(APs), in the presence of unknown dynamic trafﬁc demand. We aimat minimizing a broad family of α -fair cost functions that express various objectivesin load assignment in the wireless downlink, such as total load or total delayminimization. Finding an optimal user association policy in dynamic environmentsis challenging because trafﬁc demand ﬂuctuations over time are non-stationaryand difﬁcult to characterize statistically, which obstructs the computation of cost-efﬁcient associations. Assuming arbitrary trafﬁc patterns over time, we formulatethe problem of online learning of optimal user association policies using the OnlineConvex Optimization (OCO) framework. We introduce a periodic benchmark forOCO problems that generalizes state-of-the-art benchmarks. We exploit inherentproperties of the online user association problem and propose PerOnE, a simpleonline learning scheme that dynamically adapts the association policy to arbitrarytrafﬁc demand variations. We compare PerOnE against our periodic benchmark andprove that it enjoys the no-regret property, with additional sublinear dependence ofthe network size. To the best of our knowledge, this is the ﬁrst work that introducesa periodic benchmark for OCO problems and a no-regret algorithm for the onlineuser association problem. Our theoretical ﬁndings are validated through results ona real-trace dataset. To appear in IEEE International Conference on Computer Communications - INFOCOM, 10-13 May2021, Virtual Conference. a r X i v : . [ c s . N I] J a n . I NTRODUCTION

Communication networks in the Beyond 5G (B5G)/6G era are envisioned tosupport ultra-low latency and bandwidth-damanding services, like those enabledby Internet of Things (IoT) or autonomous vehicles. Two key technologicalenablers of such services in future communication networks are novel net-work architectures and the embedded use of Artiﬁcial Intelligence (AI) [1].The new architectures will generalize the Coordinated MultiPoint transmission(CoMP), where APs cooperate to jointly serve requests within their coveragearea, and each user’s trafﬁc may be served by more than one AP. The pervasiveintroduction of AI at the network edge, including distributed algorithms forproactive learning and prediction of unknown dynamic processes in the system,will enable the self-optimization of network resource allocation.In the envisioned ultra-dense wireless networks, devices will be in rangeof multiple Access Points (APs). These enhanced association possibilities willbring more degrees of freedom, and additional possibilities for optimization. Thenumerous devices and association alternatives call for a fast and agile user-to-APassociation scheme. This is of vital importance for the upcoming bandwidth-demanding services, especially for the downlink, that supports the majority oftrafﬁc. Moreover, trafﬁc demand at different locations heavily ﬂuctuates duringthe day. This could happen, for example, due to sudden changes in the existingsources, or due to new unpredictable sources of trafﬁc. Thus trafﬁc is generallynon-stationary during the day, which complicates its accurate statistical char-acterization and precludes the use of approaches that operate under stationaryregimes, such as Lyapunov optimization.In this work we perform blind user associations on-the-ﬂy, without anyassumption or information about the actual trafﬁc demand. We use the OnlineConvex Optimization (OCO) framework to produce updated solutions incremen-tally, by readjusting existing ones as new samples are observed. We considerarbitrarily time-varying trafﬁc demand for different locations and allocate it toAPs, which we model as queues that capture their own load. These queues forman association policy whose cost belongs to a broad family of α -fair functionsof the load at APs, including as special cases several objectives, such as delayor load minimization. We aim at producing association policies that minimizeregret, i.e. , the deviation of the cost of our online association, from that of theoptimal ofﬂine association that knows in hindsight the trafﬁc variations. Weintroduce OPS, a novel periodic benchmark that generalizes state-of-the-art,and then propose PerOnE: an online algorithm that quickly adapts to unpre-dictable trafﬁc variations, and that learns scalable and asymptotically optimaluser association policies for downlink trafﬁc routing to different locations. . Contributions The contributions of our work to the literature are as follows: • We provide a model and an OCO formulation for the problem of OnlineLearning (OL) of how to dynamically associate trafﬁc of geographic lo-cations (and therefore users) to APs. Our objective cost function modelsvarious targets for communication networks, such as AP load or delayminimization. • We introduce Optimal Periodic Static (OPS), a novel peri- odic benchmarkfor OL problems that generalizes state-of-the-art. In cases of trafﬁc peri-odicity, a benchmark where the association is the same during the day isnot suitable. OPS is appropriate to compare against, because the optimalpolicy will most likely be periodic as well. • We identify and exploit inherent properties of the online problem anddesign PerOnE, an efﬁcient online algorithm that produces cost-effectiveassociation policies under arbitrary changes in trafﬁc demand, with lack ofinformation about the actual generated trafﬁc and its statistical properties.PerOnE stems from Online Mirror Descend. • We prove PerOnE’s asymptotical optimality, as it achieves regret sublinearto the time-horizon against OPS, that knows trafﬁc variations in hindsight.Further, PerOnE’s regret also scales sublinearly with the network size,which renders it a valid association scheme for the upcoming large wirelessnetworks. • Our evaluation with publicly available trafﬁc traces conﬁrms the derivedanalytical results, showing that our algorithm achieves zero regret asymp-totically. In fact, its performance appears to be near-optimal with respectto a dynamic algorithm that chooses the optimum user association in eachtime slot.In section II we present the state-of-the-art. In section III we describe themodel and the static user association problem. In section IV we introduce andanalyse OPS. In section V we perform a transformation of the static formulationconcluding to an OCO formulation. We then design PerOnE, proving that itsregret against OPS is sublinear both to the time horizon and to the problemdimension. Finally, in section VI we evaluate our scheme on a real trafﬁc dataset.II. R

ELATED W ORK

User association (UA).

A widely adopted optimization framework is Net-work Utility Maximization (NUM) [2], which is exempliﬁed further for APassociation. It considers a broad family of convex utility functions of the APs’load, capturing a variety of objectives, such as load balancing. The followingworks also consider convex cost functions. In [3] an iterative, distributed andeterministic UA policy that is asymptotically optimal for NUM is presented.The authors in [4] propose an exponentiated gradient algorithm for NUM,proving its convergence rate to the optimum UA. In [5] load balancing acrossAPs is considered. Iterative and combinatorial algorithms that perform localadjustments are presented. In [6] the dynamic load balancing is studied bycapturing the system state with ﬂuid equations, and an asymptotically optimalsimple myopic strategy is presented. The authors in [7] predict future trafﬁcbased on the trafﬁc history by using robust optimization tools and propose aniterative UA technique that minimizes costs.UA is seen jointly with channel assignment in [8] for minimizing the numberof channels needed to serve users. After applying an iterative load balancingalgorithm, the problem reduces to a simple channel allocation problem. Thework [9] additionally considers transmission power, quantifying limits of theachievable gains. In [10] and [11] UA is seen jointly with content cachingfor cache hit ratio maximization, and low-complexity practical schemes arepresented. The fast-converging scheme of [10] iterates between UA and con-tent caching, while in [11] users are initially clustered based on their contentpreferences, and then clusters are assigned to APs. The work [12] additionallyconsiders content recommendation. A simple three-step scheme that sequentiallyperforms a preference-aware UA with service guarantees, a recommendation-aware cache placement, and an adjustment of content recommendations, revealsthe gains that can be achieved when UA is considered jointly with contentcaching and recommendations (as introduced in [13]). Despite their interestingresults, works [2]–[5], [8]–[12] consider only static UA instances, work [6]focuses on load balancing, and work [7] performs complex computations onthe historical trafﬁc.

OCO theory.

The goal in OCO is the minimization of regret against a staticbenchmark, where regret is the worst-case deviation of the preformance ofonline algorithms from the optimal algorithm that knows all data in hindsight,but is restricted to a single action for the entire time horizon T . The followingworks consider convex and Lipschitz-continuous objective functions, adversarialconstraints and decisions taken over a convex set. In [14] a general class ofOnline Gradient Ascent (OGA) algorithms with O ( √ T ) regret is introduced.The authors in [15] substitute OGA’s projection with a Frank-Wolfe linearoptimization step, achieving O ( √ T ) regret for stochastic and adversarial costs.In [16] time-varying stochastic constraints under a stochastic Slater assumptionare studied, and a drift-plus-penalty algorithm with O ( √ T ) expected regretis presented. In [17] regret is systematically balanced with constraint viola-tion. Combining stochastic optimization [16] and standard OCO [14] methods, O ( KT /V + √ T ) regret for O ( √ V T ) constraint violation is achieved, where ig. 1. At each time slot t location i ∈ I requests trafﬁc with intensity λ i ( t ) , which can be split intoportions π ji ( t ) λ i and served by different APs j ∈ N i in its neighbourhood. K = T k , k ∈ [0 , and V ∈ [ K, T ) . These works do not consider the dimensionof the problem in their solutions, which in our case is the size of the network,and either consider no constraints [14], or rely on heavier assumptions on theinput [16], [17]. OCO in network resource allocation.

The authors in [18] study onlinecontent caching under unknown ﬁle popularity. Their no-regret algorithm adaptscaching and routing decisions to any ﬁle request pattern. In [19] an asymptot-ically optimal online learning algorithm for video rate adaptation in HTTPAdaptive Streaming under no channel model assumptions is presented. Thework [20] studies network power and bandwidth allocation under adversarialcosts with bounded variations in consecutive slots. Constraints are satisﬁed onaverage, tolerating instantaneous violations. Under an additional Slater assump-tion, their algorithm achieves sublinear regret against a benchmark that takesthe optimal decision in each time slot.Our work is the ﬁrst one that applies OCO to the minimum-cost UA problem.Our scheme achieves no-regret in UA decisions, with sublinear depencence bothon the time horizon and on the network size, under no assumptions on the input.Our work also introduces a novel periodic benchmark that generalizes state-of-the-art.II. S

YSTEM MODEL AND PROBLEM FORMULATION

Basic deﬁnitions.

We start by providing some deﬁnitions and function prop-erties that are needed throughout the paper. Although we later consider differ-entiable cost functions, the results of this paper are valid for any other costfunction, considering ∇ f ( x ) to also stand for a subgradient of f ( · ) at point x . • Convexity . A function f ( x ) : A → B is convex iff ∀ x , x ∈ A,f ( x ) − f ( x ) ≤ (cid:104)∇ f ( x ) , x − x (cid:105) , for (cid:104) a , b (cid:105) the inner product of a and b . If it exists, the Hessian matrix of aconvex function is positive semi-deﬁnite, and vice versa. • p -norm and its dual norm . Let x ∈ R d . Its p -norm is deﬁned as (cid:107) x (cid:107) p := (cid:32) d (cid:88) i =1 | x i | p (cid:33) / p . A q -norm is said to be the dual of p -norm iff p + q = 1 . • Lipschitz-continuity . A function f ( x ) : A → B is Lipschitz-continuous iff the p -norm of the gradient is bounded, i.e. , if ∃ L : (cid:107)∇ f ( x ) (cid:107) p = L < + ∞ . • Strong convexity . A function f ( x ) : A → B is σ -strongly-convex w.r.t. a p -norm iff ∀ x , x ∈ A , f ( x ) ≥ f ( x ) + (cid:104)∇ f ( x ) , x − x (cid:105) + σ (cid:107) x − x (cid:107) p . Model components.

We consider downlink transmissions in a geographicalarea that is partitioned into locations i ∈ I and is covered by a set J of APs.We deﬁne the ”neighbour-hood” set N j , j ∈ J , as the subset of locations thatcan be served by AP j . Similarly, N i , i ∈ I , is the subset of APs that can servetrafﬁc of location i . An overview of our model and relevant notation are givenin Fig. 1 and Table I, respectively. Location trafﬁc.

We denote as λ = ( λ i ) i ∈I the trafﬁc in-tensity vector.Each element λ i ≥ is the aggregate amount (in packets/second) of requestedtrafﬁc of all users in location i , modeled as a random variable from a generaldistribution. Access Point (AP) load.

The trafﬁc requested by a location i can be servedby multiple APs, those in N i . An association policy determines the associationcontrol variables π ji ∈ [0 , denoting the fraction of trafﬁc λ i which is routedfrom AP j to location i . Each location’s demand must be entirely served, soits association variables are constrained to lie in the probability simplex. Thus, ∀ i ∈ I , ( π ji ) j ∈J ∈ Π , where: Π = (cid:8) x ∈ [0 , |J | : (cid:88) j ∈J x j = 1 (cid:9) . (1)Following an association decision π = ( π ji ) j ∈J ,i ∈I , AP j transmits an aggregatedemand intensity (cid:80) i ∈N j λ i π ji . The packet transmission process at each APis modeled as a queuing process. Prior work [3] has shown that statistical odel I Set of locations i J Set of APs j N i Set of neighbour APs for location i N j Set of neighbour locations for AP jλ i Intensity of trafﬁc requested at i π ji Fraction of λ i routed to i by jρ j Total load at AP j ρ Load threshold Π Probability simplex Ω Feasibility set φ α ( · ) Cost function T Time horizon t Time slot K Number of zones in each period W k Time window: Set of slots in zone k Equivalent problem formulation and Association Algorithm V ( · ) Penalty-featured costs Ω (cid:48) Extended feasibility set L Lipschitz constant for

V h ( · ) Regularization function g ( · ) Mirror function Θ Matrix with gradient information t τk τ -th time slot in window W k TABLE IN

OTATION TABLE multiplexing effects can be captured by modeling this queue with processorsharing service. Assuming the packets have exponentially distributed sizes withmean /ω , and denoting as C ji the average transmission rate from BS j tolocation i (averaged over the channel statistics), the load ρ j of BS j is ρ j ( π , λ ) = (cid:88) i ∈N j λ i π ji ωC ji . Let ρ = ( ρ j ) j ∈J . The AP trafﬁc load is a measure of the percentage of time theAP is busy with packet transmission. When ρ j < , AP j is stable in the sensethat its packet transmission queue does not grow unbounded. Values close to 1indicate large delays. If ρ j > , the AP queue is unstable and grows withoutlimit. It results in inﬁnite delays and bad user experience, and therefore mustbe avoided. To ensure stability and a high-quality service in terms of delay forthe end-users, association decisions π are constrained so that: ρ j ( π , λ ) ≤ ρ , ∀ j ∈ J , (2)where ρ ∈ (0 , a load threshold. Combining (1) and (2), the feasible set forassociation variables is: Ω =  π ∈ Π |I| : (cid:88) i ∈N j λ i π ji ωC ji ≤ ρ , ∀ j ∈ J  . (3) Cost function.

Let φ ( π , λ ) be the system cost as a result of associationpolicy π under trafﬁc λ . Our cost functions belong to the following family ofconvex and Lipschitz-continuous in π functions [2], for ρ ≤ ρ and α ≥ : φ α ( π , λ ) = (cid:88) j ∈J φ jα ( π , λ ) , (4)here φ jα ( π , λ ) = (cid:26) α − (1 − ρ j ( π , λ )) − α , α (cid:54) = 1 − log(1 − ρ j ( π , λ )) , α = 1 . (5)To conﬁrm convexity in π when ρ < ρ , observe that the cost functionsare twice differentiable with positive second derivative, hence their Hessianmatrix is positive semideﬁnite, which implies convexity. To conﬁrm Lipschitz-continuity in π , observe that when ρ j < ρ , then ∀ j, i , ∇ φ α ( π ji , λ ji ) is bounded,and so is the p -norm (cid:107)∇ φ α ( π , λ ) (cid:107) p . We will rely on both the convexity and theLipschitz-continuity of the cost function to design our online user associationalgorithm and prove its performance guarantees.Different values of α lead to different cost functions. For example, for α = 0 , (4) reduces to the total system load, φ ( π , λ ) = (cid:80) j ∈J ρ j ( π , λ ) . For α = 2 , itis φ ( π , λ ) = (cid:80) j ∈J (1 − ρ j ( π , λ )) − , and (4) is equivalent to the average delayexperienced by a typical demand ﬂow in a stationary system under a temporalfair scheduler, e.g. , round robin [3]. Optimal user association for known demand.

If the trafﬁc demand vector λ is known, the association policy that minimizes the system costs is found bysolving problem: Problem 1 (Optimal user association for known demand) . min π φ α ( π , λ ) ,s.t. (cid:88) j ∈N i π ji = 1 , ∀ i ∈ I ,ρ j ( π , λ ) ≤ ρ , ∀ j ∈ J . Problem 1 is a convex minimization problem of a Lipschitz-continuous costfunction, on the intersection of simplex and hyperplane constraints. Therefore,it can be solved through convex optimization methods [21]. Such instances arealready studied in literature. In this work we focus on online instances, wherethe trafﬁc demand is unkown at the time of the decision.

Time dynamics.

We capture time dynamics by denoting as λ ( t ) , π ( t ) , and φ α (cid:0) π ( t ) , λ ( t ) (cid:1) , the trafﬁc vector, association decision, and resulting system costduring time slot t ∈ { , , . . . , T } , where T a time horizon.IV. A NOVEL PERIODIC BENCHMARK AND R EGRET

Adversarial Online Learning.

In realistic conditions, the trafﬁc λ ( t ) forthe next time slot is unknown. Therefore, the association decisions π ( t ) for slot t will be computed based on knowledge of trafﬁc demands λ ( t − . After thedecision π ( t ) is taken, the actual demand λ ( t ) emerges, and the actual value φ α (cid:0) π ( t ) , λ ( t ) (cid:1) of the cost function is revealed. This lack of information duringthe decision π ( t ) at time slot t may imply additional costs, or even instabilityof AP packet transmission queues, due violation of the load threshold. ig. 2. Toy example to demonstrate periodicity, with T = 18 time slots in the time horizon, P = 3 periods, K = 2 time zones in each period, Z = 3 time slots in each time zone during each period. Thetime windows are: W = { , , , , , , , , } , W = { , , , , , , , , } . An appropriate setting for such online optimization problems is Online Con-vex Optimization (OCO) [22], [23]. We assume trafﬁc demand vectors λ ( t ) , and therefore cost functions φ α (cid:0) π ( t ) , λ ( t ) (cid:1) , to be arbitrarily selected by anadversary who tweaks them without adhering to any probability distribution,aiming at obstructing our decisions. While in reality the trafﬁc vectors are notchanged by such an adversary, this framework offers a convenient way to designalgorithms with provable worst-case guarantees under arbitrary variations ofsystem parameters. Trafﬁc periodicity.

The inherent nature of human activity results in trafﬁcperiodicity. For example, daily and weekly patterns can be observed due topeople going at work or returning at home. Motivated by this we introduceour periodic benchmark. It generalizes state-of-the-art and characterizes theperformance of online algorithms, being in between of the two extremes: thestatic benchmark [14] and the dynamic one.We divide the time horizon T in P periods, and each period p in K timezones. Without loss of generality, let each time zone k contain Z time slotsin each period. Then P = T / KZ , and each period p contains KZ slots. Thisnaturally deﬁnes a time window W k for each time zone k , which includes alltime slots that belong to time zone k , across all periods. It is: W k = (cid:110) t : t = KZ ( p −

1) + Z ( k −

1) + τ, where τ ∈ { , ..., L } , p ∈ { , ..., P } (cid:111) , k = 1 , ..., K. The K time zones deﬁne the manner in which each period is partitioned, whiletime windows include all time slots of a time zone across the time horizon.This partitioning of the time horizon captures any type of periodicity, e.g. ,daily, weekly, or any underlying combination. Assuming daily periodicity inour toy example of Fig. 2, the time horizon T is divided in P = 3 days, eachhaving K = 2 zones. During each period, each zone contains Z = 3 time slots, lgorithm 1 Optimal Periodic Static (OPS) benchmark policy

Input:

Trafﬁc vectors λ ( t ) , t ∈ { , . . . , T } , partition of time horizon in timewindows W k , k ∈ { , . . . , K } . Output:

K optimal periodic static policies π ∗ = (cid:8) π ∗ [ k ] (cid:9) k , one for each timewindow W k . for k = 1 to K do Compute optimal static association policy in zone k , π ∗ [ k ] = arg min π ∈ Ω (cid:88) t ∈W k φ a (cid:0) π , λ ( t ) (cid:1) (6) end for each of 4 hours duration.We want to stress that we do not consider periodicity on the trafﬁc demands :trafﬁc vectors λ are considered to have arbitrary variations during time. Onthe contrary, we aim to capture possible (approximate) periodic-like “patterns”or “trends” that may exist. In fact, a key contribution of this work is theintroduction of the following periodic benchmark. Regret against the Optimal Periodic Static (OPS) algorithm: a novelperiodic benchmark.

Given a sequence of trafﬁc vectors λ (1) , λ (2) , . . . , λ ( T ) over a time horizon T , OPS consists in ﬁnding K user association policies π ∗ [1] , π ∗ [2] , . . . , π ∗ [ K ] , one for each time zone k . Each static policy π ∗ [ k ] isoptimal regarding only trafﬁc loads in the respective time window W k , and isdeﬁned as in (6). This novel periodic benchmark exploits possible approximatetrafﬁc periodicity and allows the comparison of dynamic online policies to staticassociation rules that change according to the general trafﬁc characteristics ineach window. For example, it is possible to consider two different associationpolicies, one for peak hours and one for hours with low trafﬁc, and compareour dynamic policy against these. In the toy example of ﬁg. 2, OPS would ﬁndtwo static association policies: one for W and one for W . We provide itspseudocode in Algorithm 1.A performance metric that characterizes the learning performance of an onlinealgorithm is regret : the difference between the performance, which in our caseis the experienced cost, between an online policy and a benchmark. Let π A ( t ) be the decision taken by an online algorithm A at slot t . The regret Reg A ( T, K ) of A with respect to OPS, for K time zones in each period over a time horizon T , is: Reg A ( T, K ) := T (cid:88) t =1 φ a ( π A ( t ) , λ ( t )) − K (cid:88) k =1 (cid:88) t ∈W k φ a ( π ∗ [ k ] , λ ( t )) . (7) PS vs. existing benchmarks and regrets. We remind the reader of theoptimal static and optimal dynamic benchmark policies as deﬁned in [14], whichwe denote as π ∗ S and π ∗ D ( t ) , respectively. The optimal static benchmark knowsall trafﬁc changes in hindsight and ﬁnds one user association policy π ∗ S thatminimizes the costs over the entire time horizon, i.e. , π ∗ S := arg min π ∈ Ω T (cid:88) t =1 φ a (cid:0) π , λ ( t ) (cid:1) . (8)On the contrary, the optimal dynamic benchmark knows all trafﬁc changes butaims at minimizing the cost functions for each time slot t and ﬁnds one userassociation π ∗ D ( t ) : π ∗ D ( t ) := arg min π ∈ Ω φ a (cid:0) π , λ ( t ) (cid:1) , ∀ t = 1 , . . . T. (9)OPS generalizes the state-of-the-art benchmarks, and the respective regretsagainst them. It is easily veriﬁable that: • For K = 1 , then W k = { , . . . , T } , π ∗ [ k ] ≡ π ∗ S , and (7) reduces to thestatic regret. • For K = T, then W k = { t = k } , π ∗ [ k ] ≡ π ∗ D ( t ) , and (7) reduces to thedynamic regret. Online learning with “no regret”.

A desirable property for the regret isto scale sublinearly with the time horizon T , i.e. , Reg A ( T, K ) = o ( T ) . In thiscase, lim T → + ∞ Reg A ( T, K ) T = 0 , and the online algorithm A is said to have ”no regret”, which means that itlearns to perform as well as the benchmark asymptotically as the time horizon T → + ∞ .Another desirable feature for online algorithms is to have scalable regret.This happens when regret is also sublinear to the problem dimension d , whichin our case equals |J | · |I| . It means that by increasing the size of the networkby a unit, a sublinear increase in the regret is implied. At the moment, mostregret results arrive at √ dT . In low dimensions, this is a good result, implyingthe ability to learn quickly. However, as d starts to grow and becomes d ∼ T ,the above expression results in a regret O ( T ) , which means that learning is notattainable in the long run. Indeed, large systems may require a very large horizon T to learn - unless we are able to decrease the dependence of regret expressionto d . The above is thus a property of vital importance for the envisioned futurelarge-scaled communication networks.. O NLINE U SER A SSOCIATION A LGORITHMWITH N O R EGRET

A. Augmented penalty function

In order to avoid overloading cells, we reformulate the user associationproblem with the use of a penalty function that is added to the total cost,while removing the constraints. The penalty is active and adds to the cost whenconstraints are violated, i.e. , when ∃ j : ρ j > ρ . The set of optimal solutionsremains the same, because the structure of the problem and the coupling withthe load-constraints now appear in the objective.Our penalty function B j ( π ( t ) , λ ( t )) for overloading AP j could be anyconvex and Lipschitz-continuous in ρ function, such as B j ( π ( t ) , λ ( t )) = ψ ∇ φ jα ( ρ ) · ( ρ j − ρ ) , where ψ > a penalty factor for AP-overloading. This captures the cost for eachoverloaded AP as the linear extension of the cost function at the overloadingpoint ρ . Then, the cost function becomes: V (cid:0) π ( t ) , λ ( t ) (cid:1) = (cid:88) j ∈J V j (cid:0) π ( t ) , λ ( t ) (cid:1) , where (10) V j (cid:0) π ( t ) , λ ( t ) (cid:1) = (cid:40) φ jα ( ρ j ) , ρ j ≤ ρ φ jα ( ρ ) + ∇ φ jα ( ρ ) · ( ρ j − ρ ) , ρ j > ρ . The optimal user association problem reduces to:

Problem 2 (Online user association for unknown demand) . min π ( t ) ∈ Ω (cid:48) T (cid:88) t =1 V (cid:0) π ( t ) , λ ( t ) (cid:1) − K (cid:88) k =1 (cid:88) t ∈W k V (cid:0) π ∗ [ k ] , λ ( t ) (cid:1) . where Ω (cid:48) = (cid:8) π : π ∈ Π |I| (cid:9) . (11)This is a typical formulation for online learning problems. It aims at ﬁndingonline a sequence of association policies π ( t ) , t = 1 , ..., T that minimize regret, i.e. , the deviation of online decisions from those of an ofﬂine benchmark, whichin this work is OPS. Compared to Problem 1, the feasibility set is expanded to asimplex for each location i and it remains convex. This penalty formulation willenable us to perform a customised modiﬁcation of a traditional algorithm, basedon the speciﬁc characteristics of the new feasible space. The objective functionremains convex and Lipschitz-continuous, as the sum of such functions. Bothconvexity and Lipschitz-continuity are crucial properties for proving that theonline algorithm we will design has no regret against OPS.We will analyze the regret with respect to this augmented cost. Since thelinear part comes into play only when ρ j > ρ (which does not happen in lgorithm 2 Online Mirror Descent (OMD)

Input:

Mirror function g : R |J ||I| → Ω (cid:48) , stepsize η , objective function V ( · ) Output:

User association π ( t ) , ∀ t = 1 ...T Initialize: Θ (1) = for t=1, 2, ..., T do decide association π ( t ) = g (cid:0) Θ ( t ) (cid:1) update Θ ( t + 1) = Θ ( t ) − ∇ V (cid:0) π ( t ) , λ ( t ) (cid:1) end for the benchmark), a sublinear regret here implies sublinear regret for the φ α ( · ) functions as well. B. PerOnE: Online user association with no regret

Online Mirror Descent.

A general class of online schemes with no regretagainst the static benchmark is Online Mirror Descent (OMD) [22], presentedin Algorithm 2. It gives the opportunity to exploit the feasibility set of ourproblem, and leads to decision updates that lie in the feasible set without theneed for expensive projections. OMD computes the current decision from theprevious one using a simple gradient update rule. Let Θ ( t ) be a matrix ofdimension |J | x |I| , initialized as Θ (1) = , and updated as Θ ( t ) = Θ ( t − − ∇ V (cid:0) π ( t − , λ ( t − (cid:1) , t > . (12)During slot t it is given as input to a ”link” function g ( · ) , that combines it withthe previous decision π ( t − and ”mirrors” it to a feasible association decision π ( t ) . More speciﬁcally, the updated user association is π ( t ) = g ( Θ ( t )) , where g ( Θ ) := arg min π ∈ Ω (cid:48) { h ( π ) − (cid:104) η Θ , π (cid:105)} , (13)with η a stepsize, and h ( · ) a ”regularization” function that is strongly-convexwith respect to a norm over the feasible set Ω (cid:48) , where Ω (cid:48) as in (11). Regularization function.

The regularization function ensures stability ofthe decision and, if chosen appropriately, it leads to solutions that exploit thegeometry of the problem, do not need expensive projections to the feasiblespace, and enjoy the no-regret property.In our setting, we aim at ﬁnding associations that lie in the unit simplex foreach location. Thus, each association policy is basically a set of probabilitydistributions, one for each location. Since the feasibility set regarding location Since the ﬁrst available trafﬁc vector is λ (1) , the ﬁrst update in (12) cannot be performed for t < . Thus, the initialization is performed for t = 1 , instead of the common choice t = 0 . is the probability simplex, the most natural regularization function would bethe Gibbs-Shannon entropy, h i ( π ) = (cid:88) j ∈J π ji log π ji , which would give the well known Exponentiated Gradient Descend (EGD).Here we consider the regularization function h ( π ) := (cid:88) i ∈I h i ( π ) = (cid:88) i ∈I (cid:88) j ∈J π ji log π ji , (14)which for a given user association policy equals the aggregate entropy of theassociations for all locations. In Appendix A we prove that: Lemma 1.

The modiﬁed entropic regularization function in (14) is |I| -stronglyconvex w.r.t the 1-norm. Normalized exponentiated gradient.

Combining (14) and (13), we get g ( Θ ) = arg min π ∈ Ω (cid:48) (cid:40)(cid:88) i ∈I (cid:88) j ∈J π ji log π ji , − (cid:104) η Θ , π (cid:105) (cid:41) . By differentiating with respect to π ji , we get: ∂g ( Θ ) ∂π ji = log ( π ji ) − η Θ ji + 1 , where Θ ji is the element of matrix Θ related to AP j and location i . Thisbecomes zero at π ji = e η Θ ji − . In order to ensure that the updated associationvariables π ji will lie in the unit simplex for each location i , we need to normalizethe association of each location. Each element Θ ji is thus ”mirrored” throughthe exponentiated mirror function to: g ji ( Θ ) = e η Θ ji (cid:80) j ∈J e η Θ ji . (15)Each association variable π ji is then updated through this mirroring as: π ji ( t + 1) = g ji (cid:0) Θ ( t + 1) (cid:1) (15) = e η Θ ji ( t +1) (cid:80) j ∈J e η Θ ji ( t +1) (12) = e η Θ ji ( t ) e − η ∇ V (cid:0) π ji ( t ) ,λ i ( t ) (cid:1)(cid:80) j ∈J e η Θ ji ( t ) e − η ∇ V (cid:0) π ji ( t ) ,λ i ( t ) (cid:1) · (cid:80) j ∈J e η Θ ji ( t ) (cid:80) j ∈J e η Θ ji ( t ) (15) = π ji ( t ) e − η ∇ V (cid:0) π ji ( t ) ,λ i ( t ) (cid:1)(cid:80) j ∈J π ji ( t ) e − η ∇ V (cid:0) π ji ( t ) ,λ i ( t ) (cid:1) . (16)his mapping is a simple normalization of the product of the previous associa-tion, multiplied with a negative exponentiation of the gradient of the objectivefunction in the previous step. The controller, thus, needs only the value of ∇ V ( · ) in order to decide the association of all locations i to their neighbourhood APs j ∈ N i . Using an adequate (for the geometry of the problem) normalizationfunction leads us to decision updates that are always on the feasible set, avoidingexpensive projections that would be otherwise necessary but prohibitive forlarge-scale networks. PerOnE: Our PERiodic, ONline, Exponentiated gradient association al-gorithm with ”no regret”.

We design it based on the normalized exponentiatedgradient-based association update (16). We refer to it as PerOnE and we provideits pseudocode in Algorithm 3. PerOnE exploits possible trafﬁc periodicity andoperates in each time window W k , k = 1 , . . . , K separately.Let t k , t τk and t |W k | k be the ﬁrst, the τ -th, and last time slot in time window W k , respectively. For the ﬁrst slot t k in each window W k , PerOnE does nothave a previous user association π to rely on, nor any prior information abouttrafﬁc vectors λ ( t ) for t ∈ W k . Therefore, it simply splits the requested trafﬁcevenly across neighbouring APs. At time t = t τ +1 k , it updates the associationvariables π ji ( t τ +1 k ) as in (16). For this update, it is based on the slot t = t τk , which precedes t τ +1 k within the same time window W k , ∀ k , as shown in (18).PerOnE, is a simple, projection-free and cost-efﬁcient modi-ﬁcation of theOMD. It also achieves a sublinear bound on the regret against the OPS bench-mark over the time-horizon T and over the total number |J ||I| of decisionvariables. Let M I = max j |N j | and M J = max i |N i | (19)be the maximum number of locations that are in range of an AP j in the system,and the maximum number of APs that a location i is in range of, respectively.Then it holds: Theorem 1 (PerOnE No-regret) . For a Lipschitz-continuous and convex objec-tive function V ( · ) , with L Lipschitz constant, and M I , M J as in (19) , a stepsize η and T time horizon, PerOnE achieves the regret bound: Reg ( T, K ) ≤ KM I log( M J ) ηM J + ηT L |I| . In particular, for stepsize η = (cid:113) KM I |I| log( M J ) T L M J , and since M I ≤ |I| and log( M J ) ≤ M J , we get: Reg ( T, K ) ≤ (cid:115) KM I T L log( M J ) |I| M J ≤ L √ KT . lgorithm 3

Periodic Online Exponentiated (PerOnE)

Input:

Set of locations I , APs J and neighbouring APs N i , ∀ i , penalty-featured cost functions V ( · ) , partition of time horizon T in time windows W k , k = 1 , ..., K , step size η. Output:

User association π ( t ) , ∀ t = 1 , ..., T for t = 1 , , ..., T do Identify time window W k (cid:51) t if t = t k for W k then Initialize association as π ji ( t k ) = (cid:40) |N i | j ∈ N i , j / ∈ N i (17) else if t = t τk for W k then Update association as π ji ( t τ +1 k ) = π ji ( t τk ) · e − η ∇ V (cid:0) π ji ( t τk ) , λ i ( t τk ) (cid:1)(cid:80) j ∈J π ji ( t τk ) · e − η ∇ V (cid:0) π ji ( t τk ) , λ i ( t τk ) (cid:1) (18) end if Observe actual trafﬁc λ ( t ) Compute gradient ∇ V (cid:0) π ( t ) , λ ( t ) (cid:1) end for Please refer to Appendix B for a proof.

Remark 1.

The EGD, obtained as the OMD with regularization function theentropic h i ( π ) for only one location, has a regret of √ T on the horizon, and a log( |J ||I| ) dependence on the number of association variables [22]. Remark 2.

Our entropic function h ( π ) that considers multiple locations,and the initialization step in (17), imply a dependence of regret on topologicalcharacteristics such as the maximum number of locations M I that an AP hasin its range, and the maximum number M J of APs in whose range the locationbelongs. Overall, its regret is sublinear on the total number of associationdecision variables |J ||I| . Moreover, in realistic systems, the impact of thelinear dependence in M I and that of the logarithmic dependence in M J , onregret is very limited. In fact, their values can be considered constant comparedto the system’s dimension |J ||I| , due to the progressively decreasing range ofAPs as technology evolves, which results in N i and N j being progressivelysmaller sets.PerOnE’s regret follows the √ T dependence of EGD, and it also dependson the number K of time zones. For K = o ( T ) , the regret is sublinear to the cheme ρ = 1 ρ = 0 . K = 24 K = 12 K = 2 K = 24 K = 12 K = 2 OPS 234 936 2808 4212 3744 2808PerOnE 127 57 9 145 72 11Optimum (OPS, K = T ) 0 0PerOnE ( K = 1 ) 1 3 TABLE IIA

GGREGATE CONSTRAINT VIOLATIONS DURING TIME HORIZON , FOR DIFFERENT VALUES OF ZONES K AND LOAD THRESHOLDS ρ . time horizon T , i.e. , lim T → + ∞ Reg ( T,K ) T = 0 , which means that PerOnE learnsassociation policies that are asymptotically optimal. The standard [14] staticregret is obtained for K = 1 , and aligns with the above. For K = O ( T ) , theregret scales linearly with time, which is aligned with the impossibility resultstated in [24]: when the adversary can change its decision in each time slot, no-regret is not-attainable without other assumptions on the input. Indeed, PerOnEwill just play the initialization for each t , or any other linear scaling, sinceit will play very few rounds for each part of the horizon. The state-of-the-artdynamic regret, obtained for K = T , aligns with the above.VI. N UMERICAL E VALUATION

A. System architecture and trafﬁc demand

We perform our evaluation on the internet trafﬁc activity of the publiclyavailable dataset [25]. It provides the demand of Telecom Italia’s customersin Milano, Italy, from 1/11/2013 to 1/1/2014. The spatial distribution λ i oftelecommunication events is aggregated in a 100 x 100 grid of locations i ∈ I .The temporal distribution of events is aggregated over 10-minute time intervals.For our analysis we consider only working days, in order to evaluate the systemunder high trafﬁc and under the periodicity created by the work-cycles behaviourof people. The used dataset consists of P = 39 days, each containing 144 time-slots, with a horizon of T = 5616 trafﬁc observations.Our network architecture consists of 40 BS, most of them being close to thecity center, where the load is higher. We follow the setup of [7], and considerMacro- and Micro- BSs transmitting at P M = 43 dBm and P m = 33 dBm,respectively. The system bandwidth is W = 10 MHz, while the noise densityis N = − dBm/MHz. The path loss exponent is P lo = 3 , and G ji is theresulting coefﬁcient for the signal degradation from AP j to location i . Then,the transmission rates C ji between location i and AP j are given by the Shannonformula: C ji = W log (cid:32) G ji P j W N + (cid:80) k (cid:54) = j G ki P k (cid:33) . ig. 3. Total cell loads vs. time, for α = 0 and cost function φ a ( ρ ) = (cid:80) j ∈J ρ j and for load threshold ρ = 1 and ρ = 0 . , and for K = 24 . We are interested in evaluating the total cell loads that arise from user associ-ation policies produced by PerOnE, and to compare them to those of policiesproduced by OPS.

B. Results

We conduct a sensitivity analysis on the number K of time zones and loadthreshold ρ values, to capture the scenario where the maximum availableresources are considered ( ρ = 1 ), and a scenario with more limited resources( ρ = 0 . ). The ”Optimum” is for OPS when K = T and ρ = 1 , i.e. , it is theoptimal association decision for each individual time slot t under the maximumamount of resources that could be considered. The PerOnE for K = 1 considersonly one time zone, i.e. , runs taking as input the association policy of theprevious slot, and without considering any division in the time horizon. Forconvenience, in our plots we provide an enlargement of the ﬁrst time slots and ig. 4. Total cell loads vs. time, for α = 0 and cost function φ a ( ρ ) = (cid:80) j ∈J ρ j and for load threshold ρ = 1 and ρ = 0 . , and for K = 12 . of some slots that are indicative of how close PerOnE performs to the Optimum,at the top-right and bottom-right corner of the subﬁgures, respectively. We listsome of our observations: PerOnE quickly learns the optimal user association.

PerOnE exploitsthe geometry of the problem and rapidly learns the optimal user association,despite the lack of actual trafﬁc information. From Figs. 3-5, and Table II wesee that as the number K of time zones increases, PerOnE needs more slotsin order to learn not to violate constraints, to converge to optimal solutionsand to produce more cost-efﬁcient associations. This interesting feature allowsPerOnE’s solution updates to adapt to any trafﬁc ﬂuctuation. It is due to thefact that, as the number of the considered time zones decreases, the time slotsthat PerOnE initializes its decisions as for the expensive uniform solutions in(17) decreases too, similarly impacting the total cell loads. The contrary holdsfor OPS, whose static solutions beneﬁt from a partition of the time horizon ig. 5. Total cell loads vs. time, for α = 0 and cost function φ a ( ρ ) = (cid:80) j ∈J ρ j and for load threshold ρ = 1 and ρ = 0 . , and for K = 2 . in more zones. Observe that it produces the minimum-cost static policies fora given partition of the time horizon, which does not necessarily imply thatit will not have any constraint violations. In fact, as K grows, OPS violatesconstraints during more time slots and under more limited resources. However,observe from Figs. 3-5, that the actual cost of the produced associations growsas K decreases. PerOnE effectively adapts to trafﬁc changes.

Despite the arbitrary andlarge trafﬁc variations, PerOnE manages to adapt its solutions and decide cost-effective and near-optimal policies, both under high and low load threshold, asseen from Figs. 3-5. From these and table II, it can be observed that OPS failsto adapt, thus resulting to association policies that lead to a higher system loadand constraint violations.

PerOnE has no regret against OPS.

Despite the large ﬂunctuations duringthe duration of the day, and the lack of actual information during the decision,

Time -10-8-6-4-202 A v e r age r eg r e t Fig. 6. PerOnE’s regret over OPS, for α = 0 and cost function φ a ( ρ ) = (cid:80) j ∈J ρ j and for differentvalues of ρ and K . PerOnE manages to produce asymptotically optimal solutions, under differentpartitions of each period in zones, and under different load-thresholds. As ﬁg. 6suggests, PerOnE’s advantage over OPS grows as the load threshold ρ increasesand as the number K of time zones decreases. Intuitively, for larger ρ hasgreater ”margins” to adapt to the upcoming actual trafﬁc. Also, for smaller K OPS is more restricted in its decisions, increasing PerOnE’s advantage ofadjusting its decisions dynamically.VII. C

ONCLUSIONS

We assume arbitrary trafﬁc variations over time. We introduce OPS, a novelperiodic benchmark for online learning problems, which is signiﬁcant to com-pare against in cases of conjectured trafﬁc periodicity and generalizes state-of-the-art. We propose PerOnE, an asymptotically optimal online algorithm thatproduces association policies by performing a simple update. PerOnE learns toadapt to trafﬁc ﬂuctuations even under lack of actual information. We demon-strate its no-regret property against OPS both analytically and by performingsimulations over a real-trace dataset. Moreover, PerOnE operates under nossumptions over trafﬁc, which renders it a great user-association option for thehighly dynamic environments envisioned for the large-scaled 5G and B5G/6Gnetworks. In our future work we are interested to explore algorithms that jointlylearn several dynamic parameters, for example user association and powercontrol. VIII. A

CKNOWLEDGMENTS

This work was supported by the CHIST-ERA LeadingEdge project, call on”Smart Distribution of Computing in Dynamic Networks” (SDCDN).A

PPENDIX AP ROOF OF L EMMA h i ( π ) = (cid:80) j ∈J π ji log π ji is 1-strongly convex withrespect to the 1-norm [22], i.e. , h i ( π ) ≥ h i ( π ) + (cid:104)∇ h i ( π ) , π − π (cid:105) + 12 (cid:107) π − π (cid:107) , ∀ i ∈ I . Summing over all locations i ∈ I we obtain: h ( π ) ≥ h ( π ) + (cid:88) i ∈I (cid:104)∇ h i ( π ) , π − π (cid:105) + |I| (cid:107) π − π (cid:107) , which due to the interchangeability of the sum and the dot-product, and due to(14), becomes: h ( π ) ≥ h ( π ) + (cid:104)∇ h ( π ) , π − π (cid:105) + |I| (cid:107) π − π (cid:107) . A PPENDIX BP ROOF OF T HEOREM z ( t ) := ∇ V (cid:0) π ( t ) , λ ( t ) (cid:1) . Theorem 2.21 in [22]states that when the regularization function h ( · ) is |I| -strongly-convex w.r.t. a p -norm and the OMD is run with mirror function as in (13), then: T (cid:88) t =1 (cid:104) z ( t ) , π ( t ) − π ∗ (cid:105) ≤ h ( π ∗ ) η − h (cid:0) π (1) (cid:1) η + T (cid:88) t =1 η (cid:107) z ( t ) (cid:107) q |I| , (20)where π ∗ an optimal static decision over T time slots as in (8), and the q -normis the dual norm of the p -norm. We adopt this result and modify it to ﬁt inthe context of our periodic benchmark, using it for each time window W k , separately. Thus: (cid:88) t ∈W k (cid:104) z ( t ) , π ( t ) − π ∗ [ k ] (cid:105) ≤ h ( π ∗ [ k ]) η − h (cid:0) π ( t k ) (cid:1) η + (cid:88) t ∈W k η (cid:107) z ( t ) (cid:107) q |I| . rom the Lipschitz-continuity of the objective function, it exists a positiveconstant L ≥ (cid:107) z ( t ) (cid:107) q , for all q -norms and time windows W k . Thus, for k =1 , ..., K, it is: (cid:88) t ∈W k (cid:104) z ( t ) , π ( t ) − π ∗ [ k ] (cid:105) ≤ h ( π ∗ [ k ]) η − h (cid:0) π ( t k ) (cid:1) η + (cid:88) t ∈W k ηL |I| , (21)Since the association variables for each location belong in [0 , , for eachlocation i it holds that h i ( π ) = (cid:80) j ∈J π ji log π ji ≤ , which implies that h ( π ∗ [ k ]) = (cid:88) i ∈I h i ( π ∗ [ k ]) ≤ . (22)Moreover: h (cid:0) π ( t k ) (cid:1) (14) , (17) = (cid:88) j ∈J (cid:88) i ∈I |N i | log( 1 |N i | )= − (cid:88) j ∈J (cid:88) i ∈N j log( |N i | ) |N i | ≥ − M I log( M J ) M J , ∀ k, (23)where the inequality is due to (19), because M I ≤ |I| and M J ≤ |J | . Then,(21) together with (22) and (23) leads to: (cid:88) t ∈W k (cid:104) z ( t ) , π ( t ) − π ∗ [ k ] (cid:105) ≤ M I log( M J ) ηM J + |W k | ηL |I| , ∀ k. (24)Convexity of the objective function implies: V (cid:0) π ( t ) , λ ( t ) (cid:1) − V ( π ∗ [ k ] , λ ( t )) ≤ (cid:104) z ( t ) , π ( t ) − π ∗ [ k ] (cid:105) , (25)for all t ∈ W k , k = 1 , ..., K. Then:

Reg ( T, K ) (7) = T (cid:88) t =1 φ a ( π ( t ) , λ ( t )) − K (cid:88) k =1 (cid:88) t ∈W k φ a ( π ∗ [ k ] , λ ( t )) (10) , (6) ≤ K (cid:88) k =1 (cid:88) t ∈W k (cid:0) V (cid:0) π ( t ) , λ ( t ) (cid:1) − V (cid:0) π ∗ [ k ] , λ ( t ) (cid:1)(cid:1) (25) ≤ K (cid:88) k =1 (cid:88) t ∈W k (cid:104) z ( t ) , π ( t ) − π ∗ [ k ] (cid:105) (24) ≤ K (cid:88) k =1 (cid:18) M I log( M J ) ηM J + |W k | ηL |I| (cid:19) = KM I log( M J ) ηM J + ηT L |I| , (26)which concludes the ﬁrst part of the proof. The ﬁrst equality is basically thedeﬁnition of regret. From (6) π ∗ [ k ] minimizes costs for t ∈ W k , and from (10)he objective V ( · ) is at least equal to costs. The ﬁrst inequality comes fromthe observation that the penalty paid by the optimal benchmark π ∗ [ k ] can’t begreater than that paid by the online algorithm that takes decisions under lack ofinformation. The second inequality comes as a result of the convexity of V ( · ) , and the next from substituting the RHS of (24) for each window W k . The lastequality holds because (cid:80) Kk =1 |W k | = T, since summing all time slots over allthe time windows is equivalent with summing over the entire time horizon.For the second part of the Theorem, it is easily veriﬁable that (26) is mini-mized for η = (cid:113) KM I log( M J )2 |I| T L M J . Substituting in (26), and since log( M J ) < M J and M I < |I| , we get: Reg ( T, K ) ≤ (cid:115) KM I log( M J ) T L M J |I| ≤ L √ KT . R EFERENCES [1] E. Calvanese Strinati, S. Barbarossa, J. L. Gonzalez-Jimenez, D. Ktenas, N. Cassiau, L. Maret, andC. Dehos, “6G: The Next Frontier: From Holographic Messaging to Artiﬁcial Intelligence UsingSubterahertz and Visible Light Communication,”

IEEE Vehicular Technology Magazine , vol. 14,no. 3, pp. 42–50, 2019.[2] F. Kelly, A. Mauilloo, and D. Tan, “Rate control for communication networks: Shadow prizes,proportional fairness and stability,”

Journal of Operation Research Society , vol. 49, pp. 237–252,1998.[3] H. Kim, G. de Veciana, X. Yang, and M. Venkatachalam, “Distributed α -optimal user associationand cell load balancing in wireless networks,” IEEE/ACM Trans. on Networking , vol. 20, no. 1, pp.177–190, Feb 2012.[4] L. Vigneri, G. Paschos, and P. Mertikopoulos, “Large-scale network utility maximization: Counteringexponential growth with exponentiated gradients,” in

IEEE Conference on Computer Communica-tions - IEEE INFOCOM , 2019.[5] B. Hajek, “Performance of global load balancing by local adjustment,”

IEEE Transactions onInformation Theory , vol. 36, no. 6, pp. 1398–1414, 1990.[6] M. Alanyali and B. Hajek, “On simple algorithms for dynamic load balancing,” in

IEEE Conferenceon Computer Communications - IEEE INFOCOM , vol. 1, 1995, pp. 230–238 vol.1.[7] N. Liakopoulos, GS. Paschos, and T. Spyropoulos, “Robust user association for ultra densenetworks,” in

IEEE Conference on Computer Communications - IEEE INFOCOM , 2018, pp. 2690–2698.[8] I. Koutsopoulos and L. Tassiulas, “Joint optimal access point selection and channel assignment inwireless networks,”

IEEE/ACM Transactions on Networking , vol. 15, no. 3, pp. 521–532, 2007.[9] S. Papavassiliou and L. Tassiulas, “Improving the capacity in wireless networks through integratedchannel base station and power assignment,”

IEEE Transactions on Vehicular Technology , vol. 47,no. 2, pp. 417–427, 1998.[10] M. Karaliopoulos, L.E. Chatzieleftheriou, G. Darzanos, and I. Koutsopoulos, “On the joint contentcaching and user association problem in small cell networks,” in , 2020, pp. 1–6.[11] G. Darzanos, L.E. Chatzieleftheriou, M. Karaliopoulos, and I. Koutsopoulos, “Content preference-aware user association and caching in cellular networks,” in

International Symposium on Modelingand Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT) Workshops , 2020, pp. 1–8.[12] LE. Chatzieleftheriou, G. Darzanos, M. Karaliopoulos, and I. Koutsopoulos, “Joint user association,content caching and recommendations in wireless edge networks,”

ACM SIGMETRICS PerformanceEvaluation Review , vol. 46, no. 3, pp. 12–17, 2018.13] L. E. Chatzieleftheriou, M. Karaliopoulos, and I. Koutsopoulos, “Caching-Aware Recommendations:Nudging User Preferences towards better Caching Performance,” in

IEEE Conference on ComputerCommunications - IEEE INFOCOM , 2017, pp. 784–792.[14] M. Zinkevich, “Online convex programming and generalized inﬁnitesimal gradient ascent,” in

International Conference on Machine Learning - ICML , 2003, pp. 928–935.[15] E. Hazan and S. Kale, “Projection-free online learning,” in

International Conference on MachineLearning - ICML , 2012.[16] H. Yu, M. Neely, and X. Wei, “Online convex optimization with stochastic constraints,” in

Conferenceon Neural Information Processing Systems - NIPS , 2017, pp. 1428–1438.[17] N. Liakopoulos, A. Destounis, G. Paschos, T. Spyropoulos, and P. Mertikopoulos, “Cautious regretminimization: Online optimization with long-term budget constraints,” in

International Conferenceon Machine Learning - ICML , 2019.[18] G. Paschos, A. Destounis, and G. Iosiﬁdis, “Online convex optimization for caching networks,”

IEEE/ACM Trans. on Networking , 2020.[19] T. Karagkioules, GS. Paschos, N. Liakopoulos, A. Fiandrotti, D. Tsilimantos, and M. Cagnazzo, “On-line learning for robust adaptive video streaming in mobile networks,” arXiv preprint: 1905.11705 ,2019.[20] T. Chen and Q. Ling and G. Giannakis, “An online convex optimization approach to proactivenetwork resource allocation,”

IEEE Trans. on Signal Processing , 2017.[21] D. Bertsekas,

Convex Optimization Algorithms . Athena Scientiﬁc, 2015.[22] S. Shalev-Shwartz, “Online Learning and Online Convex Optimization,”

Foundations and Trends®in Machine Learning , 2012.[23] E. V. Belmega, P. Mertikopoulos, R. Negrel, and L. Sanguinetti, “Online Convex Optimization andNo-Regret Learning: Algorithms, Guarantees and Applications,” arXiv: 1804.04529 , 2018.[24] T. M. Cover, “Behavior of sequential predictors of binary sequences,” in