[PDF] Online Resource Inference in Network Utility Maximization Problems

Abstract

The amount of transmitted data in computer networks is expected to grow considerably in the future, putting more and more pressure on the network infrastructures. In order to guarantee a good service, it then becomes fundamental to use the network resources efficiently. Network Utility Maximization (NUM) provides a framework to optimize the rate allocation when network resources are limited. Unfortunately, in the scenario where the amount of available resources is not known a priori, classical NUM solving methods do not offer a viable solution. To overcome this limitation we design an overlay rate allocation scheme that attempts to infer the actual amount of available network resources while coordinating the users rate allocation. Due to the general and complex model assumed for the congestion measurements, a passive learning of the available resources would not lead to satisfying performance. The coordination scheme must then perform active learning in order to speed up the resources estimation and quickly increase the system performance. By adopting an optimal learning formulation we are able to balance the tradeoff between an accurate estimation, and an effective resources exploitation in order to maximize the long term quality of the service delivered to the users.

Full PDF

11 Online Resource Inference in Network UtilityMaximization Problems

Stefano D’Aronco, Pascal Frossard

Abstract —The amount of transmitted data in computer networks is expected to grow considerably in the future, putting more and morepressure on the network infrastructures. In order to guarantee a good service, it then becomes fundamental to use the networkresources efﬁciently. Network Utility Maximization (NUM) provides a framework to optimize the rate allocation when network resourcesare limited. Unfortunately, in the scenario where the amount of available resources is not known a priori, classical NUM solvingmethods do not offer a viable solution. To overcome this limitation we design an overlay rate allocation scheme that attempts to inferthe actual amount of available network resources while coordinating the users rate allocation. Due to the general and complex modelassumed for the congestion measurements, a passive learning of the available resources would not lead to satisfying performance.The coordination scheme must then perform active learning in order to speed up the resources estimation and quickly increase thesystem performance. By adopting an optimal learning formulation we are able to balance the tradeoff between an accurate estimation,and an effective resources exploitation in order to maximize the long term quality of the service delivered to the users.

Index Terms —Network Utility Maximization, Optimal Learning, Overlay Rate Allocation (cid:70)

NTRODUCTION

Since, in general, in computer networks multiple communi-cations are simultaneously active, the users have to share thelimited network resources. When sharing the network links,the users can either behave selﬁshly, meaning that each userstrives to use as much resources as possible; alternatively,they can cooperate to increase the overall efﬁciency of thecommunication system. In communication networks, the price of anarchy (which is the degradation of the system efﬁ-ciency due to a selﬁsh behavior) is rather high and the use ofresource allocation protocols for an efﬁcient sharing of thenetwork resources is recommended. Rate allocation prob-lems in communication networks are commonly referred toas Network Utility Maximization (NUM) problems [1]. Inthis sort of problems a utility function is associated to eachuser in order to map the usage of the network resources tothe obtained beneﬁts. The goal of the communication systemis then to share the network resources in such a way that theoverall users’ beneﬁt is maximized.Most of the NUM optimization problems can be solvedin a distributed way using primal and dual decomposi-tion methods [2]. These decomposition methods form thebasis of many network protocols used in communicationsystems [3]. In NUM problems it is typically assumed thateither the amount of available network resources is known,or that the users can measure some private congestionsignals (e.g., packet losses or experienced delay) that canbe used by the individual users to tune the transmittingrate and achieve the optimal rate allocation. Although inreal communication networks users have always access to . This work has been submitted to the IEEE for possible publication.Copyright may be transferred without notice, after which this versionmay no longer be accessible. a private congestion signal, it is not always true that thesesignals can be used by the individual users to adjust theirtransmitting rate to the optimal value. For instance, thisproblem arises clearly in the scenario of HTTP AdaptiveStreaming (HAS) [4], [5]. HAS represents nowadays thestandard technology for video streaming over the internetand it employs as communication protocols HTTP over TCP.The key point is that HAS users sharing a common linktend to converge to a rate-fair allocation if they tune therate according to their individual congestion measurements.As it is shown in [6], [7], such a rate-fair allocation isnot efﬁcient for video streaming and users should ratherbe coordinated at the application level in order to reacha higher quality of service. The same problem may arisewhenever a network application used by different users isforced to employ a speciﬁc congestion control algorithm(e.g., TCP), while it would be better to coordinate thetransmitting rates to match a speciﬁc rate allocation. If thesenetwork applications have no knowledge about the amountof available resources, the optimal rate allocation is hardto ﬁnd, and classical NUM algorithms cannot help in thisscenario. In this work, we aim at ﬁlling this gap and proposea way for extending the NUM framework to more generalscenarios where resource constraints are not known a priori.Speciﬁcally we consider a set of users sharing differentnetwork links. The users want to transmit data across thenetwork while optimizing the system efﬁciency. We assumethat the system knows how the network nodes are con-nected but does not know the transmitting capacity of thelinks. We further assume that, when the overall transmittingrate exceeds the network capacity, only some of the networkusers may detect the congestion event. The goal is then todesign a distributed overlay allocation method that tunesthe sending rate of all the users, so that even the users thatdo not detect any congestion event can adjust their sendingrate to their optimal value. a r X i v : . [ c s . N I] M a y In more detail, we ﬁrst separate the problem in two sub-problems, one corresponding to a classical NUM problemand one corresponding to a resource inference problem.Due to the general and complex model assumed for thecongestion signals the resource inference cannot be doneefﬁciently using a passive method and the adoption of anactive learning method is crucial in order to speed up the re-sources estimation. In the analyzed framework, performingactive learning comes at the expenses of reducing the serviceprovided to the users. In order to guarantee good systemperformance in both short and long term, we formulatethe problem as an optimal learning problem [8]. The origi-nal optimal learning problem is however computationallyintractable, hence we carefully approximate the differentmathematical steps and develop a decentralized algorithmto solve a simpliﬁed version of the problem. The algorithmdistributes the operations among different processes, eachassociated to a single speciﬁc network link. These processesare not bounded to be executed on any speciﬁc networknode, and only require communication among each otherand with the different users, enabling a ﬂexible deploy-ment. The experimental results show the effectiveness ofthe proposed method and the advantage that an optimallearning formulation offers with respect to a na¨ıve strategythat greedily optimizes the immediate performance.Motivated by similar arguments, some works, suchas [9], [10], analyze how classical NUM algorithms areaffected by potential noisy measurements of the feedbacksignals. In particular these studies consider the case wherethe measurements are corrupted by a biased noise whichprevents the network from achieving the optimal rate allo-cation. The aim of our study compared to the latter onesis however different: whereas these works try to assess theoptimality gap due to biased feedbacks, in our work we tryto infer the available resources in order to implement anoverlay NUM system that is then robust to biased feedback.We are not aware of any work on NUM that tries to builda user coordination system without having access to directmeasures of the available resources. The only known workthat attempts to solve a NUM problem with unknownconstraints is [11]. The authors however assume to haveaccess to measurements of the constraint vector perturbedby a zero mean noise. In our case we rather assume to neverhave direct access to the links capacities but only to indirectusers congestion signal measurements. A somewhat similarproblem is analyzed in [12]. In this NUM related work, theprivate congestion signals measured by the different usersare heterogenous, leading to an inefﬁcient rate allocation.The authors use a game theoretic framework in order todesign a coordination system based on local users’ beliefswithout any explicit user communication. However in thisstudy it is assumed that the congestion signals that theuser can measure are linearly coupled with the actions, i.e.,the sending rates, of all the users. In our case instead, wedo not make such assumption, and the congestion signalobserved by some users could be completely insensitive tothe actions of other users that employ the same networkresources. Therefore we are able to cover a larger class ofNUM problems.This paper is organized as follows. In Section 2 weprovide some background related to NUM problems. In Section 3 we introduce the problem settings and state theproblem formulation. In Section 4 we describe how weapproximate and solve our instance of the NUM problem. InSection 5 we summarize the proposed method. Results fromcomputer simulations are provided in Section 6. Finallyconclusions are provided in Section 7.

ACKGROUND

In this work we denote vectors with bold lowercase letters( x ) and matrices with bold uppercase letters ( X ). In bothcases the non-bold lowercase symbol with subscripts ( x n or x nm ) denotes a single element of the vector (or matrix). Thenotation ( x , y ) denotes a tuple of different variables whereas { x k } denotes a collection of elements indexed by k .As depicted in Fig. 1, we consider a set of M networklinks shared by a set of N users. Each user transmits databetween two nodes of the network. The subset of adjacentnetwork links that connect the source node and the desti-nation node of a user communication forms the route takenby the user’s data. The routing information is embedded inthe matrix A , which is a M × N binary matrix, where theelement a nm is equal to if user n employs link m , and otherwise.The vector x ∈ R N represents the transmitting rate ofeach user. We associate to each user n a utility function u n ( x n ) that maps the transmitting rate of user n , i.e., x n , tothe resulting beneﬁt. In NUM problems, utility functions areusually assumed to be strictly increasing smooth concavefunctions [13], [14]. An efﬁcient utilization of the resourcesis given by the solution to the following NUM problem:maximize x U ( x ) = N (cid:88) n =1 u n ( x n ) subject to Ax ≤ bx ∈ X , (1)where the vector b represents the available resources, i.e.,the capacities of the network links. X represents the domainof the utility functions, which usually coincides with thepositive orthant, as negative transmitting rates are meaning-less. The optimization problem in Eq. (1) aims at maximizingthe overall utility of the users subject to the limited availabil-ity of the resources. NUM problems are generally solvedonline with a distributed algorithm [3], [13]. The solvingmethod basically consists in the design of a distributedclosed loop control system whose equilibrium point is theoptimal rate allocation of the NUM problem. At each timestep t , the users demand a certain amount of resources x t and observe a private feedback signal representing theconstraint violation. The users then independently adaptthe request at step t + 1 according to the private feedbackreceived, converging to the equilibrium point.One possible way to solve the NUM problem in a dis-tributed way relies on a dual decomposition [2]. The dualfunction of problem in Eq. (1) corresponds to: g ( λ , b ) = max x ∈X U ( x ) − λ T ( Ax − b ) (2)where the vector λ represents the dual variables, or prices,associated to the M constraints. The optimal point for the link m user n a m x = y m AAACmHicbVHBbhMxEHUWSktaaEpvcLEaIXGKdlGlckGK4ADcCmraSrur1awzm1i1vSvbm7Ky9lu4wif1b+okeyBpRrL8/N4bz2gmrwQ3NgwfesGz53sv9g9e9g+PXr0+Hpy8uTZlrRlOWClKfZuDQcEVTiy3Am8rjSBzgTf53delfrNAbXiprmxTYSphpnjBGVhPZYPTZOGgzST19++WfqZNJrPBMByFq6BPQdSBIeniMjvpyWRaslqiskyAMXEUVjZ1oC1nAtt+UhusgN3BDGMPFUg0qVt139L3npnSotT+KEtX7P8ZDqQxjcy9U4Kdm21tSe7S4toWn1LHVVVbVGxdqKgFtSVdjoJOuUZmReMBMM19r5TNQQOzfmA7q2yTdi43udy/+4lGhfeslBLU1CWLNo5Slyw/yAs3jFqfsqHP1oZc7hCtBtXGYZdvCnflHX5B0fY6noLrj6MoHEU/z4fjL92qDsg7ckY+kIhckDH5Ti7JhDDSkD/kL/kXvA3Gwbfgx9oa9LqcU7IRwa9HMQ/Oyw== AAACmHicbVHBbhMxEHUWSktaaEpvcLEaIXGKdlGlckGK4ADcCmraSrur1awzm1i1vSvbm7Ky9lu4wif1b+okeyBpRrL8/N4bz2gmrwQ3NgwfesGz53sv9g9e9g+PXr0+Hpy8uTZlrRlOWClKfZuDQcEVTiy3Am8rjSBzgTf53delfrNAbXiprmxTYSphpnjBGVhPZYPTZOGgzST19++WfqZNJrPBMByFq6BPQdSBIeniMjvpyWRaslqiskyAMXEUVjZ1oC1nAtt+UhusgN3BDGMPFUg0qVt139L3npnSotT+KEtX7P8ZDqQxjcy9U4Kdm21tSe7S4toWn1LHVVVbVGxdqKgFtSVdjoJOuUZmReMBMM19r5TNQQOzfmA7q2yTdi43udy/+4lGhfeslBLU1CWLNo5Slyw/yAs3jFqfsqHP1oZc7hCtBtXGYZdvCnflHX5B0fY6noLrj6MoHEU/z4fjL92qDsg7ckY+kIhckDH5Ti7JhDDSkD/kL/kXvA3Gwbfgx9oa9LqcU7IRwa9HMQ/Oyw== AAACmHicbVHBbhMxEHUWSktaaEpvcLEaIXGKdlGlckGK4ADcCmraSrur1awzm1i1vSvbm7Ky9lu4wif1b+okeyBpRrL8/N4bz2gmrwQ3NgwfesGz53sv9g9e9g+PXr0+Hpy8uTZlrRlOWClKfZuDQcEVTiy3Am8rjSBzgTf53delfrNAbXiprmxTYSphpnjBGVhPZYPTZOGgzST19++WfqZNJrPBMByFq6BPQdSBIeniMjvpyWRaslqiskyAMXEUVjZ1oC1nAtt+UhusgN3BDGMPFUg0qVt139L3npnSotT+KEtX7P8ZDqQxjcy9U4Kdm21tSe7S4toWn1LHVVVbVGxdqKgFtSVdjoJOuUZmReMBMM19r5TNQQOzfmA7q2yTdi43udy/+4lGhfeslBLU1CWLNo5Slyw/yAs3jFqfsqHP1oZc7hCtBtXGYZdvCnflHX5B0fY6noLrj6MoHEU/z4fjL92qDsg7ckY+kIhckDH5Ti7JhDDSkD/kL/kXvA3Gwbfgx9oa9LqcU7IRwa9HMQ/Oyw== AAACmHicbVHBbhMxEHUWSktaaEpvcLEaIXGKdlGlckGK4ADcCmraSrur1awzm1i1vSvbm7Ky9lu4wif1b+okeyBpRrL8/N4bz2gmrwQ3NgwfesGz53sv9g9e9g+PXr0+Hpy8uTZlrRlOWClKfZuDQcEVTiy3Am8rjSBzgTf53delfrNAbXiprmxTYSphpnjBGVhPZYPTZOGgzST19++WfqZNJrPBMByFq6BPQdSBIeniMjvpyWRaslqiskyAMXEUVjZ1oC1nAtt+UhusgN3BDGMPFUg0qVt139L3npnSotT+KEtX7P8ZDqQxjcy9U4Kdm21tSe7S4toWn1LHVVVbVGxdqKgFtSVdjoJOuUZmReMBMM19r5TNQQOzfmA7q2yTdi43udy/+4lGhfeslBLU1CWLNo5Slyw/yAs3jFqfsqHP1oZc7hCtBtXGYZdvCnflHX5B0fY6noLrj6MoHEU/z4fjL92qDsg7ckY+kIhckDH5Ti7JhDDSkD/kL/kXvA3Gwbfgx9oa9LqcU7IRwa9HMQ/Oyw== Fig. 1. Overview of the considered scenario. Each user resides on anode of the network and transmits data to another network node. Thecolored arrows represent the communication route of each user. primal and dual variables ( x (cid:63) , λ (cid:63) ) , can be found by iteratingover the following steps:  x tn = arg max x (cid:48) ∈X n { u n ( x (cid:48) ) − x (cid:48) a T n λ t } n = 1 ...Nλ t +1 m = (cid:32) λ tm + (cid:15) (cid:0) a m x t − b m (cid:1) (cid:33) + m = 1 ...M, (3a)(3b)where (cid:15) is a parameter that controls the step size of the dualvariables update and () + denotes the projection onto thepositive orthant. The quantities a T n λ t represent the privatesignals that are needed by each user in order to updatethe sending rate. Eq. (3) represents a distributed feedbackcontrol loop whose equilibrium point corresponds to the op-timal point of the NUM problem in Eq. (1). In the commonsettings of the NUM problem, it is usually assumed thateither the link capacities b are known a priori, so that Eq. 3bcan be computed, or that the dual variables λ t correspondto physical signals that can be measured by the individualusers. For instance in delay-based congestion control algo-rithms the dynamics of the packet queuing delays matchthe dynamics of the dual variables, and the users can collecttheir private feedback signals by measuring the experienceddelay. How to achieve the optimal allocation efﬁciently,when the available resources are not known a priori andthe private signals cannot directly be used to control theusers sending rate, is an open problem that we aim to solvein this study. ROBLEM S ETTINGS AND F RAMEWORK

We want to solve the NUM problem of Eq. (1) when theexact value of the vector b is unknown. In order to ﬁnda solution to this problem we rely on the assumption thatwhen the sending rates of the users exceed the available resources, the users can detect a congestion event. In thefollowing we deﬁne more precisely what are the featuresof the congestion signals observed by the users, and deﬁnea probabilistic model that relates these signals to the linkcapacities. The model is then used by our rate allocationscheme to infer the constraint vector b .We deﬁne the random binary variable z tm to representthe occurrence of a congestion event on link m . Its probabil-ity is given by: p ( z tm | b m , y tm ) = (cid:2) σ ( κ ( y tm − ρb m )) (cid:3) z tm · (cid:2) − σ ( κ ( y tm − ρb m )) (cid:3) − z tm , (4)where σ () denotes the sigmoid function, y tm corresponds tothe sum of the users sending rates passing through link m at time t , y tm = a T m x t . κ and ρ are positive scalar parametersthat can be used to tune the steepness and the locationof the sigmoid function. Eq. (4) tells us that the larger isthe requested transmission rate for link m the larger is theprobability to face a congestion event. By using the sigmoidfunction instead of a step function in Eq. (4) we can accountfor some possible noise in the network, e.g., transmissionbursts and noisy estimates of y m . As the binary variable z tm represents the occurrence of a congestion event on link m at time t , the binary variable v tn represents the detection ofa congestion event on the route of user n at time t ( v tn = 1 when a congestion is detected by user n ).In this study we assume that the user congestion vari-ables v t and link congestion variables z t are related by thefollowing conditions: a ) if v tn = 1 then at least one variable z tm with a mn = 1 ,has to be equal to one. This condition tells us that if allthe links used by the user do not trigger any congestion,then the user cannot observe a congestion event. b ) If for all the users with a mn = 1 , the variables v tn arezero, then z tm = 0 . This condition tells us that if link m triggers a congestion then one of the users employinglink m has to observe a congestion event.If two vectors z t and v t do not verify the above conditions,then, they are not consistent and the probability to observe apair of inconsistent vectors is zero. In order to model math-ematically the above conditions we introduce the function Ψ v t ( z t ) . This function is parametrized by a vector v t , andmaps a vector z t to the set { , } . Given a pair of vectors v t and z t , Ψ v t ( z t ) is equal to one if the two vectors verifythe above conditions and equal to zero otherwise. Morespeciﬁcally the function is deﬁned as follows: Ψ v t ( z t ) = (cid:89) n : v tn =1 ψ n ( z t ) (cid:89) m : a T m v t =0 ψ m ( z tm ) , (5)with: ψ n ( z t ) = 1 − (cid:89) m (1 − a nm z tm ) , ψ m ( z tm ) = 1 − z tm . (6)The factors ψ m correspond to the above condition b )whereas the factors ψ n are associated to condition a ). Notethat the vector v t selects the factors ψ n and ψ m that areactive in full term Ψ v t .What we actually observe at each iteration t is not thelink congestion vector z t , which represents a latent variable in our model, but the user congestion signals v t and theusers rates x t (since we assume to know the routing matrixthe vector y t is also known). As shorthand we denoted by D t the observed data at time t : D t = ( y t , v t ) . What we aimto do is to use the observed variables in order to infer thevalue of the constraint vector b , which is the quantity weare interested in, in order to ﬁnd the optimal rate allocation.We can represent the above probabilistic model using thegraphical model of Fig. 2. As can be seen the z t variablesare generated from the vectors b and y t but they are relatedto each other depending on the value of the observed vector v t . Combining Eq. (4)-(6) for all the observation t we obtain: p ( { z t }| b , {D t } ) = 1 Z (cid:89) t Ψ v t ( z t ) p ( z t | b , y t ) , (7)where p ( z t | b , y t ) = (cid:81) m p ( z tm | b m , y tm ) and Z correspondsto a normalization constant required to obtain a valid poste-rior distribution. Denoting with p ( b ) the prior knowledgeon the parameters b we can write down the joint distribu-tion between the latent variables { z t } and the parameters b : p ( { z t } , b |{D t } ) = p ( { z t }| b , {D t } ) p ( b ) . (8)Finally the posterior on b can be obtained by marginalizingout all the latent variables { z t } , leading to: p ( b |{D t } ) = (cid:88) { z t } p ( { z t } , b |{D t } ) . (9)The factor graph associated to Eq. (7)-(8) is depicted inFig. 13.In our scenario we sequentially collect the observabledata D t , therefore we can see the belief on p ( b ) as somethingevolving with t : p t − ( b , z t |D t ) = 1 Z t Ψ v t ( z t ) p ( z t | b , y t ) p t − ( b ) , (10) p t ( b ) = p t − ( b |D t ) = (cid:88) z t p t − ( b , z t |D t ) . (11)We can think of p t ( b ) as the prior at time t , which is equalto the posterior at time t − ( p t − ( b |D t ) ).In this subsection we have formalized the user con-gestion signal v along with a probabilistic model able togenerate a posterior distribution on the constraint vector b using the observed quantities v and y . Analogously to the classical NUM framework in Section 2,we need to design a closed loop algorithm that selects ateach step t a certain rate vector x t . It then observes thecongestion feedback signal v t , and uses it to compute anew value of the sending rate x t +1 . This process shouldcontinue till convergence, ideally to the optimal allocation x (cid:63) of Problem (1).There is no unique solution to the design of such feed-back control system. One possible option consists in havinga single loop algorithm where the allocation method ateach iteration t computes under a deﬁned policy a newallocation vector x t +1 . While this design choice ideallyallows to achieve better performance, as it does not impose b M b y tM y t z t z tM T ...... b m y tm z tm b y t z t v t = 1 v tn = 1 Fig. 2. Graphical model of the considered scenario. Grey variable nodesrepresent quantities that are observed at each step t . b m z tm p ( z tm | b m ) ... z tl b l ... l ( z tl ) l ( z t ) p ( b l ) p ( b m ) T Fig. 3. Factor graph of one single observation t of the graphical modeldepicted in Fig. 2. Each factor ψ n ( z t ) forces at least one of the con-nected variables z t to be one. The factor ψ n ( z tl ) force z tl to be zero. any particular structure on the control system, it also posesseveral design challenges. The controller should, in fact, ﬁnda map between the prior belief p ( b ) , the users’ requests { x t } and the feedbacks { v t } (along with the probabilistic modelthat relates D t to b ), to ﬁnd the rate allocation x t +1 . Dueto the large dimension of the input and output space thisdesign choice is unpracticalAn alternative design choice consists in an adaptivecontrol architecture [15], where we separate the resourceallocation method in two separate loops. An inner loopresponsible for the users coordination, where b is seenas a controller parameter, and an outer loop responsiblefor the inference of the b parameter. A high-level blockdiagram of this system is depicted in Fig. 4. In this casethe design of the controller is much simpler as the user rateallocation process is separated from the resource inference.The inner loop simply has to solve at step t an instance usersresources Inner LoopOuter Loop ˆ b t D t = ( y t , v t )ˆ b t = ⇡ ( p t ( b )) p t ( b ) = X z p t ( b , z |D t ) ⌧ +1 = ⇣ ⌧ + ✏ ( y ⌧ ˆ b t ) ⌘ + resourceinferenceclassicalNUMconstraintparameterobservations y ⌧ = Ax ⌧ x ⌧ = u ( A T ⌧ ) ⌧ +1 Fig. 4. High-level block diagram of our system. The method is composedby two separate loops. The inner loop represents a simple primal-dualdistributed algorithm for solving the NUM problem (e.g., Eq. (3) ). Theouter loop is the method proposed in this work, see Section 4, forthe selection of the sequence of constraints { ˆ b t } to be used as inputparameter for the inner loop. of the NUM problem with the available resources set bythe vector ˆ b t . In this case any method capable to solvethe NUM problem, as for instance the one in Eq. (3), canbe used. The outer loop is responsible for the inference ofthe true b vector. It receives as input the samples D t , and,based on the current belief p t ( b ) , it then selects the bestvalue of ˆ b t under some deﬁned policy. Note that the belief p t ( b ) represents a hyperstate of the controller that evolvesaccording to Eq. (10)-(11). Differently from the inner loopthere is no off-the-shelf solution for the outer loop controller,the design of such subsystem is the main focus of this work.As ﬁnal remark, in our design we consider to update theouter loop, and compute a new parameter ˆ b , after the innerloop converges to equilibrium. In this case, the observeddata D t +1 corresponds to the equilibrium point of the innerloop at iteration t , which is the optimal point of the NUMproblem of Eq. (1) when the b = ˆ b t , i.e., y t = y (cid:63) (ˆ b t − ) . In this subsection we deﬁne the optimization problem thatwe ideally aim to solve in order to design the outer loopcontroller.The design of the outer loop controller corresponds toﬁnding a policy π ( · ) that maps the current belief on theconstraint vector b into a value of the constraint vector tobe used as parameter for the inner loop ( π : p ( b ) → ˆ b )while maximizing the performance of the system. We canmeasure the performance of the system at each step t as theexpected optimality gap when using as constraint vector forthe classical NUM algorithm of the inner loop ˆ b t instead ofthe true unknown value b . In order to do this, we introducea loss function L ( b , ˆ b ) . Ideally the loss function modelsthe difference in overall users’ utility attained by using theestimate ˆ b instead of the true value b . Unfortunately, anexact evaluation of the optimality gap is highly expensivein terms of computations since it would require to solve theNUM problem of Eq. (1) twice. Instead we can consider a simpler and rather classic loss function such as the squared l distance: L ( b , ˆ b ) = || b − ˆ b || . (12)The idea is that the closer ˆ b is to the true vector b thesmaller the optimality gap. Using the belief p ( b ) and theloss function we can then deﬁne the risk as the expectedvalue of the loss function: R (ˆ b , p ( b )) = E p ( b ) (cid:104) L ( b , ˆ b ) (cid:105) . (13)The risk corresponds to our performance metric at eachstep t and it represents the expected suboptimality of therate allocation algorithm when using a resource estimateequal to ˆ b . Having deﬁned the risk for a single step, wecan easily extend the same metric to an entire sequence ofbelief-estimates pairs { ( p t ( b ) , ˆ b t ) } : R ∞ ( { (ˆ b t , p t ( b )) } ) = ∞ (cid:88) t =0 γ t R (ˆ b t , p t ( b )) , (14)where γ ∈ [0 , represents a discount factor that permits tocompute the cumulative risk over an inﬁnite time horizon.Maximizing the expected long term performance of oursystem is equivalent to minimizing the long term risk ofbeing suboptimal, corresponding to Eq. (14). Note that con-secutive beliefs p t ( b )) are dependent; more precisely, theyevolve according to Eq. (10)-(11), and the evolution dependson the value of the observed data D t . By combining thedifferent elements we can formulate an optimal learningproblem for the long term risk minimization:minimize π ( · ) E p ( {D t }|{ π ( p t − ( b )) } ) (cid:34) ∞ (cid:88) t =0 γ t R ( π ( p t ( b )) , p t ( b )) (cid:35) subject to p t ( b ) = (cid:88) z t p t − ( b , z t |D t ) , (15)where p (cid:0) {D t }|{ π ( p t − ( b )) } (cid:1) corresponds to the probabilityof observing a sequence {D t } for t = 1 ... ∞ given that aconstraint vector ˆ b t = π ( p t − ( b )) is used at step t for theinner control loop, marginalized over the initial belief on b ,i.e., p ( {D t }|{ π ( p t − ( b )) } ) = (cid:89) t (cid:90) p ( D t | π ( p t − ( b )) , b ) p t − ( b ) d b . (16)Since the observed data D t is a random variable, the transi-tion from p t − ( b ) to p t ( b ) is also stochastic. As a result weneed to average the objective function of Eq. (15) over thepossible transitions.The discount parameter γ is a free parameter that repre-sents how much we care about the future performance. Thechoice γ = 0 represents a greedy strategy where at eachstep the controller would select ˆ b t in order to minimizethe immediate risk without caring about the future steps.Taking into account the long term risks basically is whatallows to perform active learning. Since different data points D t +1 lead to different future beliefs on b (and beliefs witha smaller variance are beneﬁcial because they result in asmaller risk), it is important to select ˆ b t that reduces both,the current risk and the uncertainty of the future beliefs.As we will see, in our problem values of ˆ b t that reducethe current risk, do not coincide with those that reduce theexpected value of the future risk. Using an optimal learningformulation we are able to explicitly take into account thistradeoff and perform a smart choice for ˆ b t at each step t . PPROXIMATE S OLUTION

In this section we describe how it is possible to approximatethe problem in Eq. (15) in order to be able to ﬁnd an effectivesolution. We ﬁrst limit the time horizon over which weoptimize, we then describe how it is possible to approximatethe true posterior on b using deterministic approximateinference, and ﬁnally we introduce a mean ﬁeld approxi-mation in order to deal with large network systems. In order to ﬁnd the optimal policy for the closed loopproblem of Eq. (15) we typically need to compute the statevalue function of our system using a dynamic programmingapproach. However, due to the inﬁnite dimension of thestate space and the required expectation operations, thismethod quickly runs into computational problems. Oneway to work around the complexity problem consists inapproximating the closed loop problem of Eq. (15) witha sequence of receding horizon open loop problems. Asalso suggested in [8], a rough but effective approximationfor optimal learning problems corresponds to computing ateach time step t the estimate ˆ b t that minimizes the longterm risk as step t would be the last step we were allowedto learn and modify our belief. Then, after taking decision ˆ b t and observing D t +1 , the belief is updated and the sameproblem is solved again with the new belief.Using this approach at each step t we aim at ﬁnding theestimate ˆ b t given the current belief p t ( b ) that minimizes therisk of the current step t plus an additional term correspond-ing to the discounted inﬁnite sum of the expected minimumimmediate risk at step t + 1 . Hence, we aim at solving thefollowing optimization problem:minimize ˆ b (cid:48) R (ˆ b (cid:48) , p t ( b )) + γ − γ E p t ( D| ˆ b (cid:48) ) (cid:2) R (cid:63) ( p t ( b |D )) (cid:3) (17)where R (cid:63) ( p t ( b |D )) corresponds to the minimum risk fora belief equal to the posterior distribution. The recedinghorizon approach, of Eq. (17), has largely simpliﬁed theproblem to solve, compared to Eq. (17), mainly for tworeasons: i ) being open loop we optimize over a vector ofdimension M rather than the inﬁnite dimension policy π ,and ii ) by limiting the learning horizon to one step we onlyneed to compute the expectation over a single observation D . In the next subsection we describe how to use ap-proximate inference methods in order to ﬁnd a convenientexpression of the belief p t ( b ) , which is necessary for thecomputation of the risk. We then deepen into the details of an approximate solution of the problem in Eq. (17) for largenetwork systems. The probability distribution p t ( b ) represents the belief onthe parameter b given our original prior p ( b ) and allthe observations collected so far {D t } . The distribution p t ( b ) is required to evaluate the risk of different estimates R (ˆ b , p t ( b )) . This operation involves averaging over all thepossible values of b and it is in general computationallyexpensive. There are mainly two possible ways to com-pute expectations under a ﬁxed distribution: deterministicapproximate inference methods and sampling methods. Inthis work we use the deterministic approximate methods asthey are considered faster and require less computations toobtain an approximate result [16]. These methods attempt tominimize a divergence measure between the true posterior, p ( b |{D t } ) , and a second distribution, q ( b ) , which belongsto a ﬁxed and predeﬁned distribution family. The key pointis that computing expectation is much simpler over thetarget distribution than over the original one and can oftenbe done analytically. More speciﬁcally we implement the Ex-pectation Propagation (EP) algorithm [16] on the graphicalmodel of Fig. 2, using as target distribution for the param-eter vector b a fully factorized distribution q ( b ) composedof M univariate lognormal distributions. The reasons forusing this speciﬁc distribution is twofold: i ) the lognormaldistribution has positive support, like the possible valuesof the link capacities, ii ) as required by the EP algorithm itbelongs to the exponential family. Note that considering amultivariate lognormal distribution with correlated compo-nents would certainly lead to a more accurate approxima-tion of the true distribution, but it also has a higher storageand computational cost (the storage cost of the second ordermoments of the M univariate distributions grows linearlywith M , whereas, for the multivariate distribution it growsquadratically with M ).The EP algorithm with a fully factorized target distri-bution is similar to the loopy belief propagation [17] algo-rithm and basically corresponds to an iterative distributedalgorithm where information is exchanged among adjacentfactor nodes and variable nodes on the factor graph. Thecommon intuition behind belief propagation algorithms isthe following. Each factor node represents a bond among the N neighbor variable nodes in the form of a function of the N variables. For a given factor node, when the value of the N − adjacent variable nodes is set, the factor node functioncan be used to produce an opinion (belief) on the value of theleft out variable node. The overall belief on a variable nodecan then be obtained by combining all the beliefs from allits adjacent factor nodes. A more technical and thoroughdescription of the belief propagation algorithm and the EPalgorithm goes beyond the scope of this work, we referthe interested reader to the following works for furtherreading [16], [18], [19]. Moreover, for a speciﬁc descriptionof the implementation on the factor graph of Fig. 13, werefer the reader to Appendix A.As we consider a fully factorized target distribution, webasically associate to each variable node of the factor graph,depicted in Fig. 13, a univariate distribution with tunable parameters. For the z nodes, this distribution is simply aBernoulli distribution; whereas for the b nodes as men-tioned earlier, the associated distribution is the lognormaldistribution. The iterative exchange of information amongthe nodes changes the parameters of these distributions tillthey converge to the value that minimizes a divergencemeasure with respect to the true distribution p t ( b ) . Afterconvergence, the outcome for the b parameter is a setof M univariate lognormal distributions that resemble thetrue belief p t ( b ) . At this point instead of computing therisk using the highly complex distribution p t ( b ) we canuse the approximate distribution q ( b ) , which, being of asimple form, allows for a closed form expression of the risk R (ˆ b , q ( b )) . We now focus on the solution of problem of Eq. (17).Considering the quadratic loss introduced in Subsection 3.3and the factorized approximate belief q ( b ) , we can expressthe risk R (ˆ b , q ( b )) in a simple form: R (ˆ b , q ( b )) = (cid:90) || ˆ b − b || q ( b ) d b = M (cid:88) m =1 (cid:90) (ˆ b m − b m ) q ( b m ) db m = M (cid:88) m =1 (cid:16) (ˆ b m − µ m ) + σ m (cid:17) , (18)where ( µ m , σ m ) represents the mean and the variance of thedistribution q ( b m ) . The last expression of Eq. (18) consists ofa sum of M parts, each corresponding to a single networklink. Each part is composed by a sum of two terms: thesquare of the deviation from the mean of q ( b m ) plus thevariance of q ( b m ) . If we substitute Eq. (18) in Eq. (17) weobtain an objective function composed by two parts, thesquared l distance of ˆ b from the mean of the current belief(immediate risk) plus a term that depends on the futurebelief (future risk). Considering a posterior distributioncomputed by applying the EP algorithm and using thecurrent q ( b ) as the true prior distribution, the second partof the objective function in Eq. (17) corresponds the sum ofthe variances of the future belief: E p ( D| ˆ b ) [ R (cid:63) ( p ( b |D ))] (cid:39) E p ( D| ˆ b ) [ R (cid:63) ( q ( b ))]= E p ( D| ˆ b ) (cid:34)(cid:88) m σ (cid:48) m (cid:35) , (19)where σ (cid:48) m denotes the posterior variance for parameter b m ,and we leveraged the fact that the risk is minimized when ˆ b m = µ m . The computation of the future posterior varianceafter observing data D can be done by simply running loopybelief propagation, similarly to Subsection 4.2. However,the difﬁculties reside in taking the expectation over all thepossible observations D . This operation is complicated fortwo reasons: i ) the number of possible combinations of thevector v grows exponentially with the number of users N , ii ) we do not actually assume to have a generativemodel for the observations v . As a workaround for thesetwo impediments, we propose to adopt a mean ﬁeld ap-proximation of the network, and to consider a worst case scenario for the observations v . The main motivation behindthis approach is the following. Finding the true optimalconstraint vector ˆ b is extremely complicated. Even for theapproximate receding horizon problem of Eq. (17), it wouldinvolve a large amount of computations, which are difﬁcultto handle for large M and N . Therefore we ask whether wecan optimize each entry of ˆ b t independently by consideringa mean interaction of all the network links.As mean ﬁeld approximation we consider a single net-work link with a lognormal distribution q (˜ b ) with mean andvariance equal to (˜ µ, ˜ σ ) . We can write down the mean ﬁeldversion of Eq. (17) as:minimize ˆ b (ˆ b − ˜ µ ) + γ − γ E p ( ˜ D| ˆ b ) (cid:2) ˜ σ (cid:48) (cid:3) . (20)The above equation simply represents the optimal learn-ing formulation for a single mean ﬁeld link. The quantity ˜ D = (˜ y, ˜ v ) represents the observed data related to themean ﬁeld link. We now need to deﬁne an approximaterelation between the parameter ˆ b and the future link rate ˜ y .The constraint vector ˆ b is used by the inner loop algorithmresponsible for solving the classical NUM problem. Becauseof the shape of the utility functions, the users strive toutilize the resources as much as possible tending to makethe constraints tight. As a result, we can consider, as a ﬁrstapproximation, ˜ y = ˆ b .We now need to ﬁnd an approximate expression forhow the posterior variance of the mean ﬁeld link ˜ σ (cid:48) varieswith respect to D . In order to proceed we need to considerseparately the cases where the mean ﬁeld link triggers acongestion, ˜ z = 1 , and when it does not, ˜ z = 0 . case ˜ z = 1 When the mean ﬁeld link triggers a congestion event weknow for sure that there must be at least one observation ˜ v = 1 (see Subsection 3.1). This observation corresponds toa factor node in the factor graph that connects the mean ﬁeldlink to other network links, see Fig. 5. If the average routelength of the network is L R , then the factor node is con-nected on average to the mean ﬁeld link plus other L R − links. In this case the posterior variance of ˜ b , denoted by ˜ σ (cid:48) ,corresponds to the variance of the following distribution: p (cid:48) (˜ b ) = (cid:18)(cid:16) − (cid:89) l p B (cid:39) , with the mean ﬁeld linkbelonging to class A, and the other links composing theroute belonging to class B, then Eq. (21) can actually lead todrastically change the mean ﬁeld posterior belief. By usingthe class notation Eq. (21) becomes p (cid:48) (˜ b ) = (cid:16) (cid:16) − p L R − B (cid:17) p (˜ z = 0 | ˜ y A , ˜ b )+ p (˜ z = 1 | ˜ y A , ˜ b ) (cid:17) q (˜ b ) /Z norm , (25)In the worst case scenario of a single user feedback v forthe mean ﬁeld link, the probability of having such route isequal to (1 − α ) L R − . In all the other cases, i.e., when there isanother link of the route that belongs to class A, we considerto have a posterior distribution with the same variance ofthe current belief. case ˜ z = 0 The second case corresponds to the event where the meanﬁeld link does not trigger a congestion event. In this scenariothe belief variance can be reduced only if none of theusers employing the link observes a congestion event, seeSubsection 3.1. In this case we can consider as worst casescenario the condition where a user observes no congestiononly when none of the employed links is congested. Theprobability for an uncongested mean ﬁeld link to have nousers with ˜ v = 1 is then equal to: p α ( L C − A p (1 − α )( L C − B , (26)where L C is the average number of links that share at leastone user with the mean ﬁeld link, see Fig. 5. In mathematicalterms L C is the average number of non-zero entries for therows of the matrix AA T . In this case the new variance isequal to the variance of the following distribution: p (cid:48) (˜ b ) = p (˜ z = 0 | ˜ y A , ˜ b ) q (˜ b ) /Z norm . (27)Finally, we need to consider the same analysis when themean ﬁeld link belongs to class B. It is easy to see that inthis case due to the shape of the likelihood function and theassumption p B (cid:39) , the belief variance would basically re-main constant even if we could observe the hidden variable ˜ z . As a result we assume a worst case scenario where theexpected posterior variance for the links of class B remainsconstant. ( p B1 p B1 p B1 p B1 p B1 p B1 (1 , p A1 ( L C L R ( (1 p L R , (˜ µ, ˜ ) (˜ µ, ˜ ) q (˜ b ) q (˜ b )˜ b ˜ b ˜ z ˜ z ˜ v = 1 z B z A z B z B z B z B z B ˜ v = 0all z = 0 p (˜ z | ˜ y A , ˜ b ) p (˜ z | ˜ y A , ˜ b ) Fig. 5. On the left, factor graph for the case of congestion event signalobserved from the users. The information ﬂowing upwards to the vari-able node b of the mean ﬁeld link affects the variance of the posteriorbelief. On the right, worst case scenario for observing no congestionsignals from the users of one network link. We request in this case thatall the links connected to the mean ﬁeld through all of its users to befeasible. Combining the two scenarios we can write down anapproximate expression for the expected future variance ofthe mean ﬁeld model: E [˜ σ (cid:48) ] =(1 − α )˜ σ + α (cid:18) p A (cid:16) (1 − α ) L R − (˜ σ (cid:48) A1 − ˜ σ ) + ˜ σ (cid:17) + p A (cid:16) p α ( L C − A p (1 − α )( L C − B (˜ σ (cid:48) A0 − ˜ σ ) + ˜ σ (cid:17)(cid:19) , (28)where ˜ σ (cid:48) A1 and ˜ σ (cid:48) A0 correspond to the variance of the distri-bution in Eq. (25) and Eq. (27) respectively. In Eq. (28) weconsider that the only scenarios where the belief variancechanges is when a link belongs to class A and the aforemen-tioned worst case condition are observed. In all the otherscenarios the belief variance does not change. Note that,since we lack of a generative model for v , the conductedanalysis does not aim at producing an accurate model forthe expected posterior variance, but rather at modelingsome situations where we expect to have a considerablevariance reduction of the belief on ˜ b .The developed model can now be used in order tominimize the mean ﬁeld objective function of Eq. (20). Asindependent variables of our model we consider α , p A and p B , which are all deﬁned over the interval [0 , . α hererepresents the ratio of links that belong to class A. Note thatsince p ( z = 1 | y = ˆ b ) is monotonically increasing, it can beinverted, and we can ﬁnd a map from p A to ˆ b A , which is denoted by ˆ b A ( p A ) . The ﬁnal problem then becomes:minimize p A ,p B ,α α (ˆ b A ( p A ) − ˜ µ ) + (1 − α )(ˆ b B ( p B ) − ˜ µ ) + ˜ σ + γ − γ E [˜ σ (cid:48) ]( p A , p B , α ) . (29)All the terms in Eq. (29) are well deﬁned and since theoptimization problem is non-convex, but it involves onlythree variables, we can simply solve it by performing a gridsearch.In Fig. 6 we show how the expected posterior varianceand the value of the objective function vary as a functionof the optimization variables. More speciﬁcally we show theapproximate expected posterior variance as a function of α and p A . Empirically we observed that the minimizer isusually located at α (cid:39) /L R . This is intuitively reasonable,as, in order to reduce the posterior variance in the case oflink congestion, we need to have routes that are composedby one link of class A and all the other links of class B.Having one link every L R that belongs to class A increasesthe chances of observing such routes. The minimizer withrespect to p A has instead a more complex dependencywith respect to the other variables and parameters of themodel, however, we observed empirically that it is usuallylocated between . and . making rather uncertain thepossibility of observing a congestion. The p B parameterinstead minimizes the future variance when is set to 0. Infact, in order to increase the chances to reduce the varianceof the links of class A p B should be as low as possible.However, a low p B increases the deviation from the meanbelief and consequently the immediate risk for the links ofclass B. The optimal learning formulation allows to easilytake care of this tradeoff. To this extent we can observein Fig. 6 that the minimizer of the objective function withrespect to p B is usually located around . − . . Note thatthe optimal value of p B is consistent with the assumption p B (cid:39) made in the analysis conducted above. If this con-dition was not true at the optimal point, then we could nothave used the approximate mean ﬁeld variance predictiondeveloped in this section. Finally, we noticed that as longas the sigmoid function of Eq. (4) is sharp with respect tothe mean ﬁeld belief q (˜ b ) , i.e., large variance of q (˜ b ) , thenthe optimal parameters ( p A , p B , α ) are not sensitive to thevalue of ˜ σ . As a result, as long as we are uncertain about thecapacity values, we are not forced to re-optimize the meanﬁeld parameters every time the belief changes.In the next section we summarize the complete algo-rithm proposed in this work and we discuss the key pointsof its implementation. VERVIEW OF THE C OMPLETE A LGORITHM

The complete set of operations required to run our algo-rithm is reported in Alg. 3. The overall system is composedby two groups of processes, namely the N user and the M link processes (each responsible for an individual net-work link). First, each user n requests the prices λ m ofthe employed links to the different resource processes and .

25 0 .

50 0 . p A . . . . . ↵ a) .

25 0 .

50 0 . p A . . . . . p B b) Fig. 6. a ) Numerical evaluation of the posterior expected variance of themean ﬁeld model deﬁned in Eq. (28) . The ﬁgure shows how this quantitychanges as a function of p A and α with a ﬁxed p B = 0 . . b ) Numericalevaluation of the objective function of the mean ﬁeld model deﬁned inEq. (29) . The ﬁgure shows how this quantity varies as a function of p A and p B ( γ = 0 . ). For both ﬁgures, the other involved parameters havebeen set to L R = 4 , L C = 20 , and ˜ σ = 0 . , respectively. computes the sending rate according to Eq. (3a) (lines 3-4). The users then forward the rate x tn and the feasibilitysignal v tn to all the process of the employed links (line 5).At this point the resource processes compute independentlythe link rate y tm simply by adding up all the users ratesand update the price using Eq. (3b) (lines 7-8). The nextoperation consists in updating the dataset {D t } with thenew observed datapoint (line 9). In order to limit the size ofthe dataset we subsample the points that are included, andwe set a maximum number of stored points (we discard theolder points when the maximum size is reached). In thisway we can reduce the number of operation required by theEP algorithm. A more detailed description of the updatingoperation can be found in Appendix A.When the inner loop has converged, we recompute thevalue of the constraint vector ˆ b (line 10). Note that we canestablish the convergence of the inner loop algorithm bymonitoring the variation of the vectors λ t and y t . Alterna-tively we can simply update the vector ˆ b on a ﬁxed timebasis. The next operation corresponds to running the EPalgorithm in a distributed way among the M link processesusing the data available in the dataset (line 11). Once thebelief is updated we use the results from the mean ﬁeldanalysis to set ˆ b . In order to execute this step we need toknow the optimal values for ( α, p A , p B ) . This operation caneither be computed by each link process independently, orit can be computed by one process and forwarded to the allthe link processes. Since we consider the network topologyand the sigmoid function characterizing the probability ofa link congestion to not change in time, we can computethe optimal values in advance using the information on thenetwork topology and the initial prior on the network links,then hardcode the optimal values of ( α, p A , p B ) in thelink processes. As shown in the results section this simplestrategy is sufﬁcient to provide satisfying performance ofthe proposed algorithm. Since we know that links belongingto class A are the ones that are likely to reduce their variance,instead to assign randomly at each step t αM links to classA and (1 − α ) M to class B, we assign to class A the αM Algorithm 1

Complete algorithm loop for each user n do collect λ tm ∀ m ∈ n compute and apply rate x t ( a T n λ t ) Forward v tn and x tn ∀ m ∈ n for each link m do λ mt +1 = max(0 , λ mt + (cid:15) ( y tm − ˆ b tm )) ˆ b t +1 m = ˆ b tm update dataset {D t } if inner loop converged then compute q t ( b ) using EP for each link m do if λ tm = 0 then σ m = 0 assign to class A M α links with largest σ m assign to class B other links for each link m do ﬁnd ˆ b m : p ( z (cid:48) = 1 | ˆ b m ) = p class m ˆ b t +1 m = ˆ b m links with the largest variance (line 15-16). Before doingthis operation we set the variance of the links that wereunderutilized in the previous time step to zero (lines 12-14). The intuition is that if links are underutilized, then itis useless to reduce the uncertainty on their capacity valueas they are associated to loose constraints, and the optimalrate allocation does not depend on them. Finally, after theclass assignment each link process, given its current beliefcomputes the value of ˆ b m that matches the probability totrigger a congestion equal to the value of its class. Notethat the class assignment operation can be done in a fullydistributed way by using a consensus algorithm. Moredetails on its implementation can be found in Appendix B.The operation, listed in Alg. 3 are executed continuously.The belief on b is expected after some steps to converge,shrinking the probability density around some value of b ,that should match the true value of the link capacities.We now brieﬂy discuss at a high-level the complexity ofthe algorithm in terms of communication and storage cost.The inner loop of the proposed system corresponds to aclassical NUM algorithm: at each step t each user commu-nicates with the resource processes of the links composingthe route to collect the link prices and forward the sendingrate and congestion signal. Considering an average routemade of L R links, this operation involves a transmission of O ( N L R ) messages. The execution of the EP algorithm is themost expensive part in terms of communication and storagerequirements of the entire algorithm. Each link process m has to store, for each episode t in the dataset, the route of theusers who employed link m and detected a congestion event v n = 1 (plus the link rate y m ). Therefore, if we denote by N link the average number of users per link, the storage costis O ( N link L R ) for each element in the dataset and for eachlink process. N link L R basically corresponds to the averagenumber of factor nodes ψ n connected to each variable node z tm in the factor graph of Fig. 13. This means that, in termsof communication requirements, one complete update of the incoming messages for all the variable nodes in the factorgraph requires an exchange of O ( N link L R ) messages foreach link process and for each episode in the dataset. EP ona factor graph with loops has to be executed multiple timeson the entire dataset before convergence. Unfortunately, it isnot easy to quantify the number of iterations required by theEP algorithm to converge. In our implementation we stopthe EP update after ten iterations, since empirically we haveobserved that they are usually sufﬁcient to provide a goodbelief estimate. Considering ﬁxed the amount of reﬁnementsrequired by the EP algorithm, the overall cost, in termsof communication and storage, at time t is O ( N link L R t ) for each process M . The complexity grows linearly withtime because the dataset is obviously growing. In order toavoid this we limit the size of the dataset to the last S D observations. The key point in the complexity analysis is thedependency of L R with respect to N link . The answer to thisquestion however depends on the network topology. Forinstance, if the network belongs to the class of small-worldnetworks, and user routes correspond to the shortest routesbetween the source and destination nodes, then L R growslogarithmically with respect to the number of total networknodes, making the overall complexity of the algorithm moretractable. Finally, the mean ﬁeld optimization is basicallyindependent of the numbers of links and users. The classassignment task uses a simple method derived from classi-cal consensus average algorithms [20], and simply involvesexchange of local messages among the M link processes.This concludes the discussion on the implementation ofthe proposed algorithm, in the next section we show anddiscuss the performance of the proposed method. IMULATION R ESULTS

In order to test the proposed method we generate somerandom networks with different numbers of links andusers. We assign to each user a log-shaped utility function u n ( x ) = w n log( x ) , where w n is a random positive param-eter, and a random route connecting two (non adjacent)nodes of the network. After the routing matrix is set, wesample the prior distribution of the link capacities in orderto set their true value. We consider that each link has alognormal distribution with mean and standard deviationequal to ( µ m , σ m ) . We set µ m = 2 . N m where N m is equalto the number of users using the link and σ m = 0 . N m .Regarding the parameters of the sigmoid function of Eq. (4)we ﬁx ρ = 0 . and κ = 5((1 − ρ ) b m ) − . In this case we havethat when y m = 0 . b m there is a 0.5 probability for link m to trigger a congestion event, whereas when y m = b m theprobability goes up to about . . We optimize the values of ( p A , p B , α ) at the beginning of each simulation using the L R and L C of the network in use, the average mean andvariance of the prior distribution of the different links, andthe parameters κ and ρ deﬁned above. In order to generatethe observation vector v at each step t , we ﬁrst generate thelink congestion vector z using Eq. (4) and then we randomlypick a value of v among the ones that are consistent with themodel speciﬁed in Subsection 3.1.In the ﬁrst test we run the algorithm on a network with links and users and a discount factor of γ = 0 . ,Fig. 7 shows the evolution of some quantities involved in t (link 1)100200 0 250 500 750 1000 t (link 2)50100150200250 0 250 500 750 1000 t (link 4)200400600 True b m valueˆ b tm µ tm y tm

95 % CI0 250 500 750 1000 t (link 3)1002003000 250 500 750 1000 t (link 5)50100150 0 250 500 750 1000 t (link 6)50100150 0 250 500 750 1000 t (link 7)50100150 0 250 500 750 1000 t (link 8)501001502000 250 500 750 1000 t (link 9)50100150 0 250 500 750 1000 t (link 10)50100150200 0 250 500 750 1000 t (link 11)50100150200 0 250 500 750 1000 t (link 12)100200 Fig. 7. Evolution of the most important quantities employed by the proposed algorithm for all the M links. Each subplot shows: the true b m value,the evolution of the mean value of the lognormal belief of b m , the % Conﬁdence Interval (CI) of the belief, the value of the decisions ˆ b t , andﬁnally, the users sending rate through link m y m . the algorithm operation. The plots show the evolution ofthe mean value of the lognormal belief of q t ( b m ) , the %Conﬁdence Interval (CI) of the belief, the value of ˆ b t , andﬁnally, the sum of the users sending rates for link m , y m . Theselection of ˆ b t is consistent with the results of the mean ﬁeldapproximation. For example, between t = 0 and t = 200 ,the value of ˆ b (which has a large variance) is set to a valuethat is slightly larger than the mean value, whereas for link and b is set to a value below the mean of their belief.We can verify our assumption about the approximation ˆ b = y . As it can be seen, the requested link rate y m tends toconverge to the value of the parameter ˆ b m . When it does notreach the value ˆ b m it is because the link capacity is too largeand the link is underutilized at equilibrium. The assumption ˆ b = y holds for the remaining active links. Concerning theunderutilized links, we can observe that, since the rate y m reached by the inner loop is much lower than the meanvalue of the capacity belief, the uncertainty of the constraintcannot be reduced and it remains rather high. However, inthis case, there is no need to reduce the variance of thesecapacities as they do not affect the optimal rate allocation.For the active links, the algorithm continues to sample atvalues of y m close to the mean value of the belief. As aresult, the variance sequentially shrinks and the mean valueapproaches the true value of b m .In order to evaluate the performance of the rate alloca-tion algorithm we compare it to a greedy algorithm that t e li n k s [ % ] greedy policyoptimal policy γ = 0 . γ = 0 . γ = 0 . Fig. 8. Evolution of the mean absolute percentage error between µ tm andthe true vector b true m . The plot shows the average and standard deviationof the metric among ten different runs. simply minimizes the immediate risk (i.e., γ = 0 ) andmatches the value of ˆ b with the mean value of the currentavailable belief. The greedy algorithm does not seek forvalues of ˆ b that are expected to reduce the future beliefuncertainty, and resembles a passive method that does notperform any active learning. For our algorithm we set the t e u s e r s [ % ] greedy policyoptimal policy γ = 0 . γ = 0 . γ = 0 . Fig. 9. Evolution of the mean absolute percentage error between therates x tm used and the optimal rates x (cid:63)n . The plot shows the averageand standard deviation of the metric among ten different runs. values of the future risk discount γ to . , . and . .We use a topology with links and users. We draw tendifferent sets of samples of b from the prior distributionand ten different sets of values for the parameters w ofthe utility functions. We then run the proposed algorithmwith different γ values and the greedy method on thesame ten different random settings. In Fig. 8 we plot theevolution of the average value over the different runs, andthe standard deviation, of the mean absolute percentageerror for the capacity values. More speciﬁcally the metricused corresponds to: e links = 100% (cid:88) m | µ m − b true m | M b true m (30)where µ m represent the mean of the current belief. Thegreedy algorithm is actually able to reduce the parametersuncertainty by a larger value in the very ﬁrst stages but thenit struggles to further reduce the uncertainty which remainsconstant around throughout the entire simulation. Run-ning the system with γ (cid:39) instead leads to lower values forthe mean absolute percentage error, between and for the different values of γ . The error decreases slowly atthe early stages, however the reduction is more persistentand eventually achieves much lower mean error. Moreoverwe can see that larger values of γ lead to lower long termerror. Due to the existence of underutilized links we cannotexpect the error to decrease to zero. In fact for these linksthe average error is expected to remain large since the beliefvariance does not decrease.Another metric that can be used to evaluate the algo-rithm is the evolution of the mean absolute percentage errorof the users’ sending rate with respect to the optimal ones: e users = 100% (cid:88) n | x n − x (cid:63)n | N x (cid:63)n (31)This quantity is not sensitive to potential underutilized linksand directly measures how the users’ rates are close tothe optimal ones. The evolution of this metric is shown inFig. 9. As for the previous ﬁgure, we show the mean and

24 36 48 60 72 M e li n k s [ % ] greedy policyoptimal policy γ = 0 . γ = 0 . γ = 0 . Fig. 10. Mean absolute percentage error between µ tm and the truevector b true m for networks with a different number of links M . The plotshows the average and standard deviation of the metric among tendifferent runs. standard deviation among ten different runs. In order toreduce the future risk our algorithm sacriﬁces the immediateone, achieving a larger error in the ﬁrst steps with respectto the greedy strategy. However, in the ﬁnal steps, when thecapacity beliefs are more accurate, the users’ rates tend to becloser to the optimal ones and our method, for γ = 0 . ,settles around e users (cid:39) . The greedy algorithm instead isnot able to signiﬁcantly reduce the error for the future steps,and settles to an average percentage error of .We further evaluate the performance of the algorithmfor different network sizes. We generate ﬁve differentnetworks, with M = [24 , , , , links and N =[400 , , , , users. For each network we drawten different sets of values for b and w . Though the networksize differs for the different simulations we generate theusers routes in order to have an average route length of links for each topology. We run the simulation for thegreedy policy and for γ equal to . , . and . .As for the previous tests we compute the mean absolutepercentage error of the capacity estimates and the users’rates. We then plot the mean and standard deviation forthe ten different runs after iterations. The results aredepicted in Fig. 10 and Fig. 11. The best performance isachieved with γ = 0 . , with e users between − . thegreedy strategy achieves the worst performance for all thedifferent network sizes, with about − of error withrespect to the optimal user rate. The results show that theperformance of the algorithm is not strongly correlated withthe network size. The performance discrepancies amongthe different sizes are likely due to the different randomrealizations of the network topology.In the ﬁnal simulation we investigate how the averagelength of the user routes affects the performance of theproposed method. We use a network topology with M = 48 and we generate ten different realizations of the user popu-lations for different values of the average route length, with L R = [3 , , . We then compute the mean and standard de-viation of the same metrics used in the previous experiments

24 36 48 60 72 M e u s e r s [ % ] greedy policyoptimal policy γ = 0 . γ = 0 . γ = 0 . Fig. 11. Mean absolute percentage error between the rates x tm usedand the optimal rates x (cid:63)n for networks with a different number of links M . The plot shows the average and standard deviation of the metricamong ten different runs. L R e li n k s [ % ] a) L R e u s e r s [ % ] b) greedy policyoptimal policy γ = 0 . Fig. 12. a ) Mean absolute percentage error between µ tm and the truevector b true m for different average route length L R . b ) Mean absolutepercentage error between the rates x tm used and the optimal rates x (cid:63)n for different average route length L R The plot shows the average andstandard deviation of the metric among ten different runs.. for the different scenarios. The results are shown in Fig. 12.The results show that, when the average route length in-creases the system performance decrease for both the greedypolicy and the proposed foresighted method. However theproposed method always outperforms the greedy policy.This is somewhat expected since longer routes make it morechallenging for the inference method to correctly estimatethe value of the latent variables z . As a consequence, thevalue of the link capacities is also harder to infer. ONCLUSIONS

In this work we consider a speciﬁc instance of the NUMproblem where the amount of the network resources isunknown and the private congestion signals of the userscannot be used to directly achieve the optimal rate allo-cation. The congestion signals, however, can be combinedto infer the amount of available resources. We design adistributed overlay rate allocation method where the users communicate with other helper processes, one for eachnetwork link, in order to achieve the optimal rate allocation.The proposed solution method consists in decomposing theoriginal problem into two subproblems: one subproblemcorresponds to the classical NUM problem, the second sub-problem instead corresponds to the design of an adaptivecontroller for the classical NUM algorithm that infers theamount of available resources. Using an optimal learningformulation we are able to balance the exploration versusexploitation tradeoff that arises in the design of the adaptivecontroller and guarantee good performance of the system inthe long run. As shown by the conducted evaluation, suchperformance cannot be attained using a greedy strategy.We believe that the analyzed problem and proposedframework, though being in its early stages, could be ofgreat interest for the future computer networks. Considerfor instance the Internet: the challenges and requirementsof the platforms using the network evolve faster thanthe underlying infrastructure. In this scenario if the newapplications want to optimize the data transmission theymight have to coordinate the users using only informationfrom the endpoints, since these are the network parts thatare accessible and can be updated more easily. The newapplications might then need to infer what are the actualglobal resources available. Though being extremely chal-lenging and complex, building an overlay allocation methodthat infers the available resources and adapts to differentnetwork conditions, might be the only viable solution if thelower layers, and the infrastructure of the communicationnetwork, cannot be changed. A CKNOWLEDGMENT

This work has been supported by the Swiss National ScienceFoundation under grant CHISTERA FNS 20CH21 151569. A PPENDIX AE XPECTATION P ROPAGATION I MPLEMENTATION

The implemented EP algorithm on the considered factorgraph of Fig. 13 is reported in Alg. 2. We denote by µ and ν the incoming and outgoing messages respectively, ofthe different variable nodes. These messages are possiblyunnormalized probability density (mass) functions deﬁnedover the domain of the variable node. Their subscriptsdenote the variable node and factor node of the message .Since the factor graph, depicted in Fig. 13, has loops, theincoming messages of the variable nodes, z tm and b m , needto be iteratively reﬁned (line 1) till they converge to theirﬁnal value. All the messages are initialized using non-informative distribution (lognormal distribution with inﬁ-nite variance for the b m variables and uniform distributionfor the binary variables z tm ). At each iteration the algo-rithm updates the incoming messages µ b m z tm parallelizingthe operations over M different processes. Each process m computes the incoming message to the z tm from the b m usingthe messages from the t (cid:48) (cid:54) = t observations and the priorinformation (line 4). Then we iterate over the individual

1. To make the notation more clear the messages to and from thefactor node p ( z tm | b m , y m ) are indexed by the variable nodes b m and z tm . factors of Ψ v t () to update the incoming messages of thelatent variables. Each process m computes the outgoingmessage from the z tm node to the factor node ψ n () associatedto any observation n with v tn = 1 (lines 5-7). As next step,each process collects all the outgoing messages ν z tl ψ n for l (cid:54) = m and a ln = 1 and computes the incoming message µ z tm ψ n (lines 8-9): µ z tm ψ n ( r m ) = (cid:88) l (cid:54) = mr l ∈{ , } ψ n ( r ) (cid:89) l (cid:54) = ma ln =1 ν z tl ψ n ( r l ) , (32)where we have emphasized the fact that the messages µ and ν are actually functions. A na¨ıve computation of thisquantity is rather expensive in terms of required operations.However, since ψ n ( r ) is equal to one when at least one r m variable employed by user n is equal to one and zerootherwise, we can compute ψ n ( r ) as the complementaryevent of having all the r m variables equal to zero. Wetherefore obtain: µ z tm ψ n (0) = (cid:89) l (cid:54) = ma ln =1 ( ν z tl ψ n (0) + ν z tl ψ n (1)) − (cid:89) l (cid:54) = ma ln =1 ν z tl ψ n (0) µ z tm ψ n (1) = (cid:89) l (cid:54) = ma ln =1 ( ν z tl ψ n (0) + ν z tl ψ n (1)) (33)if we consider to normalize the outgoing messages ν v tl z tn weobtain the simpler form: µ z tm ψ n (0) = 1 − (cid:89) l (cid:54) = ma ln =1 ν z tl ψ n (0) µ z tm ψ n (1) = 1 . (34)This is a simple operation that consists in the product ofthe outgoing messages associated with the event z tm = 0 for the observation v tn = 1 . The factors ψ m () can easily behandled locally by each link process. Once all the incomingmessages from all v t factors are updated, the incomingmessage µ tb m z m can be computed. The EP algorithm thenaims at minimizing the following divergence in order toobtain the belief for b m : q = arg min q (cid:48) KL ( µ b m z tm (cid:89) t (cid:48) (cid:54) = t ˜ µ b m z t (cid:48) m || q (cid:48) ) , (35)where q ( b m ) represents the lognormal distribution charac-terizing the belief on b m . In order to minimize Eq. (35) wesimply need to match the sufﬁcient statistics between thetwo distributions of the KL divergence. In order to matchthe sufﬁcient statistics we need to perform integration overthe ﬁrst argument of the KL divergence of Eq. (35). As theintegral is a simple one dimensional integral of a well be-haved function any common numerical integration methodcan be used to achieve this task. The message ˜ µ b m z tm whichcorresponds to the lognormal approximation of µ b m z tm isthen set to: ˜ µ b m z tm = q (cid:81) t (cid:48) (cid:54) = t ˜ µ b m z t (cid:48) m . (36) Algorithm 2

Implementation of the EP algorithm. for i < I EPMAX do for each observation to process t do for each link m do update µ z tm b m for each n with v tn = 1 do for each link m with a mn = 1 do compute ν z tm v tn collect ν z tl v tn ∀ l (cid:54) = m, a ln = 1 update µ z tm v tn compute µ b m z tm q = arg min q (cid:48) KL ( µ b m z tm (cid:81) t (cid:48) (cid:54) = t ˜ µ b m z t (cid:48) m || q (cid:48) ) ˜ µ b m z tm = q (cid:81) t (cid:48)(cid:54) = t ˜ µ bmzt (cid:48) m b m z tm p ( z tm | b m ) p ( b m ) tt = t ... µ b m z tm Y t = t ˜ µ b m z tm z tl b l ⌫ z tl n µ z tm n n ( z ) l ( z l ) µ z tl b l Fig. 13. Factor graph of one single observation t showing the messagesbetween the variables nodes exchanges during the EP algorithm execu-tion. Note that all the terms in Eq. (36) are lognormal distribu-tions, which belong to the exponential family, therefore theabove operation simply consists in algebraical manipulationof the distribution parameters. The EP algorithm allows alsoto add a damping factor ( (cid:15) EP ) in the update of the incomingmessages µ (and ˜ µ ) see [19]. This modiﬁcation preventsthe messages from changing drastically with respect totheir previous value; the equilibrium points do not changebut it can improve the convergence of the algorithm. Theabove steps are executed for each datapoint in the datasetand for I EPMAX iterations. At this point, the execution stopsand the lognormal distribution q ( b ) deﬁned by the mostupdated values of the messages ˜ µ b m z tm approximates thetrue posterior p ( b |{D t } ) . A PPENDIX BD ETAILED A LGORITHM I MPLEMENTATION

The detailed complete set of operations required to runour algorithm is reported in Alg. 3. The overall system iscomposed by two groups of processes, namely the N users Algorithm 3

Complete algorithm loop for each user n do collect λ tm ∀ m ∈ n compute and apply rate x t ( a T n λ t ) Forward v tn and x tn ∀ m ∈ n for each link m do y tm = a T m x t λ mt +1 = max(0 , λ mt + (cid:15) ( y tm − ˆ b tm )) ˆ b t +1 m = ˆ b tm if modulo ( t, T S ) = 0 then for each link m do update dataset {D t } with point ( y tm , v tn ) if | λ t +1 m − λ tm | ≤ V λ ∀ m then compute q t ( b ) using EP for each link m do if λ t = 0 then σ = 0 Initialize ˆ r i = 0 , r i = 0 , i = 0 while | h im − h i − m | ≤ V h ∀ m do for each link m do r im = arg max r (cid:48) ∈{ , } r (cid:48) ( σ m − h im ) ˆ r i +1 m = 1 − l T m ˆ r i + ∆ r im h i +1 m = (cid:0) − l T m h i + δ h (ˆ r im − α ) (cid:1) + p class m = r im p A + (1 − r im ) p B for each link m do ﬁnd ˆ b m : p ( z (cid:48) = 1 | ˆ b m ) = p class m ˆ b t +1 m = ˆ b m and the M link processes (each responsible for an individualnetwork link). As described in the main manuscript lines(3-8) correspond to the execution of the classical NUMalgorithm. The next operations are instead speciﬁc to oursystem.The next step corresponds to updating the dataset {D t } with observations { y t , v t } . Though our design optimizesthe choice of ˆ b t by taking into account the expected poste-rior variance when observing exclusively the point { y t , v t } after the inner loop has converged, we could as well usethe information collected during the inner loop dynamics.However, points that share similar link rates y t provideredundant information to the inference method, moreovera higher number of points increases also the computationcost of the running the EP algorithm. In order to have atradeoff between the two cases we adopt a simple heuristicthat consists in adding one observations to the dataset every T S time steps (lines 10-12). Moreover in order to limit theamount of computations of the EP algorithm we limit themaximum size of the dataset to S D . When the maximumsize is reached we discard the older points to make space forthe new ones. The most recent values of the approximatedincoming messages ˜ µ b m z tm for the discarded points are how-ever embedded in the prior belief. When the inner loop hasconverged (line 13) we execute the distributed EP algorithm,see Alg. 2, on our graphical model using the current datasetand obtain the new ﬁtted distribution q t ( b ) (line 14).At this point we need to compute the new value of ˆ b to use in the next iterations. In order to set the value of ˆ b according to the mean ﬁeld approximation, we need arule to assign each link either to class A or to class B. Oursolution consists in assigning to class A the links with thelarger variance. According to our model, the links of class Aare actually the ones expected to reduce more signiﬁcantlytheir belief variance. Before assigning the links to the dif-ferent classes it is however beneﬁcial to modify the currentvariances σ m for the links that are actually underutilizedas their values do not affect the rate allocation. We set thevariance σ m of these links to zero, basically forcing themto belong to class B (line 15-17). We design a distributedalgorithm for the selection of the αM largest links with thelargest variance σ m among the M link processes. We canformulate the problem as an optimization problem:maximize r r T σ subject to T r ≤ αM r ∈ { , } M , (38)where r is a binary vector representing the class member-ship of link m : if r m = 0 then the link m belongs to classB and to class A otherwise. For simplicity we assume herethat αM ∈ N and that the solution to problem in Eq. (38)is unique. If these conditions hold we can solve exactlythe above problem using a dual method, which allows todecompose the solution method among the M processes.We consider the Lagrange relaxation of Eq. (38):maximize r ∈{ , } M r T σ − h (cid:16) T r − αM (cid:17) , (39)where h denotes the Lagrange multiplier of the inequalityconstraint. Given the value of the dual variable h eachprocess can compute independently the value of r m . Thisoperation is extremely simple, basically r m = 1 if σ m > h and zero otherwise. In order to solve the dual problemthe variable h can be iteratively updated using a gradientdescent method: h (cid:48) = (cid:0) h + δ h (ˆ r − α ) (cid:1) + , (40)where ˆ r represents the mean value of the elements of r and δ h is a positive parameter that controls the step length.Unfortunately this update operation requires a central entitywhich knows the value of ˆ r and forwards the dual variable h to all the M link processes. Ideally we prefer to havean algorithm that is completely distributed, and does notrequire a central entity, therefore we modify the iterationin the following way. We ﬁrst create M copies of the dualvariable h and M estimates of the mean value ˆ r , one for eachlink process. Each process then, iteratively executes Eq. (40)using its local copies and runs in parallel a consensusalgorithm on both variables [20]. In our case we need tosolve a dynamic average consensus problem, which meansthat the nodes have to agree on the average value of a M di-mensional input signal that is varying over time (in our casethe M -dimensional signals are the dual variables h and theestimates ˆ r ). In order to run a consensus algorithm amongthe M link processes we deﬁne a connected undirectedgraph G = ( V , E ) , where the nodes V correspond to the M processes and the edges E denote the pairs of processesthat exchange messages in the class assignment method. We  r i = arg max r (cid:48) ∈{ , } M r (cid:48) T ( σ − h i ) (cid:20) ˆ r i +1 h i +1 (cid:21) = (cid:20) I − L 0 δ ih I I − L (cid:21) (cid:20) ˆ r i h i (cid:21) − (cid:20) I 00 − δ ih (cid:21) (cid:2) ∆ r i | α (cid:3) h i +1 = ( h i +1 ) + (37a)(37b)(37c)deﬁne by L the normalized Laplacian matrix of the graph G . Many works focused on how to optimize communicationamong the agents in order to speed up the convergenceof average consensus algorithms (see [21]) in this work,however, we simply assume that the M links processesdeﬁne a random connected communication network.The iterative steps of the class assignment algorithm aresummarized in Eq. (37), where ∆ r = r i − r i − denotes thevariation of vector r with respect to previous iteration, and I denotes the identity matrix. Each iteration is composedby three steps. First each process selects the optimal valueof r m according to the local copy of the dual variable h im .The variables ˆ r are then updated according to the followingdynamics ˆ r i +1 = ( I − L )ˆ r i + ∆ r i . Considering null initialconditions, we have that, under a steady input r the vector ˆ r converges to (cid:80) m r m /M (similar dynamic consensusmethods have been proposed in [22], [23]). Note that, at eachstep ,each link process has to communicate with the otherprocesses that correspond to its neighbors on the graph G inorder to collect the values of ˆ r il . At the same time, the dualvariables h evolve according to h i +1 = ( I − L ) h i + δ (ˆ r i − I α ) .The operations are basically the same as before, except tahteach link process integrates the difference between the targetratio α and the local estimate of the mean value of r . The useof the Laplacian matrix L in this case assures that possiblediscrepancies among the entries of the vector h are damped,making all the link processes to agree on the same value ofthe dual variable. Finally, the variables h are constrained tobe non-negative to be consistent with to Eq. (40). Executingiteratively Eq. (37) the link processes agree on the classassignment, and the links that have a largest uncertaintyare assigned to class A (lines 18-24). At this point each linkprocess computes the value of ˆ b m that makes p ( z m = 1 | ˆ b m ) equal to the probability of the link class (lines 25-26).Finally, all the parameters used in our simulations forAlg. 2 and Alg. 3 are given in Table 1. TABLE 1Parameters settings

Parameter Value Parameter Value (cid:15) m . / ˆ b m T S V λ − V h − δ ih . / √ i γ . − . I EPMAX (cid:15) EP . S D R EFERENCES [1] F. P. Kelly, A. K. Maulloo, and D. K. Tan, “Rate control forcommunication networks: shadow prices, proportional fairness and stability,”

Journal of the Operational Research society , vol. 49,no. 3, 1998.[2] D. P. Palomar and M. Chiang, “A tutorial on decomposition meth-ods for network utility maximization,”

IEEE Journal on SelectedAreas in Communications , vol. 24, no. 8, 2006.[3] M. Chiang, S. H. Low, A. R. Calderbank, and J. C. Doyle, “Layeringas optimization decomposition: A mathematical theory of networkarchitectures,”

Proceedings of the IEEE , vol. 95, no. 1, 2007.[4] T. Stockhammer, “Dynamic adaptive streaming over http –: Stan-dards and design principles,” in

Conference on Multimedia Systems .ACM, 2011.[5] L. De Cicco and S. Mascolo, “An adaptive video streaming con-trol system: Modeling, validation, and performance evaluation,”

IEEE/ACM Transactions on Networking , vol. 22, no. 2, Apr. 2014.[6] D. Stefano, L. Toni, and P. Frossard, “Price-based controller forutility-aware http adaptive streaming,”

IEEE MultiMedia , no. 99,2017.[7] S. Petrangeli, J. Famaey, M. Claeys, S. Latr´e, and F. De Turck, “Qoe-driven rate adaptation heuristic for fair adaptive video stream-ing,”

ACM Transactions Multimedia Computing, Communications andApplications , vol. 12, no. 2, 2015.[8] W. B. Powell and I. O. Ryzhov, “Optimal learning,” 2012.[9] M. Mehyar, D. Spanos, and S. H. Low, “Optimization ﬂow controlwith estimation error,” in

INFOCOM . IEEE, 2004.[10] J. Zhang, D. Zheng, and M. Chiang, “The impact of stochasticnoisy feedback on distributed network utility maximization,”

IEEE Transactions on Information Theory , vol. 54, no. 2, 2008.[11] M. Zargham, A. Ribeiro, and A. Jadbabaie, “Network optimizationunder uncertainty,” in

Annual Conference on Decision and Control .IEEE, 2012.[12] Y. Su and M. Van Der Schaar, “Linearly coupled communicationgames,”

IEEE Transactions on Communications , vol. 59, no. 9, 2011.[13] S. H. Low, F. Paganini, and J. C. Doyle, “Internet congestioncontrol,”

IEEE control systems , vol. 22, no. 1, 2002.[14] A. Beck, A. Nedi, A. Ozdaglar, and M. Teboulle, “An o (1 /k ) gradient method for network resource allocation problems,” IEEETransactions on Control of Network Systems , vol. 1, no. 1, March 2014.[15] K. J. ˚Astr¨om and B. Wittenmark,

Adaptive control . Courier Corpo-ration, 2013.[16] T. Minka, “Expectation propagation for approximate bayesianinference,” in

Conference in Uncertainty in Artiﬁcial Intelligence .Morgan Kaufmann Publishers Inc., 2001.[17] B. J. Frey and D. J. MacKay, “A revolution: Belief propagation ingraphs with cycles,” in

Advances in neural information processingsystems , 1998.[18] C. M. Bishop,

Pattern recognition and machine learning . springer,2006, ch. 8.[19] T. Minka, “Divergence measures and message passing,” MicrosoftResearch, Tech. Rep., 2005.[20] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus andcooperation in networked multi-agent systems,”

Proceedings of theIEEE , vol. 95, no. 1, 2007.[21] D. Jakovetic, J. Xavier, and J. M. Moura, “Weight optimization forconsensus algorithms with correlated switching topology,”

IEEETransactions on Signal Processing , vol. 58, no. 7, 2010.[22] D. P. Spanos, R. Olfati-Saber, and R. M. Murray, “Dynamic consen-sus on mobile networks,” in

IFAC world congress . Prague CzechRepublic, 2005.[23] M. Zhu and S. Mart´ınez, “Discrete-time dynamic average consen-sus,”