[PDF] Resource allocation in communication networks with large number of users: the stochastic gradient descent method

Abstract

We consider a communication network with fixed number of links, shared by large number of users. The resource allocation is performed on the basis of an aggregate utility maximization in accordance with the popular approach, proposed by Kelly and coauthors (1998). The problem is to construct a pricing mechanism for transmission rates to stimulate an optimal allocation of the available resources. In contrast to the usual approach, the proposed algorithm does not use the information on the aggregate traffic over each link. Its inputs are the total number N of users, the link capacities and optimal myopic reactions of randomly selected users to the current prices. The dynamic pricing scheme is based on the dual projected stochastic gradient descent method. For a special class of utility functions u i we obtain upper bounds for the amount of constraint violation and the deviation of the objective function from the optimal value. These estimates are uniform in N and are of order O( T −1/4 ) in the number T of reaction measurements. We present some computer experiments for quadratic utility functions u i .

Full PDF

aa r X i v : . [ m a t h . O C ] M a y RESOURCE ALLOCATION IN COMMUNICATION NETWORKSWITH LARGE NUMBER OF USERS: THE STOCHASTIC GRADIENTDESCENT METHOD

D.B. ROKHLIN

Southern Federal University, Rostov-on-Don

We consider a communication network with ﬁxed number of links, sharedby large number of users. The resource allocation is performed on the basis ofan aggregate utility maximization in accordance with the popular approach,proposed by Kelly and coauthors [7]. The problem is to construct a pricingmechanism for transmission rates to stimulate an optimal allocation of theavailable resources.In contrast to the usual approach, the proposed algorithm does not usethe information on the aggregate traﬃc over each link. Its inputs are thetotal number N of users, the link capacities and optimal myopic reactions ofrandomly selected users to the current prices. The dynamic pricing schemeis based on the dual projected stochastic gradient descent method. For aspecial class of utility functions u i we obtain upper bounds for the amountof constraint violation and the deviation of the objective function from theoptimal value. These estimates are uniform in N and are of order O ( T − / ) in the number T of reaction measurements. We present some computerexperiments for quadratic utility functions u i . Kew words and phrases : network utility maximization, duality, stochasticprojected gradient descent method, large number of users

Introduction

Contemporary communication networks contain large number of links,whose capacities are shared by huge number of users. Network resourcemanagement is aimed to optimally utilize the available resources, preventcongestion and ensure the stability of the system. Furthermore, to be of prac-tical value the control should be decentralized: users and links are consideredas processors, updating their variables based on the dynamically monitoredlocal information. Now conventional optimality criterion, the sum of user

E-mail address : [email protected] .The research is supported by the Russian Science Foundation, project 17-19-01038. utilities, was proposed in [7]. In economic terms, this criterion can be calledutilitarian, since it corresponds to the maximization of social welfare.Consider a network with m of links and N users. Each user i transmitspackets over a ﬁxed set of links. The network structure is determined by therouting matrix R = ( R ji ) ∈ R m × N . Its columns R i = 0 , i = 1 , . . . , N arebinary m -dimensional vectors such that R ji = 1 , if the link j is utilized by theuser i and R ji = 0 otherwise. The link capacities are described by a vector b ∈ R m with strictly positive components. The users evaluate the networkquality by the utility functions u i ( x i ) , depending on the transmission rates x i ∈ R + . An optimal resource allocation corresponds to an optimal solution x ∗ ∈ R N + of the network utility maximization (NUM) problem: u ( x ) = N X i =1 u i ( x i ) → max , (0.1) Rx = N X i =1 R i x i ≤ b, x = ( x , . . . , x N ) ∈ R N + , (0.2)which was formulated in [7].For given link prices λ ∈ R m + , the users select optimal transmission rates x i maximizing the diﬀerence between the utility and price of x i : x i ∈ arg max x i ∈ R + u i ( x i ) − x i m X j =1 λ j R ji ! . The aim of the management is to stimulate this optimal resource allocation x ∗ by setting the link prices λ ∗ ∈ R m + . The research related to this problem,its variants and generalizations is reviewed in [13, 5, 12, 14].Under some technical assumptions, including the concavity of the utilityfunctions u i , the existence of the mentioned stimulating prices λ ∗ follow fromthe duality theory. Moreover, these prices can be approximated using thedual projected gradient descent method: see [8, 9]. The related computationsare completely distributed: each link j updates its price λ j in accordancewith the diﬀerence between the supply of b j and total demand P Ni =1 R ji x i ,and each user i updates the transmision rate x i on the basis of the link pricesfor the path { j : R ji = 1 } .In this paper the total traﬃc on the links is not assumed to be known.This problem statement makes sense, since the packets from users do notcome simultaneously. The asynchronous model of [8] and the model withnoisy feedback [15] diﬀerently address the same problem.The input data for the dynamic pricing algorithm in question are (1) thetotal number of users N , (2) the link capacities b and (3) the reactions x ξ of ESOURCE ALLOCATION IN COMMUNICATION NETWORKS 3 randomly selected users to current prices λ . It should be emphasized thatthis approach requires the knowledge of one global parameter: the numberof users. The algorithm builds an approximation for the optimal price of λ ∗ on the basis of relatively small number of user reactions. The sequentialprocedure for constructing this approximation is based on the dual projected stochastic gradient descent method.In section 2 for a special class of utility functions u i we give upper boundsfor the amount of constraint violation and the deviation of the objectivefunction from the optimal value. These estimates are uniform in N and areof orded O ( T − / ) in the number T of measured user reactions. Note thatthe fast gradient descent method of Nesterov [10], applied to the problemunder consideration in [3], bounds the same quantities by O ( T − ) in thenumber T of iterations. However, each iteration of the fast gradient descentmethod requires the knowledge of N user reactions, if they are measuredindividually. So, for large values of N the proposed algorithm may requiremuch smaller number of user reactions measurements to achieve the desiredaccuracy. The computer experiments with quadratic utility functions u i ,presented in section 2, illustrate this fact.We assume that marginal utilities at zero u ′ i (0) are ﬁnite (and uniformlybounded by N ). The consequence of this assumption is the fact that asigniﬁcant proportion of users receive zero optimal data transmission rates.So, we interpret the proposed pricing mechanism as a way to manage anextra traﬃc. This means that initially each user receives a bandwidth of theorder of min ≤ j ≤ m b j /N , and only the remaining link capacities are sharedaccording the proposed pricing scheme. However, in what follows we do notconsider this aspect. Notation.

We do not explicitly distinguish between row and column vec-tors. The scalar product and Euclidean norm are be denoted as follows: h x, y i = k X i =1 x i y i , k x k = p h x, x i , x, y ∈ R k . We use lower indexes for vector numbers and upper indexes for their com-ponents. The gradient of a function is written as g ′ := ( g x , . . . , g x k ) .1. The main result

Let us brieﬂy describe the standard approach to the NUM problem (0.1),(0.2). Consider the Lagrange function L ( x, λ ) = N X i =1 u i ( x i ) + h λ, b − N X i =1 R i x i i , ( x, λ ) ∈ R N + × R m + RESOURCE ALLOCATION IN COMMUNICATION NETWORKS and the dual objective function q ( λ ) = sup x ∈ R N + L ( x, λ ) = h λ, b i + N X i =1 sup x i ∈ R + ( u i ( x i ) − h λ, R i i x i ) , λ ∈ R m + , Denote by x ∗ the solution of the primal problem (0.1), (0.2) and by λ ∗ thesolution of the dual problem(1.1) q ( λ ) → min λ ∈ R m + . Formally applying the Kuhn-Tucker conditions, we get the relations x ∗ ,i ∈ arg max x i ∈ R + ( u i ( x i ) − h λ ∗ , R i i x i ) , (1.2) λ ∗ ,j b j − N X i =1 R ji x ∗ ,i ! = 0 , j = 1 , . . . , m. (1.3) N X i =1 R i x ∗ ,i ≤ b, x ∗ ,i ∈ R n + . (1.4)The Lagrange multipliers ( λ ∗ ,j ) mj =1 are interpreted as link prices, and ( R ji x ∗ ,i ) mj =1 are the optimal transmission rates of i -th user. The relations (1.2) mean that x ∗ ,i are optimal reactions to the prices λ ∗ ,j , j ∈ L i on the part of the self-ish users, seeking for the “revenue” u i ( x i ) − h λ ∗ , R i x i i maximization. Theconditions (1.4) ensure the feasibility of x ∗ . The complementary slacknessconditions (1.3) imply that all links j with non-zero prices λ ∗ ,j > arecompletely utilized: P Ni =1 R ji x ∗ ,i = b j .If the functions − u i are strongly convex, then the elementary problems(1.5) x i ( λ ) ∈ arg max x i ∈ R + ( u i ( x i ) − h λ, R i i x i ) have no more that one solution x i ( λ ) for any λ ∈ R m + . If such solutions exist,then the function q is diﬀerentiable and(1.6) q ′ ( λ ) = b − N X i =1 R i x i ( λ ) = b − Rx ( λ ) . An optimal solution λ ∗ can be computed by the dual projected gradientdescent method:(1.7) λ t +1 = ( λ t − η t ( b − Rx ( λ t ))) + , t ≥ , where µ + = (max { µ j , } ) mj =1 and η t > is a step sequence. If the aggregatedemand P Ni =1 R ji x i is known, then each link j can adjust its price λ jt accord-ing to (1.7), using only the local information on its resource demand. Under ESOURCE ALLOCATION IN COMMUNICATION NETWORKS 5 suitable technical conditions the sequence λ t converges to λ ∗ and x ( λ t ) con-verges to x ∗ . For the NUM problem this method was formulated in [8], seealso [3, 9, 11].As was already mentioned in the introduction, in this paper the quanti-ties b − Rx ( λ t ) assumed to be unknown. To model the demands comingfrom random users, we replace the sum P Ni =1 R i x i ( λ ) by a random vector N R ξ x ξ ( λ ) , where ξ is uniformly distributed on { , . . . , N } , and apply theprojected stochastic gradient descent method. To bound the approximationerrors uniformly in N , we impose the following conditions on the utilityfunctions. Assumption 1.

The functions u i are twice continuously diﬀerentiable on ( − ε, ∞ ) , ε > and satisfy the conditions u i (0) = 0 , < u ′ i (0) ≤ B < ∞ , ≤ i ≤ N. Assumption 2.

The functions − u i are ( N σ ) -strongly convex: (1.8) − u ′′ i ( x i ) ≥ N σ, x i ∈ R + , σ > . Although, as is clear from (1.8), the functions u i depend on N , for thereadability reasons we suppress this dependence in the notation. In whatfollows the Assumptions 1, 2 are supposed to be fulﬁlled without furthercommentary.Our main example is the quadratic utilities(1.9) u i ( x i ) = a i x i − N σ i x i ) . To meet the Assumptions 1, 2 we require that < a i ≤ B, σ i ≥ σ. If the data trasmission is free, then such utility functions induce the individ-ual demands of order /N : ˆ x i = 1 N a i σ i . So, the users are “small”. Since usually the resources are scarse, the aggregatedemand P Ni =1 R i ˆ x i should exceed b componentwise.On can regard (1.9) as an approximation of a general ( N σ i ) -strongly con-vex function near the origin. Furthermore, a i can be considered as a valueof the unit transmission rate for the user i . The second term in (1.9) can beregarded as a penalty, assigned by the network. From (1.5) we get x i = ( a i − h λ, R i i ) + N σ i . Thus, the optimal rates (if positive) are proportional to the diﬀerence be-tween the value coeﬃcient a i and the aggregate price h λ, R i i of the utilized RESOURCE ALLOCATION IN COMMUNICATION NETWORKS links. If the users are distinguished only by the values a i (т.е. σ i = σ ), thenthe proportionality coeﬃcient are common to all of them. This coeﬃcienttakes into account the total number N of users.Note that the utilities (1.9) decrease for large values of the argument. How-ever, the user demands x i ( λ ) cannot exceed ˆ x i for any price vector λ . Hence,the users consider u i only on the intervals [0 , ˆ x i ] , where these functions areincreasing.For f : R [0 , ∞ ] denote by f ∗ ( z ) = sup x ∈ R ( xz − f ( x )) the conjugate function. Put − u i ( y ) = ( − u i ( y ) , x ∈ R + , + ∞ , otherwise . The dual utility function takes the form q ( λ ) = h λ, b i + n X i =1 ( − u i ) ∗ ( −h R i , λ ) i . Since the function − u i are strongly convex, the elementary problems (1.5)have the unique solutions x i ( λ ) (see [2, Theorem 5.25]). From the formulafor the subdiﬀerential of the conjugate function (see [2, Corollary 4.21]): ∂f ∗ ( z ) = arg max x ( xz − f ( x )) it follows that the functions ( − u i ) ∗ ( z ) are diﬀerentiable and(1.10) ∂∂λ j ( − u i ) ∗ ( −h R i , λ )) = − R ji x i ( λ ) . Hence, the function q is diﬀerentiable and its gradient is given by (1.6).The problem (0.1),(0.2) is solvable, since its set of feasible solutions is com-pact. By the strong duality theorem (see [4, Proposition 5.3.6]) an optimalsolution λ ∗ of the dual problem (1.1) exists and u ( x ∗ ) = q ( λ ∗ ) . Moreover, ( x ∗ , λ ∗ ) is a pair of primal and dual optimal solutions, if and only if x ∗ isfeasible, λ ∗ ≥ , and the optimality conditions (1.4), (1.5) are satisﬁed: [4,Proposition 5.3.2]. Lemma 1.

An optimal solution λ ∗ of the dual problem satisﬁes the inequal-ities ≤ λ ∗ ,j ≤ B, j = 1 , . . . , m.

Furthermore, (cid:12)(cid:12)(cid:12)(cid:12) ∂q ( λ ) ∂λ j (cid:12)(cid:12)(cid:12)(cid:12) ≤ max (cid:26) b j , Bσ − b j (cid:27) , λ ∈ [0 , B ] m . ESOURCE ALLOCATION IN COMMUNICATION NETWORKS 7

Proof. If λ ∗ is an optimal solution of the dual problem, then the uniquesolution x ∗ of the primal problem satisﬁes the relation x ∗ ∈ arg max x ∈ R N + L ( x, λ ∗ ) (see [1, Theorem 12.11]). By (1.5), x ∗ = x ( λ ∗ ) . It follows that x ∗ is deter-mined by the relations(1.11) u ′ i ( x ∗ ,i ) − h R i , λ ∗ i ≤ , x ∗ ,i (cid:0) u ′ i ( x ∗ ,i ) − h R i , λ ∗ i (cid:1) = 0 , ≤ i ≤ N (see [1, Example 9.4]).Let j ∈ { , . . . , m } . Recall that the set I j = { i : R ji = 1 } of all users i ,utilizing the link j , is nonempty. If λ ∗ ,j > B , then u ′ i ( x ∗ ,i ) ≤ u ′ i (0) ≤ B < λ ∗ ,j ≤ h R i , λ ∗ i , i ∈ I j , since the functions x i u ′ i ( x i ) are decreasing. From (1.11) it follows that x ∗ ,i = 0 , i ∈ I j . Hence, N X i =1 R ji x ∗ ,i = X i ∈ I j R ji x ∗ ,i = 0 in contradiction to the complementary slackness condition (1.3).Furthermore, for ( N σ ) -strongly convex function − u i we have the inequal-ity ( − u ′ i ( x i ) + u ′ i ( y i ))( x i − y i ) ≥ N σ ( x i − y i ) , x i , y i ∈ R + (see [2, Theorem 5.24]). Put x i = x i ( λ ) , y i = 0 : ( − u ′ i ( x i ) + u ′ i (0)) x i ≥ N σ ( x i ) , Using the optimality condition (1.11) for x i ( λ ) : x i (cid:0) u ′ i ( x i ) − h R i , λ i (cid:1) = 0 , we get x i ( u ′ i (0) − h R i , λ i ) ≥ N σ ( x i ) . Thus,(1.12) x i ≤ N σ ( u ′ i (0) − h R i , λ i ) ≤ BN σ .

The inequality (cid:12)(cid:12)(cid:12)(cid:12) ∂q ( λ ) ∂λ j (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) b j − N X i =1 R ji x i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ max (cid:26) b j , Bσ − b j (cid:27) , which follows from (1.12), completes the proof. (cid:3) RESOURCE ALLOCATION IN COMMUNICATION NETWORKS

Lemma 1 shows that the minimization of the dual objective function can beperformed over the hypercube

Λ = [0 , B ] m . Let Π Λ ( y ) = arg min {k z − y k : z ∈ S } be an orthogonal projection onto Λ : Π Λ ( y ) j =  , y j ≤ ,y j , ≤ y j ≤ B,B, y j ≥ B. (see [1, Example 8.10]). On some probability space (Ω , F , P ) consider asequence ( ξ r ) ∞ r =1 of independent random variables, uniformly distributed on { , . . . , N } : P ( ξ r = i ) = 1 N , i ∈ { , . . . , N } . Let F k = σ ( ξ , . . . , ξ k ) be a natural ﬁltration of the process ( ξ r ) ∞ r =1 .The recurrence formula(1.13) λ t +1 = Π Λ (cid:0) λ t − η t ( b − N R ξ t +1 x ξ t +1 ( λ t )) (cid:1) , t ≥ , λ ∈ S. with deterministic steps η t > deﬁnes the projected stochastic gradientdescent method for the problem q ( λ ) → min λ ∈ Λ . Indeed, since the random variable λ t is F t -measurable, the following condi-tional expectation can be computed by “freezing” λ t :(1.14) E (cid:0) b − N R ξ t +1 x ξ t +1 ( λ t )) | F t (cid:1) = b − N X i =1 R i x i ( λ t ) = q ′ ( λ t ) . The argumentation in the proof of the following lemma is similar to [6,Theorem 3.1].

Lemma 2.

Let λ t be a sequence generated by the projected stochastic gra-dient descent method (1.13). Then for any decreasing sequence η t > thefollowing estimate holds true E q ( λ T ) ≤ q ( λ ∗ ) + 1 T mB η T + L T X t =1 η t ! , (1.15) λ T := 1 T T X t =1 λ t , L := m X j =1 max (cid:26) b j , Bσ − b j (cid:27) ! / . ESOURCE ALLOCATION IN COMMUNICATION NETWORKS 9

Proof.

Let z t = b − N R ξ t x ξ t ( λ t ) , r t = λ t − λ ∗ . Using the inequality (1.12),we get E k z t +1 k = E m X j =1 ( b j − N R jξ t +1 x ξ t +1 ) ≤ m X j =1 max (cid:26) b j , Bσ − b j (cid:27) = L . Furthermore, since by the “Pythagorean theorem” k Π Λ µ − λ ∗ k ≤ k µ − λ ∗ k , µ ∈ R m (see [6, Theorem 2.1]), then k r t +1 k = k λ t +1 − λ ∗ k = k Π Λ ( λ t − η t z t +1 ) − λ ∗ k ≤ k λ t − η t z t +1 − λ ∗ k = k r t k − η t h z t +1 , λ t − λ ∗ i + η t k z t +1 k . Using (1.14), we get E k r t +1 k = E k r t k − η t E h E ( z t +1 | F t ) , λ t − λ ∗ i + η t E k z t +1 k ≤ E k r t k − η t E h q ′ ( λ t ) , λ t − λ ∗ i + η t L . By the convexity of q : q ( λ ∗ ) − q ( λ t ) ≥ h q ′ ( λ t ) , λ ∗ − λ t i it follows that E k r t +1 k ≤ E k r t k + 2 η t E ( q ( λ ∗ ) − q ( λ t )) + η t L , E q ( λ t ) − q ( λ ∗ ) ≤ E k r t k − E k r t +1 k η t + L η t . After the summation and rearranging terms, we get T X t =1 ( E q ( λ t ) − q ( λ ∗ )) ≤ (cid:18) η E k r k + (cid:18) η − η (cid:19) E k r k + . . . + (cid:18) η T − η T − (cid:19) E k r T k − η T E k r T +1 k (cid:19) + L T X t =1 η t ≤ mB η T + L T X t =1 η t . Here we used the estimates k r t k ≤ mB and the fact that η t is decreasing.Dividing by T and using the convexity of q : q ( λ T ) ≤ T T X t =1 q ( λ t ) , we get the desired estimate (1.15). (cid:3) The main result of the paper is the following theorem. Its proof uses theideas of [3, Theorem 1].

Theorem 1.

Deﬁne λ t by the recurrence formula (1.13) with η t = K/ √ t .Then for λ T = T P Tt =1 λ t the following estimates hold true: E N X i =1 R ji x i ( λ T ) − b j ! + ≤ r Dσ T / , D = mB K + K L , (1.16) u ( x ∗ ) − E u ( x ( λ T )) ≤ B r Dσ T / . (1.17) Proof.

Since T X t =1 √ t = T X t =1 Z tt − du √ t ≤ T X t =1 Z tt − du √ u = Z T du √ u = 2 √ T , by substituting η t = K/ √ t into (1.15), we get(1.18) E q ( λ T ) ≤ q ( λ ∗ ) + D √ T .

By the Assumption 2 the function x

7→ − L ( x, λ ) is ( N σ ) -strongly convex,that is, the function − L ( x, λ ) − σN k x k is convex. Hence, N σ N X i =1 ( x i − x i ( λ )) ≤ L ( x ( λ ) , λ ) − L ( x, λ ) (see [2, Theorem 5.25]). On the other hand, L ( x ( λ ) , λ ) − L ( x, λ ) = q ( λ ) − N X i =1 u i ( x i ) − * λ, b − N X i =1 R i x i + . In particular, for the optimal solution x ∗ of the primal problem (0.1), (0.2)we have N σ N X i =1 ( x ∗ ,i − x i ( λ )) ≤ q ( λ ) − u ( x ∗ ) = q ( λ ) − q ( λ ∗ ) . By (1.18) it follows that(1.19) E N X i =1 ( x ∗ ,i − x i ( λ T )) ≤ N σ ( E q ( λ T ) − q ( λ ∗ )) ≤ DN σ √ T .

ESOURCE ALLOCATION IN COMMUNICATION NETWORKS 11

For the discrepancy in the feasibility conditions we have the estimate N X i =1 R ji x i ( λ T ) − b j ≤ N X i =1 R ji x i ( λ T ) − N X i =1 R ji x ∗ ,i ≤ N X i =1 | x i ( λ T ) − x ∗ ,i |≤√ N N X i =1 ( x i ( λ T ) − x ∗ ,i ) ! / . Using (1.19), we get E N X i =1 R ji x i ( λ T ) − b j ! ≤ N N X i =1 ( x i ( λ T ) − x ∗ ,i ) ≤ Dσ √ T .

This implies (1.16). Furthermore, N X i =1 ( u i ( x ∗ ,i ) − u i ( x i )) ≤ N X i =1 u ′ i ( x i )( x ∗ ,i − x i ) ≤ B N X i =1 | x ∗ ,i − x i |≤ B √ N N X i =1 ( x ∗ ,i − x i ) ! / . Again using (1.19), we get the inequality E N X i =1 ( u i ( x ∗ i ) − u i ( x i )) ! ≤ B Dσ √ T , implying (1.17). (cid:3)

The estimates (1.16), (1.17) do not depend on the number N of users.This qualitative result is the main point of Theorem 1.Let B/σ ≥ ≤ j ≤ m b j . Then(1.20) L = m X j =1 max (cid:26) b j , Bσ − b j (cid:27) ! / ≤ Bσ √ m. Replace in (1.16) the constant L by its upper bound (1.20): D = mB (cid:18) K + Kσ (cid:19) and select the “optimal” constant K by minimizing this expression:(1.21) K = σ √ . This constant will be used in computer experiments in section 2.

Recall that a continuously diﬀerentiable function f : R m R is called β -smooth, if k f ′ ( x ) − f ′ ( y ) k ≤ β k x − y k . By Theorem 5.26 from [2] the functions ϕ i ( z ) := ( − u ) ∗ i ( z ) of a single variableare / ( N σ ) -smooth. By (1.10), q ′ ( λ ) = b − N X i =1 ϕ ′ i ( −h R i , λ i ) R i . Hence the function q is m/σ -smooth: k q ′ ( λ ) − q ′ ( µ ) k ≤ N X i =1 | ϕ ′ i ( −h R i , λ i ) − ϕ ′ i ( −h R i , µ i ) | · k R i k≤ N X i =1 σN |h R i , λ − µ i|√ m ≤ mσ k λ − µ k . (1.22)The constant in this estimate can be reﬁned, using the structure of thenetwork: see [3, Lemma III.1].Following [3], consider the fast gradient descent method of Nesterov [10]:(1.23) µ = b λ , τ = 1 , (1.24) b λ t = h µ t − σm q ′ ( µ t ) i + = " µ t − σm b − N X i =1 R i x i ( µ t ) ! + , (1.25) τ t +1 = 1 + p τ t , µ t +1 = b λ t + τ t − τ t +1 ( b λ t − b λ t − ) , where t ≥ . Denote by L u = B √ N the Lipschitz constant of the function u : | u ( x ) − u ( y ) | ≤ N X i =1 | u ′ i ( z i )( x i − y i ) | ≤ N X i =1 | u ′ i (0) || x i − y i |≤ B N X i =1 | x i − y i | ≤ B √ N k x − y k and by L q ′ = m/σ the Lipschitz constant of the vector-function q ′ : see(1.22). For the routing matrix R we have the estimate (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N X k =1 R jk x k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N X k =1 | x k | ≤ √ N k x k . ESOURCE ALLOCATION IN COMMUNICATION NETWORKS 13

In the notation of [3] this means that k R k , ∞ := max ( max j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N X k =1 R jk x k (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) : k x k ≤ ) ≤ √ N .

Note also that k b λ − λ ∗ k ≤ B √ m . Theorems 1 and 2 of [3] give the followingestimates: q ( b λ T ) − q ( λ ∗ ) ≤ L q ′ k b λ − λ ∗ k ( T + 1) ≤ CT , C = 2 m B σ ,u ( x ∗ ) − u ( x t ( b λ T )) ≤ L u r CσN T = 2 mB σT , (1.26) " N X i =1 R ji x ji ( b λ T ) − b j + ≤ k R k , ∞ r CσN T = 2 mBσT . (1.27)The fast gradient descent method will be used in section 2 for comparisonwith the projected stochastic gradient descent method (1.13). As alreadymentioned in the introduction, the estimates (1.26), (1.27) are much betterthan (1.16),(1.17) in the order of T , but each iteration of the fast gradi-ent descent method can be signiﬁcantly more labor-consuming, than in themethod (1.13). 2. Examples

Example . To better understand the properties of the optimal solutionscorresponding to the quadratic utility functions (1.9), consider a networkwith two links and three users. Assume that the user 1 utilizes both links,and the users 2 and 3 utilize the links 1 and 2 respectively. Let the linkcapacities be and . Thus,(2.1) R = (cid:18) (cid:19) , R = (cid:18) (cid:19) , R = (cid:18) (cid:19) , b = (cid:18) (cid:19) . This network was considered in [13, Example 2.3] under the assumptionthat the user utility functions are logarithmic: u i ( x i ) = ln x i . In this case thesolutions of the primal (0.1), (0.2) and dual (1.1) problems look as follows x ∗ = (cid:18) λ ∗ , + λ ∗ , , λ , ∗ , λ ∗ , (cid:19) , λ ∗ , = √

31 + √ , λ ∗ , = √ . Note, that(2.2) < λ ∗ , < λ ∗ , , < x ∗ , < x ∗ , < x ∗ , . Now consider the users with identical utility functions (1.9): u i ( x i ) = ax i − σ ( x i ) . An elementary analysis of the optimality conditions (1.2) – (1.4) gives thesolutions, presented in Table 1. a/σ (0 , /

2] [3 / , /

2] [9 / ,

9] [9 , ∞ ) λ ∗ , a/ − σ a − σλ ∗ , a − σ/ a/ a − σx ∗ , a/ (3 σ ) 1 / − a/ (9 σ ) x ∗ , a/ (3 σ ) a/ (3 σ ) 1 + a/ (9 σ ) x ∗ , a/ (3 σ ) 1 / a/ (9 σ ) Table 1.

The dependence of the optimal solution on the parameter a/σ

Note that for all values of the parameter a/σ the inequalities (2.2) aresatisﬁed at least in the non-strict sense. Furthermore, ﬁx the common valuecoeﬃcient a and consider σ as a penalty, assigned by the network. If thepenalty is very high: a/σ ≤ / , then the network resources are availablefor free and the users select identical transmission rates, since their utilityfunctions are identical. If a/σ ∈ (3 / , / , then the ﬁrst link is available forfree. The capacity of the second link is divided equally between the ﬁrst andthird users. The second user utilizes the remaining capacity of the ﬁrst linkonly partially. If a/σ ∈ (9 / , , then both resources are scarce. In this casethe inequalities (2.2) are literally satisﬁed. Finally, for very small penalty: a/σ > the ﬁrst user is “eliminated from the market”, and the remainingtwo fully utilize the capacities of the corresponding links. Example . Consider the “network”, containing a single link with the capacity b > , utilized by large number N of users with the utility functions(2.3) u i ( x i ) = a i x i − σN x i ) , a i ∈ (0 , B ) , σ > . The problems (1.5) are easily solved:(2.4) x i ( λ ) = 1 N ( a i − λ ) + σ , The optimal solution λ ∗ of the dual problem is the solution of the equation(2.5) q ′ ( λ ) = b − N X i =1 N ( a i − λ ) + σ = 0 . ESOURCE ALLOCATION IN COMMUNICATION NETWORKS 15

The stochastic projected gradient descent method (1.13) takes the form λ t +1 = Π [0 ,B ] h λ t − η t ( b − N x ξ t +1 ) i , Π [0 ,B ] ( y ) =  , y ≤ ,y, y ∈ (0 , B ) ,A, y ≥ B. Let b = 5 , σ = 1 , N = 10 and assume that the value coeﬃcients ( a i ) Ni =1 are uniformly distributed on (0 , B ) , B = 100 . We generated k = 30 “popu-lations” of N users with the utilities (2.3). For each of problem instancesthe solution λ ∗ of the equation (2.5) was obtained by the bisection method optimize.bisect from the scipy module (Python) with the standard toler-ance parameters. The average values of the optimal price, optimal aggregateutility and optimal demand for the free resource equal to λ ∗ = 68 . , u ( x ∗ ) = 394 . , N X i =1 x i (0) = 50 . Thus, the demand for the free resource is 10 times higher than the availablecapacity b = 5 . Since a i are uniformly distributed on (0 , , the formula(2.4) shows that on average almost 70% of users get zero optimal rate x i ( λ ∗ ) .We applied the projected stochastic gradient descent method (1.13) with λ = 0 , η t = K/ √ t , where K = 1 / √ is deﬁned by (1.21). Consider therelative errors in the optimal price, optimal aggregate demand and optimalnetwork utility: ε t := | λ t − λ ∗ | λ ∗ , ε t := 1 b (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N X i =1 x i ( λ t ) − b (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (2.6) ε t := 1 u ( x ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N X i =1 u i ( x i ( λ t )) − u ( x ∗ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (2.7)The values ε iT , averaged over the sample of 30 problems, and their maximalvalues for the same sample, are presented in Table 2. Note that the errorsin the optimal demand and utility are approximately 4 times larger thanthe errors in the optimal price. Note also that the number of measuredtransmission rates T ≤ is signiﬁcantly smaller than the number ofusers: T /N ≤ . .The fast gradient descent method (1.23) – (1.25) (with b λ = 0 ) requiresonly few iterations to get a comparable accuracy. For example, the mean rel-ative error in the optimal price equals to . for 10 iterations. However, ifthe user demands are measured individually, this requires measurements. The number Mean relative Maximal relativeof iterations errors in the errors in the T prices demand utility prices demand utility1000 0.0129 0.056 0.049 0.035 0.155 0.1322000 0.0078 0.034 0.029 0.019 0.082 0.0724000 0.0052 0.022 0.019 0.016 0.069 0.060 Table 2.

The relative errors of the projected stochastic gradient descent method δ Mean Мaximal Minimal Mean Maximalnumber of number of number of relative relativeiterations iterations iterations price error price error −

792 1697 218 0.0140 0.045 − − Table 3.

The relative errors for the stopping rule (2.8)

For unknown λ ∗ the errors (2.6), (2.7) are unobservable. We used verysimple rule for iteration stopping, which is based on the observable quantities λ t = t P tk =1 λ k :(2.8) τ = min (cid:26) t ≥ | λ t − λ t − | λ t − < δ (cid:27) . For the same sample of 30 problems the results are given in the Table 3.

Example . Consider a network with the routing matrix and link capacities(2.1). In contrast to the Example 1, assume that there are N = 1 . · users.Let the users with numbers i ∈ { , . . . , N/ } utilize both links, and the userswith the numbers i ∈ { N/ , . . . , N/ } and i ∈ { N/ , . . . , N } utilizethe links and respectively. It is assumed that the utility functions are ofthe form (2.3), where σ = 1 , and a i are uniformly distributed on (0 , B ) .As in Example 2, we generated a sample of k = 30 problems. For eachproblem 200 iterations of the fast gradient descent method (1.23) – (1.25)were performed. The obtained vector ( λ ∗ , λ ∗ ) is considered as an exactsolution of the dual problem. The corresponding solution x ∗ = x ( λ ∗ ) of theprimal problem gives the discrepancy in the constraints (0.2) of order − .Put h x ∗ i k + ri = k +1 = r P ri =1 x ∗ i + k . For B = 100 the computer experimentsshow that the users, utilizing both links, are eliminated from the market: ESOURCE ALLOCATION IN COMMUNICATION NETWORKS 17 h x ∗ i N/ i =1 = 0 . The resources are shared the by remaining users so that: h x ∗ i N/ i = N/ = 5 · − , h x ∗ i Ni =2 N/ = 2 . · − similarly to the case a/σ > in Example 1. In this case, however, manyelements x ∗ i , i > N/ are also equal to . For B = 12 we get the followingresults: h x ∗ i N/ i =1 ≈ − < h x ∗ i Ni =2 N/ ≈ . · − < h x ∗ i N/ i = N/ ≈ · − , similar to the case a/σ ∈ (9 / , in Example 1.We applied the projected stochastic gradient descent method with λ = 0 and η t = 1 / √ t , as in Example 2. The errors in the prices and demand,analogous to (1.6), are understood componentwise: ε ,jt := | λ jt − λ ∗ ,j | λ ∗ ,j , ε ,jt := 1 b j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N X i =1 x ji ( λ t ) − b j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . The errors in the network utility are computed by the formula (2.7). Therelative errors, averaged over 30 problems of the sample, and their maximalvalues are given in the Tables 4, 5.The number Mean relative Maximal relativeof iterations errors in the errors in T prices demand utility prices demand utility2000 0.104 0.023 0.011 0.249 0.060 0.0370.022 0.030 0.061 0.0834000 0.080 0.018 0.008 0.247 0.071 0.0320.015 0.021 0.037 0.0568000 0.048 0.012 0.006 0.140 0.031 0.0170.012 0.016 0.033 0.036 Table 4.

The relative errors of the projected stochastic gradient descentmethod for the network with two links, B = 12 The number Mean relative Maximal relativeof iterations errors in the errors in the T prices demand utility prices demand utility2000 0.020 0.078 0.093 0.050 0.196 0.2080.033 0.217 0.072 0.4954000 0.012 0.045 0.049 0.033 0.122 0.1100.018 0.116 0.046 0.3038000 0.006 0.021 0.026 0.022 0.081 0.0720.010 0.062 0.034 0.220 Table 5.

The relative errors of the projected stochastic gradient descentmethod for the network with two links, B = 100 Conclusion

In this paper we used the dual projected stochastic gradient descent methodfor the pricing of the information transmission rates over the links of a net-work. The main example of the utility function is the diﬀerence between thelinear utility, individual for each user, and the quadratic penalty, assigned bythe network. The penalty contains a coeﬃcient, which is proportional to thetotal number N of users. For a class of utility functions, containing the men-tioned quadratic functions, we obtained the estimates for the errors in theprices, stimulating an optimal resource allocation, and for the feasibility andthe optimal network utility errors. These estimates are uniform in N . Wepresented computer experiments, conﬁrming that, at least for networks withsmall number of links, a satisfactory accuracy can be obtained by measuringa relatively small number of individual user reactions to the link prices. References [1] A. Beck.

Introduction to nonlinear optimization: theory, algorithms, and applications with MAT-LAB . SIAM, Philadelphia, 2014.[2] A. Beck.

First-order methods in optimization . SIAM, Philadelphia, 2017.[3] A. Beck, A. Nedi´c, A. Ozdaglar, and M. Teboulle. An O (1 /k ) gradient method for network resourceallocation problems. IEEE Transactions on Control of Network Systems , 1(1):64–73, 2014.[4] D.P. Bertsekas.

Convex optimization theory . Athena Scientiﬁc, Belmont, 2009.[5] M. Chiang, S.H. Low, A.R. Calderbank, and J.C. Doyle. Layering as optimization decomposition:a mathematical theory of network architectures.

Proceedings of the IEEE , 95(1):255–312, 2007.[6] E. Hazan. Introduction to online convex optimization.

Foundations and Trends R (cid:13) in Optimization ,2(3-4):157–325, 2016.[7] F.P. Kelly, A.K. Maulloo, and D.K.H. Tan. Rate control for communication networks: shadow prices,proportional fairness and stability. Journal of the Operational Research Society , 49(3):237–252, 1998.[8] S. Low and D.E. Lapsley. Optimization ﬂow control, I: basic algorithm and convergence.

IEEE/ACMTransactions on Networking , 7(6):861–874, 1999.[9] A. Nedi´c and A. Ozdaglar. Cooperative distributed multi-agent optimization. In P.P. Daniel andY.C. Eldar, editors,

Convex Optimization in Signal Processing and Communications , chapter 10,pages 340–386. Cambridge University Press, Cambridge, 2010.

ESOURCE ALLOCATION IN COMMUNICATION NETWORKS 19 [10] Yu. Nesterov. A method for solving the convex programming problem with convergence rate O (1 /k ) . Dokl. Akad. Nauk SSSR , 269(3):543–547, 1983.[11] Yu. Nesterov and V. Shikhman. Dual subgradient method with averaging for optimal resourceallocation.

European Journal of Operational Research , 270(3):907–916, 2018.[12] S. Shakkottai and R. Srikant. Network optimization and control.

Foundations and Trends R (cid:13) inNetworking , 2(3):271–379, 2008.[13] R. Srikant. The mathematics of Internet congestion control . Birkh¨auser, Boston, 2004.[14] R. Srikant and L. Ying.

Communication networks: an optimization, control, and stochastic networksperspective . Cambridge University Press, New York, 2014.[15] J. Zhang, D. Zheng, and M. Chiang. The impact of stochastic noisy feedback on distributed networkutility maximization.