[PDF] Accelerated Multi-Agent Optimization Method over Stochastic Networks

Abstract

We propose a distributed method to solve a multi-agent optimization problem with strongly convex cost function and equality coupling constraints. The method is based on Nesterov's accelerated gradient approach and works over stochastically time-varying communication networks. We consider the standard assumptions of Nesterov's method and show that the sequence of the expected dual values converge toward the optimal value with the rate of O(1/ k 2 ) . Furthermore, we provide a simulation study of solving an optimal power flow problem with a well-known benchmark case.

Full PDF

aa r X i v : . [ m a t h . O C ] S e p Accelerated Multi-Agent Optimization Method over Stochastic Networks

Wicak Ananduta, Carlos Ocampo-Martinez, and Angelia Nedi´c

Abstract — We propose a distributed method to solve amulti-agent optimization problem with strongly convex costfunction and equality coupling constraints. The method is basedon Nesterov’s accelerated gradient approach and works overstochastically time-varying communication networks. We con-sider the standard assumptions of Nesterov’s method and showthat the sequence of the expected dual values converge towardthe optimal value with the rate of O (1 /k ) . Furthermore, weprovide a simulation study of solving an optimal power ﬂowproblem with a well-known benchmark case. Index Terms — multi-agent optimization, distributed method,accelerated gradient method, distributed optimal power ﬂowproblem

I. I

NTRODUCTION

The advancement on information, computation and com-munication technologies promotes the deployment of dis-tributed approaches to solve complex large-scale problems,e.g., in power networks [1], [2] and water networks [3]. Onone hand, such approaches offer ﬂexibility and scalability.On the other hand, they require more complex design thanthe centralized counterpart as multiple computational unitsmust cooperate and communicate among each other.In this paper, we deal with a multi-agent optimizationproblem, in which the cost function is a summation of astrongly convex cost functions. Moreover, the problem hasequality coupling constraints. This formulation is mainlymotivated from optimal power ﬂow (OPF) problems of large-scale power networks [1] and resource allocation problems[2], [4]. Furthermore, the problem can also be considered asa subclass of extended monotropic problems [5].We solve the problem in a distributed manner through itsdual to deal with the coupling constraints. Particularly, wedevelop the method based on Nesterov’s accelerated gradientmethod [6], [7], which is an accelerated ﬁrst-order approach,with the rate of O (1 /k ) . This accelerated method has beenused to develop a fast distributed gradient method to solvenetwork utility maximization problems [8], a fast alternatingdirection method of multipliers (ADMM) for a certain classof problems with strongly convex cost function [9], anddistributed model predictive controllers [10], among others.However, different from the aforementioned papers, onefeature of the system that we particularly pay attention tois the time-varying nature of the communication network,over which the agents exchange information. Speciﬁcally W. Ananduta is with the Delft Center of Systems and Control (DCSC),TU Delft, the Netherlands. C. Ocampo-Martinez is with Institut deRob`otica i Inform`atica Industrial (CSIC-UPC), Barcelona, Spain. A. Nedi´cis with School of Electrical, Computer and Energy Engineering, Ari-zona State University. E-mail addresses: [email protected],[email protected], [email protected] . here, we assume that the network is stochastically time-varying and this assumption can model communication fail-ures that might occur in large-scale systems. Similar setupon communication networks can be found in [11]–[14],which develop unaccelerated ﬁrst-order methods, and [15],[16], which propose a Nesterov-like fast gradient method fordistributed optimization problem with a common decisionvariable. Nevertheless, whereas the former four papers donot consider an accelerated method, the latter ones deal witha different problem and work directly in the primal space.Note that different models of time-varying communicationnetworks have also been considered, as in [17]–[19].To summarize, the main contribution of this paper isan accelerated ﬁrst-order distributed method for a multi-agent optimization problem, which works over stochasticcommunication networks. As a fully distributed algorithm,the parameter design and iterations only need local infor-mation, i.e., neighbor-to-neighbor communication. Further-more, since the method is based on Nesterov’s acceleratedapproach, it enjoys the convergence rate of O (1 /k ) on theexpected dual value, as shown in the convergence analysis.The paper is structured as follows. Section II providesthe problem setup and the cosidered model of time-varyingcommunication networks. Afterward, Section III presentsthe proposed distributed method along with its convergencestatement. Then, in Section IV, we show the convergenceanalysis of the proposed method. Furthermore, we alsoshowcase the performance of the proposed method to solvean intra-day OPF problem for a well-known benchmark casein Section V. Finally, Section VI concludes the paper byproviding some remarks and discussions about future work. Notation and properties

The set of real numbers is denoted by R . For any a ∈ R , R ≥ a denotes { b ∈ R : b ≥ a } . The inner product of vectors x, y ∈ R n is denoted by h x, y i , whereas the Euclideanvector norm and the induced matrix norm are denoted by k · k . The operator col {·} stacks the arguments column-wise.We use n to denote zero vector with dimension n . Whenthe dimension is clear from the context, we may omit thesubscript. Furthermore, the following properties will be usedin the convergence analysis. Property 1 (Strong convexity):

A differentiable function f ( x ) : R n → R is strongly convex, if for any x, y ∈ R n it holds that h∇ f ( y ) − ∇ f ( x ) , y − x i ≥ σ k y − x k , where σ is the strong convexity constant. roperty 2 (Lipschitz smoothness): A function f ( x ) : R n → R is continuously differentiable with Lipschitz con-tinuous gradient, if for any x, y ∈ R n it holds that k∇ f ( y ) − ∇ f ( x ) k ≤ L k y − x k , where L denotes the Lipschitz constant. II. P

ROBLEM SETUP

A. Multi-agent optimization problem

We consider a multi-agent system, where the set of agentsis denoted by N := { , , . . . , N } . The agents want tocooperatively solve an optimization problem in the followingform: minimize u i ∈U i , ∀ i ∈N N X i =1 f i ( u i ) (1a)s.t. G ii u i + X j ∈N i G ji u j = g i , ∀ i ∈ N , (1b)where u i ∈ R n i and U i ∈ R n i denote the decision vector andthe local set constraint of agent i , respectively. In (1a), eachcost function f i ( u i ) is associated to agent i . Moreover, eachequality in (1b), with the non-zero matrix G ji ∈ R m i × n j , foreach j ∈ N i ∪ { i } and i ∈ N , and g i ∈ R m i , is assignedto agent i and couples agent i with some other agents, i.e., j ∈ N i ⊆ N . Based on the formulation of the couplingconstraints in (1b), we can represent the system as a directedgraph, denoted by S = ( N , V ) , where V denotes the set oflinks that represents how each agent inﬂuences the couplingconstraint (1b) of other agents. Speciﬁcally, the link ( j, i ) ∈V implies that u j appears on the coupling constraint of agent i , i.e., j ∈ N i . Therefore, we can say that N i is the set ofin-neighbors of agent i . On the other hand, we also introducethe set of out-neighbors, denoted by M i , i.e., M i = { j ∈N : ( i, j ) ∈ V} . Furthermore, we deﬁne i ∈ M i and, ingeneral, M i may not be equal to N i ∪ { i } (see Figure 1).Problem (1) is a subclass of the extended monotropicproblem [5]. Resource allocation problems [2], [4] can alsobe formulated as in (1). A particular practical problem ofinterest, which can be represented by (1), is the direct current(DC) OPF problem [1], where the decision vectors u i mightconsist of the real powers and phase angle, whereas (1b) rep-resents the DC approximation of the power ﬂow equations.Note that, in the DC-OPF problem, M i = N i ∪ { i } .Now, we consider the following assumptions hold. Assumption 1:

The function f i : R n i → R , for each i ∈ N , is differentiable and strongly convex with strongconvexity parameter denoted by σ i . Assumption 2:

The local set U i , for each i ∈ N , iscompact and convex. Assumption 3:

The feasible set of Problem (1) is non-empty. Assumptions 1 and 2 are rather restrictive, however, com-monly used in the applications considered, i.e., OPF andresource allocation problems. Moreover, these assumptionsallow us to apply Nesterov’s accelerated gradient method tosolve the dual problem of (1), as these assumptions result G u + G u = g G u + G u = g G u + G u = g

12 3(1,3)(3,2)(2,1) (1,1)(3,3)(2,2)Set of coupling constraints Graph representation

Fig. 1. A small network of three agents. Notice that N = { } and M = { , } . in a dual function with Lipschitz continuous gradient. Thisstatement is elaborated further in Section IV. Furthermore,Assumption 3 is considered to ensure that the proposedalgorithm can ﬁnd a solution to Problem (1). B. Stochastic communication networks

The aim of this work is to design a distributed optimizationalgorithm that solves Problem (1). As a distributed method,the algorithm requires each agent to communicate with otheragents over a communication network, which we supposeto be time-varying. Precisely, the communication networkis represented by the undirected graph G ( k ) = ( N , L ( k )) ,where L ( k ) ⊆ N × N denotes the set of communicationlinks that may vary over iteration k , i.e., { i, j } ∈ L ( k ) implies that agents i and j can communicate at iteration k . Thus, we denote by E i ( k ) the set of agents that canexchange information with agent i , i.e., E i ( k ) = { j ∈ N : { i, j } ∈ L ( k ) } . Furthermore, we consider the activation ofcommunication links as a random process and the followingassumption holds. Assumption 4:

The set L ( k ) is a random variable thatis independent and identically distributed across iterations.Furthermore, any communication link of neighboring agentsis active with a positive probability denoted by β { i,j } , i.e., P ( { i, j } ∈ L ( k )) = β { i,j } > , for { i, j } ∈ {{ i ′ , j ′ } ∈N × N : j ′ ∈ N i ′ , i ′ ∈ N } . Additionally, β { i,i } = 1 , for all i ∈ N . Assumption 4 implies that the probability that agent i canreceive information from all its in-neighbors j ∈ N i at thesame iteration k is positive. Let α i denote this probability,thus we have that α i = Q j ∈N i β { i,j } .III. P ROPOSED METHOD

In this section, we propose a distributed method to solveProblem (1) over stochastic communication networks. Theproposed method actually solves the dual problem associ-ated to (1) and is based on Nesterov’s accerelated gradientapproach [6], [7].To that end, let λ i ∈ R m i denote the Lagrange multiplierassociated to (1b), for each i ∈ N , and λ = col { λ i , i ∈N } . Thus, we deﬁne the dual function, associated to (1) anddenoted by q ( λ ) , as follows: q ( λ ) = X i ∈N q i ( λ i ) , (2) lgorithm 1 Distributed accelerated method

Initialization (for each i ∈ N )Set θ (1) = 1 and ˆ λ i (1) = λ i (0) = 0 Iteration (for each i ∈ N , k ≥ )1) Compute u i ( k ) : u i ( k ) = arg min u i ∈U i f i ( u i ) + X j ∈M i h G i ⊤ j ˆ λ j ( k ) , u i i (5)2) Send G ij u i ( k ) to out-neighbors j ∈ M i and receive G ji u j ( k ) from the in-neighbors j ∈ N i

3) Compute λ i ( k ) : λ i ( k ) = ˆ λ i ( k ) + η i  G ii u i ( k ) + X j ∈N i G ji u j ( k ) − g i  (6)4) Compute θ ( k + 1) = √ θ ( k )2

5) Compute ˆ λ i ( k + 1) : ˆ λ i ( k +1) = λ i ( k )+ θ ( k ) − θ ( k + 1) ( λ i ( k ) − λ i ( k − (7)6) Send ˆ λ i ( k + 1) to in-neighbors j ∈ N i and receive ˆ λ j ( k + 1) from the out-neighbors j ∈ M i where q i ( λ i ) = min u i ∈U i  f i ( u i ) − h λ i , g i i + X j ∈M i h G i ⊤ j λ j , u i i  . (3)Note that λ i denotes all Lagrange multipliers associatedto the coupling constraints that involve agent i , i.e., λ i =col { λ j , j ∈ M i } . We will then solve the dual problem: maximize q ( λ ) , (4)by adapting Nesterov’s accelerated gradient method suchthat it works over stochastically time-varying communicationnetworks (c.f. Section II-B). Note that, due to Assumptions1-3, the strong duality holds [20, Proposition 5.2.1].Hence, ﬁrst we state the distributed method based onNesterov’s accelerated gradient approach without consider-ing stochastic communication networks, i.e., the informationrequired to perform the updates is always available. Themethod is shown in Algorithm 1. For a detailed designprocedure of Nesterov’s accelerated method, the reader mightcheck [7], [8]. The main steps in the iteration of Nesterov’saccelerated approach can be seen in Steps 4 and 5 where aninterpolated point of each Lagrange multiplier λ i (denotedby ˆ λ i ) is computed. As a distributed method, these stepsare carried out by each agent. Furthermore, the step-sizeof the gradient ascent in (6), denoted by η i , is a localvariable that must be chosen appropriately (c.f. Theorem 1).Finally, note that, in (5), u i is updated by solving a localminimization derived from (3) and based on the interpolatedpoints of the Lagrange multipliers from the out-neighbors, Algorithm 2

Distributed accelerated method over stochasticnetworks

Initialization (for each i ∈ N )Set θ (1) = 1 , λ i (0) = 0 , and ˆ ξ ij (1) = ξ ij (0) = 0 , for all j ∈ M i Iteration (for each i ∈ N , k ≥ ): with random realizationof L ( k )

1) Compute u i ( k ) : u i ( k ) = arg min u i ∈U i f i ( u i ) + X j ∈M i h G i ⊤ j ˆ ξ ij ( k ) , u i i (8)2) Send G ij u i ( k ) to out-neighbors j ∈ E i ( k ) ∩ M i andreceive G ji u j ( k ) from in-neighbors j ∈ E i ( k ) ∩ N i

3) Compute λ i ( k ) : λ i ( k ) =  ˆ ξ ii ( k ) + η i (cid:16) G ii u i ( k ) + P j ∈N i G ji u j ( k ) − g i (cid:17) , if N i ⊆ E i ( k )ˆ ξ ii ( k ) , otherwise (9)4) Send λ i ( k ) to in-neighbors j ∈ E i ( k ) ∩ N i and receive λ j ( k ) from out-neighbors j ∈ E i ( k ) ∩ M i

5) Update ξ ij ( k ) , for all j ∈ M i : ξ ij ( k ) = ( λ j ( k ) , for j ∈ M i ∩ E i ( k ) , ˆ ξ ij ( k ) , otherwise (10)6) Compute θ ( k + 1) = √ θ ( k )2

7) Compute ˆ ξ ij ( k + 1) , for all j ∈ M i : ˆ ξ ij ( k + 1) = ξ ij ( k ) + θ ( k ) − θ ( k + 1) (cid:0) ξ ij ( k ) − ξ ij ( k − (cid:1) (11)i.e., ˆ λ i = col { ˆ λ j , j ∈ M i } . Due to Assumptions 1 and 2,the local minimization in Step 1 admits a unique solution.Now, we are ready to state the proposed method, whichworks over stochastic communication networks. The methodis shown in Algorithm 2. We adjust the gradient step update(Step 3) in order to take into account the time-varying natureof the communication network. As can be seen in Step 3, λ i is only updated with the gradient step when agent i receivesnew information from all in-neighbors in N i . Furthermore,the required Lagrange multipliers from the other agents j ∈ M i are tracked by agent i using the auxiliary vector ξ i = col { ξ ij , j ∈ M i } , where each ξ ij is updated in (10).Additionally, each agent i must compute the interpolatedpoint of ξ ij , denoted by ˆ ξ ij in (11). This step is different thanthe steps in Algorithm 1, where the exchanged informationis actually the interpolated point ˆ λ i .The outcome of Algorithm 2, which is the main result ofthis work, is stated as the following theorem. Theorem 1:

Let Assumptions 1-4 hold and the sequence λ ( k ) be generated by Algorithm 2 with η i ∈ (0 , /L i ] , where L i is deﬁned as follows: L i = X j ∈N i ∪{ i } k G j k σ j , (12)n which G j = col { G ji , i ∈ M j } and σ j is the strong con-vexity constant of f j ( u j ) . Furthermore, let q ( λ ) be deﬁnedby (2) and λ ⋆ be an optimal solution of the dual problem(4). Then,1) It holds that E ( q ( λ ⋆ ) − q ( λ ( k ))) ≤ C ( k + 1) , (13)where C is a non-negative constant.2) Hence, it also holds that lim k →∞ E ( q ( λ ⋆ ) − q ( λ ( k ))) = 0 , (14)almost surely. Theorem 1 shows that the expected dual values convergeto the optimal dual value with the rate of O (1 /k ) . Further-more, the choice of parameter η i , for each agent i ∈ N ,which is sufﬁcient to achieve convergence, can be obtainedlocally, i.e., agent i only requires some information from itsin-neighbors in N i (see (12)).IV. C ONVERGENCE ANALYSIS

First, Section IV-A provides some preliminary results,which become the building blocks to prove Theorem 1. Then,the proof of Theorem 1 is given in Section IV-B.

A. Preliminary results

First, we show that the local dual function, q i ( λ i ) , for any i ∈ N , is a Lipschitz smooth function. Lemma 1:

Let Assumptions 1-3 hold. The local dual func-tion q i ( λ i ) deﬁned in (3) is Lipschitz smooth with Lipschitzconstant k G i k σ i . Proof:

Recall the deﬁnition of q i ( λ i ) in (3) andlet u i ( λ i ) = arg min u i ∈U i n f i ( u i ) + P j ∈M i h G i ⊤ j λ j , u i i o and v i ( µ i )= arg min u i ∈U i n f i ( u i ) + P j ∈M i h G i ⊤ j µ j , u i i o .Since u i ( λ i ) , v i ( µ i ) ∈ U i , the optimality conditions [21] ofthe preceding minimizations yield the following inequalities: ≤ h∇ f i ( u i ( λ i )) + G i ⊤ λ i , v i ( µ i ) − u i ( λ i ) i , (15) ≤ h∇ f i ( v i ( µ i )) + G i ⊤ µ i , u i ( λ i ) − v i ( µ i ) i . (16)Combining (15) and (16) gives ≤ h∇ f i ( u i ( λ i )) − ∇ f i ( v i ( µ i )) , v i ( µ i ) − u i ( λ i ) i + h G i ⊤ ( λ i − µ i ) , v i ( µ i ) − u i ( λ i ) i≤ − σ i k v i ( µ i ) − u i ( λ i ) k + h λ i − µ i , G i ( v i ( µ i ) − u i ( λ i )) i , (17)where the second inequality is obtained since f i ( · ) is stronglyconvex (c.f. Property 1). Furthermore, the strong convexityof f i ( · ) also implies that u i ( λ i ) is unique and q i ( λ i ) isdifferentiable, with ∇ q i ( λ i ) = G i u i ( λ i ) − ˜ g i , where ˜ g i =col { ˜ g ij , j ∈ M i } and ˜ g ij = 0 m j if j = i and ˜ g ij = g i otherwise. Thus, ∇ q i ( µ i ) − ∇ q i ( λ i ) = G i ( v i ( µ i ) − u i ( λ i )) .Using [8, Lemma 1.1] we obtain that k G i k k∇ q i ( µ i ) − ∇ q i ( λ i ) k ≤ k v i ( µ i ) − u i ( λ i ) k . (18) By adding h λ i − µ i , ˜ g i − ˜ g i i = 0 to the right-hand side of (17),and then rearranging (17) as well as using (18) and the factthat G i v i ( µ i ) − ˜ g i = ∇ q i ( µ i ) and G i u i ( λ i ) − ˜ g i = ∇ q i ( λ i ) ,we obtain that σ i k G i k k∇ q i ( µ i ) − ∇ q i ( λ i ) k ≤ h λ i − µ i , ∇ q i ( µ i ) − ∇ q i ( λ i ) i≤ k µ i − λ i kk∇ q i ( µ i ) − ∇ q i ( λ i ) k , where the second inequality is obtained using the Cauchy-Schwarz inequality. Thus, we have that k∇ q i ( µ i ) − ∇ q i ( λ i ) k ≤ k G i k σ i k µ i − λ i k , showing that q i ( · ) is Lipschitz smooth with Lipschitz con-stant k G i k σ i (c.f. Property 2). Remark 1:

The Lipschitz constant of q i ( · ) can be com-puted locally by each agent i ∈ N since G i and parameter σ i are local information. Lemma 2:

Let Assumptions 1-3 hold. For any µ, λ ∈ R P i ∈N m i , it holds that q ( λ ) ≥ q ( µ ) + h λ − µ, ∇ q ( µ ) i − X i ∈N L i k λ i − µ i k , (19)where L i , for each i ∈ N , is deﬁned in (12). Proof:

Since q i ( λ i ) is concave and has a Lipschitzsmooth gradient (Lemma 1), it follows from [22] that q i ( λ i ) ≥ q i ( µ i ) + h λ i − µ i , ∇ q i ( µ i ) i − k G i k σ i k λ i − µ i k . (20)The desired inequality follows by summing (20) over i ∈ N .The Lipschitz smoothness property of the dual function(Lemma 2) is sufﬁcient to show the inequality (22) statedin Lemma 3, which will become the key to prove Theorem1. Note that Lemma 3 is similar to [7, Lemma 4.1] and [9,Lemma 5], although, differently from these references, thestep-size η i in (6) does not need to be the Lipschitz constantof the (dual) function. Lemma 3:

Let Assumptions 1-3 hold and the sequence { θ ( k ) , u i ( k ) , λ i ( k ) , ˆ λ i ( k ) , ∀ i ∈ N } be generated by Algo-rithm 1, with η i ∈ (0 , /L i ] , where L i is deﬁned by (12).Furthermore, let λ ⋆ = col { λ ⋆i , i ∈ N } be an optimal solutionof the dual problem (4) and deﬁne ω i ( k ) by ω i ( k ) = θ ( k ) λ i ( k ) − ( θ ( k ) − λ i ( k − − λ ⋆i , (21)for each i ∈ N . Then, it holds that X i ∈N η i (cid:0) k ω i ( k + 1) k − k ω i ( k ) k (cid:1) ≤ ( θ ( k )) ( q ( λ ( k )) − q ( λ ⋆ )) − ( θ ( k + 1)) ( q ( λ ⋆ ) − q ( λ ( k + 1))) . (22) Proof: see Appendix A. . Proof of Theorem 1

Recall that α i is the probability that the communicationlinks between agent i and all its in-neighbors j ∈ N i areactive, i.e., α i = Q j ∈N i β { i,j } and introduce the followingfunction V ( k ) : V ( k ) = X i ∈N α i η i k ω i ( k ) k , (23)where ω i ( k ) is deﬁned in (21).To show the convergence, ﬁrst we evaluate the sequence { E ( V ( k )) } . To this end, deﬁne F ( k ) as the ﬁltration up toand including iteration k , i.e., F ( k ) = {L ( ℓ ) , λ ( ℓ ) , ξ ( ℓ ) , ℓ =0 , , , . . . , k } , where ξ ( k ) = col { ξ i ( k ) , i ∈ N } . Based on(9), λ i ( k ) , for each i ∈ N , is updated with the gradientascent rule only when all the in-neighbors of agent i in N i send new information to agent i . Otherwise, λ i ( k ) = ˆ ξ ii ( k ) .Therefore, if N i ⊆ E i ( k + 1) , ω i ( k + 1) is computed using λ i ( k + 1) updated with the gradient ascent step. Otherwise,since λ i ( k + 1) = ˆ ξ ii ( k + 1) (c.f. (9)), we have that ω i ( k + 1) = θ ( k + 1) ˆ ξ ii ( k + 1) − ( θ ( k + 1) − λ i ( k ) − λ ⋆i = θ ( k + 1) λ i ( k ) + ( θ ( k ) − λ i ( k ) − λ i ( k − − ( θ ( k + 1) − λ i ( k ) − λ ⋆i = ω i ( k ) , where the second equality is obtained by using (11) and since λ i ( k ) = ξ ii ( k ) , for any k ≥ , due to (10) and a properinitialization in Algorithm 2.Thus, we can see that ω ( k + 1) is updated with probability α i and remains the same, i.e., ω i ( k + 1) = ω i ( k ) withprobability − α i . Based on this fact, we obtain, withprobability 1, that E ( V ( k + 1) − V ( k ) |F ( k ))= E X i ∈N α i η i (cid:0) k ω i ( k + 1) k − k ω i ( k ) k (cid:1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F ( k ) ! = X i ∈N η i (cid:18) α i α i k ω i ( k + 1) k + 1 − α i α i k ω i ( k ) k − α i k ω i ( k ) k (cid:19) = X i ∈N η i (cid:0) k ω i ( k + 1) k − k ω i ( k ) k (cid:1) ≤ ( θ ( k )) ( q ( λ ( k )) − q ( λ ⋆ )) − ( θ ( k + 1)) ( q ( λ ⋆ ) − q ( λ ( k + 1))) , (24)where the inequality is obtained based on (22) in Lemma 3.Iterating (24), for ℓ = 1 , , . . . , k − , and taking the total G G G GG1 2 3456 7 89101112 13 14

Fig. 2. The IEEE 14-bus network. expectation, we have that E k − X ℓ =1 ( V ( ℓ + 1) − V ( ℓ )) ! ≤ E k − X ℓ =1 ( θ ( ℓ )) ( q ( λ ( ℓ )) − q ( λ ⋆ )) − ( θ ( ℓ + 1)) ( q ( λ ⋆ ) − q ( λ ( ℓ + 1))) ! ⇐⇒ E ( V ( k ) − V (1)) ≤ θ (1) E ( q ( λ (1)) − q ( λ ⋆ )) − E ( θ ( k )) ( q ( λ ⋆ ) − q ( λ ( k )))) . (25)Rearranging the inequality in (25) yields E (cid:0) θ ( k ) ( q ( λ ⋆ ) − q ( λ ( k ))) (cid:1) ≤ E ( V (1) − V ( k )) + θ (1) E ( q ( λ ⋆ ) − q ( λ (1))) ≤ E ( V (1) + q ( λ ⋆ ) − q ( λ (1))) , (26)where the second inequality is obtained since θ (1) = 1 and by dropping − E ( V ( k )) since it is non-positive forany k ≥ . Finally, note that θ ( k ) is not random andit holds that θ ( k ) ≥ k +12 since θ (1) = 1 and it isupdated using the equation in step 6 of Algorithm 2[7]. Using this fact and (26), the desired inequality (13)follows, where C = 4 E ( V (1) + q ( λ ⋆ ) − q ( λ (1))) ≥ , since E ( V ( k )) ≥ , for any k ≥ , and q ( λ ⋆ ) =max λ q ( λ ) , thus E ( q ( λ ⋆ ) − q ( λ (1)) ≥ . Upon obtain-ing (13), we can show the equality (14). Since C in(13) is non-negative, the term E ( q ( λ ⋆ ) − q ( λ ( k ))) con-verges to 0. Furthermore, using the Markov inequality,for any δ ∈ R > , we have that lim sup k →∞ P ( q ( λ ⋆ ) − q ( λ ( k ) ≥ δ ) ≤ lim sup k →∞ δ E ( q ( λ ⋆ ) − q ( λ ( k )) = 0 , thus, lim k →∞ E ( q ( λ ⋆ ) − q ( λ ( k ))) = 0 , almost surely. V. N

UMERICAL STUDY

We use the IEEE 14-bus benchmark case, which is shownin Figure 2, as the test case in this simulation study, where ig. 3. Convergence of ∇ q ( λ ( k )) (top) and q ( λ ( k )) − q ⋆ (bottom). we solve an intra-day DC-OPF problem, with time horizon( h ) of 6 hourly steps. We suppose that each bus is an agentin the network, though there are only ﬁve active agents,which have the capability of generating power, bounded bythe capacity of the generators. Furthermore, we consider theDC-approximation of the power ﬂow equations, as follows: P g i,t − P l i,t = X j ∈N i B { i,j } ( ψ i,t − ψ j,t ) , ∀ i ∈ N , t = 1 , . . . , h, (27)where P g i,t ∈ R ≥ denotes the power generated at bus i at time step t , P l i,t ∈ R ≥ denotes the power demandassumed to be known for the whole time horizon, B { i,j } denotes the susceptance of line { i, j } , whereas ψ i denotesthe phase angle of bus i . The equalities in (27) becomethe coupling constraints of the network. In this problem, wecompute the hourly set points of each generator for the wholetime horizon. Additionally, we consider a strongly convexquadratic local cost.We suppose that the communication links among theagents may fail with certain probability, denoted by γ > .This implies that the activation probability of each commu-nication link is equal, i.e., β { i,j } = 1 − γ , for each i, j ∈ N ,where i = j , and we perform 10 Monte-Carlo simulations fordifferent values of γ . Moreover, we also compare Algorithm2 with the unaccelerated version, where θ ( k ) = 1 and γ = 0 , for all k ≥ . Figure 3 shows the convergence ofthe coupling constraint ∇ q ( λ ( k )) toward 0 and the dualvalue q ( λ ( k )) toward the optimal value q ⋆ . Additionally,Figure 4 shows the number of iterations required to meet thestopping criteria, which is the error of the equality constraint,i.e., k G ii u i ( k ) + P j ∈N i G ji u j ( k ) − g i k < ǫ , for a small ǫ ≥ . As expected, Algorithm 2 signiﬁcantly outperformsthe unaccelerated version, and the smaller γ , the faster theconvergence. VI. C ONCLUSION

In this paper, we propose a distributed algorithm for multi-agent optimization problem over stochastic networks. The Fig. 4. The number of iterations performed for different values of γ .The blue boxes indicate the 25 th -75 th percentiles, the red lines indicate themedian, and the + symbols indicate the outliers. algorithm is based on Nesterov’s accelerated gradient methodand we analytically show that the convergence rate of theexpected dual value is O (1 /k ) . We also show the perfor-mance of the algorithm in an intra-day optimal power ﬂowsimulation. As ongoing work, we are performing an analysison the convergence of the primal variables. Moreover, weinvestigate methods to relax the assumptions considered togeneralize the approach.A PPENDIX

A. Proof of Lemma 3

To show Lemma 3, we can follow the approach used onthe proof of [7, Lemma 2.3]. Therefore, ﬁrst we need thefollowing intermediate result.

Lemma 4:

Let ψ ( µ, ξ ) be a quadratic approximationmodel of q ( µ ) , i.e., ψ ( µ, ξ ) = q ( ξ ) + h µ − ξ, ∇ q ( ξ ) i − X i ∈N η i k µ i − ξ i k , (28)and λ ( ξ ) be deﬁned by λ ( ξ ) = arg max µ ψ ( µ, ξ ) . Further-more, let Assumptions 1-3 hold and η i ∈ (0 , /L i ] , where L i is deﬁned by (12). Then, for any µ ∈ R P i ∈N m i , q ( λ ( ξ )) − q ( µ ) ≥ X i ∈N η i h ξ i − µ i , λ i ( ξ ) − ξ i i + X i ∈N η i k λ i ( ξ ) − ξ i k . (29) Proof:

Since η i ∈ (0 , /L i ] , it follows from Lemma 2that q ( λ ( ξ )) ≥ ψ ( λ ( ξ ) , λ ) . Thus, q ( λ ( ξ )) − q ( µ ) ≥ ψ ( λ ( ξ ) , λ ) − q ( µ ) . Since q ( · ) is concave, we also have that q ( µ ) ≤ q ( λ ) + h µ − λ, ∇ q ( λ ) i . The desired inequality (29) is obtained by combining the twopreceding relations with the deﬁnition of ψ ( λ ( ξ ) , λ ) in (28). emark 2: The update λ ( k ) in (6) follows λ ( k ) =arg max µ ψ ( µ, ˆ λ ( k )) , which admits a unique solution. Next, [9, Lemma 4] shows that ω i ( k + 1) = ω i ( k ) + θ ( k + 1) (cid:16) λ i ( k + 1) − ˆ λ i ( k + 1) (cid:17) . Based on this relation,we obtain that k ω i ( k + 1) k − k ω i ( k ) k = k ω i ( k ) + θ ( k + 1)( λ i ( k + 1) − ˆ λ i ( k + 1)) k − k ω i ( k ) k = 2 θ ( k + 1)( θ ( k + 1) − ·· h λ i ( k + 1) − ˆ λ i ( k + 1) , ˆ λ i ( k + 1) − λ i ( k ) i ++ ( θ ( k + 1) − θ ( k + 1)) k λ i ( k + 1) − ˆ λ i ( k + 1) k ++ θ ( k + 1) k λ i ( k + 1) − ˆ λ i ( k + 1) k ++ 2 θ ( k + 1) h λ i ( k + 1) − ˆ λ i ( k + 1) , ˆ λ i ( k + 1) − λ ⋆i i , where the second equality is obtained by performing somealgebraic manipulations using (21) and (7). Multiplying by η i and summing over i ∈ N the above equality, we obtainthat X i ∈N η i (cid:0) k ω i ( k + 1) k − k ω i ( k ) k (cid:1) = ( θ ( k + 1) − θ ( k + 1)) · X i ∈N (cid:18) η i h λ i ( k + 1) − ˆ λ i ( k + 1) , ˆ λ i ( k + 1) − λ i ( k ) i + 12 η i k λ i ( k + 1) − ˆ λ i ( k + 1) k (cid:19) + θ ( k + 1) X i ∈N (cid:18) η i k λ i ( k + 1) − ˆ λ i ( k + 1) k + 1 η i h λ i ( k + 1) − ˆ λ i ( k + 1) , ˆ λ i ( k + 1) − λ ⋆i i (cid:19) . By applying the inequality (29) twice to substitute each terminside the two summations, we obtain the desired inequality,as follows: X i ∈N η i (cid:0) k ω i ( k + 1) k − k ω i ( k ) k (cid:1) ≤ ( θ ( k + 1) − θ ( k + 1))( q ( λ ( k + 1)) − q ( λ ( k ))+ θ ( k + 1)( q ( λ ( k + 1)) − q ( λ ⋆ ))= θ ( k + 1) q ( λ ( k + 1)) − ( θ ( k + 1) − θ ( k + 1)) q ( λ ( k )) − θ ( k + 1) q ( λ ⋆ )= θ ( k + 1) q ( λ ( k + 1)) − θ ( k ) q ( λ ( k ))+ ( θ ( k ) − θ ( k + 1) ) q ( λ ⋆ )= θ ( k ) ( q ( λ ⋆ ) − q ( λ ( k ))) − θ ( k + 1) ( q ( λ ⋆ ) − q ( λ ( k + 1))) , where the second equality is obtained based on step 4 ofAlgorithm 1, where θ ( k + 1) − θ ( k + 1) − θ ( k ) = 0 . R EFERENCES[1] D. K. Molzahn, F. D¨orﬂer, H. Sandberg, S. H. Low, S. Chakrabarti,R. Baldick, and J. Lavaei, “A survey of distributed optimization andcontrol algorithms for electric power systems,”

IEEE Transactions onSmart Grid , vol. 8, no. 6, pp. 2941–2962, 2017. [2] P. Yi, Y. Hong, and F. Liu, “Initialization-free distributed algorithmsfor optimal resource allocation with feasibility constraints and appli-cation to economic dispatch of power systems,”

Automatica , vol. 74,pp. 259–269, 2016.[3] J. M. Grosso, C. Ocampo-Martinez, and V. Puig, “A distributed pre-dictive control approach for periodic ﬂow-based networks: applicationto drinking water systems,”

International Journal of Systems Science ,vol. 48, no. 14, pp. 3106–3117, 2017.[4] L. Xiao and S. Boyd, “Optimal scaling of a gradient method fordistributed resource allocation,”

Journal of Optimization Theory andApplications , vol. 129, pp. 469–488, 2006.[5] D. P. Bertsekas, “Extended monotropic programming and duality,”

Journal of Optimization Theory and Applications , vol. 139, pp. 209–225, 2008.[6] Y. Nesterov, “A method for solving the convex programming problemwith convergence rate O (1 /k ) ,” Dokl. Akad. Nauk SSSR , vol. 27, p.543547, 1983, translated as Sov. Math. Dokl.[7] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholdingalgorithm for linear inverse problems,”

SIAM Journal on ImagingSciences , vol. 2, no. 1, pp. 183–202, 2009.[8] A. Beck, A. Nedi´c, A. Ozdaglar, and M. Teboulle, “An o (1 /k ) gradient method for network resource allocation problems,” IEEETransactions on Control of Network Systems , vol. 1, no. 1, pp. 64–73,2014.[9] T. Goldstein, B. O’Donoghue, S. Setzer, and R. Baraniuk, “Fastalternating direction optimization methods,”

SIAM Journal on ImagingSciences , vol. 7, no. 3, pp. 1588–1623, 2014.[10] X. Zhou, C. Li, T. Huang, and M. Xiao, “Fast gradient-baseddistributed optimisation approach for model predictive control andapplication in four-tank benchmark,”

IET Control Theory Applications ,vol. 9, no. 10, pp. 1579–1586, 2015.[11] E. Wei and A. Ozdaglar, “On the O(1/k) convergence of asynchronousdistributed alternating direction method of multipliers,” pp. 1–30,2013, arXiv:1307.8254.[12] T. Chang, M. Hong, W. Liao, and X. Wang, “Asynchronous distributedADMM for large-scale optimizationPart I: algorithm and convergenceanalysis,”

IEEE Transactions on Signal Processing , vol. 64, no. 12,pp. 3118–3130, 2016.[13] M. Hong and T. Chang, “Stochastic proximal gradient consensus overrandom networks,”

IEEE Transactions on Signal Processing , vol. 65,no. 11, pp. 2933–2948, 2017.[14] W. Ananduta, A. Nedi´c, and C. Ocampo-Martinez, “Distributed aug-mented Lagrangian method for linkbased resource sharing problemsof multiagent systems,”

IEEE Transactions on Automatic Control ,submitted.[15] D. Jakoveti´c, J. M. F. Xavier, and J. M. F. Moura, “Convergence ratesof distributed nesterov-like gradient methods on random networks,”

IEEE Transactions on Signal Processing , vol. 62, no. 4, pp. 868–882,2014.[16] O. Fercoq and P. Richt´arik, “Accelerated, parallel, and proximalcoordinate descent,”

SIAM Journal on Optimization , vol. 25, no. 4,pp. 1997–2023, 2015.[17] A. Nedi´c and A. Olshevsky, “Distributed optimization over time-varying directed graphs,”

IEEE Transactions on Automatic Control ,vol. 60, no. 3, pp. 601–615, 2015.[18] C. A. Uribe, S. Lee, A. Gasnikov, and A. Nedi´c, “A dual approachfor optimal algorithms in distributed optimization over networks,”

Optimization Methods and Software , pp. 1–40, 2020.[19] G. Scutari and Y. Sun, “Distributed nonconvex constrained optimiza-tion over time-varying digraphs,”

Mathematical Programming , vol.176, pp. 497–544, 2019.[20] D. Bertsekas,

Nonlinear Programming . Athena Scientiﬁc, 1995.[21] A. Nedi´c,