Dependable Distributed Nonconvex Optimization via Polynomial Approximation
aa r X i v : . [ m a t h . O C ] J a n Dependable Distributed Nonconvex Optimizationvia Polynomial Approximation
Zhiyu He, Jianping He, Cailian Chen and Xinping Guan
Abstract —There has been work on exploiting polynomialapproximation to solve distributed nonconvex optimization prob-lems. This idea facilitates arbitrarily precise global optimizationwithout requiring local evaluations of gradients at every iteration.Nonetheless, there remains a gap between existing theoreticalguarantees and diverse practical requirements for dependability,including privacy preservation and robustness to network imper-fections (e.g., time-varying directed communication, asynchronyand packet drops). To fill this gap and keep the above strengths,we propose a Dependable Chebyshev-Proxy-based distributedOptimization Algorithm (D-CPOA). Specifically, to ensure bothaccuracy of solutions and privacy preservation of local objectivefunctions, a new privacy-preserving mechanism is designed. Thismechanism leverages the randomness in block-wise insertions ofperturbed data and separate subtractions of added noises, andits effects are thoroughly analyzed through ( α, β )-data-privacy.In addition, to gain robustness to various network imperfections,we use the push-sum consensus protocol as a backbone, discussits specific enhancements, and evaluate the performance of theproposed algorithm accordingly. Thanks to the linear consensus-based structure of iterations, we avoid the privacy-accuracytrade-off and the bother of selecting appropriate step-sizes in dif-ferent settings. We provide rigorous treatments of the accuracy,dependability and complexity. It is shown that the advantagesbrought by the idea of polynomial approximation are perfectlymaintained when all the above challenging requirements exist.Simulations demonstrate the efficacy of the developed algorithm.
Index Terms —Distributed optimization, Chebyshev polynomialapproximation, dependability, privacy preservation, ( α, β ) -data-privacy, robustness. I. I
NTRODUCTION
Distributed optimization enables multiple agents in a net-work to agree on the optimal points of the average of localobjective functions. This global aim is achieved by exploitinglocal computations and communication between neighboringagents. Such a distributed architecture is highly preferablein a variety of applications related to large-scale networkedsystems, e.g., distributed learning [2], energy management[3] and resource allocation [4]. In these applications, theneeds of improving efficiency, scalability and robustness aswell as protecting privacy have motivated the development ofdistributed strategies, which serve as plausible alternatives totheir centralized counterparts.
The authors are with the Department of Automation, Shanghai Jiao TongUniversity, and Key Laboratory of System Control and Information Pro-cessing, Ministry of Education of China, Shanghai 200240, China. Emails:{hzy970920, jphe, cailianchen, xpguan}@sjtu.edu.cn. This paper was pre-sented in part at the 59 th IEEE Conference on Decision and Control, Republicof Korea, December 2020 [1].
A. Motivations
Considerable effort has been devoted to designing efficientdistributed optimization algorithms, e.g., [5]–[8] and extendingthem to meet diverse practical requirements, including privacypreservation [4], [9], [10], time-varying and directed commu-nication [11]–[13] and asynchronous computations to allowlack of coordination [13], [14], delays and packet drops [15],[16]. Most of these extensions focus on convex problems, andsome critical and complex issues including privacy-accuracytrade-off [10], optimal numbers of iterations [4] and boundsfor constant step-sizes [13], [16] have been explored.Recently, [17] proposed a promising algorithm termedCPCA to address a class of constrained distributed nonconvexoptimization problems. The core idea is to first use polynomialapproximations (i.e., proxies) to substitute for general local ob-jective functions, and then employ consensus protocols, wherethe information of coefficients of these proxies is exchanged,to enable agents to acquire a global proxy, and finally solvean easier approximate version of the original problem locally.The novel idea of employing polynomial approximation helpsto achieve arbitrarily precise global optimization without de-manding local evaluations of gradients or values of objectivefunctions at every iteration. More importantly, it separates thisalgorithm from typical gradient-based methods, and offers anew perspective to address distributed optimization problems.Nevertheless, this algorithm is neither inherently privacy-preserving nor robust against various imperfections in networkcommunication, and it is unclear whether the aforementionedadvantages can be maintained when the factors of privacy androbustness are taken into account. First, it can easily cause theleakage of private information of local objective functions. Theleakage results from its consensus-based iterations where thevectors of coefficients of local proxies are directly exchanged.Once the adversaries obtain the exact initial vector of thetarget agents, they can recover a fairly accurate estimate ofthe corresponding local objectives. Hence, how to effectivelypreserve the privacy of objective functions in this algorithmand to quantify the protection results are well worth consid-eration. Second, it only handles the optimization over staticundirected graphs with perfect communication. Given thatissues including time-varying and directed links, lack of coor-dination, transmission delays and packet drops are common inapplications, it is meaningful to investigate their effects on theperformance of this algorithm and find effective measures tomake it more robust. The above issues have motivated thestudy of this work. We aim to demonstrate that the novelidea of introducing polynomial approximation into distributedptimization not only allows for further enhancements to meetdiverse practical needs, but also maintains notable advantagesin terms of performance in these settings.
B. Contributions
In this paper, we exploit the idea of using polynomialapproximation and develop a Dependable Chebyshev-Proxy-based distributed Optimization Algorithm (D-CPOA), consid-ering typical needs of privacy preservation and robustnessto various network imperfections, including time-varying anddirected communication and asynchrony due to lack of coor-dination, delays or packet drops.We first focus on the requirement of preserving the privacyof local objective functions. This requirement is reduced tokeeping the initial vectors as secrets in the consensus-basediterations of CPCA. These vectors store the coefficients oflocal approximations and are of different lengths. Instead ofsimply extending existing methods designed for the scalar case(e.g., [18]–[21]), we exploit the feature that vector states canbe partitioned and propose a new privacy-preserving mecha-nism for consensus-based iterations. The key idea is to appendinitial vectors perturbed by noises block by block to currentstates and then remove the influence of perturbations at severalseparate iterations afterward. The randomness in the actions ofappending helps to hide useful perturbed initial values withiniterations, and the separate subtractions of noises can ensurethe exact convergence and also mitigate the negative impacts ofpersistent noises on the convergence rates [20]. These designscontribute to the effective preservation of privacy, and itsdegree is properly analyzed through ( α, β ) -data-privacy [22].We avoid the trade-off between privacy and accuracy andconsider a more general problem with nonconvex objectives,which is in sharp contrast with existing differentially privatedistributed convex optimization algorithms [4], [9], [10].To gain robustness against various imperfections in networkcommunication, we employ the push-sum average consensusprotocol [23] as a backbone of iterations to handle time-varying and directed graphs, and then discuss its asynchronousextensions to cope with issues including lack of coordination,delays and packet drops. We analyze in detail the relationshipbetween the accuracy of consensus and that of the obtainedsolutions, thus verifying that the proposed algorithm keeps ef-fective when the above network imperfections exist. Since theiterations of the developed algorithm are linear and consensus-based, we are free from the problem of selecting appropriatestep-sizes in different settings, which is a troublesome routineof typical gradient-based methods.Preliminary results on addressing time-varying and directedcommunication and some computational issues are presentedin [1]. In this paper, we extend the analysis by i) furtherfulfilling the requirement of privacy preservation, ii) offeringmore details on the design and analysis of the strategies to dealwith various network imperfections, and iii) adding omittedproofs relating to the analysis of the developed algorithm. Themain contributions are summarized as follows.1) We propose D-CPOA to solve a class of constraineddistributed nonconvex optimization problems, pursuing the guarantee of privacy preservation and robustnessagainst various network imperfections. We demonstratethat it maintains the advantages of CPCA in being ableto obtain ǫ -globally optimal solutions for any arbitrarilysmall given precision ǫ and being distributed terminable.2) A new privacy-preserving mechanism is introduced intothe proposed algorithm to prevent sensitive informationof local objective functions from being leaked. Thismechanism is tailored for the setting where local ob-jective functions are represented as vectors of differentlengths, and it exploits the randomness in block-by-block insertions of perturbed data and separate sub-tractions of added noises to achieve both the effectivepreservation of privacy and the exact convergence. Wethoroughly analyze the privacy degree through ( α, β ) -data-privacy.3) We address the robustness issue of the proposed al-gorithm in face of various imperfections in networkcommunication. The iterations of D-CPOA are based onthe push-sum consensus protocol. It functions well overtime-varying and directed networks and can be furtherenhanced to allow for asynchronous computations andmanage time delays and packet drops. We prove that theproposed algorithm keeps effective when all these im-perfections are present, and there is no need to carefullyselect proper step-sizes in different circumstances. C. Paper Organization
The remainder of this paper is organized as follows. Sec-tion II describes the problem of interest and gives somepreliminaries. Section III presents the algorithm D-CPOA.Section IV analyzes the accuracy, dependability and com-plexity of the proposed algorithm. Numerical evaluations areperformed in Section V, followed by the review of relatedwork in Section VI. Finally, Section VII concludes this paper.II. P
ROBLEM D ESCRIPTION AND P RELIMINARIES
Consider a network system consisting of N agents, each ofwhich owns a local objective function f i ( x ) : X i → R and alocal constraint set X i ⊂ R . The network at time t ( t ∈ N ) isdescribed as a directed graph G t = ( V , E t ) , where V is the setof agents, and E t ⊆ V ×V is the set of edges. Note that ( i, j ) ∈E t if and only if ( iff ) agent j can receive messages from agent i at time t . In this paper, the superscript t , subscripts i, j andscript in parentheses k denote the number of iterations, indexesof agents and index of elements in a vector, respectively. A. Problem Description
In this paper, we aim to solve the following constrainedoptimization problem min x f ( x ) = 1 N N X i =1 f i ( x ) , s.t. x ∈ X = N \ i =1 X i (1)n a distributed and dependable manner. Specifically, theglobal aim of optimization needs to be achieved by means oflocal communication and computations. Meanwhile, diversepractical requirements will be taken into account, includingpreservation of the privacy of local objective functions androbustness to time-varying directed communication and asyn-chrony. Some basic assumptions are given as follows. Assumption 1.
Every f i ( x ) is Lipschitz continuous on X i . Assumption 2.
All X i are closed, bounded and convex sets. Assumption 3. {G t } is B -strongly-connected, i.e., there existsa positive integer B , such that for any k ∈ N , the graph (cid:0) V , S ( k +1) B − t = kB E t (cid:1) is strongly connected. In problem (1), the objective functions are (possibly) non-convex and the constraint sets are convex. Therefore, it isa constrained distributed nonconvex optimization problem.Under Assumption 2, X i is a closed interval for any i ∈ V .Hence, let X i = [ a i , b i ] , where a i , b i ∈ R . As a result, X = [ a, b ] , where a = max i ∈V a i , b = min i ∈V b i . B. Preliminaries • Consensus Protocols
Let N in ,ti = { j | ( j, i ) ∈ E t } and N out ,ti = { j | ( i, j ) ∈ E t } be the sets of agent i ’s in-neighbors and out-neighbors,respectively, and d out ,ti = |N out ,ti | be its out-degree, where |N out ,ti | is the cardinality of N out ,ti . Suppose that every agent i owns a local variable x ti ∈ R . There are two kinds ofclassical consensus protocols, i.e., maximum consensus andaverage consensus, that allow agents to reach a global agree-ment through local information exchange only. The maximumconsensus protocol [25] is given by x t +1 i = max j ∈N in ,ti x tj . (2)It can be proven that with (2), all x ti converge to max i ∈V x i in T ( ≤ ( N − B ) iterations, i.e, x ti = max i ∈V x i , ∀ t ≥ T, i ∈ V . The push-sum average consensus protocol [23] is given by x t +1 i = X j ∈N in ,ti a tij x tj , y t +1 i = X j ∈N in ,ti a tij y tj , (3)where y ti ∈ R is initialized to be for all i ∈ V . The weight a tij is set according to a tij = ( /d out ,tj , if j ∈ N in ,ti , , else. (4)Let A t , ( a tij ) N × N . It follows that A t is column stochastic.In the implementation, (3) requires every agent j to transmitthe data x tj /d out ,tj and y tj /d out ,tj to its out-neighbors. With (3),the ratio z ti , x ti /y ti converges geometrically to the averageof all the initial values x = 1 /N P Ni =1 x i [23], i.e., lim t →∞ z ti = x, ∀ i ∈ V . As [24], we assume that i ∈ N in ,ti , ∀ t ∈ N , i.e., agent i can always accessits own information. • Chebyshev Polynomial Approximation
The degree m Chebyshev interpolant p ( m ) ( x ) correspondingto a Lipschitz continuous function g ( x ) defined on [ a, b ] takesthe form as p ( m ) ( x ) = m X j =0 c j T j (cid:16) x − ( a + b ) b − a (cid:17) , x ∈ [ a, b ] , (5)where T j ( u ) is the j -th Chebyshev polynomial defined on [ − , and satisfies | T j ( u ) | ≤ , ∀ u ∈ [ − , . As m increases, p ( m ) ( x ) uniformly converges to g ( x ) on the entire [ a, b ] [26], i.e., ∀ x ∈ [ a, b ] , (cid:12)(cid:12) p ( m ) ( x ) − g ( x ) (cid:12)(cid:12) → , as m → ∞ . Note that the convergence rates of approximation errors de-pend on the smoothness of g ( x ) and are discussed in [17].Consequently, computing p ( m ) ( x ) becomes a practical way toconstruct an arbitrarily precise polynomial approximation for g ( x ) , as theoretically ensured by the Weierstrass Approxima-tion Theorem [26].
C. Models of Adversaries of Privacy
In this paper, we mainly consider the honest-but-curiousadversaries [21]. These adversaries are agents in the networkthat faithfully follow the specified protocol but intend to infer f i ( x ) of the target agent i based on the received data.In applications, the evolutions of time-varying networkscan be arbitrary and unpredictable. Hence, it is hard forthese adversaries to own stable and perfect knowledge of thekey information on which an accurate estimation relies. Inthis paper, we are first concerned with the issue of privacydisclosure arising in the consensus iterations of D-CPOA. Asfor the push-sum consensus algorithms, this key informationavailable to each agent i refers to I own ,ti = { a tii , x ti } , I in ,ti = { a tij , x tj | j ∈ N in ,ti } , which are information sets of the states and weights of agent i and those transmitted from N in ,ti to agent i at time t , respec-tively. As has been discussed in [18], [27], the knowledge of S t ∈ N I own ,ti , S t ∈ N I in ,ti and the coupling relationship betweenthe locally added noises is a sufficient condition for the privacycompromise of the noise-adding-based privacy-preserving con-sensus algorithms. We make the following assumption on theabilities of these adversaries. Assumption 4.
At every time t , for the target agent i , honest-but-curious adversaries can always access I own ,ti but can onlyobtain the full knowledge of I in ,ti with a probability whoseupper bound is p ∈ (0 , .D. Privacy Definition Without loss of generality, we consider the requirement ofpreserving the privacy of agent i ’s local objective f i ( x ) . InCPCA, local communication happens in its second stage ofaverage consensus iterations, where agents directly exchangeand update their local variables p i ∈ R m i +1 . These variablesare the vectors of coefficients of approximations p i ( x ) for localobjectives f i ( x ) . Once the adversaries obtain an estimation ˆ p i f p i , they can recover an approximation ˆ f i ( x ) : X → R for f i ( x ) . Note that ˆ f i ( x ) is in the form of (5) with its coefficientsstored in ˆ p i . Hence, p i is the sensitive information of f i ( x ) and its privacy needs to be preserved.In this paper, we aim to design a secure average consensusalgorithm for D-CPOA to effectively preserve the privacyof f i ( x ) , or more specifically, p i . This algorithm will betailored to the case where agents own local variables ofdifferent lengths. To characterize the privacy degree, we use ( α, β ) -data-privacy, which is a comprehensive measure of theestimation accuracy and disclosure probability [22]. Let ˆ p i bethe estimation of p i based on be the available information set I and the predefined rule. The definition of ( α, β ) -data-privacyis given as follows. Definition 1.
A distributed algorithm achieves ( α, β )-data-privacy for p i iff Pr (cid:8) k ˆ p i − p i k ≤ α |I (cid:9) ≤ β. (6)In the above definition, α ≥ and β ≥ are parametersthat indicate the estimation accuracy and the bound for thedisclosure probability of p i , respectively. When α is speci-fied, a smaller β corresponds to a higher degree of privacypreservation. The original definition of ( α, β )-data-privacy in[22] considers the estimation of scalar states, and it is extendedin this paper to handle vector states. We use the l -norm of theerror ˆ p i − p i to measure the estimation accuracy. This usagecontributes to the neat relationship between the estimationaccuracy of p i and that of f i ( x ) . Detailed discussions areprovided in Remark 3.III. D ESIGN OF D EPENDABLE -CPOAIn this section, we present the design of Dependable-CPOA(D-CPOA). The proposed algorithm consists of three stages,whose details are discussed in the following three subsections.
A. Construction of Local Chebyshev Proxies
In this stage, every agent i computes a polynomial approx-imation p i ( x ) for f i ( x ) on X = [ a, b ] , s.t. | f i ( x ) − p i ( x ) | ≤ ǫ , ∀ x ∈ [ a, b ] (7)holds, where ǫ > is a specified tolerance. This goalis achieved by using the adaptive Chebyshev interpolationmethod [28]. In this method, the degree of the interpolantis systematically increased until a certain stopping criterion issatisfied. The details are as follows. Agent i sets m i = 2 andbegins to calculate a Chebyshev interpolant of degree m i . Itevaluates f i ( x ) at the ( m i + 1) -point grid S m i , { x k } by x k = b − a (cid:18) kπm i (cid:19) + a + b ,f k = f i ( x k ) , (8)where k = 0 , , . . . , m i . Then, it calculates the coefficients ofthe interpolant in (5) by c j = 1 m i ( f + f m i cos( jπ )) + 2 m i m i − X k =1 f k cos (cid:18) jkπm i (cid:19) , (9) where j = 0 , , . . . , m i [28]. At every iteration, the degree m i is doubled until the stopping criterion max x k ∈ ( S mi − S mi ) | f i ( x k ) − p i ( x k ) | ≤ ǫ (10)is met, where p i ( x ) takes the form of (5) with { c j } being thecoefficients. The intersection X = [ a, b ] of local constraintsets is known by running some numbers of max/min consensusiterations as (2) beforehand. B. Privacy-Preserving Information Dissemination
After the stage of initialization, each agent owns a localvariable p i ∈ R m i +1 , which is the vector of coefficients oflocal polynomial approximation p i ( x ) . In this stage, the goalis to enable agents to agree on the average p = 1 /N P Ni =1 p i of their initial values via a distributed mechanism and, at thesame time, the privacy of these initial values is preserved.We propose a privacy-preserving consensus-based schemeof information dissemination to achieve the aforementionedgoal. The backbone of this scheme is the push-sum averageconsensus protocol. The key ideas of the developed privacy-preserving mechanism are i) adding random noises to p i to mask the true values, ii) inserting the elements of theperturbed initial states block-by-block to hide them within theiterations, and iii) subtracting the noises separately in severalrandomly chosen rounds of iterations to guarantee the accuracyof average consensus. The details are as follows.First, every agent i generates a noise vector θ i ∈ Θ m i +1 whose elements are independent random variables within thedomain Θ , and adds θ i to its initial state p i to form a perturbedstate ˜ p i , i.e., ˜ p i = p i + θ i . Then, agents go on push-sum consensus iterations to exchangeand update their local variables x ti and y ti . The initial value of y ti is set as for all i ∈ V . Nonetheless, instead of directlysetting the initial value of x ti as ˜ p i , every agent i will graduallyextend x ti with the elements of ˜ p i in the first K iterations.Let ( d i , . . . , d K i ) be drawn from the multinomial distributionwith parameters m i + 1 and (cid:0) K , . . . , K (cid:1) . Then, K X t =1 d ti = m i + 1 , d ti ∈ { , . . . , m i + 1 } , ∀ t. Hence, ( d i , . . . , d K i ) can be used to denote the numbers ofelements of ˜ p i that are inserted to x ti at every iteration. Let l i = 0 , l ti = t X k =1 d ki , t = 1 , . . . , K . At the t -th iteration, the ( l t − i + 1) -th to l ti -th elements of ˜ p i are inserted into x ti to form x t + i . The rule of insertion is asfollows. For all t = 1 , . . . , K , x t + i ( k ) = ( x ti ( k ) + ˜ p i ( k ) , for k = l t − i + 1 , . . . , l ti , x ti ( k ) , else. (11) In this expression of the average, those variables of shorter lengths areextended with zeros when necessary to ensure the agreement in dimensions. ote that if the corresponding x ti ( k ) is null, it is regardedas in (11). Then, agents transmit x t + i and y ti to their out-neighbors and update x t +1 i and y t +1 i by x t +1 i ( k ) = X j ∈N in ,ti a tij x t + j ( k ) , ∀ k, y t +1 i = X j ∈N in ,ti a tij y tj . (12)Note that (12) also involves the extension of x ti . The length of x t +1 i will be the same as that of the longest x t + j , ∀ j ∈ N in ,ti .At the end of the K -th iteration, all the elements of ˜ p i havebeen gradually inserted, and the length of x ti is at least m i +1 .In the following K − K iterations, to guarantee the accuracyof the average consensus iterations, every agent will properlysubtract the added noises. Let L be a random integer between and K − K such that | ζ i ( k ) | > α, where ζ i ( k ) , θ i ( k ) L . (13)Note that L can be drawn from various discrete distributions,e.g., the discrete uniform, binomial and hypergeometric distri-butions. The choices of such distributions are up to the agentsand are unknown to the adversaries. For the k -th element of x ti ( ∀ k ) , at L randomly selected numbers of iterations, everyagent i subtracts a fraction of the added noise ζ i ( k ) from theupdated state, i.e., x t +1 i ( k ) = X j ∈N in ,ti a tij x t + j ( k ) ,x ( t +1)+ i ( k ) = x t +1 i ( k ) − ζ i ( k ) , ∀ k,y t +1 i = X j ∈N in ,ti a tij y tj . (14)The numbers of these selected iterations form a set X i,k , ∀ k .It is assumed that the duration of this period is sufficient forall x ti to be extended to the length m + 1 , where m , max i ∈V m i is the maximum degree of all the local approximations. Atthe rest of the iterations, agents update their local variables by(12), where x t + i ( k ) is set as x ti ( k ) , ∀ k ≥ K + 1 .To realize distributed stopping when the precision of it-erations has met the requirement, we utilize the max/min-consensus-based stopping mechanism in [29] after the K -th iteration. Note that the scheme in [29] deals with staticdigraphs, but it can be easily extended to the settings withtime-varying digraphs, given that in this case the max/minconsensus protocols can still converge in finite time. Thefollowing assumption is required by this mechanism. Assumption 5.
Every agent i in G knows an upper bound U on ( N − B , such that U is of the same order as ( N − B . Specifically, there are two auxiliary variables, i.e., r ti , s ti ∈ R m +1 , initialized as p K i = x K i /y K i and updated togetherwith x ti and y ti by r t +1 i ( k ) = max j ∈N in ,ti r tj ( k ) , s t +1 i ( k ) = min j ∈N in ,ti s tj ( k ) , (15) where k = 1 , . . . , m + 1 . These variables are reinitialized as p ti every U iterations, so that the recent information of p ti iscontinually disseminated. When the stopping criterion k r Ki − s Ki k ∞ ≤ δ, δ = ǫ m + 1 (16)is satisfied at the K -th iteration, agents terminate the iterationsand set p Ki = x Ki /y Ki . C. Polynomial Optimization by Solving SDPs
In this stage, agents locally optimize the polynomial proxy p Ki ( x ) recovered from p Ki on X = [ a, b ] to obtain ǫ -optimalsolutions of problem (1). This optimization problem can betransformed to a semidefinite program (SDP), and it can beefficiently solved by using the interior-point method [30].We provide such reformulations based on the Chebyshevcoefficients of p Ki ( x ) .Note that p Ki ( x ) is a polynomial of degree m and takesthe form of (5). The elements of p Ki = [ c , . . . , c m ] T are itscoefficients. To simplify the notation, let g Ki ( u ) , p Ki (cid:16) b − a u + a + b (cid:17) = m X j =0 c j T j ( u ) , u ∈ [ − , . The optimal values of p Ki ( x ) on [ a, b ] and g Ki ( u ) on [ − , are the same, and the optimal points x ∗ p and u ∗ g satisfy x ∗ p = b − a u ∗ g + a + b . (17)Hence, we solve the following problem min u g Ki ( u ) , s.t. u ∈ [ − , , (18)and then use (17) to obtain the optimal value and optimalpoints of p Ki ( x ) on [ a, b ] . To this end, we first transformproblem (18) to max t t, s.t. g Ki ( x ) − t ≥ , ∀ x ∈ [ − , . (19)Then, we introduce new optimization variables Q, Q ′ ∈ S + .When m is odd, problem (19) is transformed to max t,Q,Q ′ t s.t. c = t + Q + Q ′ + 12 m X u =1 ( Q uu + Q ′ uu )+ 14 X | u − v | =1 ( Q uv − Q ′ uv ) ,c j = 12 X ( u,v ) ∈A ( Q uv + Q ′ uv )+ 14 X ( u,v ) ∈B ( Q uv − Q ′ uv ) , j = 1 , . . . , m,Q ∈ S ⌊ m/ ⌋ +1+ , Q ′ ∈ S ⌊ ( m − / ⌋ +1+ , (20)where the rows and columns of Q and Q ′ are indexed by , , . . . , and A = { ( u, v ) | u + v = i ∨ | u − v | = i } , B = (cid:8) ( u, v ) | u + v = i − ∨ | u − v | = i − ∨ | u + v − | = i ∨ (cid:12)(cid:12) | u − v | − (cid:12)(cid:12) = i (cid:9) . hen m is even, the transformed problem takes a similar form.We refer readers to our work [17] for more details on the formsand sensibilities of these reformulations .The aforementioned transformed problems are SDPs, andtherefore can be efficiently solved via the primal-dual interior-point method [30]. The iterations of this method are terminatedwhen ≤ f ∗ e − p ∗ ≤ ǫ , where f ∗ e is the obtained estimate of the optimal value p ∗ of p Ki ( x ) on X = [ a, b ] , and ǫ > is the specifiedprecision. The optimal points of g Ki ( x ) are computed from thecomplementary slackness condition [31]. The optimal pointsof p Ki ( x ) on X can then be calculated by (17).The full details of the proposed algorithm are summarizedas Algorithm 1. We set all the precision used in three stages,i.e., ǫ , ǫ and ǫ , as ǫ/ , such that their sum equals to ǫ andthen the reach of ǫ -optimality is ensured.IV. P ERFORMANCE A NALYSIS
A. Accuracy
We establish the accuracy of D-CPOA in this subsection.We use ǫ and f ∗ to denote the specified precision and theoptimal value of problem (1), respectively. The followinglemma guarantees the accuracy of the consensus iterationswithin the proposed algorithm. Lemma 1.
When (16) is satisfied, we have max i ∈V (cid:13)(cid:13) p Ki − p (cid:13)(cid:13) ∞ ≤ δ, (21) where δ = ǫ / ( m + 1) .Proof. The proof is provided in Appendix A.
Remark 1.
Lemma 1 is in the same form as [17, Theorem1], but its proof is much more involved. Here we need toprove that with the insertions of perturbed data and separatesubtractions of noises, the reach of exact average consensusis still ensured. We also need to verify the effectiveness of thestopping criterion (16) in this case.
The following theorem demonstrates the accuracy of theproposed algorithm.
Theorem 2.
Suppose that Assumptions 1-5 hold. D-CPOAensures that every agent obtains ǫ -optimal solutions f ∗ e forproblem (1), i.e., | f ∗ e − f ∗ | ≤ ǫ, where ǫ > is any arbitrarily small specified precision.Proof. The proof is provided in Appendix B.
Remark 2.
Theorem 2 takes the same form as [17, The-orem 4]. It implies that even though various challengingrequirements concerning privacy and robustness are taken intoaccount, arbitrarily precise globally optimal solutions are stillobtained by using the proposed algorithm. Such SDP reformulations are only dependent on the Chebyshev coeffi-cients of the polynomial p Ki ( x ) to be optimized, and are independent of thenetwork topology. Hence, the analysis of the reformulations in [17], whichconsiders static undirected networks, also applies to this paper. Algorithm 1
D-CPOA
Input: f i ( x ) , X i = [ a i , b i ] , U and ǫ . Output: f ∗ e for every agent i ∈ V . Initialize: a i = a i , b i = b i , m i = 2 . for each agent i ∈ V do for t = 0 , . . . , U − do a t +1 i = max j ∈N in ,ti a tj , b t +1 i = min j ∈N in ,ti b tj . end for Set a = a ti , b = b ti .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calculate { x j } and { f j } by (8). Calculate { c k } by (9). If (10) is satisfied (where ǫ = ǫ/ ), go to step 10. Otherwise,set m i ← m i and go to step 7.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Set ˜ p i = p i + θ i , x i = null , y i = 1 , ( d i , . . . , d K i ) ∼ Multi ( m i + 1 , /K (1 , . . . , , l = 1 . for t = 0 , , . . . do if t ≤ K then Extend x ti to form x t + i by (11). Update x t +1 i , ∀ k and y t +1 i by (12). else if K + 1 ≤ t ≤ K then for each element k = 1 , . . . , m i do if t ∈ X i,k then Update x t +1 i ( k ) , ∀ k and y t +1 i by (14). else Update x t +1 i ( k ) , ∀ k and y t +1 i by (12). end if end for else if t = lU then if k r ti − s ti k ∞ ≤ ǫ / ( m + 1) then p Ki = x ti /y ti . break end if r ti = s ti = p ti , l ← l + 1 . end if Update x t +1 i ( k ) , ∀ k and y t +1 i by (3). end if end for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Solve the reformulated problem, e.g, (20), with ǫ = ǫ/ andreturn f ∗ e . end for B. Data-Privacy
In this subsection, we show that the developed algorithmpreserves the privacy of p i and investigate the privacy-preserving property through the notion of data-privacy [22].We first define the information set I ti used by the adversariesfor state estimation. Let I ti = I own ,ti S I in ,ti , where I own ,ti = t S s =1 I own ,si = t S s =1 { a sii , x s + i } , I in ,ti = S s ∈ S t I in ,si = S s ∈ S t { a sij , x s + j | j ∈ N in ,si } . The set S t contains those numbers of iterations s ( s ≤ t ) whenthe adversaries have obtained the full knowledge of I in ,si . Notethat I ti consists of all the available information on the statesand weights owned by and transmitted to agent i up to the t -thiteration. Let X be a random variable whose distribution andany other relevant information are unknown. Since X can beny arbitrary value in its domain, it is reasonable to assumethat the probability that an accurate enough estimation ˆ X of X can be obtained is rather small [27], i.e., Pr (cid:8) | ˆ X − X | ≤ α (cid:9) ≤ γ, (22)where α ≥ and γ ≥ are small given constants. The bound γ for the disclosure probability satisfies γ ≪ p max ν ∈ Θ Z ν + αν − α f θ i ( k ) ( y )d y, ∀ k, where f θ i ( k ) ( y ) is the probability density function (PDF) ofthe added noise θ i ( k ) .Recall that we aim to preserve the privacy of p i ∈ R m i +1 .Let α and α k be the estimation accuracy of p i and each ofthe element p i ( k ) , respectively, s.t., m i +1 X k =1 α k = α, α k ≥ , ∀ k = 1 , . . . , m i + 1 . (23)The following theorem characterizes the effects of privacypreservation of the developed algorithm. Theorem 3.
If Assumptions 3 and 4 hold, D-CPOA achieves ( α, β ) -data-privacy for p i , where β = m i +1 Y k =1 h(cid:0) − p K − K (cid:1) h i ( α k ) + p K − K i , (24) and h i ( α k ) = p max ν ∈ Θ Z ν + α k ν − α k f θ i ( k ) ( y )d y + γ, and { α k } satisfies (23).Proof. The proof is provided in Appendix C.Theorem 3 states that D-CPOA preserves the privacy of p i . The effects of privacy preservation are evaluated through( α, β )-data-privacy. The interpretation of β in (24) is asfollows. Note that β is the product of a set of bounds β k for disclosure probabilities corresponding to the elements p i ( k ) , ∀ k = 1 , . . . , m i + 1 (see (35) and (36)). The bounds β k are derived via the law of total probability. If the event that theadversaries know I in ,ti for any time t between K + 1 and K happens (the probability of which is not more than p K − K ),then the added noises θ i ( k ) and states p i ( k ) can be perfectlyinferred. Otherwise, the disclosure probability will not exceed h i ( α k ) . The bounds h i ( α k ) are derived likewise based onwhether the adversaries know I in ,s − i , where s is the time whenagent i inserts its perturbed state ˜ p i ( k ) . If the adversaries knowthis information, then the maximum disclosure probability is max ν ∈ Θ Z ν + α k ν − α k f θ i ( k ) ( y )d y, which equals to the probability that the optimal distributedestimation falls into [ p i ( k ) − α k , p i ( k ) + α k ] [22]. Otherwise,the disclosure probability is rather small since the adversariesown little relevant information of p i .From (24), we know that for those p i of longer lengths (i.e.,with larger m i ), β will generally be smaller, which implies ahigher degree of privacy preservation. In addition, β increases with α k but decreases with K − K . These relationshipssupport the intuitions that less accurate estimations can beacquired with higher probabilities, and more room for ran-domness leads to lower probabilities of privacy disclosure. Remark 3.
In this paper, we investigate the privacy-preserving property of D-CPOA by studying its degree of data-privacy for p i . The reasons are twofold. First, this degreedirectly reflects the effectiveness of the incorporated privacy-preserving mechanism, since p i is exactly the initial valuecalling for protections in the iterations. Second, this degree isclosely related to the effects of privacy preservation of f i ( x ) .In (6), if k ˆ p i − p i k ≤ α , i.e., a fairly precise estimation ˆ p i of p i is obtained, then ∀ x ∈ X = [ a, b ] , we have | ˆ f i ( x ) − p i ( x ) | = (cid:12)(cid:12)(cid:12)(cid:12) m X k =0 (ˆ p i ( k ) − p i ( k )) T k (cid:16) x − ( a + b ) b − a (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) ≤ m X k =0 | ˆ p i ( k ) − p i ( k ) | · k ˆ p i − p i k ≤ α. (25) By referring to (7), it follows that | ˆ f i ( x ) − f i ( x ) | ≤ α + ǫ , i.e., an accurate enough estimation ˆ f i ( x ) of f i ( x ) is acquired.Nevertheless, to derive the requirement of closeness between ˆ p i and p i from that between ˆ f i ( x ) and f i ( x )( or p i ( x )) isvery difficult due to the coupling of terms in (25). Hence, thecharacterization of the degree of data-privacy for p i is themain focus of this paper.C. Discussions on Dependability In this section, we discuss the dependability of the proposedalgorithm, considering various requirements including privacypreservation and robustness to network imperfections. Wesummarize the comparisons of the performance of D-CPOAand other typical algorithms in Table I. The details are givenas follows. • Privacy Guarantee.
We have shown in Theorem 3 thatthe consensus-based iterations of D-CPOA preserve the pri-vacy of sensitive p i and analyzed the effects of preservationthrough the notion of ( α, β )-data-privacy. Next, we study sucheffects via differential privacy , which provides a strong privacyguarantee when in face of adversaries owning arbitrarily muchauxiliary information [4], [19], [33]. Let P = { p i |∀ i ∈ V} be the dataset of all the initial states and M ( P ) = { x + i ( t ) |∀ t ∈ N , i ∈ V} , i.e., the set of transmitted states of consensus protocols, bethe randomized query output. By referring to [19], [33], inour setting, a privacy-preserving consensus protocol is ǫ -differentially private if Pr (cid:8) M ( P ) ∈ O (cid:9) ≤ e ǫ Pr (cid:8) M ( P ′ ) ∈ O (cid:9) holds for any O ⊆ range ( M ) and σ -adjacent P, P ′ satisfying (cid:13)(cid:13) p i − ( p i ) ′ (cid:13)(cid:13) ≤ ( σ, if i = i , , if i = i ABLE IC
OMPARISONS OF
D-CPOA
AND O THER T YPICAL D ISTRIBUTED O PTIMIZATION A LGORITHMS
Algorithms NonconvexObjectives Networks PrivacyGuarantee Asynchrony ExactConvergence Complexities
Time-varying DigraphPush-DIGing [11]
X X X scvx : linearG-Push-Pull [13] X X mean-square scvx: linearSONATA [32]
X X X X scvx: linearncvx : O (cid:0) ǫ (cid:1) ASY-SONATA [16]
X X X X scvx: linearncvx: O (cid:0) ǫ (cid:1) Algorithm in [4] Cloud-based DP trade-off Algorithm in [9] Cloud-based DP —"— Algorithm in [10] X X DP X —"— D-CPOA
X X X ( α, β )-data-privacy(Theorem 3) X X th -ord. oracle: O ( m ) Commn.: O (cid:0) log mǫ (cid:1) PD itr.: O (cid:0) √ m log ǫ (cid:1) “scvx” refers to “strongly-convex” objective functions. “ncvx” refers to “nonconvex” objective functions. The convergence time is O (cid:0) ǫ (cid:1) , implying that both the complexities of evaluations of local gradients (i.e., queries of the first-order oracle) and those ofinter-agent communication are O (cid:0) ǫ (cid:1) . “DP” stands for “differential privacy”. There is a trade-off between accuracy and privacy. This symbol stands for “same as above”. In [10], the authors propose a general strategy of function perturbation to achieve differential privacy indistributed optimization. This strategy can be combined with any distributed convex constrained optimization algorithms to take effect. Hence, we place“ X ” to some blocks in this row to imply feasible possibilities. Detailed discussions are provided in Sec. IV-C. See Theorem 4 for details. for all i ∈ V , where i is some element in V . Note that wehave used correlated noises (see (13)) to pursue the proximityof p Ki to the exact average p (see Lemma1), thus ensuringthe accuracy of the obtained solutions (see Theorem3). Basedon the impossibility result of simultaneously achieving exactaverage consensus and differential privacy [19], [33], weconclude that the current algorithm is not ǫ -differentiallyprivate. If we want to preserve differential privacy at thecost of losing some accuracy of the obtained solutions, wecan add uncorrelated noises that satisfy the condition in [33,Theorem 4.3] to the transmitted states at every iteration. In thiscase, the almost sure limit of differentially private consensusiterations is an unbiased estimate of the exact average. Therandom difference between the limit and the average will leadto an additional random error in the returned solutions of theproposed algorithm. Remark 4.
Existing privacy-preserving distributed optimiza-tion algorithms (e.g., [4], [9], [10]) mainly use uncorrelatednoises to perturb the exchanged messages and are differen-tially private. The notion of differential privacy provides strongprivacy guarantees. Also, its nice property of sequential com-posability facilitates the analysis of privacy when confrontedwith complex and nonlinear iterations involving gradients.Nonetheless, there always exists a trade-off between privacyand accuracy [10], which calls for a careful selection ofrelated parameters to obtain a rather small bound on the sub-optimality. In contrast, due to the simple and linear consensus-based iterations of the proposed algorithm, we can either usecorrelated noises to readily achieve the effective preservationof privacy and ensure the accuracy of the obtained solutions,or use uncorrelated noises to pursue the strong guaranteeoffered by differential privacy. • Asynchrony.
We discuss the asynchronous extension ofthe proposed algorithm. Compared to synchronous models,asynchronous paradigms are more desirable in applications forits increased efficiency in handling uncoordinated computa-tions and imperfect communication, e.g., transmission delaysand packet drops. The design of consensus-based informationdissemination presented in Algorithm 1 is synchronous. Itsextension to cope with asynchrony is readily available and canbenefit from the extensive research on asynchronous consensusprotocols, including those allowing for random activations(e.g., gossip algorithms [34]), delays [35], packet drops [36]and all these issues [16]. In these protocols, the basic idea ofproving convergence is to first transform asynchronous modelsto synchronous counterparts over augmented graphs, wherevirtual nodes and edges are added to facilitate the analysis,and then establish the convergence of synchronous models.All these asynchronous protocols converge deterministically tothe average of initial values. If they are incorporated into theproposed algorithm, by Lemma 1 and Theorem 2, the accuracyof the obtained solutions can still be guaranteed. In addition,since the iterations of the developed algorithm are consensus-based and do not involve gradients, there is no need to selectvarying step-sizes in different circumstances of asynchrony.
D. Complexity
The following theorem describes the complexities of thedeveloped algorithm.
Theorem 4.
D-CPOA ensures that every agent obtains ǫ -optimal solutions for problem (1) with O ( m ) evaluationsof local objective functions, O (cid:0) log mǫ (cid:1) rounds of inter-agentcommunication and O (cid:0) √ m log ǫ (cid:1) iterations of primal-dual ABLE IIC
OMPARISONS OF C OMPLEXITIES
Alg. th -orderOracles Communication PD Iterations CPCA O ( m ) O (cid:16) log mǫ (cid:17) O (cid:16) √ m log 1 ǫ (cid:17) D-CPOA O ( m ) O (cid:16) log mǫ (cid:17) O (cid:16) √ m log 1 ǫ (cid:17) Compared with CPCA, D-CPOA generally requires more inter-agentcommunication to reach certain precision. This increase results frompotential network imperfections and the extra steps of insertions andsubtractions, which may slow down the convergence rates. Nonethe-less, the communication complexities of both algorithms are the same(see proof of Theorem 4). interior-point methods .Proof. Note that the evaluations of local objective functions(i.e., queries of the zeroth-order oracle) are only performedin the stage of initialization, and the primal-dual interior-point method [30] is used to solve problem (20) in the stageof polynomial optimization. By referring to the proof of[17, Theorem 6], we know that for every agent, the ordersof evaluations of local objective functions and primal-dualiterations are O ( m ) and O (cid:0) √ m log ǫ (cid:1) , respectively, where m is the maximum degree of local approximations.In the stage of information dissemination, the insertions ofblock-data and subtractions of noises are completed in finitetime, i.e., within K iterations. Since the consensus-basedprotocol converges geometrically, the order of the total numberof iterations (i.e., inter-agent communication) is K + O (cid:16) log 1 δ (cid:17) = O (cid:16) log 1 δ (cid:17) = O (cid:16) log mǫ (cid:17) , where the required precision δ is given by (16).The comparisons of the complexities of D-CPOA and thoseof CPCA [17] are shown in Table II. We observe that the com-plexities of these two algorithms are the same. The reasons areas follows. The major difference between the two algorithmslies in the stage of consensus-based information dissemination.In this stage, D-CPOA fulfills privacy preservation by utilizingthe randomness of insertions of block-data and subtractionsof added noises. These actions are completed in finite time,and thus they only change the values but not the orders of theneeded numbers of iterations (i.e., inter-agent communication).Hence, we conclude that the dependability of the proposedalgorithm brings no extra costs in terms of complexities. E. Discussions on Multivariate Extensions
In this paper, we mainly consider problems with univari-ate objective functions to highlight the various advantagesbrought by the idea of using polynomial approximation, e.g.,achieving efficient optimization of nonconvex problems andreadily allowing for enhancement to be dependable whendiverse practical needs exist. We now briefly discuss thepossibility of multivariate extensions of the proposed idea. The dependence of m on ǫ and the smoothness of local objective functionsis discussed in [17, Lemma 7]. The differences will mainly lie in the stage of initializationand that of optimization of approximations. Specifically, let L ( X ) be the set of square-integrable functions over X ⊂ R n and f i ( x ) ∈ L ( X ) be a general local objective function.Then, there exists an orthonormal basis { h k ( x ) } k ∈ N + (e.g.,Taylor polynomials) and an arbitrarily precise approximation ˆ f i ( x ) = m X k =1 c k h k ( x ) for f i ( x ) , where { c k } mk =1 is the set of coefficients. After-ward, agents can exchange and update their local variablesstoring these coefficients (as in Sec. III-B) and acquire anapproximation for the global objective function. Finally, theycan locally optimize this approximation via the tools forpolynomial optimization or for finding stationary points ofgeneral nonconvex functions, thus obtaining desired solutions.Nevertheless, the aforementioned idea of extensions calls forfurther investigation and more careful analysis and is stillamong our ongoing work.V. N UMERICAL E VALUATIONS
In this section, we perform numerical experiments to il-lustrate the performance of D-CPOA. We consider a networkwith N = 20 agents. At each time t , besides itself, everyagent i has two out-neighbors. One belongs to a fixed cycle,and the other is chosen uniformly at random. Hence, {G t } is -strongly-connected. We assume that all the local constraintsets are the same interval X = [ − , and the local objectivefunction f i ( x ) of agent i is f i ( x ) = a i e − x + b i log(1 + x ) , where a i ∼ N (10 , and b i ∼ N (5 , are normally dis-tributed. It follows that f i ( x ) is nonconvex and Lipschitzcontinuous on X . Chebfun toolbox [26] is used to constructChebyshev polynomial approximations p i ( x ) corresponding toall the local objective functions f i ( x ) .The convergence of the proposed algorithm is shown inFig. 1(a). In the experiment, we set K = 10 , K = 20 . Wegenerate i.i.d. random noises θ i ( k ) from the uniform distribu-tion U ( − , and randomly select L from the discrete uniformdistribution U{ , K − K } to satisfy (13). In Fig. 1(a), thesquare markers on the blue line indicate how many numbers ofiterations t of information dissemination have been performed,when certain precisions ǫ are specified. The triangle markerson the orange line represent what the actual values of objectiveerrors | f ∗ e − f ∗ | are, when those numbers of iterations arecompleted. We observe that the relationship between log ǫ and t is roughly linear. This phenomenon results from the propertyof linear convergence of the consensus-based informationdissemination in the developed algorithm.The effects of privacy preservation are presented inFig. 1(b). This figure demonstrates the relationships betweenestimation accuracy α k and bound β k for the disclosureprobability for a single element p i ( k ) when different typesof noises θ i ( k ) are used. These relationships are explicitlycharacterized by (35) in Appendix C. In the experiment, weset K = 10 , K = 20 , p = 0 . and γ = 10 − . We consider Number of Iterations -12 -10 -8 -6 -4 -2 O b j e c t i v e E rr o r specified errors & iterationsactual errors & iterations (a) UniformNormalLaplace (b)
Number of Iterations -15 -10 -5 M a x i m u m D e v i a t i on s Proposed ProtocolSCDADP Protocol (c)Fig. 1. Performance of D-CPOA. (a) Convergence. (b) ( α, β )-data-privacy. (c) Convergence of the proposed protocol in Sec. III-B and other privacy-preservingconsensus protocols. three types of noises that satisfy uniform, normal and Laplacedistributions. We assume that the mean and variance of thesenoises are and , respectively. We observe that β k increaseswith α k , which confirms the intuition that a less accurateestimate can be obtained with a higher probability. We alsonotice that uniformly distributed noises yield the smallest β k and thus the most effective preservation of p i ( k ) . Thisobservation supports the conclusion in [22]. Note that thebound β for the disclosure probability of p i is the productof all β k for k = 1 , . . . , m i + 1 (see (36) in Appendix C).The degrees m i of local approximations constructed in thisexperiment roughly vary from to when the specifiedprecision ǫ = 10 − . Hence, in this case, β is an extremelysmall number given α k and β k in the figure.Specifically, we study the convergence rates of theconsensus-based iterations incorporated with the proposedprivacy-preserving mechanism. The initial states of agents areset as the vectors of coefficients of local approximations.The rest of the settings are the same as those in the studyof the convergence of D-CPOA. We also implement SCDAin [20] and a differentially private consensus protocol [19],[33] for comparison, where uniformly distributed and Laplacedistributed noises are added to every element of local vari-ables, respectively. In all these three protocols, the initiallyadded noises are of zero mean and variance / . In the lasttwo protocols, the variances of the added noises decay at arate of . . The relationships between maximum deviations max i ∈V k p ti − p k and numbers of iterations t are shown inFig. 1(c). It is observed that our protocol converges faster thanSCDA to the exact average. The main reason is that we donot continuously add noises to local variables all along theiterations. Hence, the possible negative effects of noises onthe convergence rates are mitigated. Also, the deviation ofthe differentially private consensus protocol does not convergeto . This phenomenon reflects the fundamental trade-offbetween privacy and accuracy in this class of protocols.VI. R ELATED W ORK
There have been extensive researches on designing efficientdistributed optimization algorithms, e.g., primal methods [5],[6], [14], [37] and dual-based methods [7], [8], [38]. The core idea of the primal methods is to combine consensuswith gradient-based optimization algorithms, thus achievingconsensual iterative convergence in the primal domain. Thanksto the development of gradient tracking [6], [14], [37], [39],which enables local agents to approximately track the gradi-ents of the global objective function, the convergence ratesof these distributed algorithms can nearly match that of theoptimal centralized gradient-based algorithm [40]. The basicintuition of the dual-based methods is to express the consensusrequirement as equality constraints, and then solve the dualproblems of the equivalent reformulations or carry on primal-dual updates. These carefully constructed dual problems aredecoupled, thus easily allowing for the distributed implemen-tations of certain linearly convergent centralized optimizationalgorithms, e.g., ADMM [2]. For convex problems, distributedalgorithms guarantee convergence to globally optimal points;for nonconvex problems, the convergence to stationary orlocally optimal points is ensured [32], [39], [41]–[43].The aforementioned work mainly centers on bridging thegap in terms of convergence behaviors between distributedand centralized optimization algorithms. To effectively deploythese distributed algorithms into applications, some specificissues need to be addressed. These issues include but arenot limited to privacy preservation, time-varying and directedcommunication, asynchronous computations due to lack ofcoordination, transmission delays or packet drops.Specifically, the privacy concern of distributed algorithmshas received growing attention. Conventional approaches arebased on the premise that exact local data is exchangedbetween agents. Nevertheless, if there exist adversaries thatintentionally gather certain data necessary for estimation, thesensitive information of objective functions, constraints andlocal states can be disclosed [4]. To tackle this problem,a number of privacy-preserving consensus and distributedoptimization algorithms have been proposed. One typicalapproach based on the idea of message perturbation is toadd random noises to the data transmitted within iterations.The perturbation of the critical data (e.g., states [18]–[20],[27], gradients [4], [9] and functions [10]) limits its util-ity for yielding sensible estimations. Some work considersthe use of uncorrelated Laplacian or Gaussian noises andevelops various differentially private consensus [19], [44]and distributed optimization algorithms [4], [9], [10]. Thedifferentially private mechanism equips these algorithms withstrong privacy guarantees even against those adversaries own-ing arbitrarily much side information. Nonetheless, it alsobrings about the trade-off between privacy and accuracy [10],[19]. Other work thus turns to correlated noises and showsthat the exact average consensus is reached [18], [20], [27].The effects of privacy preservation can be characterized byusing the notion of data-privacy [22]. Another typical approachis to apply cryptographic techniques, e.g., homomorphic en-cryption. Related algorithms can be found in [45]–[47]. Thesecryptography-based methods are suitable if the requirementsof trusted agents or shared keys/secrets are satisfied, and theextra communication and computation burdens induced byencryption and decryption are acceptable.In addition to the privacy concern, the robustness issues ofdistributed optimization have also been widely investigated.Time-varying and directed communication inhibits the efficientconstruction of doubly stochastic weight matrices, which arecrucial for achieving convergence over undirected graphs.To overcome this challenge, push-sum-based algorithms [11],[32], [48], [49] and push-pull-based algorithms [12], [13]are developed. The former combine the push-sum consensusprotocol [23] with gradient-based methods and only requirecolumn stochastic weight matrices. The latter use one rowstochastic and one column stochastic weight matrix to mixestimates of optimal solutions and trackers of average gra-dients, respectively. Algorithms that purely handle randomtransmission delays can be found in [3], [50], where the basicidea is to locally fuse the delayed information as soon as itarrives. To achieve asynchronous computations, gossip-typealgorithms [13], [14] and those further allowing delays andpacket drops [15], [16] have been developed.Different from the aforementioned work, the proposed algo-rithm is based on the idea of using polynomial approximationand is equipped with effective mechanisms to meet diversepractical requirements concerning privacy and robustness. Weshow that the efficient distributed optimization of generalnonconvex problems is achieved, and in the meantime thecommon issues of privacy-accuracy trade-off and step-sizeselections are avoided.VII. C
ONCLUSION
In this paper, we proposed D-CPOA to solve a class of con-strained distributed nonconvex optimization problems, consid-ering the needs of privacy preservation and robustness to var-ious network imperfections. We achieved exact convergenceand effective preservation of the privacy of local objectivefunctions by incorporating a new privacy-preserving mech-anism for consensus-based iterations. The developed mech-anism utilized the randomness in block-by-block insertionsof perturbed data and separate subtractions of added noise,and its privacy degree was explicitly characterized through( α, β )-data-privacy. We ensured the robustness of the proposedalgorithm by using the push-sum average consensus protocolas a basis for iterations, and discussed its extensions to help to maintain the performance when diverse imperfections in net-work communication exist. We proved that the major benefitsbrought by the idea of using polynomial approximation werepreserved, and the aforementioned demanding requirementswere satisfied at the same time.A
PPENDIX
A. Proof of Lemma 1Proof.
The proof consists of two steps. First, we prove thatthe limit value of p ti , x ti /y ti ( t ∈ N ) is indeed p , i.e., lim t →∞ p ti = p. (26)Then, we demonstrate that the meet of the stopping criterion(16) is a sufficient condition for (21). • Step 1: Proof of the Limit Value
We consider the k -th element of the involved local variables, ∀ k = 1 , . . . , m . Let x t , [ x t ( k ) , . . . , x tN ( k )] T , θ , [ θ ( k ) , . . . , θ N ( k )] T ,p , [ p ( k ) , . . . , p N ( k )] T , y t , [ y t , . . . , y tN ] T . Note that if the k -th elements of some x tj , θ j and p j ( j ∈ V ) are null, they are regarded as in the expressions. Weinvestigate the effects of insertions (11) and subtractions (14)on the accuracy of the consensus-based updates in Algorithm1 as follows.We first consider the effect of insertions that happened inthe first K iterations. Let t k be the number of the iterationwhere agent i inserts the perturbed state ˜ p i ( k ) . Since A t k iscolumn stochastic, from (11) and (12), we have T x t k +1 = 1 T A t k x t k + = 1 T x t k + = 1 T x t k + ˜ p i ( k ) , which means that the sum of the elements of x t increases by ˜ p i ( k ) . At the end of the K -th iteration, all the agents haveinserted their perturbed initial states. Hence, T x K = 1 T x + X i ∈V ˜ p i ( k ) = X i ∈V ˜ p i ( k ) = 1 T ( p + θ ) . Then, we focus on the effect of subtractions happenedbetween time K + 1 and time K . Suppose that the smallestelement in X i,k is t , i.e., agent i performs its first action ofsubtractions at the t -th iteration. From the column stochas-ticity of A t ( t ∈ N ) and (3), it is not difficult to obtain that T x t = 1 T A t − x t − = 1 T x t − = . . . = 1 T x K . At the t -th iteration, we have T x t +1 = 1 T x t − δ i ( k ) = 1 T x K − δ i ( k ) , which implies that the sum of the elements of x t decreasesby δ i ( k ) = θ i ( k ) /L . At the end of the K -th iteration, everyagent has completed its L rounds of actions of subtracting thenoises. It follows that T x K = 1 T x K − T θ = 1 T p . Since y ti is constantly updated by (3), we have T y K = 1 T A K − y K − = 1 T y K − = . . . = 1 T y . ater on, agents continue to update x ti and y ti by (3). Basedon the convergence of the push-sum consensus protocol, weconclude that the exact average can still be achieved, i.e., lim t →∞ p ti = lim t →∞ x ti y ti = 1 T x K T y K = 1 T p T y = 1 N N X j =1 p j ( k ) = p ( k ) . Note that this result holds for any k = 1 , . . . , m . Therefore,the limit value of p ti is p , i.e., (26) holds. • Step 2: Proof of the Sufficiency
Next, we verify the effectiveness of the stopping criterion(16). Note that p ti = x ti /y ti , ∀ t ∈ N . The push-sum-consensus-based update of x ti in (3) can be transformed to p t +1 i = N X j =1 w tij p tj , where w tij = a tij y tj y t +1 i . It follows from (3) and (4) that W t , ( w tij ) N × N is rowstochastic, i.e., N X j =1 w tij = 1 , ≤ w ij ≤ , ∀ i, j = 1 , . . . , N, ∀ t. Hence, we have p t +1 i ( k ) = N X j =1 w tij p tj ( k ) ≤ N X j =1 w tij max j ∈V p tj ( k )= max j ∈V p tj ( k ) , ∀ k = 1 , . . . , m + 1 , ∀ i ∈ V . Let M t ( k ) , max i ∈V p ti ( k ) , m t ( k ) , min i ∈V p ti ( k ) . It follows that M t +1 ( k ) ≤ M t ( k ) , m t +1 ( k ) ≥ m t ( k ) . It has been proven that lim t →∞ p ti ( k ) = p ( k ) , ∀ i ∈ V . Hence, lim t →∞ M t ( k ) = p ( k ) , lim t →∞ m t ( k ) = p ( k ) . Since the sequences of (cid:0) M t ( k ) (cid:1) t ∈ N and (cid:0) m t ( k ) (cid:1) t ∈ N are non-increasing and non-decreasing, respectively, we have m t ( k ) ≤ p ( k ) ≤ M t ( k ) , ∀ t ∈ N . Note that the max/min consensus protocols converge within U iterations. When agents terminate the iterations at time K ,we have r Ki ( k ) − s Ki ( k ) = M K ′ ( k ) − m K ′ ( k ) , where K ′ , K − U . The meet of the stopping criterion (16)implies that (cid:12)(cid:12) p Ki ( k ) − p ( k ) (cid:12)(cid:12) ≤ M K ( k ) − m K ( k ) ≤ r Ki ( k ) − s Ki ( k ) ≤ δ, ∀ i, k. B. Proof of Theorem 2Proof.
The proof is rather similar to that of Theorem 4 in[17]. We provide a sketch of the main steps here. The keyidea is to prove the closeness between p Ki ( x ) and f ( x ) on theentire X = [ a, b ] . Then, their optimal values are also closeenough (see [17, Lemma 3]). Note that p Ki ( x ) and p ( x ) arein the forms of (5) with the coefficients stored in p Ki and p ,respectively. It follows from (21) that (cid:12)(cid:12) p Ki ( x ) − p ( x ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) m X j =0 ( c j − c j ) T j (cid:16) x − ( a + b ) b − a (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) ≤ m X j =0 | c j − c j | · ≤ m X j =0 k p Ki − p k ∞ ≤ δ ( m + 1) = ǫ , ∀ x ∈ [ a, b ] , where the first inequality is based on | T j ( u ) | ≤ , ∀ u ∈ [ − , . Note that p is the average of all p i . Hence, p ( x ) isalso the average of all p i ( x ) . Based on (7), we have | p ( x ) − f ( x ) | = (cid:12)(cid:12)(cid:12)(cid:12) N N X i =1 (cid:0) p i ( x ) − f i ( x ) (cid:1)(cid:12)(cid:12)(cid:12)(cid:12) ≤ N N X i =1 | p i ( x ) − f i ( x ) | ≤ N N ǫ = ǫ , ∀ x ∈ [ a, b ] . Given that ǫ = ǫ = ǫ/ , we have (cid:12)(cid:12) p Ki ( x ) − f ( x ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) p Ki ( x ) − p ( x ) (cid:12)(cid:12) + | p ( x ) − f ( x ) |≤ ǫ + ǫ = 23 ǫ, ∀ x ∈ [ a, b ] . Let p ∗ be the optimal value of p Ki ( x ) on X = [ a, b ] . It followsfrom [17, Lemma 3] that | p ∗ − f ∗ | ≤ ǫ. Note that p ∗ ≤ f ∗ e ≤ p ∗ + ǫ = p ∗ + ǫ . Hence, f ∗ − ǫ ≤ p ∗ ≤ f ∗ e ≤ p ∗ + ǫ ≤ f ∗ + ǫ, which leads to | f ∗ e − f ∗ | ≤ ǫ . C. Proof of Theorem 3Proof.
We consider the estimation of p i ( k ) , ∀ k = 1 , . . . , m i +1 . Suppose that at the t k -th iteration, agent i inserts theperturbed state ˜ p i ( k ) by (11). Note that the estimation ˆ p i ( k ) of p i ( k ) can be calculated at three types of time, i.e., before t k , at t k and after t k . We discuss each of these scenarios indetail as follows.At time t < t k , ˜ p i ( k ) has not been inserted yet. What theadversaries have collected are either null values or combina-tions of the perturbed states of agent i ’s neighbors. Since thereis not any available information on p i ( k ) that serves as a basisfor the estimation, by (22), we have Pr (cid:8) | ˆ p i ( k ) − p i ( k ) | ≤ α k (cid:12)(cid:12) I ti (cid:9) ≤ γ. At time t = t k , ˜ p i ( k ) is inserted. By Assumption 4, theprobability that the adversaries acquire the full knowledge of in ,t k − i is not more than p . If this is the case, based on (11)and (12), they can easily calculate ˜ p i ( k ) by ˜ p i ( k ) = x t k + i ( k ) − X j ∈N in ,tk − i a t k − ij x ( t k − j ( k ) . (27)Note that ˜ p i ( k ) = p i ( k ) + θ i ( k ) . Hence, after an estimation ˆ θ i ( k ) of θ i ( k ) is obtained, ˆ p i ( k ) iscalculated by ˆ p i ( k ) = ˜ p i ( k ) − ˆ θ i ( k ) . Therefore, we have Pr (cid:8) | ˆ p i ( k ) − p i ( k ) | ≤ α k (cid:12)(cid:12) I t k i (cid:9) = Pr (cid:8) | ˆ θ i ( k ) − θ i ( k ) | ≤ α k (cid:12)(cid:12) I t k i (cid:9) , = Pr (cid:8) θ i ( k ) ∈ [ˆ θ i ( k ) − α k , ˆ θ i ( k ) + α k ] (cid:12)(cid:12) I t k i (cid:9) = Z ˆ θ i ( k )+ α k ˆ θ i ( k ) − α k f θ i ( k ) ( y |I t k i )d y ≤ max ν ∈ Θ Z ν + α k ν − α k f θ i ( k ) ( y )d y, (28)where ˆ θ i ( k ) ∈ Θ . However, if the adversaries can only accesspart of I t k − i , they are unable to calculate x t k i ( k ) by (12) andthen recover ˜ p i ( k ) by (27). Note that x t k + i ( k ) = x t k i ( k ) + ˜ p i ( k ) = x t k i ( k ) + θ i ( k ) + p i ( k ) . Hence, in this case, they need to obtain an estimation ˆ η i ( k ) of x t k i ( k ) + θ i ( k ) first, and then calculate ˆ p i ( k ) by ˆ p i ( k ) = x t k + i ( k ) − ˆ η i ( k ) . According to (12), x t k i ( k ) is a linear combination of the states x ( t k − j for j ∈ N in ,t k − i . These states are dependent onsome ˜ p l ( k ) and thus also dependent on some θ l ( k ) , where l ∈ V . Note that the adversaries only have partial knowledgeof I in ,t k − i and know part of these states. Hence, there existcertain independent random variables, i.e., θ l ( k ) , of which theadversaries do not own any prior or relevant knowledge. As aresult, by (22), it is hard to estimate x t k i ( k ) with high precision.It follows that Pr (cid:8) | ˆ p i ( k ) − p i ( k ) | ≤ α k (cid:12)(cid:12) I t k i (cid:9) = Pr (cid:8)(cid:12)(cid:12) ˆ η i ( k ) − ( x t k i ( k ) + θ i ( k )) (cid:12)(cid:12) ≤ α k (cid:12)(cid:12) I t k i (cid:9) ≤ Pr (cid:8) ˆ η i ( k ) − x t k i ( k ) ∈ [ θ i ( k ) − α k , θ i ( k ) + α k ] (cid:12)(cid:12) I t k i , θ i ( k ) (cid:9) ≤ γ, (29)Combining (28) and (29) together, we have Pr (cid:8) | ˆ p i ( k ) − p i ( k ) | ≤ α k (cid:12)(cid:12) I t k i (cid:9) ≤ p max ν ∈ Θ Z ν + α k ν − α k f θ i ( k ) ( y )d y + γ , h i ( α k ) . (30)At time t > t k , the adversaries can estimate p i ( k ) eitherby the same rule that is adopted at time t = t k , or by the newrule based on the new information. In the former case, we still obtain (30). We now discuss the latter case in detail. We firstconsider the time t = t k + 1 . Note that x ( t k +1)+ i ( k ) a t k ii = x t k +1 i ( k ) a t k ii = x t k + i ( k ) + 1 a t k ii (cid:16) X j ∈N in ,tki \{ i } a t k ij x t k + j ( k ) − τ i,t k +1 ( k ) (cid:17) = p i ( k ) + θ i ( k ) + x t k i ( k )+ 1 a t k ii (cid:16) X j ∈N in ,tki \{ i } a t k ij x t k + j ( k ) − τ i,t k +1 ( k ) (cid:17) = p i ( k ) + θ i ( k ) + θ ′ i ( k ) , (31)where τ i,t ( k ) = ζ i ( k ) if t ∈ X i,k , i.e., when the noises aresubtracted, and τ i,t ( k ) = 0 otherwise. If the full knowledgeof I in ,t k i is available, the adversaries can not only collect allthe x t k + j for j ∈ N in ,ti , but also accurately infer τ i,t k +1 ( k ) by τ i,t k +1 ( k ) = X j ∈N in ,tki a t k ij x t k + j ( k ) − x ( t k +1)+ i ( k ) . Hence, θ ′ i ( k ) is a deterministic constant. In this case, by using(31), we still have Pr (cid:8) | ˆ p i ( k ) − p i ( k ) | ≤ α k (cid:12)(cid:12) I t k +1 i (cid:9) = Pr (cid:8) | ˆ θ i ( k ) − θ i ( k ) | ≤ α k (cid:12)(cid:12) I t k +1 i (cid:9) . Next, we analyze the disclosure probability of θ i ( k ) given I t k +1 i . The newly available information, i.e., the subtractednoise ζ i ( k ) , allows for another means of inferring θ i ( k ) . Wenow show that the resulting disclosure probability is rathersmall when L is drawn from an unknown distribution. Notethat ζ i ( k ) = θ i ( k ) /L > α k . Hence, Pr (cid:8) | ˆ p i ( k ) − p i ( k ) | ≤ α k (cid:12)(cid:12) I t k +1 i (cid:9) = Pr (cid:8) | ˆ θ i ( k ) − θ i ( k ) | ≤ α k (cid:12)(cid:12) ζ i ( k ) (cid:9) = Pr (cid:8) | ˆ L − L | · ζ i ( k ) ≤ α k (cid:12)(cid:12) ζ i ( k ) (cid:9) = Pr (cid:8) ˆ L = L (cid:12)(cid:12) ζ i ( k ) (cid:9) ≤ γ, where ˆ L is an estimation of L , and the last inequality followsfrom (22). Thus, the disclosure probability will not exceed theupper bound in (28), i.e., Pr (cid:8) | ˆ p i ( k ) − p i ( k ) | ≤ α k (cid:12)(cid:12) I t k +1 i (cid:9) = Pr (cid:8) | ˆ θ i ( k ) − θ i ( k ) | ≤ α k (cid:12)(cid:12) I t k +1 i (cid:9) ≤ max ν ∈ Θ Z ν + α k ν − α k f θ i ( k ) ( y )d y. (32)If the full knowledge of I in ,ti is unavailable, then θ ′ i ( k ) containsthose independent random variables whose relevant informa-tion is unknown to the adversaries. Specifically, if t k +1 ≤ K ,then those variables refer to certain added noises θ l ( k ) that areincluded in x t k + l ( k ) , where l ∈ V . Else, those variables referto certain subtracted noises ζ l ( k ) for some l ∈ V . Thus, itfollows from (22) that Pr (cid:8) | ˆ p i ( k ) − p i ( k ) | ≤ α k (cid:12)(cid:12) I t k +1 i (cid:9) ≤ γ. (33)ombing (32) and (33), we have Pr (cid:8) | ˆ p i ( k ) − p i ( k ) | ≤ α k (cid:12)(cid:12) I t k +1 i (cid:9) ≤ p max ν ∈ Θ Z ν + α k ν − α k f θ i ( k ) ( y )d y + γ = h i ( α k ) . (34)A similar analysis can be performed for other arbitrary t ≥ t k + 1 , t ∈ N . However, for t ≥ K , there exists anextreme case where the adversaries successfully obtain thefull knowledge of I in ,ti starting from time t = K + 1 to time t = K . In this case, they can acquire τ i,t ( k ) and perfectlyinfer θ i ( k ) by θ i ( k ) = K X t = K +1 τ i,t ( k ) . Hence, the exact value of p i ( k ) can be inferred, and Pr (cid:8) | ˆ p i ( k ) − p i ( k ) | ≤ α k (cid:12)(cid:12) I K i (cid:9) = Pr (cid:8) | ˆ θ i ( k ) − θ i ( k ) | ≤ α k (cid:12)(cid:12) I K i (cid:9) = 1 . The probability that such an extreme case happens is not morethan p K − K . Therefore, we have Pr (cid:8) | ˆ p i ( k ) − p i ( k ) | ≤ α k (cid:12)(cid:12) I ti (cid:9) ≤ β k , where β k = (cid:0) − p K − K (cid:1) h i ( α k ) + p K − K (35)for any k = 1 , . . . , m i + 1 and t ∈ N . It is easy to verify that β k is larger than the RHS of (30). It follows that Pr (cid:8) k ˆ p i − p i k ≤ α |I (cid:9) = m i +1 Y k =1 Pr (cid:8) | ˆ p i ( k ) − p i ( k ) | ≤ α k (cid:12)(cid:12) I ti (cid:9) ≤ m i +1 Y k =1 β k = β, (36)where β is given by (24).R EFERENCES[1] Z. He, J. He, C. Chen, and X. Guan, “Constrained distributed nonconvexoptimization over time-varying directed graphs,” in
Proc. 59th IEEEConf. Decis. Control , 2020, pp. 378–383.[2] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al. , “Distributedoptimization and statistical learning via the alternating direction methodof multipliers,”
Found. Trends Mach. Learn. , vol. 3, no. 1, pp. 1–122,2011.[3] T. Yang, J. Lu, D. Wu, J. Wu, G. Shi, Z. Meng, and K. H. Johansson, “Adistributed algorithm for economic dispatch over time-varying directednetworks with delays,”
IEEE Trans. Ind. Electron. , vol. 64, no. 6, pp.5095–5106, 2016.[4] S. Han, U. Topcu, and G. J. Pappas, “Differentially private distributedconstrained optimization,”
IEEE Trans. Autom. Control , vol. 62, no. 1,pp. 50–64, 2017.[5] A. Nedi´c and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,”
IEEE Trans. Autom. Control , vol. 54, no. 1, pp.48–61, 2009.[6] W. Shi, Q. Ling, G. Wu, and W. Yin, “EXTRA: An exact first-orderalgorithm for decentralized consensus optimization,”
SIAM J. Optim. ,vol. 25, no. 2, pp. 944–966, 2015. [7] J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging fordistributed optimization: Convergence analysis and network scaling,”
IEEE Trans. Autom. Control , vol. 57, no. 3, pp. 592–606, 2011.[8] K. Scaman, F. Bach, S. Bubeck, Y. T. Lee, and L. Massoulié, “Optimalalgorithms for smooth and strongly convex distributed optimization innetworks,” in
Proc. ICML , 2017, pp. 3027–3036.[9] M. T. Hale and M. Egerstedt, “Cloud-enabled differentially privatemultiagent optimization with constraints,”
IEEE Trans. Control Netw.Syst. , vol. 5, no. 4, pp. 1693–1706, 2018.[10] E. Nozari, P. Tallapragada, and J. Cortés, “Differentially private dis-tributed convex optimization via functional perturbation,”
IEEE Trans.Control Netw. Syst. , vol. 5, no. 1, pp. 395–408, 2018.[11] A. Nedic, A. Olshevsky, and W. Shi, “Achieving geometric convergencefor distributed optimization over time-varying graphs,”
SIAM J. Optim. ,vol. 27, no. 4, pp. 2597–2633, 2017.[12] R. Xin and U. A. Khan, “A linear algorithm for optimization overdirected graphs with geometric convergence,”
IEEE Contr. Syst. Lett. ,vol. 2, no. 3, pp. 315–320, 2018.[13] S. Pu, W. Shi, J. Xu, and A. Nedic, “Push-pull gradient methods fordistributed optimization in networks,”
IEEE Trans. Autom. Control ,vol. 66, no. 1, pp. 1–16, 2021.[14] J. Xu, S. Zhu, Y. C. Soh, and L. Xie, “Convergence of asynchronousdistributed gradient methods over stochastic networks,”
IEEE Trans.Autom. Control , vol. 63, no. 2, pp. 434–448, 2018.[15] T. Wu, K. Yuan, Q. Ling, W. Yin, and A. H. Sayed, “Decentralizedconsensus optimization with asynchrony and delays,”
IEEE Trans. SignalInf. Process. Netw. , vol. 4, no. 2, pp. 293–307, 2018.[16] Y. Tian, Y. Sun, and G. Scutari, “Achieving linear convergence indistributed asynchronous multi-agent optimization,”
IEEE Trans. Autom.Control , vol. 65, no. 12, pp. 5264–5279, 2020.[17] Z. He, J. He, C. Chen, and X. Guan, “Distributed nonconvex optimiza-tion: Oracle-free iterations and globally optimal solution,” arXiv preprintarXiv:2008.00252 , 2020.[18] Y. Mo and R. M. Murray, “Privacy preserving average consensus,”
IEEETrans. Autom. Control , vol. 62, no. 2, pp. 753–765, 2017.[19] E. Nozari, P. Tallapragada, and J. Cortés, “Differentially private averageconsensus: Obstructions, trade-offs, and optimal algorithm design,”
Automatica , vol. 81, pp. 221–231, 2017.[20] J. He, L. Cai, P. Cheng, J. Pan, and L. Shi, “Consensus-based data-privacy preserving data aggregation,”
IEEE Trans. Autom. Control ,vol. 64, no. 12, pp. 5222–5229, 2019.[21] Y. Wang, “Privacy-preserving average consensus via state decomposi-tion,”
IEEE Trans. Autom. Control , vol. 64, no. 11, pp. 4711–4716,2019.[22] J. He, L. Cai, and X. Guan, “Preserving data-privacy with added noises:Optimal estimation and privacy analysis,”
IEEE Trans. Inf. Theory ,vol. 64, no. 8, pp. 5677–5690, 2018.[23] D. Kempe, A. Dobra, and J. Gehrke, “Gossip-based computation ofaggregate information,” in
Proc. 44th Annu. IEEE Symp. Found. Comput.Sci. , 2003, pp. 482–491.[24] A. Nedi´c, A. Olshevsky, and M. G. Rabbat, “Network topology andcommunication-computation tradeoffs in decentralized optimization,”
Proc. IEEE , vol. 106, no. 5, pp. 953–976, 2018.[25] R. O. Saber and R. M. Murray, “Consensus protocols for networks ofdynamic agents,” in
Proc. Amer. Control Conf. , 2003, pp. 951–956.[26] L. N. Trefethen,
Approximation theory and approximation practice .SIAM, 2013, vol. 128.[27] J. He, L. Cai, C. Zhao, P. Cheng, and X. Guan, “Privacy-preservingaverage consensus: privacy analysis and algorithm design,”
IEEE Trans.Signal Inf. Process. Netw. , vol. 5, no. 1, pp. 127–138, 2019.[28] J. P. Boyd,
Solving Transcendental Equations: The Chebyshev Polyno-mial Proxy and Other Numerical Rootfinders, Perturbation Series, andOracles . SIAM, 2014, vol. 139.[29] M. Prakash, S. Talukdar, S. Attree, S. Patel, and M. V. Salapaka,“Distributed stopping criterion for ratio consensus,” in
Proc. 56th Annu.Allerton Conf. on Commun., Control and Computing , 2018, pp. 131–135.[30] S. Boyd and L. Vandenberghe,
Convex optimization . Cambridgeuniversity press, 2004.[31] G. Blekherman, P. A. Parrilo, and R. R. Thomas,
Semidefinite optimiza-tion and convex algebraic geometry . SIAM, 2013, vol. 13.[32] G. Scutari and Y. Sun, “Distributed nonconvex constrained optimizationover time-varying digraphs,”
Math. Program. , vol. 176, no. 1-2, pp. 497–544, 2019.[33] J. He, L. Cai, and X. Guan, “Differential private noise adding mecha-nism and its application on consensus algorithm,”
IEEE Trans. SignalProcess. , vol. 68, pp. 4069–4082, 2020.34] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossipalgorithms,”
IEEE Trans. Inf. Theory , vol. 52, no. 6, pp. 2508–2530,2006.[35] C. N. Hadjicostis and T. Charalambous, “Average consensus in thepresence of delays in directed graph topologies,”
IEEE Trans. Autom.Control , vol. 59, no. 3, pp. 763–768, 2013.[36] N. Bof, R. Carli, and L. Schenato, “Average consensus with asyn-chronous updates and unreliable communication,” in
Proc. IFAC WorldCongr. , 2017, pp. 601–606.[37] G. Qu and N. Li, “Harnessing smoothness to accelerate distributedoptimization,”
IEEE Trans. Control Netw. Syst. , vol. 5, no. 3, pp. 1245–1260, 2018.[38] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, “On the linearconvergence of the ADMM in decentralized consensus optimization,”
IEEE Trans. Signal Process. , vol. 62, no. 7, pp. 1750–1761, 2014.[39] P. Di Lorenzo and G. Scutari, “NEXT: In-network nonconvex optimiza-tion,”
IEEE Trans. Signal Inf. Process. Netw. , vol. 2, no. 2, pp. 120–136,2016.[40] Y. Nesterov,
Introductory lectures on convex optimization: A basiccourse . Springer Science & Business Media, 2013, vol. 87.[41] P. Bianchi and J. Jakubowicz, “Convergence of a multi-agent projectedstochastic gradient algorithm for non-convex optimization,”
IEEE Trans.Autom. Control , vol. 58, no. 2, pp. 391–405, 2012. [42] T. Tatarenko and B. Touri, “Non-convex distributed optimization,”
IEEETrans. Autom. Control , vol. 62, no. 8, pp. 3744–3757, 2017.[43] M. Hong, D. Hajinezhad, and M.-M. Zhao, “Prox-PDA: The proximalprimal-dual algorithm for fast distributed nonconvex optimization andlearning over networks,” in
Proc. ICML , 2017, pp. 1529–1538.[44] Z. Huang, S. Mitra, and G. Dullerud, “Differentially private iterativesynchronous consensus,” in
Proc. ACM Workshop Privacy Electron.Soc. , 2012, pp. 81–90.[45] M. Ruan, H. Gao, and Y. Wang, “Secure and privacy-preserving con-sensus,”
IEEE Trans. Autom. Control , vol. 64, no. 10, pp. 4035–4049,2019.[46] C. N. Hadjicostis and A. D. Dominguez-Garcia, “Privacy-preservingdistributed averaging via homomorphically encrypted ratio consensus,”
IEEE Trans. Autom. Control , vol. 65, no. 9, pp. 3887–3894, 2020.[47] Y. Lu and M. Zhu, “Privacy preserving distributed optimization usinghomomorphic encryption,”
Automatica , vol. 96, pp. 314–325, 2018.[48] A. Nedi´c and A. Olshevsky, “Distributed optimization over time-varyingdirected graphs,”
IEEE Trans. Autom. Control , vol. 60, no. 3, pp. 601–615, 2015.[49] C. Xi and U. A. Khan, “DEXTRA: A fast algorithm for optimizationover directed graphs,”
IEEE Trans. Autom. Control , vol. 62, no. 10, pp.4980–4993, 2017.[50] A. Agarwal and J. C. Duchi, “Distributed delayed stochastic optimiza-tion,” in