Decentralized Computation of Effective Resistances and Acceleration of Consensus Algorithms
DDECENTRALIZED COMPUTATION OF EFFECTIVE RESISTANCESAND ACCELERATION OF CONSENSUS ALGORITHMS
Necdet Serhat Aybat ∗ Department of IMEPennsylvania State UniversityUniversity Park, PA, [email protected]
Mert G¨urb¨uzbalaban † Department of MSISRutgers Business SchoolPiscataway, NJ, [email protected]
ABSTRACT
The effective resistance between a pair of nodes in a weightedundirected graph is defined as the potential difference inducedbetween them when a unit current is injected at the first nodeand extracted at the second node, treating edge weights as theconductance values of edges. The effective resistance is a keyquantity of interest in many applications and fields includingsolving linear systems, Markov Chains and continuous-timeaveraging networks. We develop an efficient linearly conver-gent distributed algorithm for computing effective resistancesand demonstrate its performance through numerical studies.We also apply our algorithm to the consensus problem wherethe aim is to compute the average of node values in a dis-tributed manner. We show that the distributed algorithm wedeveloped for effective resistances can be used to acceleratethe convergence of the classical consensus iterations consid-erably by a factor depending on the network structure.
Index Terms — Effective resistance, graph, distributedoptimization, consensus, Laplacian matrix, Kaczmarz method
1. INTRODUCTION
Let G = ( N , E , w ) be an undirected, weighted and connectedgraph defined by the set of nodes (agents) N = { , . . . , n } ,the set of edges E ⊂ N × N , and the edge weights w ij > for ( i, j ) ∈ E . Since G is undirected, we assume that both ( i, j ) and ( j, i ) refer to the same edge when it exists, and forall ( i, j ) ∈ E , we set w ji = w ij . Identifying the weightedgraph G as an electrical network in which each edge ( i, j ) corresponds to a branch of conductance w ij , the effective re-sistance R ij between a pair of nodes i and j is defined asthe voltage potential difference induced between them whena unit current is injected at i and extracted at j .The effective resistance, also known as the resistancedistance, is a key quantity of interest to compute in manyapplications and algoritmic questions over graphs. It de-fines a metric on graphs providing bounds on its conductance ∗ Research of N. S. Aybat was partially supported by NSF grants CMMI-1400217 and CMMI-1635106, and ARO grant W911NF-17-1-0298. † Research of M. G¨urb¨uzbalaban was partially supported by the NSF grantDMS-1723085. [1, 2]. Furthermore, it is closely associated with the hittingtime and commute time for a random walk on the graph G such that the probability of a transition from i to j ∗ ∈ N i is w ij ∗ / (cid:80) j ∈N i w ij where N i (cid:44) { j ∈ N : ( i, j ) ∈ E} denotes the set of neighboring nodes of i ∈ N ; therefore, itarises naturally for studying random walks over graphs andtheir mixing time properties [3, 4, 5], continuous-time aver-aging networks including consensus problems in distributedoptimization [3]. Other prominent applications include dis-tributed control and estimation [6], solving symmetric diago-nally dominant (SDD) linear systems [7], deriving complex-ity bounds in the Asymmetric Traveling Salesman Problem(ATSP) [8], design and control of communication networks[9, 10] and spectral sparsification of graphs [11].There exist centralized algorithms for computing or ap-proximating { R ij } i (cid:54) = j accurately which require global com-munication beyond local communication among the neigh-boring agents [7, 12]. They are based on computing or ap-proximating the entries of the pseudoinverse L + of the Lapla-cian matrix, based on the identity R ij = L + ii + L + jj − L + ij [7]. However, such centralized algorithms are impractical or infeasible for several key applications in multi-agent systemswhere only local communications between the neighboringagents are allowed; this motivates the development of dis-tributed algorithms for computing effective resistances, whichare used in solving many optimization and estimation prob-lems over graphs. Prominent examples include, least squareregression and more general estimation problems over graphs,formation control of moving agents with noisy measurementsand stability of multi-vehicle swarms [6].To our knowledge, there has been no systematic studyof distributed algorithms for computing effective resistances.In this work, we discuss how existing algorithms in the dis-tributed optimization literature for solving linear systems canbe adapted to solve this problem. First, we show that a naiveimplementation of consensus optimization methods, e.g., theEXTRA algorithm [13] is inefficient in terms of the conver-gence and communication requirements. Second, we propose The hitting time H ij is the expected number of steps of a random walkstarting from i until it first visit j . The commute time C ij is the expectednumber of steps required to go from i to j and back again. a r X i v : . [ m a t h . O C ] A ug variant of the Kaczmarz method and show that it is linearlyconvergent while being efficient in terms of total number oflocal communications carried out. Third, we demonstrate theperformance of our algorithms on numerical examples. Inparticular, numerical experiments suggest finite convergenceof our algorithms which is of independent interest. Finally,we apply our results to the consensus problem [14] where theaim is to compute the average of values assigned to each nodein a distributed manner. Specifically, we propose a variant ofthe classical asynchronous consensus protocol and show thatwe can accelerate the convergence considerably by a factordepending on the underlying network. The main idea is touse the distributed algorithm we developed for effective re-sistances to design a weight matrix which can help pass theinformation among neighbors more effectively – an alterna-tive approach in [15] also builds on modifying the weightsdepending on the degree of the neighbors. Since the consen-sus iterations are the building block of many existing core dis-tributed optimization algorithms such as the distributed sub-gradient, distributed proximal gradient and ADMM methods;we believe that our method and framework have far-reachingpotential for accelerating many other distributed algorithms inaddition to consensus algorithms, and this will be the subjectof future work. Outline.
In Section 2, we introduce our algorithm forcomputing effective resistances. In Section 3, we provide nu-merical results; finally, in Section 4, we give a summary ofour results and discuss future work.
Notation.
Let d i (cid:44) |N i | denote the degree of i ∈ N , and m (cid:44) |E| . Throughout the paper, L ∈ R |N |×|N | denotes theweighted Laplacian of G , i.e., Ł ii = (cid:80) j ∈N i w ij , L ij = − w ij if j ∈ N i , and equal to otherwise. The set S n denotes theset of n × n real symmetric matrices. We use the notation Z = [ z i ] ni =1 where z i ’s are either the columns or rows of thematrix Z depending on the context. is the column vectorwith all entries equal to 1, and I is the identity matrix.
2. METHODOLOGY
Clearly, L is symmetric and positive semidefinite; and since G is connected, the nullspace of L is spanned by . In particular,consider the eigenvalue decomposition L = (cid:80) ni =1 λ i u i u (cid:62) i ;we have λ < λ ≤ . . . ≤ λ n and u = √ n . Recallthat we would like to compute L † = (cid:80) ni =2 1 λ i u i u (cid:62) i in a de-centralized way. First, we are going to describe a naive way tosolve this problem which would converge with a linear rate,but require storing and communicating n × n matrices amongthe neighboring nodes. Next, we discuss that L † can be com-puted in a distributed way using the (randomized) Kaczmarz(RK) method with significantly less communication burden. L † : Let θ ≥ λ and define ¯ L (cid:44) L + θn (cid:62) , i.e., ¯ L = θu u (cid:62) + (cid:80) ni =2 λ i u i u (cid:62) i ; hence, ¯ L − = L † + θ u u (cid:62) . To compute ¯ L − ,consider solving ( P ) : min X ∈ S n f ( X ) (cid:44) (cid:13)(cid:13) ¯ L X − I (cid:13)(cid:13) F .Note that f is strongly convex with modulus λ since θ ≥ λ ; moreover, such θ can be chosen easily in certain cases. Forinstance, for unweighted G , i.e., w ij = 1 for ( i, j ) ∈ E , itis known that λ ≤ min i ∈N d i ; hence, θ could be chosenafter running a min-consensus algorithm over G . To solve ( P ) in a decentralized manner, we will exploit connectivityof G . Let ¯ (cid:96) i ∈ R n be a column vector for i ∈ N such that ¯ L = [(¯ (cid:96) i ) (cid:62) ] i ∈N , i.e., (¯ (cid:96) i ) (cid:62) denotes the i -th row of ¯ L . ( P ) can be equivalently written as follows: ( P (cid:48) ) : min X i ∈ S n , i ∈N (cid:40)(cid:88) i ∈N (cid:13)(cid:13) X i ¯ (cid:96) i − e i (cid:13)(cid:13) : X i = X j ∀ ( i, j ) ∈ E (cid:41) , where e i denotes the i -th standard basis vector of R n . Al-though this problem is not strongly convex in [ X i ] i ∈N ,there is a way to regularize the objective ¯ f ([ X i ] i ∈N ) (cid:44) (cid:80) i ∈N (cid:13)(cid:13) X i ¯ (cid:96) i − e i (cid:13)(cid:13) to make it strongly convex. Indeed,it can be shown that for α > sufficiently large, ¯ f α (cid:44) ¯ f + αr is strongly convex in [ X i ] i ∈N , where r ([ X i ] i ∈N ) (cid:44) (cid:80) ( i,j ) ∈E (cid:107) X i − X j (cid:107) F ; and one can equivalently consider min { ¯ f α ([ X i ] i ∈N ) : X i = X j ( i, j ) ∈ E} . In particular,the algorithm EXTRA in [13] exploits a similar restrictedstrong convexity argument and achieves a linear convergencerate for the iterate sequence. That said, the communicationoverhead is the main problem with this approach of solving ( P (cid:48) ) . In fact, at each iteration k , each node i ∈ N com-municates its local estimate X ki to its neighbors N i ; thus,each iteration of these consensus based methods would re-quire O (2 |E| n ) real variable communications in total, e.g.,EXTRA. Next, we discuss the distributed implementation ofthe RK method to compute L † , which would prove itself as amore communication efficient and practical method. L † : Consider a consistent system Ax = b , where A = [ a (cid:62) i ] mi =1 ∈ R m × n and b ∈ R m . Suppose A has no rows with all ze-ros, and let x ∗ = argmin {(cid:107) x (cid:107) : Ax = b } . In [16], it isshown that x ∗ can be computed using a randomized Kacz-marz method. In particular, it follows from the results in [16]that starting from x ∈ Null ( A ) , the method displayed inAlgorithm 1 produces { x k } k ≥ such that E [ (cid:13)(cid:13) x k − x ∗ (cid:13)(cid:13) ] ≤ ρ k (cid:13)(cid:13) x − x ∗ (cid:13)(cid:13) for k ≥ with ρ (cid:44) − λ +min ( A (cid:62) HA ) where λ +min ( · ) denotes the smallest positive eigenvalue and H = (cid:80) mi =1 p i (cid:107) a i (cid:107) e i e (cid:62) i ; furthermore, − rank ( A ) ≤ ρ < . Notethat fixing p i = (cid:107) a i (cid:107) / (cid:107) A (cid:107) F gives us the randomized Kacz-marz in [17, 18]. Algorithm 1:
RK( { p i } mi =1 ) – Randomized Kaczmarz Initialization: x ∈ Null ( A ) for k ≥ do Pick i ∈ { , . . . , m } with probability p i x k +1 ← x k − (cid:107) a i (cid:107) ( a (cid:62) i x k − b i ) a i Note LL † = (cid:80) ni =2 u i u (cid:62) i and I = (cid:80) ni =1 u i u (cid:62) i ; hence, LL † = I − u u (cid:62) = I − n (cid:62) . Although the solution set X ∈ S n : L X = I − n (cid:62) } has infinitely many elements,it is well-known that L † is the unique solution to L † = argmin X ∈ S n {(cid:107) X (cid:107) F : L X = B } , (1)where B (cid:44) I − n (cid:62) . Let x l , b l ∈ R n for l ∈ N be columnvectors such that X = [ x l ] l ∈N and B = [ b l ] l ∈N , i.e., b l = e l − n . Note n columns of L † can be computed in parallel: x l ∗ (cid:44) argmin x ∈ R n {(cid:107) x (cid:107) : L x = b l } , l ∈ N , (2)i.e., L † = [ x l ∗ ] l ∈N . Since L † = , x n ∗ = − (cid:80) n − l =1 x l ∗ .Thus, one does not need to solve for all l ∈ N ; it suffices tocompute { x l ∗ } l ∈N \{ n } and calculate x n ∗ from these.Let { x l,k } k ≥ be the sequence generated when RK im-plemented on (2) for l ∈ N \ { n } . In Algorithm 2, we sum-marized the distributed nature of RK steps assuming that each i ∈ N has an exponential clock with rate r i > , and when itsclock ticks, the node i wakes up and communicates with itsneighbors j ∈ N i on G . More precisely, consider the result-ing superposition of these point processes, and let { t k } k ∈ Z + be the times such that one of the clocks ticks; hence, for all k ≥ , the node that wakes up at time t k is node i with prob-ability p i = r i / (cid:80) j ∈N r i , i.e., { t k } k ≥ denotes the arrivaltimes of a Poisson process with rate (cid:80) j ∈N r i . Algorithm 2:
D-RK ( { r i } i ∈N ) – Decentralized RK Initialization: x l, i ← for l ∈ N \ { n } and i ∈ N for k ≥ do At time t k , i ∈ N wakes up w.p. p i = r i (cid:80) j ∈N r i for l ∈ N \ { n } do Node i requests and receives x l,kj from j ∈ N i Node i computes and sends q l,ki to all j ∈ N i q l,ki = (cid:80) j ∈N i ∪{ i } L ij ( (cid:80) j ∈N i ∪{ i } L ij x l,kj − b li ) Each j ∈ N i ∪ { i } updates x l,k +1 j ← x l,kj − L ij q l,ki For k ≥ , let X k (cid:44) [ x l,k ] l ∈N be the concatenationof D-RK sequence, where x n,k (cid:44) − (cid:80) n − l =1 x l,k , and define S = diag ( s ) such that s i (cid:44) (cid:80) j ∈N i ∪{ i } L ij for i ∈ N . Ac-cording to [16, 17], for r i = s i , we get H = (cid:107)L(cid:107) F I , andthis implies linear convergence of { X k } k ≥ to L † with rate ρ = 1 − (cid:16) λ +min ( L ) (cid:107)L(cid:107) F (cid:17) , i.e., E [ (cid:13)(cid:13) X k − L † (cid:13)(cid:13) F ] ≤ ρ k (cid:13)(cid:13) L † (cid:13)(cid:13) F for k ≥ . Moreover, for each i ∈ N , when node i wakes up,D-RK requires d i ( n − communications – each commu-nication i sends/receives a real variable to/from a neighbor-ing node in N i ; hence, at each iteration, i.e., at each time anode wakes up, the expected number of communication periteration is N = (cid:80) i ∈N p i d i ( n − ≤ d max ( n − . Inparticular, for unweighted graphs, i.e., w ij = 1 for ( i, j ) ∈ E ,we have p i = d i ( d i +1)2 m + (cid:80) j ∈N d j for i ∈ N .Next, instead of (1), consider implementing D-RK on anormalized system S − / L X = S − / B to obtain better con-vergence rate in practice – i -th equation in this normalized system can be computed locally at i ∈ N . For this sys-tem, where all the rows have unit norm, one can set r i = r for some r > for all i ∈ N – hence, nodes wake upwith uniform probability, i.e., p i = n for i ∈ N ; for thischoice of equal clock rates, H = n I and { X k } k convergeslinearly to L † with rate ρ S (cid:44) − n λ +min ( L S − L ) . More-over, the expected number of communication per iteration is N = 4 m n − n ≤ m . In all experiments on small world ran-dom networks – see the definition in the numerical section,D-RK implemented on the normalized system worked muchbetter than directly implementing it on (1) (see Fig. 1). Weconjecture that for certain family of random graphs, n λ +min ( L S − L ) ≥ (cid:18) λ +min ( L ) (cid:107)L(cid:107) F (cid:19) (3) holds with high probability which would directly imply that ρ S ≤ ρ , i.e., D-RK on the normalized system would be faster.
3. NUMERICAL EXPERIMENTS
In this chapter, first we provide numerical experiments toshow that { R ij } ( i,j ) ∈E can be computed very efficiently in adecentralized fashion, and second, we demonstrate the ben-efits of using effective resistances in consensus algorithms. L † We tested D-RK and its normalized version on unweightedsmall-world type communication networks, and we comparedthese randomized methods with deterministic (cyclic) Kacz-marz method. Given positive integers n, m such that m ≥ n ,let E ∈ S n denote the adjacency matrix of the small-woldnetwork parameterized by ( n, m ) such that E i,i +1 = 1 for i = 1 , . . . , n − and E ,n = 1 , and the other m − n en-tries are chosen uniformly at random among the remainingupper diagonal elements of E and set to . We considered n ∈ { , } and for each n , we chose m such that the edgedensity, m/ ( n − n ) , is . or . . For each scenario, weplot the average of log log(1 + (cid:13)(cid:13) X k − L † (cid:13)(cid:13) F / (cid:13)(cid:13) L † (cid:13)(cid:13) F ) over100 sample paths versus k . The results show that the random-ized algorithms are slower than their deterministic counter-part; this is the price to pay for asynchronous computations.D-RK applied to the normalized system was also faster thanthe standard D-RK, i.e., numerically we see ρ S < ρ as sug-gested by the inequality (3). We also observed finite conver-gence on every sample path numerically – the finite numberof iterations required for convergence depended on the sam-ple path chosen; hence, averaging iterates over sample pathsled to the smooth curves reported in Fig. 1. Let y ∈ R n be a vector such that the i -th component repre-sents the initial value at node i , and let ¯ y (cid:44) (cid:80) ni =1 y i /n be theaverage. In consensus algorithms, the aim is to compute ¯ y ateach node in a distributed manner. As in Section 2, we assumethat each i ∈ N has an exponential clock with rate r i > ;however, now, we assume that when its clock ticks at time t k ,the node i wakes up and picks one of its neighbors j ∈ N i ig. 1 . Performance of D-RK and normalized D-RK on small-world G : top, left: ( n, m ) = (10 , , top, right: ( n, m ) = (10 , , bottom, left: ( n, m ) = (20 , , top, right: ( n, m ) = (20 , . with probability p ij ∈ (0 , , i.e., (cid:80) j ∈N i p ij = 1 . Next,nodes i and j exchange their local variables y ki and y kj . Weassume that each node i ∈ N knows { R ij } j ∈N . We will becomparing two different consensus protocols, where in bothprotocols nodes operate as in Algorithm 3 but with different { p i } i ∈N and { p ij } j ∈N i for i ∈ N . Algorithm 3:
Randomized Gossiping Initialization: y = [ y , y , . . . , y n ] (cid:62) ∈ R n for k ≥ do At time t k , i ∈ N wakes up w.p. p i Picks j ∈ N i randomly w.p. p ij y k +1 i ← y ki + y kj , y k +1 j ← y ki + y kj . Classic Randomized Gossiping:
At each iteration k , eachedge ( i, j ) ∈ E has equal probability of being activated. If anedge ( i, j ) is activated at iteration k the nodes take average oftheir decision variables y ki and y kj . This algorithm admits anasynchronous implementation – see, e.g., [14]. In our node-wake-up based asynchronous setting, the same behavior canbe achieved if each node i wakes up with equal probability p i = n , i.e., using uniform clock rates r i = r > for i ∈ N ,and node i picks ( i, j ) w.p. p ij = d i for all j ∈ N i . Randomized Gossiping with Effective Resistances:
Thisalgorithm is similar to classical randomized gossiping, withthe only difference that edges are sampled with non-uniformprobabilities proportional to effective resistances { R ij } ( i,j ) ∈N .In our node-wake-up based asynchronous setting, the samebehavior can be achieved if each node i wakes up withprobability p i = (cid:80) j ∈N i R ij (cid:80) ( i,j ) ∈E R ij , i.e., setting clock rate r i = (cid:80) j ∈N i R ij for i ∈ N , and node i picks ( i, j ) w.p. p ij = R ij (cid:80) j ∈N i R ij for all j ∈ N i .We compare the performance of both protocols over an Fig. 2 . Performance of classic vs effective resistance based gossip-ing on barbell K − K : left: Relative error vs k , right: Averageof left and right lobes vs k for both protocols. unweighted barbell graph K n − K n with n nodes. Sucha graph is illustrated in Fig. 3. In our experiment, we set n = 20 . Let N R = { , . . . , } and N L = { , . . . , } represent the node sets in right and left lobes (the subgraphof K n on the right and left) of the barbell graph. To initial-ize y , we sample y i from N (100 , for i ∈ N L and y i from N (0 , for i ∈ N R – this way both lobes have sig-nificantly different local means. On the left of Fig. 3, weplot log log(1 + (cid:13)(cid:13) y k − ¯ y (cid:13)(cid:13) / | ¯ y | ) ; and on the right, we plot (cid:80) i ∈N L y ki and (cid:80) i ∈N R y ki vs k for both protocols. Theresults show that randomized gossiping with effective resis-tances is much faster. Fig. 3 . Barbell graph K n − K n with nodes
4. CONCLUSIONS AND FUTURE WORK
In this work, we developed a distributed algorithm for com-puting effective resistances over an undirected graph G . Ourmethod builds on an efficient, distributed and asynchronousimplementation of the Kaczmarz method for solving linearLaplacian systems L x = b . We also presented an applicationof our algorithm to the consensus problem.As part of our future work, we will investigate the finiteconvergence properties of this, suggested by the experiments.We will also study the inequality (3) further which was sat-isfied for a wide class of random graph models in our tests.Finally, we will investigate the applications of effective resis-tances to a wide class of distributed optimization algorithmswhich contain consensus-like iterations including distributedproximal-gradient algorithm (DPGA) and ADMM. In partic-ular, one could design the communication matrix W for theDPGA-W method in [19] using effective resistances by set-ting W ij = − R ij for ( i, j ) ∈ E and W ii = − (cid:80) j ∈N i R ij .Similarly, it would be interesting to design the communica-tion matrix in ADMM [20] using effective resistances for im-proving its performance over for optimization problems de-fined over ill-conditioned graphs. . REFERENCES [1] D. J. Klein, “Resistance-distance sum rules,” Croaticachemica acta , vol. 75, no. 2, pp. 633–649, 2002.[2] D. J. Klein and M. Randi´c, “Resistance distance,”
Jour-nal of Mathematical Chemistry , vol. 12, no. 1, pp. 81–95, 1993.[3] A. Ghosh, S. Boyd, and A. Saberi, “Minimizing effec-tive resistance of a graph,”
SIAM review ∼ aldous/RWG/book.html.[5] P. G. Doyle and J. L. Snell, Random walks and electricnetworks , Mathematical Association of America,, 1984.[6] P. Barooah and J. P. Hespanha, “Graph effective resis-tance and distributed control: Spectral properties andapplications,” in
Proceedings of the 45th IEEE Con-ference on Decision and Control , Dec 2006, pp. 3479–3485.[7] D. A. Spielman and N. Srivastava, “Graph sparsificationby effective resistances,”
SIAM Journal on Computing ,vol. 40, no. 6, pp. 1913–1926, 2011.[8] N. Anari and S. O. Gharan, “Effective-resistance-reducing flows, spectrally thin trees, and asymmetrictsp,” in , Oct 2015, pp. 20–39.[9] A. Tizghadam and A. Leon-Garcia, “Betweenness cen-trality and resistance distance in communication net-works,”
IEEE Network , vol. 24, no. 6, pp. 10–16,November 2010.[10] A. Jadbabaie, “On geographic routing without loca-tion information,” in
Decision and Control, 2004. CDC.43rd IEEE Conference on . IEEE, 2004, vol. 5, pp. 4764–4769.[11] M. Kapralov and R. Panigrahy, “Spectral sparsificationvia random spanners,” in
Proceedings of the 3rd In-novations in Theoretical Computer Science Conference .ACM, 2012, pp. 393–398.[12] R. B. Bapat, I. Gutmana, and W. Xiao, “A simplemethod for computing resistance distance,”
Zeitschriftf¨ur Naturforschung A , vol. 58, no. 9-10, pp. 494–498,2003.[13] W. Shi, Q. Ling, G. Wu, and W. Yin, “Extra: An exactfirst-order algorithm for decentralized consensus opti-mization,”
SIAM Journal on Optimization , vol. 25, no.2, pp. 944–966, 2015. [14] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Ran-domized gossip algorithms,”
IEEE/ACM Transactionson Networking (TON) , vol. 14, no. SI, pp. 2508–2530,2006.[15] Alex Olshevsky, “Linear time average consensus onfixed graphs and implications for decentralized op-timization and multi-agent control,” arXiv preprintarXiv:1411.4186 , 2016.[16] R. M. Gower and P. Richt´arik, “Stochastic dualascent for solving linear systems,” arXiv preprintarXiv:1512.06890 , 2015.[17] T. Strohmer and R. Vershynin, “A randomized kacz-marz algorithm with exponential convergence,”
Journalof Fourier Analysis and Applications , vol. 15, no. 2, pp.262–278, 2009.[18] A. Zouzias and Nikolaos M. Freris, “Randomized ex-tended Kaczmarz for solving least squares,”
SIAM Jour-nal on Matrix Analysis and Applications , vol. 34, no. 2,pp. 773–793, 2013.[19] N. S. Aybat, Z. Wang, T. Lin, and S. Ma, “DistributedLinearized Alternating Direction Method of Multipliersfor Composite Convex Consensus Optimization,” arXivpreprint arXiv:1512.08122, accepted to IEEE Transac-tions on Automatic Control , Dec. 2015.[20] S. Boyd, N. Parikh, Er. Chu, B. Peleato, and J. Eckstein,“Distributed optimization and statistical learning via thealternating direction method of multipliers,”