Sparse Output Feedback Synthesis via Proximal Alternating Linearization Method
11 Sparse Output Feedback Synthesis viaProximal Alternating Linearization Method
Fu Lin and Veronica Adetola
Abstract —We consider the co-design problem of sparse outputfeedback and row/column-sparse output matrix. A row-sparse(resp. column-sparse) output matrix implies a small number ofoutputs (resp. sensor measurements). We impose row/column-cardinality constraint on the output matrix and the cardi-nality constraint on the output feedback gain. The resultingnonconvex, nonsmooth optimal control problem is solved byusing the proximal alternating linearization method (PALM).One advantage of PALM is that the proximal operators forsparsity constraints admit closed-form expressions and are easy toimplement. Furthermore, the bilinear matrix function introducedby the multiplication of the feedback gain and the outputmatrix lends itself well to PALM. By establishing the Lipschitzconditions of the bilinear function, we show that PALM is globallyconvergent and the objective value is monotonically decreasingthroughout the algorithm. Numerical experiments verify theconvergence results and demonstrate the effectiveness of ourapproach on an unstable system with 60,000 design variables.
Keywords:
Bilinear matrix function, proximal alternatinglinearization method, row/column-sparse matrix, static outputfeedback. I. I
NTRODUCTION
Recent years have seen progress on the design of sparse,structured feedback controllers [1]–[11]. One driving force forthis research direction is its wide range of applications in thecontrol of complex systems, including power systems [12],[13], multi-agent systems [14], [15], oscillator networks [16],[17], and social networks [18]. A recent survey on the devel-opment of this research effort can be found in [19].A diverse set of tools for optimal sparsity control havebeen developed and tailored to specific design requirements.In [1], an augmented Lagrangian method was proposed for thestructured state feedback problem. In [2], [3], sparse LQR statefeedback controllers were obtained via the alternating directionmethod of multipliers. In [4], an approach based on linearmatrix inequality was proposed for the row/column sparsefeedback problem. In [5], a convex–concave decompositionmethod for bilinear matrix inequality was shown effective forstatic output feedback problems. In [6], a rank constrainedoptimization method was developed for the sparse outputfeedback design. In [9], a sparse H output feedback controllerthat resembles the centralized controller in frequency charac-teristics was proposed. In [7], [10], localized output feedbackcontrollers with communication delay were developed.In this work, we design of the output feedback and theoutput matrix simultaneously. The motivation for this co- F. Lin and V. Adetola are with Systems Department, United Technolo-gies Research Center, 411 Silver Ln, East Hartford, CT 06118. E-mail: { linf,adetova } @utrc.utc.com. design output feedback problem is two-fold. First, outputfeedback controllers require fewer sensors than state feedbackcontrollers. One may have a limited budget for the numberof sensors and is thus constrained to output feedback design.Second, it is useful to estimate the tradeoff between thenumber of sensors and the number of communication linksfor the output controllers. In practice, it is challenging tostrike a good balance between the choice of sensor networksand the communication networks. Our work is a step to thisdirection by including the output matrix in the design process.Co-design problems of linear systems with system matriceshave been considered in [20], [21].The placement of sensors and actuators for feedback controlhas been an active research topic [22]–[25]. In [23], a two-partcost function was proposed for state feedback with full infor-mation and the state estimation with candidate sensors. Fromsystem integration and cost perspective, it is desired to use theleast number of sensors to achieve the required performanceobjective [24], [25]. In this context, we design simultaneouslyfeedback sensor structure and sparse feedback gains to reducethe sensing cost and the number of communication links indistributed control.We impose the row/column-sparsity condition on the outputmatrix and sparsity condition on the output feedback gain. Inparticular, we employ the row/column cardinality constraint inorder to directly control the number of nonzero rows/columnsof the output matrix. The nonconvex, nonsmooth optimalcontrol problem is solved by using the proximal alternatinglinearization method (PALM). We establish the global conver-gence of PALM by proving the Lipschitz conditions of thebilinear matrix function. Furthermore, when the closed-loopperformance index satisfies the Kurdyka-Lojasiewicz property,we show that PALM is guaranteed to converge to a criticalpoint of the optimal control problem.The presentation is organized as follows. In Section II,we formulate the co-design output feedback problem. InSection III, we develop the PALM algorithm and in Sec-tion IV, we provide the convergence analysis. In Section V, wedemonstrate the convergence behavior of PALM via numericalexperiments. In Section VI, we summarize our contributions.II. C O - DESIGN OUTPUT FEEDBACK PROBLEM
Consider the static output feedback design ˙ x ( t ) = Ax ( t ) + B d ( t ) + B u ( t ) y ( t ) = Cx ( t ) u ( t ) = − Ky ( t ) a r X i v : . [ m a t h . O C ] J un where x ( t ) ∈ R n is the state, d ( t ) ∈ R q is the disturbanceinput, u ( t ) ∈ R m is the control input, and y ( t ) ∈ R p is themeasured output.In this work, we design both the output matrix C ∈ R p × n and the output feedback gain K ∈ R m × p simultaneously.We impose sparsity conditions on both design variables. Thecardinality of the output feedback K is defined ascard ( K ) := number of nonzero entries of K. A sparser K implies a smaller number of communicationchannels from the sensors to the actuators. We are interestedin the output matrix C with sparse rows or sparse columns.Because a row-sparse C implies a small number of outputs,while a column-sparse C implies a small number of sensors tomeasure the states. The row-cardinality of a matrix is definedas card row ( C ) := number of nonzero rows of C. Or equivalently,card row ( C ) = n (cid:88) i =1 card ( (cid:107) C i (cid:107) ) , where C i denotes the i th row of C and (cid:107) · (cid:107) denotes theEuclidean norm. Column-cardinality of C is equal to the row-cardinality of its transpose, C T . In what follows, we use row-sparsity without loss of generality.The co-design problem of the sparse output feedback canbe expressed as follows:minimize K,C,F J ( F ) subject to F = KC card ( K ) ≤ s card row ( C ) ≤ r, (1)where s and r are prespecified positive integers. Here, J isa user-specified performance index of the closed-loop system.We assume that J is bounded below for all F . When A − BF is not Hurwitz, J is defined as the positive infinity.Problem (1) is a nonconvex, nonsmooth optimal controlproblem. Because the cardinality constraints are nonconvex,nonsmooth, and the bilinear constraint F = KC is nonconvex.This difficulty limits the number of solution algorithms sinceexiting algorithms typically require convexity or smoothnessor both properties [26]–[29]. One may relax the cardinalityconstraint by using the convex surrogates such as the (cid:96) norm.It is noteworthy that the PALM algorithm can handle bothconvex and nonconvex penalty functions [29].We next put the co-design problem into a formulationsuited to PALM, the proximal alternating linearization method,originally proposed for generic nonconvex, nonsmooth prob-lems [29]. We begin by penalizing the difference between F and KC in the cost functionminimize K,C,F J ( F ) + γ (cid:107) F − KC (cid:107) F subject to card ( K ) ≤ s, card row ( C ) ≤ r, (2)where γ is a sufficiently large, positive coefficient and (cid:107) ·(cid:107) F denotes the Frobenius norm. By introducing the indicator function f ( K ) := (cid:26) , card ( K ) ≤ s ∞ , otherwise (3)for the cardinality constraint, and the indicator function g ( C ) := (cid:26) , card row ( C ) ≤ r ∞ , otherwise (4)for the row-cardinality constraint, problem (2) can be ex-pressed asminimize K,C,F
Φ := f ( K ) + g ( C ) + J ( F ) + H ( K, C, F ) . (5)Note that Φ is separable with respect to K , C , and F exceptfor the coupling function H ( K, C, F ) := γ (cid:107) F − KC (cid:107) F . (6)It turns out that this bilinear matrix function lends itself wellto PALM. In particular, the Lipschitz constants of the partialgradient of H can be calculated explicitly, which facilitates theimplementation of PALM and the proof of its convergence.III. P ROXIMAL ALTERNATING LINEARIZATION METHOD
PALM falls in the class of proximal methods for nonconvex,nonsmooth optimization problems recently developed in [26]–[29]. It is also closely related to the alternating directionmethod of multipliers for convex problems. In this section,we show that the co-design problem is well suited to PALM;in particular, the proximal operators for the sparsity constraintscan be computed efficiently.The PALM algorithm computes the minimum of the prox-imal functions iteratively K k +1 := argmin K (cid:110) f ( K ) + a k (cid:107) K − X k (cid:107) F (cid:111) , (7a) C k +1 := argmin C (cid:26) g ( C ) + b k (cid:107) C − Y k (cid:107) F (cid:27) , (7b) F k +1 := argmin F (cid:110) J ( F ) + c k (cid:107) F − Z k (cid:107) F (cid:111) . (7c)The quadratic term (cid:107) K − X k (cid:107) F encourages the solution of (7a)to be in the proximity of X k , where X k = K k − a k ∇ K H ( K k , C k , F k ) . Note that X k is a linear combination of the current iterate K k and the partial gradient of H with respect to K , hencethe name linearization in PALM. A key requirement for theconvergence of PALM is that the coefficient a k be chosento be greater than the Lipschitz constant of ∇ K H . For fixed ( C k , F k ) , the Lipschitz constant L satisfies (cid:107)∇ K H ( K , C k , F k ) − ∇ K H ( K , C k , F k ) (cid:107)≤ L ( C k , F k ) (cid:107) K − K (cid:107) for all K and K . We set a k = γ L for some γ > .Similarly, the proximal points ( Y k , Z k ) are linear combi-nation of the current iterate ( C k , F k ) and the partial gradients ( ∇ C H, ∇ F H ) , Y k = C k − b k ∇ C H ( K k +1 , C k , F k ) ,Z k = F k − c k ∇ F H ( K k +1 , C k +1 , F k ) . Let L and L be the Lipschitz constants of ∇ C H and ∇ F H ,respectively. That is, L satisfies (cid:107)∇ C H ( K k +1 , C , F k ) − ∇ C H ( K k +1 , C , F k ) (cid:107)≤ L ( K k +1 , F k ) (cid:107) C − C (cid:107) for all C and C , and L satisfies (cid:107)∇ F H ( K k +1 , C k +1 , F ) − ∇ F H ( K k +1 , C k +1 , F ) (cid:107)≤ L ( K k +1 , C k +1 ) (cid:107) F − F (cid:107) for all F and F . We set b k = γ L and c k = γ L forconstants γ , γ > . A. Lipschitz conditions
The Lipschitz conditions of the partial gradient of H arecritical for the global convergence of PALM. Furthermore, theLipschitz conditions are necessary for the implementation ofPALM because they determine the coefficients a k , b k , c k inthe proximal operators (7). The Lipschitz constants for the co-design problem can be computed via a closed-form expression.This is because the partial gradient of the bilinear couplingfunction is linear; in particular, the partial gradients of H withrespect to K , C , and F are given by ∇ K H = γ ( KC − F ) C T , ∇ C H = γK T ( KC − F ) , ∇ F H = γ ( F − KC ) . Since ∇ K H , ∇ C H , and ∇ F H are linear functions of K , C ,and F , respectively, it follows that the Lipschitz constants aregiven by L = γ (cid:107) CC T (cid:107) F , L = γ (cid:107) K T K (cid:107) F , L = γ. (8) B. Explicit formulas for proximal operators
We next show that the proximal operators (7a) and (7b) canbe computed efficiently via explicit formulas. As a result, theimplementation of PALM is particularly simple.The proximal operator (7a) can be written asminimize K a k (cid:107) K − X k (cid:107) F subject to card ( K ) ≤ s. The solution is obtained by keeping the s largest entries of X k in magnitude and set the remaining entries to zero. Thisresult is well known; e.g., see [29]. Let X ks be the s th largestentry of X k in magnitude and let I ks ∈ R m × n be such that ( I ks ) ij = (cid:26) if | X kij | ≥ X ks otherwise . The solution is obtained by truncating the entries whosemagnitude is less than X ks K k +1 = X k ◦ I ks , (9) where ◦ denotes the entry-wise multiplication of matrices.When (cid:96) norm is used to promote sparsity, an efficientalgorithm for the projection to the (cid:96) ball can be found in [30].The proximal operator (7b) can be written asminimize C b k (cid:107) C − Y k (cid:107) F subject to card row ( C ) ≤ r. Similar to the entry-wise truncation, the row-wise truncationamounts to keeping the r largest rows of Y k in Euclideannorm and set the remaining rows to zero. Let δ k be the r thlargest element of {(cid:107) Y ki (cid:107)} ni =1 where Y ki denotes the i th rowof Y k . Define a binary vector v kr of length n as follows ( v kr ) i = (cid:26) if (cid:107) Y ki (cid:107) ≥ δ k otherwise . Then the row truncation of Y k can be expressed as C k +1 = Y k ◦ ( v kr T ) , (10)where ∈ R n is the vector of all ones. For column sparsityconstraint, apply the truncation operator to the rows of C T . C. Computation of proximal operator (7c)One advantage of PALM is that it allows the computation ofthe proximal operator (7c) to be independent of other proximaloperators (7a)-(7b). In other words, it does not rely on aspecific performance index J in the minimization problemminimize F J ( F ) + c k (cid:107) F − Z k (cid:107) F . (11)This feature of separability has been noted in the statefeedback design by using the alternating direction method ofmultipliers (ADMM); see [3].For example, we consider the closed-loop H norm fromthe disturbance d to the performance output z := [ x T Q / , u T R / ] T , where Q and R are positive definite matrices. In [3] theAnderson-Moore method was developed for the minimizationstep (11); see Appendix for details.We summarize PALM in Algorithm 1.IV. C ONVERGENCE ANALYSIS
The global convergence of PALM for nonconvex, nons-mooth problems are analyzed in [28], [29]. In this section, webuild on the results in [29] and show the global convergence ofPALM for the co-design problem. Furthermore, the objectivevalue is monotonically decreasing throughout the PALM algo-rithm. When the performance index J satisfies the so-calledKL property, PALM is guaranteed to converge to a criticalpoint. The proofs can be found in Appendix.We begin with a technical lemma on the Lipschitz condi-tions of Φ . Lemma 1:
The objective function Φ in (5) satisfies thefollowing properties:1) inf K,C,F Φ( K, C, F ) > −∞ , inf K f ( K ) > −∞ , inf C g ( C ) > −∞ , and inf F J ( F ) > −∞ . Algorithm 1
PALM for nonconvex, nonsmooth problem (5)Initialization: Start with any ( K , C , F ) . for k = 0 , , , . . . until convergence do // K -minimization stepCompute the Lipschitz constant L = γ (cid:107) C k C kT (cid:107) F .Compute a k = γ L ( C k ) and the partial gradient. ∇ K H ( K k , C k ) = γ ( K k C k − F k ) C kT .Update X k = K k − a k ∇ K H ( K k , C k ) .Perform the entry-wise truncation of X k by using (9).// C -minimization stepCompute the Lipschitz constant L = γ (cid:107) ( K k +1 ) T K k +1 (cid:107) F .Compute b k = γ L ( K k +1 ) and the partial gradient ∇ C H ( K k +1 , C k ) = ( K k +1 ) T ( K k +1 C k − F k ) .Update Y k = C k − b k ∇ C H ( K k +1 , C k , F k ) .Perform the row-wise truncation of Y k by using (10).// F -minimization stepWhen J is the closed-loop H norm, employ theAnderson-Moore method in Appendix D. end for
2) The partial gradients ∇ K H , ∇ C H , and ∇ F H areglobally Lipschitz.3) There exist bounded constants q − i , q + i > for i = 1 , , such that the Lipschitz constants in (8) are bounded inf k { L ki } ≥ q − i and sup k { L ki } ≤ q + i . (12)4) The entire gradient ∇ H is Lipschitz continuous on thebounded subsets of R m × n × R n × n × R m × n .Property 1) ensures that proximal operators in PALM arewell defined and the minimization of Φ is also well defined.Property 2) on the boundedness of the Lipschitz constants iscritical for convergence. Note that the block-Lipschitz propertyin K , C , and F is weaker than standard assumptions inproximal methods that require Φ to be globally Lipschitz in joint variables ( K, C, F ) ; see [29]. Property 3) guarantees thatthe Lipschitz constants for the partial gradients are lower andupper bounded by finite numbers. Property 4) is a technicalcondition for controlling the distance between two consecutivesteps in the sequence ( K k , C k , F k ) . This is a mild conditionthat holds when H is twice continuously differentiable asin (6). Assumption 1:
The closed-loop performance metric J ( F ) : R m × n → ( −∞ , + ∞ ] is a proper, lower semicontinuousfunction.Here J is defined as the positive infinity for an unstable statefeedback gain F . From Lemma 1 and from the convergenceresults established in Lemma 3.3 of [29] for generic PALM, itfollows that the objective value Φ is monotonically decreasingin PALM. Specifically, we have the following result. Proposition 1:
Suppose that Assumption 1 holds. Let G k :=( K k , C k , F k ) be a sequence generated by Algorithm 1. Then δ (cid:107) G k +1 − G k (cid:107) F < Φ( G k ) − Φ( G k +1 ) , ∀ k ≥ where δ = min { ( γ i − q − i } for i = 1 , , .Note that δ > throughout PALM iterations because γ i > for i = 1 , , (see Algorithm 1) and q − i > for i = 1 , , (see Lemma 1). Thus, the convergence of the decision variable G k can be measured by the convergence of the objectivevalue Φ . The numerical experiments in Section V verify thisconvergence behavior.Proposition 1 guarantees global convergence of PALM start-ing from any initial point. We next show that PALM convergesto a critical point of Φ when the closed-loop performancemetric satisfies the Kurdyka-Lojasiewicz (KL) property; seeAppendix A for definition. Lemma 2:
If the performance index J in (5) satisfies theKL property, then Φ in (5) satisfies the KL property.The KL property of Φ established in Lemma 2 allows us toinvoke the convergence results in [29]. Proposition 2:
Let G k = ( K k , C k , F k ) be a sequencegenerated by Algorithm 1. If the performance index J in (5)satisfies the KL property, then the following results hold.1) The sequence { G k } has a finite length, that is, ∞ (cid:88) k =1 (cid:107) G k +1 − G k (cid:107) F < ∞ .
2) The sequence { G k } converges to a critical point G ∗ =( K ∗ , C ∗ , F ∗ ) of Φ .Proposition 2 follows from Lemma 1 and the convergenceresult for generic PALM; see Theorem 3.1 in [29]. Remark 1 (Comparison with ADMM):
The convergenceanalysis of ADMM typically relies on convexity assump-tion [31]. As aforementioned, no convexity assumption isrequired for PALM. Another noteworthy point is that ADMMis primarily used for two-block problems (i.e., two variableswith a coupling constraint), while the co-design problem (5)is a three-block problem. It is shown in [32] via a counterex-ample that direct extension of ADMM for multi-block convexproblem may not converge. In contrast, the convergence ofPALM for multi-block problems has been established in [29].
Remark 2 (KL property and semi-algebraic functions):
While it may not be straightforward to establish the KLproperty for a given function, it is useful to show the semi-algebraic property; see Appendix A for definition. Moreimportantly, a variety of nonsmooth functions that arise inmodern applications can be shown KL via the semi-algebraicanalysis, for example, all polynomial functions, indicator func-tions of semi-algebraic sets, finite sums and product of semi-algebraic functions, composition of semi-algebraic functions,supremum/infimum functions of semi-algebraic functions. Fur-thermore, several important sets are semi-algebraic, includingthe cone of positive semidefinite matrices, Stiefel manifolds,and matrices with constant rank. More details on the KLproperty and its relation to semi-algebraic functions can befound in [26]–[29].
Remark 3 (Convergence rate):
Convergence rate of PALMfor nonconvex, nonsmooth problems with KL property is stillan on-going research topic. For semi-algebraic problems withspecial forms, a desingularizing technique has been devel-oped to characterize the convergence rate. Depending on the desingularization parameters of the semi-algebraic functions,PALM converges with a finte number of steps, with a linearconvergence rate, or with a sublinear rate [26]. Our numericalexperience suggests a linear convergence rate for the co-designoutput feedback problem; see Section V.V. N
UMERICAL EXPERIMENTS
In this section, we illustrate the convergence property ofPALM for the sparse output feedback problem. We consider amass-spring system with design variables and an unstablesystem with , design variables. For both systems PALMfinds sparse solutions with prespecified sparsity levels in afew hundred steps. We take the closed-loop H norm asthe performance index with Q = I and R = 10 I . PALMis initialized with the state feedback LQR solution and theoutput matrix whose elements are all ones. A. Mass-spring system
We consider the mass-spring system with N = 10 massesconnected in series. Let x = [ p T , v T ] T where p and v ∈ R N denote the position and velocity of the masses, respectively.The state-space representation is given by A = (cid:20) O IT O (cid:21) ∈ R N × N ,B = B = (cid:20) OI (cid:21) ∈ R N × N where T ∈ R N × N is a tridiagonal Toeplitz matrix with − onthe main diagonal and on the first subdiagonal.The total number of unknown variables in C ∈ R × and K ∈ R × is 600. We set r = N nonzero columns in C and s = 2 N / nonzero elements in K . In other words, we take column-sparsity of C and entry-sparsity of K .Figure 1 shows the convergence results of the objectivevalue Φ and the error of variables in consecutive steps e kK = (cid:107) K k +1 − K k (cid:107) F ,e kC = (cid:107) C k +1 − C k (cid:107) F ,e kF = (cid:107) F k +1 − F k (cid:107) F . As predicted in Proposition 1, the objective value decreasesmonotonically with the PALM iterations. The errors betweentwo consecutive steps converge fast; in particular, it takes lessthan 300 iterations to reach e kK = 4 . × − , e kC = 8 . × − , and e kF = 7 . × − .The sparsity patterns of K and C are shown in Fig. 2. Notethat only the velocity of the masses is measured. On the otherhand, the sparsity pattern of K shows that the velocity of theneighboring masses is used to control the masses. The product F = KC is a column sparse matrix with the same column-sparsity pattern of C . Therefore, one only needs N sensors tomeasure the velocity of the masses to implement the sparseoutput feedback controller. PALM iteration index k Φ ( K k , C k , F k ) Monotonical decrease of
Φ0 100 200 300 40010 − − − PALM iteration index k E rr o r s i n c on s ec u ti v e s t e p s e kK e kC e kF Fig. 1: Convergence results of PALM for the mass-springsystem with column sparsity: The monotonic decreasing of Φ (top) and the convergence of the errors in two consecutivesteps for the variables (bottom). B. Distributed system
We next consider N = 100 identical unstable systemsin a square of × units; see Fig. 3. The state-spacerepresentation for the i -th system is given by ˙ x i = A ii x i + (cid:88) i (cid:54) = j A ij x j + B i u i + B i w i where A ii = (cid:20) (cid:21) , A ij = α ij I , and B i = (cid:20) (cid:21) . The coupling coefficient α ij is determined by the Euclideandistance between two systems, α ij = e −(cid:107) p i − p j (cid:107) , (13)where p i denotes the position of the i -th system.The total number of unknown variables in C ∈ R × and K ∈ R × is 60,000. We consider the co-designproblem with nonzero rows in C and nonzero entriesin K , in other words, r = 20 and s = 200 .Figure 4 shows the convergence results of the objectivevalue Φ and the error of variables in consecutive PALM steps.As in the mass-spring example, Φ is monotonically decreasingwith the PALM iterations. It takes less than 500 iterations toachieve e K ≤ . × − , e C ≤ − , and e kF ≤ . × − . It takes a few minutes on a laptop computer running Matlab 2016b with2.4 GHz CPU and 8GB RAM. N u m b e r o f i npu t s Number of outputs N u m b e r o f ou t pu t s Number of statesFig. 2: Sparsity pattern of K with nonzero entries (top)and column-sparsity pattern of C (bottom) for the mass-springsystem.Fig. 3: A network of unstable coupled systems randomlydistributed in a square. The coupling strength is determinedby the distance between two subsystems as in (13).Figure 5 shows the sparsity pattern of K and C . As required,the output matrix C has exactly r = 20 nonzero rows ( row-sparsity) and the output feedback gain K has exactly s =400 nonzero entries ( sparsity).VI. C ONCLUSIONS
We consider the co-design problem of output feedbackand output matrix simultaneously. We impose row/column-cardinality constraint to guarantee row/column sparsity onthe output matrix. We use the cardinality constraint to ob-tain sparse output feedback gain. The resulting nonconvex,nonsmooth problem is solved by using the PALM algorithm.We show the global convergence of PALM by establishingthe Lipschitz conditions of bilinear matrix function. When , PALM iteration index k Φ ( K k , C k , F k ) Monotonical decrease of
Φ0 500 1 , − − − PALM iteration index k E rr o r s i n c on s ec u ti v e s t e p s e kK e kC e kF Fig. 4: Convergence results of PALM for the coupled unstablesystem with row sparsity: The monotonic decreasing of Φ (top)and the convergence of the errors in two consecutive steps forthe variables (bottom).the closed-loop performance index satisfies the KL property,the PALM algorithm converges to a critical point. Numeri-cal results verify the convergence analysis and illustrate theeffectiveness of our approach.A PPENDIX
A. Definitions of KL functions and semi-algebraic functionsDefinition 1 (Kurdyka-Lojasiewicz property):
Let f : R d → ( −∞ , + ∞ ] be proper and lower semicontinuous. The function f is said to have the Kurdyka-Lojasiewicz (KL) property at ¯ u ∈ dom ∂ f := { u ∈ R d : ∂ f ( u ) (cid:54) = ∅} if there exist η ∈ (0 , + ∞ ] ,a neighborhood N of ¯ u , and a scalar-valued function ψ suchthat for all u ∈ N ∩ { f (¯ u ) < f ( u ) < f (¯ u ) + η } , the following inequality holds: ψ (cid:48) ( f ( u ) − f (¯ u )) · dist (0 , ∂ f ( u )) ≥ , where () (cid:48) denotes the derivative function and dist ( x, s ) :=inf {(cid:107) y − x (cid:107) : y ∈ s } denotes the distance from a point x ∈ R d to a set s ⊂ R d . A function f is called a KL function if f satisfies the KL property at each point of the domain of thegradient ∂ f . N u m b e r o f i npu t s Number of outputs N u m b e r o f ou t pu t s Number of statesFig. 5: Sparsity structure of K (top) and row-sparsity structureof C (bottom) with sparsity level for the distributedsystem. Definition 2 (Semi-algebraic function):
A subset S of R d isa real semi-algebraic set if there exists a finite number of realpolynomial functions g ij and h ij : R d → R such that S = p (cid:91) j =1 q (cid:92) i =1 { u ∈ R d : g ij ( u ) = 0 and h ij ( u ) < } . A function h : R d → ( −∞ , + ∞ ] is called semi-algebraicfunction if its graph { ( u , v ) ∈ R d +1 : h ( u ) = v } is a semi-algebraic subset of R d +1 .The connection between the KL functions and the semi-algebraic functions is provided by the following result. Proposition 3 (Theorem 5.1 in [29]):
A proper, lowersemicontinuous, and semi-algebraic function satisfies the KLproperty.
B. Proof of Lemma 1
Property 1) is a consequence of the coupling function H in (6), the indicator function f in (3) and g in (4),and the performance metric J in Assumption 1. Property 2)follows from the Lipschitz constants derived in (8). To showproperty 3), L ( F ) = γ is a constant throughout the PALMiterations. On the other hand, L ( C ) in (8) is bounded belowfor all C . Since C k is the minimizer of a feasible problemover a bounded set, it is bounded above for all k . Hence theentire sequence L ( C k ) satisfies the upper and lower bounds in (12). An analogous argument shows that the Lipschitzconstant L ( K ) satisfies (12). Finally, Property 4) is a directconsequence of the twice continuous differentiability of H . C. Proof of Lemma 2
Since KL functions are stable with respect to summationand since J is assumed to be a KL function, one needs toshow that f and g are KL functions. From Proposition 3, weproceed to show that the indicator functions f and g are semi-algebraic. To this end, we use the results that the indicatorfunction of the semi-algebraic set { K | card ( K ) ≤ s } is semi-algebraic. This is because the graph of the cardinality functioncan be represented by a finite union of piecewise linearsets; see [29, Example 5.2]. Similarly, the set of row/columnsparsity matrices is also semi-algebraic. Since the indicatorfunction of a semi-algebraic set is semi-algebraic, it followsthat f and g are semi-algebraic functions. This completes theproof. D. Anderson-Moore method
Let J be the closed-loop H norm from the disturbance d to the performance output z . The necessary and sufficientconditions for the optimality of (11) are determined by thefollowing coupled matrix equations [1], [3] RF − B T P ) L + c k ( F − Z k ) = 0 (14a) ( A − B F ) L + L ( A − B F ) T = − B B T (14b) ( A − B F ) T P + P ( A − B F ) = − ( Q + F T RF ) . (14c)When F is fixed, then (14b)-(14c) are two Lyapunov equationsin L and P . On the other hand, when L and P are fixed,then (14a) is a Sylvester equation in F . This observationmotivates the Anderson-Moore method [3], namely, solvingthe Sylvester equation for F and two Lyapunov equations for ( L, P ) iteratively. The descent property of the new directionin conjunction with the Armijo line-search guarantees conver-gence of this approach [2].The Anderson-Moore method is provided in Algorithm 2. Algorithm 2
Anderson-Moore methodInitialization: Start with a stabilizing F for l = 0 , , , . . . until convergence do Solve (14b)-(14c) to get the solutions ( L l , P l ) .Solve (14a) to get the solution ¯ F l .Form the direction ∆ F l = ¯ F l − F l .Determine stepsize α by using the Armijo rule.Update F l +1 = F l + α ∆ F l . end for R EFERENCES[1] F. Lin, M. Fardad, and M. R. Jovanovi´c, “Augmented Lagrangianapproach to design of structured optimal state feedback gains,”
IEEETransactions on Automatic Control , vol. 56, no. 12, pp. 2923–2929,2011. [2] ——, “Sparse feedback synthesis via the alternating direction method ofmultipliers,” in
Proceedings of the 2012 American Control Conference ,2012, pp. 4765–4770.[3] ——, “Design of optimal sparse feedback gains via the alternatingdirection method of multipliers,”
IEEE Transactions on AutomaticControl , vol. 58, no. 9, pp. 2426–2431, 2013.[4] B. Polyak, M. Khlebnikov, and P. Shcherbakov, “An LMI approachto structured sparse feedback design in linear control systems,” in
Proceedings of the 2013 European Control Conference , 2013, pp. 833–838.[5] Q. T. Dinh, S. Gumussoy, W. Michiels, and M. Diehl, “Combiningconvex–concave decompositions and linearization approaches for solv-ing BMIs, with application to static output feedback,”
IEEE Transactionson Automatic Control , vol. 57, no. 6, pp. 1377–1390, 2012.[6] R. Arastoo, N. Motee, and M. V. Kothare, “Optimal sparse outputfeedback control design: a rank constrained optimization approach,” arXiv preprint arXiv:1412.8236 , 2014.[7] Y.-S. Wang and N. Matni, “Localized distributed optimal control withoutput feedback and communication delays,” in ,2014, pp. 605–612.[8] Y. Wang, J. Lopez, and M. Sznaier, “Sparse static output feedbackcontroller design via convex optimization,” in , 2014, pp. 376–381.[9] R. Arastoo, M. Bahavarnia, M. V. Kothare, and N. Motee, “Out-put feedback controller sparsification via H -approximation,” IFAC-PapersOnLine , vol. 48, no. 22, pp. 112–117, 2015.[10] N. Matni, “Communication delay co-design in H distributed controlusing atomic norm minimization,” IEEE Transactions on Control ofNetwork Systems , vol. 4, no. 2, pp. 267–278, 2015.[11] Y.-S. Wang, N. Matni, and J. C. Doyle, “Separable and local-ized system level synthesis for large-scale systems,” arXiv preprintarXiv:1701.05880 , 2017.[12] F. D¨orfler, M. R. Jovanovi´c, M. Chertkov, and F. Bullo, “Sparsity-promoting optimal wide-area control of power networks,”
IEEE Trans.Power Syst. , vol. 29, no. 5, pp. 2281–2291, September 2014.[13] X. Wu, F. D¨orfler, and M. R. Jovanovi´c, “Input-output analysis and de-centralized optimal control of inter-area oscillations in power systems,”
IEEE Trans. Power Syst. , vol. 31, no. 3, pp. 2434–2444, May 2016.[14] F. Lin, M. Fardad, and M. R. Jovanovi´c, “Identification of sparsecommunication graphs in consensus networks,” in
Proceedings of the50th Annual Allerton Conference on Communication, Control, andComputing , Monticello, IL, 2012, pp. 85–89.[15] D. Zelazo, S. Schuler, and F. Allg¨ower, “Performance and design ofcycles in consensus networks,”
Systems & Control Letters , vol. 62, no. 1,pp. 85–96, 2013.[16] M. Fardad, F. Lin, and M. R. Jovanovi´c, “On the optimal synchronizationof oscillator networks via sparse interconnection graphs,” in
Proceedingsof the 2012 American Control Conference , Montr´eal, Canada, 2012, pp.4777–4782.[17] ——, “Design of optimal sparse interconnection graphs for synchro-nization of oscillator networks,”
IEEE Trans. Automat. Control , vol. 59,no. 9, pp. 2457–2462, September 2014.[18] M. Fardad, X. Zhang, F. Lin, and M. R. Jovanovi´c, “On the optimaldissemination of information in social networks,” in
Proceedings of the51th IEEE Conference on Decision and Control , Maui, HI, 2012, pp.2539–2544.[19] M. R. Jovanovi´c and N. K. Dhingra, “Controller architectures: tradeoffsbetween performance and structure,”
Eur. J. Control , vol. 30, pp. 76–91,July 2016.[20] P. V. Chanekar, N. Chopra, and S. Azarm, “A new formulation forco-design of linear systems with system matrices having affine designvariables,” in , 2016, pp. 507–513.[21] T. Liu, S. Azarm, and N. Chopra, “On decentralized optimizationfor a class of multi-subsystem co-design problems,” in
ASME 2016International Design Engineering Technical Conferences and Computersand Information in Engineering Conference , 2016, p. V02AT03A007.[22] K. Lim, “Method for optimal actuator and sensor placement for largeflexible structures,”
Journal of Guidance, Control, and Dynamics ,vol. 15, no. 1, pp. 49–57, 1992.[23] W. Gawronski and K. Lim, “Balanced actuator and sensor placementfor flexible structures,”
International Journal of Control , vol. 65, no. 1,pp. 131–145, 1996.[24] G. J. Balas and P. M. Young, “Sensor selection via closed-loop controlobjectives,”
IEEE Transactions on Control Systems Technology , vol. 7,no. 6, pp. 692–705, 1999. [25] C. P. Moreno, H. Pfifer, and G. J. Balas, “Actuator and sensor selectionfor robust control of aeroservoelastic systems,” in , 2015, pp. 1899–1904.[26] H. Attouch and J. Bolte, “On the convergence of the proximal algorithmfor nonsmooth functions involving analytic features,”
MathematicalProgramming , vol. 116, no. 1, pp. 5–16, 2009.[27] H. Attouch, J. Bolte, P. Redont, and A. Soubeyran, “Proximal alternatingminimization and projection methods for nonconvex problems: Anapproach based on the kurdyka-łojasiewicz inequality,”
Mathematics ofOperations Research , vol. 35, no. 2, pp. 438–457, 2010.[28] H. Attouch, J. Bolte, and B. F. Svaiter, “Convergence of descentmethods for semi-algebraic and tame problems: proximal algorithms,forward–backward splitting, and regularized Gauss–Seidel methods,”
Mathematical Programming , vol. 137, no. 1-2, pp. 91–129, 2013.[29] J. Bolte, S. Sabach, and M. Teboulle, “Proximal alternating linearizedminimization for nonconvex and nonsmooth problems,”
MathematicalProgramming , vol. 146, no. 1-2, pp. 459–494, 2014.[30] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra, “Efficient pro-jections onto the (cid:96) -ball for learning in high dimensions,” in Proceedingsof the 25th International Conference on Machine Learning , 2008, pp.272–279.[31] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributedoptimization and statistical learning via the alternating direction methodof multipliers,”