Numerical Optimization of Eigenvalues of Hermitian Matrix Functions
NNUMERICAL OPTIMIZATION OF EIGENVALUES OF HERMITIANMATRIX FUNCTIONS
EMRE MENGI ∗ , E. ALPER YILDIRIM † , AND
MUSTAFA KILIC¸ ‡ Abstract.
This work concerns the global minimization of a prescribed eigenvalue or a weightedsum of prescribed eigenvalues of a Hermitian matrix-valued function depending on its parametersanalytically in a box. We describe how the analytical properties of eigenvalue functions can be putinto use to derive piece-wise quadratic functions that underestimate the eigenvalue functions. Thesepiece-wise quadratic under-estimators lead us to a global minimization algorithm, originally due toBreiman and Cutler. We prove the global convergence of the algorithm, and show that it can beeffectively used for the minimization of extreme eigenvalues, e.g., the largest eigenvalue or the sum ofthe largest specified number of eigenvalues. This is particularly facilitated by the analytical formulasfor the first derivatives of eigenvalues, as well as analytical lower bounds on the second derivativesthat can be deduced for extreme eigenvalue functions. The applications that we have in mind alsoinclude the H ∞ -norm of a linear dynamical system, numerical radius, distance to uncontrollabilityand various other non-convex eigenvalue optimization problems, for which, generically, the eigenvaluefunction involved is simple at all points. Key words.
Hermitian eigenvalues, analytic, global optimization, perturbation of eigenvalues,quadratic programming
AMS subject classifications.
1. Introduction.
The main object of this work is a matrix-valued function A ( ω ) : R d → C n × n that is analytic and Hermitian at all ω ∈ R d . Here, we con-sider the numerical global minimization of a prescribed eigenvalue λ ( ω ) of A ( ω ) over ω ∈ B ⊆ R d , where B denotes a box. From an application point of view, a prescribedeigenvalue typically refers to the j th largest eigenvalue, i.e., λ ( ω ) := λ j ( A ( ω )), or aweighted sum of j largest eigenvalues, i.e., λ ( ω ) := (cid:80) jk =1 d k λ k ( A ( ω )) for given realnumbers d , . . . , d j . However, it may as well refer to a particular eigenvalue with re-spect to a different criterion as long as the (piece-wise) analyticity properties discussedbelow and in Section 3 are satisfied.The literature from various engineering fields and applied sciences is rich witheigenvalue optimization problems that fits into the setting of the previous paragraph.There are problems arising in structural design and vibroacoustics, for which theminimization of the largest eigenvalue or maximization of the smallest eigenvalue ofa matrix-valued function is essential, e.g., the problem of designing the strongest col-umn which originated from Euler in the 18th century [30]. In control theory, variousquantities regarding dynamical systems can be posed as eigenvalue optimization prob-lems. For instance, the distance from a linear dynamical system to a nearest unstablesystem [47], and the H ∞ -norm of a linear dynamical system have non-convex eigen-value optimization characterizations [3]. In graph theory, relaxations of some NP-hard ∗ Department of Mathematics, Ko¸c University, Rumelifeneri Yolu, 34450 Sarıyer, ˙Istanbul, Turkey ([email protected]) . The work of this author was supported in part by the European Commissiongrant PIRG-GA-268355 and the T ¨UB˙ITAK (The Scientific and Technological Research Council ofTurkey) Career Grant 109T660. † Department of Industrial Engineering, Ko¸c University, Rumelifeneri Yolu, 34450 Sarıyer-˙Istanbul, Turkey ([email protected]) . This author was supported in part by T ¨UB˙ITAK(The Scientific and Technological Research Council of Turkey) Grant 112M870 and by T ¨UBA-GEB˙IP(Turkish Academy of Sciences Young Scientists Award Program). ‡ Department of Mathematics, Ko¸c University, Rumelifeneri Yolu, 34450 Sarıyer, ˙Istanbul, Turkey ([email protected]) . The work of this author was partly supported by the European CommissionGrant PIRG-GA-268355. 1 a r X i v : . [ m a t h . NA ] M a y E. MENGI, E.A. YILDIRIM, AND M. KILIC¸ graph partitioning problems give rise to optimization problems in which the sum ofthe j largest eigenvalues is to be minimized [10].In this paper, we offer a generic algorithm based on the analytical properties ofeigenvalues of an analytic and Hermitian matrix-valued function, that is applicablefor any eigenvalue optimization problem whenever lower bounds on the second deriva-tives of the eigenvalue function can be calculated analytically or numerically. All ofthe existing global eigenvalue optimization algorithms in the non-convex setting aredesigned for specific problems, e.g., [3, 5, 6, 7, 15, 17, 19, 20, 21, 22, 32], while widelyadopted techniques such as interior point methods [34] - when it is possible to posean eigenvalue optimization problem as a semi-definite program - or a bundle method[31] are effective in the convex setting. We foresee non-convex eigenvalue optimizationproblems that depend on a few parameters as the typical setting for the use of thealgorithm here.For the optimization of non-convex eigenvalue functions, it appears essential tobenefit from the global properties of eigenvalue functions, such as their global Lip-schitzness or global bounds on their derivatives. Such global properties lead us toapproximate λ ( ω ) globally with under-estimating functions, which we call supportfunctions. Furthermore, the derivatives of the eigenvalue functions can be evaluatedeffectively at no cost once the eigenvalue function is evaluated (due to analytic ex-pressions for the derivatives of eigenvalues in terms of eigenvectors as discussed inSection 3.2.1). Therefore, the incorporation of the derivatives into the support func-tions yields quadratic support functions on which our algorithm relies. The quadraticsupport functions for eigenvalue functions are derived exploiting the analytical prop-erties of eigenvalues and presume the availability of a lower bound γ on the secondderivatives of the eigenvalue function that is obtained either analytically or numeri-cally. Example: Consider the minimization of the largest eigenvalue λ ( ω ) = λ ( A ( ω )) of A : R → R n × n , A ( ω ) := A + ωA + ω A , where A , A , A ∈ R n × n are given symmetric matrices. It can be deduced from theexpressions in Section 3.2.2 that λ (cid:48)(cid:48) ( ω ) ≥ γ := 2 λ min ( A ) for all ω such that λ ( ω ) is simple. Furthermore, due to expressions in Section 3.2.1, at all such ω , we have λ (cid:48) ( ω ) = v ( ω ) T ( A + 2 ωA ) v ( ω ) , where v ( ω ) is a unit eigenvector associated with λ ( ω ) . Consequently, it turns out that, about any ω k ∈ R where λ ( ω k ) is simple,there is a support function q ( ω ) := λ ( ω k ) + λ (cid:48) ( ω k )( ω − ω k ) + γ ω − ω k ) satisfying q ( ω ) ≤ λ ( ω ) for all ω ∈ R ; see Section 5.2 for the details. Support functions have earlier been explored by the global optimization com-munity. The Piyavskii-Shubert algorithm [40, 45] is derivative-free, and constructsconic support functions based on Lipschitz continuity with a known global Lipschitzconstant. It converges sub-linearly in practice. Sophisticated variants that makeuse of several Lipschitz constants simultaneously appeared in the literature [24, 43].The idea of using derivatives in the context of global optimization yields powerfulalgorithms. Breimann and Cutler [4] developed an algorithm that utilizes quadraticsupport functions depending on the derivatives. Some variants of the Breimann-Cutleralgorithm are also suggested for functions with Lipschitz-continuous derivatives; for ptimization of Hermitian Eigenvalues λ ( ω ) coincide with the quadratic support functions on which the Breimann-Cutleralgorithm is built on. Consequently, our approach is a variant of the algorithm dueto Breimann and Cutler [4].At every iteration of the algorithm, a global minimizer of a piece-wise quadraticmodel defined as the maximum of a set of quadratic support functions is determined.A new quadratic support function is constructed around this global minimizer, and thepiece-wise quadratic model is refined with the addition of this new support function.In practice, we observe a linear rate of convergence to a global minimizer.The algorithm appears applicable especially to extremal eigenvalue functions ofthe form λ ( ω ) = j (cid:88) k =1 d k λ k ( A ( ω )) , where d k are given real numbers such that d ≥ d ≥ · · · ≥ d j ≥
0. This is facilitatedby the simple quadratic support functions derived in Section 5.2, and expressions forthe lower bound γ on the second derivatives derived in Section 6. The algorithm isalso applicable if the eigenvalue function λ ( ω ) is simple over all ω ∈ B , which holdsfor various eigenvalue optimization problems of interest. Outline:
We start in the next section with a list of eigenvalue optimization problemsto which our proposed algorithm fits well. In Section 3, the basic results concerningthe analyticity and derivatives of the eigenvalues of a Hermitian matrix-valued func-tion A ( ω ) that depends analytically on ω are reviewed. In Section 4, for a generaleigenvalue function, the piece-wise quadratic support functions that are defined as theminimum of n quadratic functions are derived. In Section 5, it is shown that thesepiece-wise quadratic support functions simplify to smooth quadratic support functionsfor the extremal eigenvalue functions, as well as for the eigenvalue functions that aresimple for all ω ∈ B . Global lower bounds γ on the second derivatives of an extremaleigenvalue function are deduced in Section 6. The algorithm based on the quadraticsupport functions is presented in Section 7. We establish the global convergence of theproposed algorithm in Section 8. Finally, comprehensive numerical experiments areprovided in Section 9. The examples indicate the superiority of the algorithm over theLipschitz continuity based algorithms, e.g., [24, 40, 45], as well as the level-set basedapproaches devised for particular non-convex eigenvalue optimization problems, e.g.,[20, 32]. The reader who prefers to avoid technicalities at first could glance at thealgorithm in Section 7, then go through Sections 3-6 for the theoretical foundation.
2. Applications.2.1. Quantities Related to Dynamical Systems.
The numerical radius r ( A )of A ∈ C n × n is the modulus of the outer-most point in its field of values [23], and isdefined by r ( A ) := max {| z ∗ Az | | z ∈ C n s . t . (cid:107) z (cid:107) = 1 } . This quantity gives information about the powers of A , e.g., (cid:107) A k (cid:107) ≤ r ( A ) k , and isused in the literature to analyze the convergence of iterative methods for the solution E. MENGI, E.A. YILDIRIM, AND M. KILIC¸ of linear systems [1, 11]. An eigenvalue optimization characterization is given by [23]: r ( A ) = − (cid:20) min θ ∈ [0 , π ] λ n ( A ( θ )) (cid:21) , A ( θ ) := − ( Ae iθ + A ∗ e − iθ ) / . The H ∞ -norm is one of the most widely used norms in practice for the descriptorsystem Ex (cid:48) ( t ) = Ax ( t ) + Bu ( t ) , and y ( t ) = Cx ( t ) + Du ( t ) , where u ( t ) and y ( t ) are the input and output functions, respectively, and E, A ∈ C n × n , B ∈ C n × m , C ∈ C p × n , D ∈ C p × m with m, p ≤ n are the system matrices. The H ∞ -norm of the transfer function for this system is defined as (cid:107) H (cid:107) ∞ := 1inf ω ∈ R σ n [ H ( iω ) † ] , H ( s ) := (cid:2) C ( sE − A ) − B + D (cid:3) . Here and elsewhere, σ j ( · ) represents the j th largest singular value, and H ( iω ) † de-notes the pseudoinverse of H ( iω ). Also above, with zero initial conditions for thedescriptor system, the transfer function H ( s ) reveals the linear relation between theinput and output, as Y ( s ) = H ( s ) U ( s ) , with U ( s ) and Y ( s ) denoting the Laplacetransformations of u ( t ) and y ( t ), respectively. Note that the H ∞ -norm above is ill-posed (i.e., the associated operator is unbounded) if the pencil L ( λ ) = A − λE hasan eigenvalue on the imaginary axis or to the right of the imaginary axis. Therefore,when the H ∞ -norm is well-posed, the matrix-valued function A ( ω ) := H ( iω ) is ana-lytic at all ω ∈ R . A relevant quantity is the (continuous) distance to instability froma matrix A ∈ C n × n ; the eigenvalue optimization characterization for the H ∞ -normwith E = B = C = I n and D = 0 reduces to that for the distance to instability [47]from A with respect to the (cid:96) -norm.Paige [39] suggested the distance to uncontrollability, for a given A ∈ C n × n and B ∈ C n × m with m ≤ n , defined by τ ( A, B ) := inf (cid:8)(cid:13)(cid:13)(cid:2) ∆ A ∆ B (cid:3)(cid:13)(cid:13) | ( A + ∆ A, B + ∆ B ) is uncontrollable (cid:9) , as a robust measure of controllability. Here, the controllability of a linear controlsystem ( A, B ) of the form x (cid:48) ( t ) = Ax ( t ) + Bu ( t ) means that the function x ( t ) can bedriven into any state at a particular time by some input u ( t ), and could be equivalentlycharacterized as rank (cid:0)(cid:2) A − zI B (cid:3)(cid:1) = n, ∀ z ∈ C . Therefore, the eigenvalueoptimization characterization for the distance to uncontrollability takes the form [12]: τ ( A, B ) = min z ∈ C σ n ( A ( z )) , A ( z ) := (cid:2) A − zI B (cid:3) . In the 18th century, Euler considered the design of the strongest column with a givenvolume with respect to the radii of the cross-sections [30, 37]. The problem can be for-mulated as finding the parameters, representing the radii of cross-sections, maximizingthe smallest eigenvalue of a fourth order differential operator. The analytical solutionof the problem has been considered in several studies in 1970s and in 1980s [2, 33, 35],which were motivated by the earlier work of Keller and Tadjbakhsh [46]. Later, theproblem is treated numerically [8] by means of the finite-element discretization, givingrise to the problem min ω ∈ R d λ ( A ( ω )) . (2.1) ptimization of Hermitian Eigenvalues A ( ω ) := A + (cid:80) dj =1 ω j A j . In this affine setting, theminimization of the largest eigenvalue is a convex optimization problem (immediatefrom Theorem 6.1 below) and received considerable attention [14, 16, 36].In the general setting, when the dependence of the matrix function A ( ω ) onthe parameters is not affine, the problem in (2.1) is non-convex. Such non-convexproblems are significant (though they are not studied much excluding a few studiessuch as [38] that offer only local analysis) in robust control theory for instance toensure robust stability. The dual form that concerns the maximization of the smallesteigenvalue is of interest in vibroacoustics. j Largest Eigenvalues.
In graph the-ory, relaxations of the NP-hard partitioning problems lead to eigenvalue optimiza-tion problems that require the minimization of the sum of the j largest eigenval-ues. For instance, given a weighted graph with n vertices and nonnegative integers d ≥ d ≥ · · · ≥ d j summing up to n , consider finding a partitioning of the graphsuch that the (cid:96) th partition contains exactly d (cid:96) vertices for (cid:96) = 1 , . . . , j and the sumof the weights of the edges within each partition is maximized. The relaxation of thisproblem suggested in [10] is of the formmin ω ∈ R d j (cid:88) k =1 d k λ k ( A ( ω )) . (2.2)The problem (2.2) is convex, if A ( ω ) is an affine function of ω , as in the case consideredby [10], see also [9].Once again, in general, the minimization of the sum of the j largest eigenvaluesis not a convex optimization problem, and there are a few studies in the literaturethat attempted to analyze the problem locally for instance around the points wherethe eigenvalues coalesce [44].
3. Background on Perturbation Theory of Eigenvalues.
In this section,we first briefly summarize the analyticity results, mostly borrowed from [41, Chapter1], related to the eigenvalues of matrix-valued functions. Then, expressions [28] areprovided for the derivatives of Hermitian eigenvalues in terms of eigenvectors andthe derivatives of matrix-valued functions. Finally, we elaborate on the analyticity ofsingular value problems as special Hermitian eigenvalue problems.
For a univariate matrix-valued func-tion A ( ω ) that depends on ω analytically, which may or may not be Hermitian, thecharacteristic polynomial is of the form g ( ω, λ ) := det( λI − A ( ω )) = a n ( ω ) λ n + · · · + a ( ω ) λ + a ( ω ) , where a ( ω ) , . . . , a n ( ω ) are analytic functions of ω . It follows from the Puiseux’ the-orem (see, e.g., [48, Chapter 2]) that each root ˜ λ j ( ω ) such that g ( ω, ˜ λ j ( ω )) = 0 has aPuiseux series of the form ˜ λ j ( ω ) = ∞ (cid:88) k =0 c k,j ω k/r , (3.1)for all small ω , where r is the multiplicity of the root ˜ λ j (0). E. MENGI, E.A. YILDIRIM, AND M. KILIC¸
Now suppose A ( ω ) is Hermitian for all ω , and let (cid:96) be the smallest integer suchthat c (cid:96),j (cid:54) = 0. Then, we have lim ω → + ˜ λ j ( ω ) − ˜ λ j (0) ω (cid:96)/r = c (cid:96),j , which implies that c (cid:96),j is real, since ˜ λ j ( ω ) and ω (cid:96)/r are real numbers for each ω .Furthermore, lim ω → − ˜ λ j ( ω ) − ˜ λ j (0)( − ω ) (cid:96)/r = ( − (cid:96)/r c (cid:96),j is real, which implies that ( − (cid:96)/r is real, or equivalently that (cid:96)/r is integer. Thisobservation reveals that the first nonzero term in the Puiseux series of ˜ λ j ( ω ) is aninteger power of ω . The same argument applied to the derivatives of ˜ λ j ( ω ) andthe associated Puiseux series indicates that only integer powers of ω can appear inthe Puiseux series (3.1), that is the Puiseux series reduces to a power series. Thisestablishes that ˜ λ j ( ω ) is an analytic function of ω . Indeed, it can also be deducedthat, associated with ˜ λ ( ω ) , . . . , ˜ λ n ( ω ), there is an orthonormal set { v ( ω ) , . . . , v n ( ω ) } of eigenvectors, where each of v ( ω ) , . . . , v n ( ω ) varies analytically with respect to ω (see [41] for details). Theorem 3.1 (
Rellich ). Let A ( ω ) : R → C n × n be a Hermitian matrix-valuedfunction that depends on ω analytically. (i) The n roots of the characteristic polynomial of A ( ω ) can be arranged so thateach root ˜ λ j ( ω ) for j = 1 , . . . , n is an analytic function of ω . (ii) There exists an eigenvector v j ( ω ) associated with ˜ λ j ( ω ) for j = 1 , . . . , n thatsatisfies the following: (1) (cid:16) ˜ λ j ( ω ) I − A ( ω ) (cid:17) v j ( ω ) = 0 , ∀ ω ∈ R , (2) (cid:107) v j ( ω ) (cid:107) = 1 , ∀ ω ∈ R , (3) v ∗ j ( ω ) v k ( ω ) = 0 , ∀ ω ∈ R for k (cid:54) = j , and (4) v j ( ω ) is an analytic function of ω . The eigenvalues of a multivariatematrix-valued function A ( ω ) : R d → C n × n that depends on ω analytically do nothave a power series representation in general even when A ( ω ) is Hermitian. As anexample, consider A ( ω ) = (cid:20) ω ω + ω ω + ω ω (cid:21) with ˜ λ , ( ω ) = ω + ω ± (cid:114) ω + ω . On the other hand, it follows from Theorem 3.1 that, there are underlying eigenvaluefunctions ˜ λ j ( ω ) , j = 1 , . . . , n , of A ( ω ), each of which is analytic along every line in R d , when A ( ω ) is Hermitian. This analyticity property along lines in R d implies theexistence of the first partial derivatives of ˜ λ j ( ω ) everywhere. Expressions for the firstpartial derivatives will be derived in the next subsection, indicating their continuity.As a consequence of the continuity of the first partial derivatives, each ˜ λ j ( ω ) must bedifferentiable. Theorem 3.2.
Let A ( ω ) : R d → C n × n be a Hermitian matrix-valued functionthat depends on ω analytically. Then, the n roots of the characteristic polynomial of A ( ω ) can be arranged so that each root ˜ λ j ( ω ) is (i) analytic on every line in R d , and (ii) differentiable on R d . ptimization of Hermitian Eigenvalues Consider a univariate Hermitianmatrix-valued function A ( ω ) that depends on ω analytically. An analytic eigenvalue˜ λ j ( ω ) and the associated eigenvector v j ( ω ) as described in Theorem 3.1 satisfy A ( ω ) v j ( ω ) = ˜ λ j ( ω ) v j ( ω ) . Taking the derivatives of both sides, we obtain d A ( ω ) dω v j ( ω ) + A ( ω ) dv j ( ω ) dω = d ˜ λ j ( ω ) dω v j ( ω ) + ˜ λ j ( ω ) dv j ( ω ) dω . (3.2)Multiplying both sides by v j ( ω ) ∗ and using the identities v j ( ω ) ∗ A ( ω ) = v j ( ω ) ∗ ˜ λ j ( ω )as well as v j ( ω ) ∗ v j ( ω ) = (cid:107) v j ( ω ) (cid:107) = 1, we get d ˜ λ j ( ω ) dω = v j ( ω ) ∗ d A ( ω ) dω v j ( ω ) . (3.3) By differentiating both sides of(3.3), it is possible to deduce the formula (the details are omitted for brevity) d ˜ λ j ( ω ) dω = v j ( ω ) ∗ d A ( ω ) dω v j ( ω ) + 2 n (cid:88) k =1 ,k (cid:54) = j λ j ( ω ) − ˜ λ k ( ω ) (cid:12)(cid:12)(cid:12)(cid:12) v k ( ω ) ∗ d A ( ω ) dω v j ( ω ) (cid:12)(cid:12)(cid:12)(cid:12) (3.4) for the second derivatives assuming that the (algebraic) multiplicity of ˜ λ j ( ω ) is one.If, on the other hand, the eigenvalues repeat at a given ˆ ω , specifically when the(algebraic) multiplicity of ˜ λ j (ˆ ω ) is greater than one, the formula (3.4) generalizes as d ˜ λ j (ˆ ω ) dω = v j (ˆ ω ) ∗ d A (ˆ ω ) dω v j (ˆ ω )+2 n (cid:88) k =1 ,k (cid:54) = j,k/ ∈ α lim ˜ ω → ˆ ω (cid:32) λ j (˜ ω ) − ˜ λ k (˜ ω ) (cid:12)(cid:12)(cid:12)(cid:12) v k (˜ ω ) ∗ d A (˜ ω ) dω v j (˜ ω ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:33) . (3.5) Here, α denotes the set of indices of the analytic eigenvalues (specified in Theorem3.1) that are identical to ˜ λ j ( ω ) at all ω . Let A ( ω ) : R d → C n × n be Hermitian and analytic. It follows from (3.3)that ∂ ˜ λ j ( ω ) ∂ω k = v ∗ j ( ω ) ∂ A ( ω ) ∂ω k v j ( ω ) . (3.6)Since A ( ω ) and v j ( ω ) are analytic with respect to ω (cid:96) for (cid:96) = 1 , . . . , n , this im-plies the continuity, also the analyticity with respect to ω (cid:96) , of each partial derivative ∂ ˜ λ j ( ω ) /∂ω k , and hence the existence of ∂ ˜ λ j ( ω ) / ( ∂ω k ∂ω (cid:96) ), everywhere. If the mul-tiplicity of ˜ λ j ( ω ) is one, differentiating both sides of (3.6) with respect to ω (cid:96) wouldyield the following expressions for the second partial derivatives. ∂ ˜ λ j ( ω ) ∂ω k ∂ω (cid:96) = v ∗ j ( ω ) ∂ A ( ω ) ∂ω k ∂ω l v j ( ω ) +2 · (cid:60) n (cid:88) m =1 ,m (cid:54) = j λ j ( ω ) − ˜ λ m ( ω ) (cid:18) v j ( ω ) ∗ ∂ A ( ω ) ∂ω k v m ( ω ) (cid:19) (cid:18) v m ( ω ) ∗ ∂ A ( ω ) ∂ω (cid:96) v j ( ω ) (cid:19) . Expressions similar to (3.5) can be obtained for the second partial derivatives when˜ λ j ( ω ) has multiplicity greater than one. E. MENGI, E.A. YILDIRIM, AND M. KILIC¸
Some of the applications (see Section2.1) concern the optimization of the j th largest singular value of an analytic matrix-valued function. The singular value problems are special Hermitian eigenvalue prob-lems. In particular, denoting the j th largest singular value of an analytic matrix-valued function B ( ω ) : R d → C n × m (not necessarily Hermitian) by σ j ( ω ), the set ofeigenvalues of the Hermitian matrix-valued function A ( ω ) := (cid:20) B ( ω ) B ( ω ) ∗ (cid:21) , is { σ j ( ω ) , − σ j ( ω ) : j = 1 , . . . , n } . In the univariate case σ j ( ω ) is the j th largest of the2 n analytic eigenvalues, ˜ λ ( ω ) , . . . , ˜ λ n ( ω ), of A ( ω ). The multivariate d -dimensionalcase is similar, with the exception that each eigenvalue ˜ λ j ( ω ) is differentiable andanalytic along every line in R d . Let us focus on the univariate case throughout therest of this section. Extensions to the multi-variate case are similar to the previoussections. Suppose v j ( ω ) := (cid:20) u j ( ω ) w j ( ω ) (cid:21) , with u j ( ω ) ∈ C n , w j ( ω ) ∈ C m , is the analyticeigenvector function as specified in Theorem 3.1 of A ( ω ) associated with ˜ λ j ( ω ), thatis (cid:20) B ( ω ) B ( ω ) ∗ (cid:21) (cid:20) u j ( ω ) w j ( ω ) (cid:21) = ˜ λ j ( ω ) (cid:20) u j ( ω ) w j ( ω ) (cid:21) . The above equation implies B ( ω ) w j ( ω ) = ˜ λ j ( ω ) u j ( ω ) and B ( ω ) ∗ u j ( ω ) = ˜ λ j ( ω ) w j ( ω ) . (3.7)In other words, u j ( ω ), w j ( ω ) are analytic, and consist of a pair of consistent left andright singular vectors associated with ˜ λ j ( ω ). To summarize, in the univariate case,˜ λ j ( ω ) can be considered as a signed analytic singular value of B ( ω ), and there is aconsistent pair of analytic left and right singular vector functions, u j ( ω ) and w j ( ω ),respectively.Next, in the univariate case, we derive expressions for the first derivative of ˜ λ j ( ω ),in terms of the corresponding left and right singular vectors. It follows from thesingular value equations (3.7) above that (cid:107) u j ( ω ) (cid:107) = (cid:107) w j ( ω ) (cid:107) = 1 / √ λ j ( ω ) = 0,this equality follows from analyticity). Now, the application of the expression (3.3)yields d ˜ λ j ( ω ) dω = (cid:2) u j ( ω ) ∗ w j ( ω ) ∗ (cid:3) (cid:20) d B ( ω ) /dωd B ( ω ) ∗ /dω (cid:21) (cid:20) u j ( ω ) w j ( ω ) (cid:21) , = u j ( ω ) ∗ d B ( ω ) dω w j ( ω ) + w j ( ω ) ∗ d B ( ω ) ∗ dω u j ( ω ) , = 2 · (cid:60) (cid:18) u j ( ω ) ∗ d B ( ω ) dω w j ( ω ) (cid:19) . In terms of the unit left and right singular vectors ˆ u j ( ω ) := √ · u j ( ω ) and ˆ w j ( ω ) := √ · w j ( ω ), respectively, associated with ˜ λ j ( ω ), we obtain d ˜ λ j ( ω ) dω = (cid:60) (cid:18) ˆ u j ( ω ) ∗ d B ( ω ) dω ˆ w j ( ω ) (cid:19) . (3.8) ptimization of Hermitian Eigenvalues Notation:
Throughout the rest of the text, we denote the eigenvalues of A ( ω )that are analytic in the univariate case (stated in Theorem 3.1), and differentiableand analytic along every line in the multivariate case (stated in Theorem 3.2) with˜ λ ( ω ) , . . . , ˜ λ n ( ω ). On the other hand, λ j ( ω ) or λ j ( A ( ω )) denotes the j th largesteigenvalue, and σ j ( ω ) or σ j ( A ( ω )) denotes the j th largest singular value of A ( ω ).
4. Piece-wise Quadratic Support Functions.
Let ˜ λ , . . . , ˜ λ n : R d → R beeigenvalue functions of a Hermitian matrix-valued function A ( ω ) that are analyticalong every line in R d and differentiable on R d , and let B ⊂ R d be the box defined by B := B (cid:16) ω ( l )1 , ω ( u )1 , . . . , ω ( l ) d , ω ( u ) d (cid:17) := (cid:110) ω ∈ R d | ω j ∈ (cid:104) ω ( l ) j , ω ( u ) j (cid:105) for j = 1 , . . . , d (cid:111) . (4.1)Consider the closed and connected subsets P , . . . , P q of B , with q as small as possible,such that ∪ qk =1 P k = B , and P k ∩ P (cid:96) = ∂ P k ∩ ∂ P (cid:96) for each k and (cid:96) , and such thatin the interior of P k none of the eigenvalue functions ˜ λ , . . . , ˜ λ n intersect each other.Define λ : B → R as follows: λ ( ω ) := f (cid:16) ˜ λ s k ( ω ) , . . . , ˜ λ s kj ( ω ) (cid:17) for all ω ∈ int ( P k ) (4.2)where f is analytic, and s k = [ s k . . . s kj ] T ∈ Z j + is a vector of indices such that˜ λ s ki ( ω ) = ˜ λ s (cid:96)i ( ω ) for i = 1 , . . . , j, and for all ω ∈ ∂ P k ∩ ∂ P (cid:96) in order to ensure the continuity of λ ( ω ) on B . The extremaleigenvalue function λ ( ω ) = (cid:80) jk =1 d k λ k ( ω ) fits into the framework.We derive a piece-wise quadratic support function q k ( ω ) about a given point ω k ∈B bounding λ ( ω ) from below for all ω ∈ B , and such that q k ( ω k ) = λ ( ω k ). Let us focuson the direction p := ( ω − ω k ) / (cid:107) ω − ω k (cid:107) , the univariate function φ ( α ) := λ ( ω k + αp ),and the analytic univariate functions ˜ φ j ( α ) := ˜ λ j ( ω k + αp ) for j = 1 , . . . , n . Also, letus denote the isolated points in the interval [0 , (cid:107) ω − ω k (cid:107) ], where two distinct functionsamong ˜ φ ( α ) , . . . , ˜ φ n ( α ) intersect each other by α (1) , . . . , α ( m ) . At these points, φ ( α )may not be differentiable. We have λ ( ω ) = λ ( ω k ) + m (cid:88) (cid:96) =0 (cid:90) α ( (cid:96) +1) α ( (cid:96) ) φ (cid:48) ( t ) dt, (4.3)where α (0) := 0 and α ( m +1) := (cid:107) ω − ω k (cid:107) . Due to the existence of the second partialderivatives of ˜ λ j ( ω ) (since the expression (3.6) implies the analyticity of the firstpartial derivatives with respect to each parameter disjointly), there exists a constant γ that satisfies λ min (cid:16) ∇ ˜ λ j ( ω ) (cid:17) ≥ γ, for all ω ∈ B , j = 1 , . . . , n. (4.4)Furthermore, ˜ φ (cid:48)(cid:48) j ( α ) = p T ∇ ˜ λ j ( ω k + αp ) p ≥ λ min (cid:16) ∇ ˜ λ j ( ω k + αp ) (cid:17) ≥ γ for all α ∈ [0 , (cid:107) ω − ω k (cid:107) ]. Thus, applying the mean value theorem to the analytic functions˜ φ (cid:48) j ( α ) for j = 1 , . . . , n and since φ (cid:48) ( t ) ≥ min j =1 ,...,n ˜ φ (cid:48) j ( t ), we obtain φ (cid:48) ( t ) ≥ min j =1 ,...,n ˜ φ (cid:48) j (0) + γt. By substituting the last inequality in (4.3), integrating the right-hand side of (4.3),and using ˜ φ (cid:48) j (0) = ∇ ˜ λ j ( ω k ) T p (since ˜ λ j is differentiable), we arrive at the following:0 E. MENGI, E.A. YILDIRIM, AND M. KILIC¸
Theorem 4.1.
Suppose A ( ω ) : R d → C n × n is an analytic and Hermitian matrix-valued function, the eigenvalue function λ ( ω ) is defined as in (4.2) in terms of theeigenvalues ˜ λ ( ω ) , . . . , ˜ λ n ( ω ) of A ( ω ) that are differentiable and analytic on every linein R d , and γ is a lower bound as in (4.4). Then the following inequality holds for all ω ∈ B : λ ( ω ) ≥ (cid:104) q k ( ω ) := λ ( ω k ) + (cid:16) min j =1 ,...,n ∇ ˜ λ j ( ω k ) T ( ω − ω k ) (cid:17) + γ (cid:107) ω − ω k (cid:107) (cid:105) . (4.5)
5. Simplified Piece-wise Quadratic Support Functions.5.1. Support Functions under Generic Simplicity.
In various instances, theeigenvalue functions ˜ λ , . . . , ˜ λ n do not intersect each other at any ω ∈ B generically.In such cases, for some j , we have λ ( ω ) = ˜ λ j ( ω ) for all ω ∈ B , therefore λ ( ω ) isanalytic in the univariate case and analytic along every line in the multivariate case.For instance, the singular values of the matrix function A ( ω ) := C ( ωiI − A ) − B + D involved in the definition of the H ∞ -norm do not coalesce at any ω ∈ R on a densesubset of the set of quadruples ( A, B, C, D ). Similar remarks apply to all of the specificeigenvalue optimization problems in Section 2.1.Under the generic simplicity assumption, the piece-wise quadratic support func-tion (4.5) simplifies to q k ( ω ) = λ ( ω k ) + ∇ λ ( ω k ) T ( ω − ω k ) + γ (cid:107) ω − ω k (cid:107) . (5.1)Here, γ is a lower bound on λ min (cid:0) ∇ λ ( ω ) (cid:1) for all ω ∈ B . In many cases, it may bepossible to obtain a rough lower bound γ numerically by means of the expressionsfor the second derivatives in Sections 3.2.2 and 3.2.3, and exploiting the Lipschitzcontinuity of the eigenvalue λ ( ω ) and other eigenvalues. Consider the extremaleigenvalue function λ ( ω ) = j (cid:88) k =1 d k λ k ( ω ) (5.2)for given real numbers d ≥ d ≥ · · · ≥ d j ≥
0. A special case when d , d , . . . d j areintegers is discussed in Section 2.3. When d = 1 , d = · · · = d j = 0, this reduces tothe maximal eigenvalue function λ ( ω ) = λ ( ω ) in Section 2.2.For simplicity, let us suppose that λ ( ω ) is differentiable at ω k , about which wederive a support function below. This is generically the case. In the unlikely caseof two eigenvalues coalescing at ω k , the non-differentiability is isolated at this point.Therefore, λ ( ω ) is differentiable at all nearby points. For a fixed ω ∈ B , as in theprevious section, define φ : R → R , φ ( α ) := λ ( ω k + αp ) for p = ( ω − ω k ) / (cid:107) ω − ω k (cid:107) .Denote the points α ∈ (0 , (cid:107) ω − ω k (cid:107) ] where either one of λ ( ω k + αp ) , . . . , λ j ( ω k + αp )is not simple by α (1) , . . . , α ( m ) in increasing order. These are the points where φ ( α )is possibly not analytic. At any α ∈ (0 , (cid:107) ω − ω k (cid:107) ), by φ + ,α we refer to the analyticfunction satisfying φ + ,α (˜ α ) = φ (˜ α ) for all ˜ α > α sufficiently close to α . Similarly, φ − ,α refers to the analytic function satisfying φ − ,α (˜ α ) = φ (˜ α ) for all ˜ α < α sufficientlyclose to α . Furthermore, φ (cid:48) + ( α ) and φ (cid:48)− ( α ) represent the right-hand and left-handderivatives of φ ( α ), respectively. ptimization of Hermitian Eigenvalues Lemma 5.1.
The following relation holds for all α ∈ (0 , (cid:107) ω − ω k (cid:107) ) : (i) φ + ,α (˜ α ) ≥ φ − ,α (˜ α ) for all ˜ α > α sufficiently close to α ; (ii) consequently φ (cid:48) + ( α ) ≥ φ (cid:48)− ( α ) .Proof . The functions φ − ,α (˜ α ) and φ + ,α (˜ α ) are of the form φ − ,α (˜ α ) = j (cid:88) k =1 d k ˜ λ n k ( ω k + ˜ αp ) and φ + ,α (˜ α ) = j (cid:88) k =1 d k λ k ( ω k + ˜ αp ) (5.3)for some indices n , . . . , n j and for all ˜ α > α sufficiently close to α . In (5.3), thelatter equality follows from φ + ,α (˜ α ) = φ (˜ α ) = λ ( ω k + ˜ αp ) for all ˜ α > α sufficientlyclose to α by definition. The former equality is due to φ − ,α (˜ α ) = φ (˜ α ) = λ ( ω k + ˜ αp )for all ˜ α < α sufficiently close to α implying φ − ,α (˜ α ) is a weighted sum of j of theanalytic eigenvalues ˜ λ ( ω k + ˜ αp ) , . . . , ˜ λ n ( ω k + ˜ αp ) with weights d , . . . , d j . We rephrasethe inequality φ + ,α (˜ α ) ≥ φ − ,α (˜ α ) as φ + ,α (˜ α ) − φ − ,α (˜ α ) = (cid:80) jk =1 d k · a k ≥ , where a k = λ k ( ω k + ˜ αp ) − ˜ λ n k ( ω k + ˜ αp ), where k = 1 , . . . , j .Note that (cid:80) qk =1 a k ≥ q = 1 , . . . , j since λ ( ω k + ˜ αp ) , . . . , λ q ( ω k + ˜ αp )are the largest q eigenvalues. In particular, their sum cannot be less than the sum of˜ λ n ( ω k + ˜ αp ) , . . . , ˜ λ n q ( ω k + ˜ αp ). Therefore, j (cid:88) k =1 d k · a k = d j j (cid:88) k =1 a k + ( d j − − d j ) j − (cid:88) k =1 a k + ( d j − − d j − ) j − (cid:88) k =1 a k + . . . +( d − d ) a , ≥ , since d ≥ d ≥ . . . ≥ d j ≥ (ii) is immediate from part (i) due to φ (cid:48) + ( α ) = φ (cid:48) + ,α ( α ) ≥ φ (cid:48)− ,α ( α ) = φ (cid:48)− ( α ),where the inequality follows from an application of the Taylor’s theorem to φ + ,α (˜ α )and φ − ,α (˜ α ) for ˜ α > α and around α . Theorem 5.2.
Let ω k ∈ R d be such that all of the eigenvalues λ ( ω k ) , . . . , λ j ( ω k ) are simple, and let γ satisfy λ min (cid:0) ∇ λ ( ω ) (cid:1) ≥ γ for all ω ∈ B such that λ ( ω ) is simple.Then, the following inequality holds for the extreme eigenvalue function λ ( ω ) given by(5.2) and for all ω ∈ B : λ ( ω ) ≥ q k ( ω ) := λ ( ω k ) + ∇ λ ( ω k ) T ( ω − ω k ) + γ (cid:107) ω − ω k (cid:107) . Proof . It follows from the Taylor’s theorem that, for each k = 0 , . . . , m and α that belongs to the closure of ( α ( k ) , α ( k +1) ) (here we define α (0) = 0 and α ( m +1) = (cid:107) ω − ω k (cid:107) ), we have φ ( α ) = φ ( α ( k ) ) + φ (cid:48) + ( α ( k ) )( α − α ( k ) ) + φ (cid:48)(cid:48) ( η )2 ( α − α ( k ) ) , where η ∈ ( α ( k ) , α ). By observing φ (cid:48)(cid:48) ( η ) = p T ∇ λ ( ω k + ηp ) p ≥ λ min (cid:2) ∇ ( λ ( ω k + ηp )) (cid:3) ≥ γ, we deduce φ ( α ) ≥ φ ( α ( k ) ) + φ (cid:48) + ( α ( k ) )( α − α ( k ) ) + γ α − α ( k ) ) , (5.4)and by differentiating we also deduce φ (cid:48)− ( α ) ≥ φ (cid:48) + ( α ( k ) ) + γ ( α − α ( k ) ) . (5.5)2 E. MENGI, E.A. YILDIRIM, AND M. KILIC¸
Next, we prove the inequalities φ ( α ( (cid:96) ) ) ≥ φ ( α (0) ) + φ (cid:48) + ( α (0) )( α ( (cid:96) ) − α (0) ) + γ α ( (cid:96) ) − α (0) ) , (5.6) φ (cid:48)− ( α ( (cid:96) ) ) ≥ φ (cid:48) + ( α (0) ) + γ ( α ( (cid:96) ) − α (0) ) , (5.7)for each (cid:96) = 1 , , . . . , m + 1 by induction. The inequalities (5.6) and (5.7) for (cid:96) = 1hold by applications of equations (5.4) and (5.5) with α = α (1) , α ( k ) = α (0) . Let ussuppose that the inequalities indeed hold for (cid:96) = 2 , . . . , k as the inductive hypothesis.Now, by another application of (5.5) and since φ (cid:48) + ( α ( k ) ) ≥ φ (cid:48)− ( α ( k ) ) (see Lemma 5.1),we obtain φ (cid:48)− ( α ( k +1) ) ≥ φ (cid:48) + ( α ( k ) ) + γ ( α ( k +1) − α ( k ) ) , ≥ φ (cid:48)− ( α ( k ) ) + γ ( α ( k +1) − α ( k ) ) , ≥ (cid:104) φ (cid:48) + ( α (0) ) + γ ( α ( k ) − α (0) ) (cid:105) + γ ( α ( k +1) − α ( k ) ) , = φ (cid:48) + ( α (0) ) + γ ( α ( k +1) − α (0) ) , where we use the inductive hypothesis with (cid:96) = k in the third inequality. Furthermore,by (5.4), the inequality φ (cid:48) + ( α ( k ) ) ≥ φ (cid:48)− ( α ( k ) ), and exploiting the inductive hypothesiswith (cid:96) = k , we end up with φ ( α ( k +1) ) ≥ φ ( α ( k ) ) + φ (cid:48) + ( α ( k ) )( α ( k +1) − α ( k ) ) + γ α ( k +1) − α ( k ) ) , ≥ φ ( α ( k ) ) + φ (cid:48)− ( α ( k ) )( α ( k +1) − α ( k ) ) + γ α ( k +1) − α ( k ) ) , = (cid:104) φ ( α (0) ) + φ (cid:48) + ( α (0) )( α ( k ) − α (0) ) + γ α ( k ) − α (0) ) (cid:105) + (cid:104) φ (cid:48) + ( α (0) ) + γ ( α ( k ) − α (0) ) (cid:105) ( α ( k +1) − α ( k ) ) + γ α ( k +1) − α ( k ) ) , = φ ( α (0) ) + φ (cid:48) + ( α (0) )( α ( k +1) − α (0) ) + γ α ( k +1) − α (0) ) , proving the validity of (5.6) and (5.7) for each (cid:96) = 1 , . . . , m + 1.The inequality (5.6) with (cid:96) = m + 1 yields φ ( α ( m +1) ) ≥ φ ( α (0) ) + φ (cid:48) + ( α (0) )( α ( m +1) − α (0) ) + γ α ( m +1) − α (0) ) , from which the result follows by noting φ ( α ( m +1) ) = λ ( ω ), φ ( α (0) ) = λ ( ω k ), φ (cid:48) + ( α (0) ) = ∇ λ ( ω k ) T p , and ( α ( m +1) − α (0) ) = (cid:107) ω − ω k (cid:107) .A lower bound γ as stated in Theorem 5.2 can be deduced analytically in variouscases. This is discussed next.
6. Lower Bound γ for the Extremal Eigenvalue Functions. The quadraticsupport functions discussed so far rely on the lower bound γ . Such bounds can bederived analytically for the extremal eigenvalue function (5.2) with d ≥ d ≥ · · · ≥ d j ≥ Theorem 6.1.
Suppose A ( ω ) : R d → C n × n is analytic and Hermitian at all ω ∈ R d . Then, λ ( ω ) defined as in (5.2) satisfies λ min (cid:0) ∇ λ ( ω ) (cid:1) ≥ (cid:32) j (cid:88) k =1 d k (cid:33) · λ min (cid:0) ∇ A ( ω ) (cid:1) ptimization of Hermitian Eigenvalues for each ω such that λ ( ω ) , . . . , λ j ( ω ) are simple, where ∇ A ( ω ) ∈ C nd × nd is givenby ∇ A ( ω ) = ∂ A ( ω ) ∂ω ∂ A ( ω ) ∂ω ∂ω . . . ∂ A ( ω ) ∂ω ∂ω d ∂ A ( ω ) ∂ω ∂ω ∂ A ( ω ) ∂ω . . . ∂ A ( ω ) ∂ω ∂ω d . . . ∂ A ( ω ) ∂ω d ∂ω ∂ A ( ω ) ∂ω d ω . . . ∂ A ( ω ) ∂ω d . Proof . First observe that, by the formulas in Section 3.2.3, we obtain ∂ λ ( ω ) ∂ω (cid:96) ∂ω i = j (cid:88) k =1 d k · (cid:18) v k ( ω ) ∗ ∂ A ( ω ) ∂ω (cid:96) ∂ω i v k ( ω ) (cid:19) +2 · (cid:60) (cid:34) j (cid:88) k =1 (cid:32) n (cid:88) m = k +1 d k − d m λ k ( ω ) − λ m ( ω ) (cid:18) v k ( ω ) ∗ ∂ A ( ω ) ∂ω (cid:96) v m ( ω ) (cid:19) (cid:18) v m ( ω ) ∗ ∂ A ( ω ) ∂ω i v k ( ω ) (cid:19)(cid:33)(cid:35) , where we define d j +1 = · · · = d n = 0. As for the Hessian, this yields ∇ λ ( ω ) = j (cid:88) k =1 d k H ( k ) ( ω ) + 2 · j (cid:88) k =1 n (cid:88) m = k +1 d k − d m λ k ( ω ) − λ m ( ω ) (cid:60) (cid:16) H ( k,m ) ( ω ) (cid:17) , (6.1)where H ( k ) ( ω ) , H ( k,m ) ( ω ) ∈ R d × d are such that the ( (cid:96), i ) entries of H ( k ) ( ω ) and H ( k,m ) ( ω ) are given by v k ( ω ) ∗ ∂ A ( ω ) ∂ω (cid:96) ∂ω i v k ( ω ) and (cid:18) v k ( ω ) ∗ ∂ A ( ω ) ∂ω (cid:96) v m ( ω ) (cid:19) (cid:18) v m ( ω ) ∗ ∂ A ( ω ) ∂ω i v k ( ω ) (cid:19) , respectively. Furthermore, it can easily be verified that H ( k,m ) ( ω ) is positive semi-definite for each k and m . Indeed, denoting h ( k,m ) i := v m ( ω ) ∗ ( ∂ A ( ω ) /∂ω i ) v k ( ω ) , foreach u ∈ C d , observe that u T H ( k,m ) ( ω ) u = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) d (cid:88) i =1 u i h ( k,m ) i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ . This in turn implies that (cid:60) (cid:0) H ( k,m ) ( ω ) (cid:1) is positive semi-definite, i.e., since H ( k,m ) ( ω ) = (cid:60) (cid:0) H ( k,m ) ( ω ) (cid:1) + i (cid:61) (cid:0) H ( k,m ) ( ω ) (cid:1) , for all u ∈ R d we have u T (cid:60) (cid:0) H ( k,m ) ( ω ) (cid:1) u = u T H ( k,m ) ( ω ) u ≥
0. Consequently, it follows from (6.1) that λ min (cid:0) ∇ λ ( ω ) (cid:1) ≥ λ min (cid:32) j (cid:88) k =1 d k H ( k ) ( ω ) (cid:33) ≥ j (cid:88) k =1 d k λ min (cid:16) H ( k ) ( ω ) (cid:17) , ≥ j (cid:88) k =1 d k λ min (cid:0) ∇ A ( ω ) (cid:1) . To see the last inequality, note that H ( k ) ( ω ) = [ I d ⊗ v k ( ω ) ∗ ] · ∇ A ( ω ) · [ I d ⊗ v k ( ω )],where ⊗ denotes the Kronecker product, and therefore for some v ( ω ) ∈ C d of unitlength λ min (cid:16) H ( k ) ( ω ) (cid:17) = v ( ω ) ∗ · H ( k ) ( ω ) · v ( ω ) = [ v ( ω ) ∗ ⊗ v k ( ω ) ∗ ] · ∇ A ( ω ) · [ v ( ω ) ⊗ v k ( ω )] ≥ λ min (cid:0) ∇ A ( ω ) (cid:1) . E. MENGI, E.A. YILDIRIM, AND M. KILIC¸
There are various instances when the theorem above reveals an immediate lowerbound γ on λ min (cid:0) ∇ λ ( ω ) (cid:1) over all ω ∈ B . A particular instance is given by thecorollary below when A ( ω ) is a quadratic function of ω . Corollary 6.2.
Suppose that A ( ω ) : R d → C n × n is of the form A ( ω ) = A + d (cid:88) (cid:96) =1 ω (cid:96) A (cid:96) + 12 d (cid:88) (cid:96) =1 d (cid:88) i =1 ω (cid:96) ω i A (cid:96)i where A (cid:96) and A (cid:96)i are Hermitian and A (cid:96)i = A i(cid:96) . Then λ ( ω ) defined as in (5.2) satisfies λ min (cid:0) ∇ λ ( ω ) (cid:1) ≥ (cid:32) j (cid:88) k =1 d k (cid:33) · λ min A A . . . A d A A . . . A d . . . A d A d . . . A dd for each ω such that λ ( ω ) , . . . , λ j ( ω ) are simple.
7. The Algorithm.
We consider the minimization of an eigenvalue function λ : B → R formally defined in Section 4 over a box B ⊂ R d . We will utilize thequadratic function given by (5.1) under the assumption that this is indeed a supportfunction. This is certainly the case for λ ( ω ) = (cid:80) jk =1 d k λ k ( ω ) with weights d k suchthat d ≥ d ≥ · · · ≥ d j ≥ ω ∈ B generically, as discussed in Section 5.1. Thus the algorithm isapplicable to all of the problems in Section 2.The algorithm starts by constructing a support function q about an arbitrarypoint ω ∈ B . The next step is to find the global minimizer ω of q in B , and toconstruct the quadratic support function q about ω . In general, suppose there are s + 1 support functions q , q , . . . , q s about the points ω , ω , . . . , ω s . A new quadraticsupport function q s +1 is constructed about ω s +1 , which is a global minimizer of q s ( ω ) = max k =0 ,...,s q k ( ω ) . (7.1)The details of the algorithm are formally presented in Algorithm 1. In the algo-rithm, the computationally challenging task is the determination of a global minimizerof q s defined by (7.1). For this purpose, we partition the box B into regions R , . . . , R s such that the quadratic function q k takes the largest value inside the region R k (seeFigure 7.1). Therefore, the minimization of q s over the region R k is equivalent to theminimization of q k over the same region. This problem can be posed as the followingquadratic programming problem:minimize ω ∈ R d q k ( ω )subject to q k ( ω ) ≥ q (cid:96) ( ω ) , (cid:96) (cid:54) = k,ω j ∈ [ ω ( l ) j , ω ( u ) j ] , j = 1 , . . . , d (7.2)Here, we remark that the seemingly quadratic inequalities q k ( ω ) ≥ q (cid:96) ( ω ) are in factlinear since the quadratic terms cancel out. The feasibility of the algorithm largely ptimization of Hermitian Eigenvalues Algorithm 1
Support-based Eigenvalue Optimization
Require:
A box
B ⊂ R d and an eigenvalue function λ : B → R satisfying eitherthe generic simplicity assumption or of the form λ ( ω ) = (cid:80) jk =1 d k λ k ( ω ) with d ≥ · · · ≥ d j ≥ Pick an arbitrary ω ∈ B . u ← λ ( ω ); ωbest ← ω . q ( ω ) := q ( ω ) := λ ( ω ) + ∇ λ ( ω ) T ( ω − ω ) + ( γ/ (cid:107) ω − ω (cid:107) . ω ← arg min ω ∈B q ( ω ); l ← q ( ω ). if λ ( ω ) < u then u ← λ ( ω ); ωbest ← ω . end if s ← While u s − l s > (cid:15) do loop q s ( ω ) := λ ( ω s ) + ∇ λ ( ω s ) T ( ω − ω s ) + ( γ/ (cid:107) ω − ω s (cid:107) . q s ( ω ) := max k =0 ,...,s { q k ( ω ) } . ω s +1 ← arg min ω ∈B q s ( ω ); l s +1 ← q s ( ω s +1 ). if λ ( ω s +1 ) < u s then u s +1 ← λ ( ω s +1 ); ωbest ← ω s +1 . else u s +1 ← u s . end if s ← s + 1. end loop Output: l s , u s , ωbest .relies on the efficiency with which we can solve subproblems (7.2). These subproblemsare non-convex whenever γ <
0. However, since they involve the optimization of aconcave function over a polytope, the solution for each subproblem must be attainedat one of the vertices of the polytope.Breiman and Cutler introduced the notion of quadratic support functions of theform (5.1) for global optimization [4] . They described how the subproblems of theform (7.2) can be solved efficiently. At each iteration, when a new quadratic supportfunction is added, a new polytope is introduced associated with it, and some points- call these dead vertices - that used to be vertices of some polytope are no longervertices (i.e., they now lie strictly inside the new polytope). A vertex v is dead after theintroduction of q s +1 ( ω ) if and only if q s +1 ( ω ) > q s ( ω ). Breiman and Cutler observedthat these dead vertices form a connected graph. Therefore, they can be identifiedefficiently. Furthermore, a new vertex v (on the boundary of the new polytope)appears between a dead vertex v d and each vertex v a - each existing vertex that isstill a vertex after the addition of q s +1 ( ω ) and adjacent to v d - given by the formula v = (cid:20) q s +1 ( v a ) − q s ( v a )( q s +1 ( v a ) − q s ( v a )) − ( q s +1 ( v d ) − q s ( v d )) (cid:21) v d + (cid:20) − q s +1 ( v a ) − q s ( v a )( q s +1 ( v a ) − q s ( v a )) − ( q s +1 ( v d ) − q s ( v d )) (cid:21) v a . We are grateful to an anonymous referee who pointed us to this reference. E. MENGI, E.A. YILDIRIM, AND M. KILIC¸ ! ω ( u )1 ,ω ( l )2 " R R R R R ! ω ( l )1 ,ω ( u )2 " ! ω ( u )1 ,ω ( u )2 "! ω ( l )1 ,ω ( l )2 " Fig. 7.1 . To minimize the piece-wise quadratic function q s over the box B , the box is splitinto regions R k , k = 0 , . . . , s such that q k is the largest inside the region R k . Above, a possiblepartitioning with s = 4 is illustrated in the 2-dimensional case. These new vertices, possibly with the addition of some of the vertices where only boxconstraints are active, form the set of vertices for the new polytope. The edges for thenew polytope can be determined by the common active constraints of its vertices; twovertices are adjacent if and only if there are d − q s after the formation of q s and anadjacency list for each vertex) lead to an efficient algorithm when the dimension d issmall. We refer to [4] for further details.Algorithm 1 could be based on the more general piece-wise quadratic supportfunctions (4.5) by adjusting lines 3 and 11 accordingly. This would make the algorithmapplicable for the optimization of more general eigenvalue functions, i.e., those thatdo not involve the sum of the largest eigenvalues and violating generic simplicityeverywhere. Subproblems analogous to (7.2) could be devised, however, their solutionsappear prohibitively expensive.
8. Convergence Analysis.
In this section, we analyze the convergence of Al-gorithm 1 - in the general setting when the support functions (4.5) are used on lines3 and 11 - for the following optimization problem:(P) λ ∗ := min ω ∈B λ ( ω ) . Recall that the algorithm starts off by picking an arbitrary point ω ∈ B . At iteration s , the algorithm picks ω s +1 to be a global minimizer of q s ( x ) over B , where q s ( ω ) isthe maximum of the functions q k ( ω ) constructed at the points ω k , k = 0 , . . . , s in B .Note that { l s } is a non-decreasing sequence of lower bounds on λ ∗ , while { u s } is anon-increasing sequence of upper bounds on λ ∗ .We require that λ : B → R be a continuous and piece-wise function defined interms of the differentiable functions ˜ λ j : R d → R , j = 1 , . . . , n as described in Section4. The differentiability of each ˜ λ j on R d implies the boundedness of (cid:107)∇ ˜ λ j ( x ) (cid:107) on B . ptimization of Hermitian Eigenvalues µ := max j =1 ,...,n max ω ∈B (cid:107)∇ ˜ λ j ( ω ) (cid:107) . We furthermore require each piece ˜ λ j to be analytic along every line in R d , and exploitthe existence of a scalar γ that satisfies (4.4). Our convergence analysis depends onthe scalars µ and γ . We now establish the convergence of Algorithm 1 to a globalminimizer of (P). Theorem 8.1.
Let { ω s } be the sequence of iterates generated by Algorithm 1, inthe general case when the support functions (4.5) are used on lines 3 and 11. (i) Everylimit point of this sequence is a global minimizer of the problem (P). (ii)
Furthermore lim s →∞ u s = lim s →∞ l s = λ ∗ .Proof . Since B is a bounded subset of R d , it follows that the sequence { ω s } has at least one limit point ω ∗ ∈ B . By passing to a subsequence if necessary, wemay assume that { ω s } itself is a convergent sequence. Let l ∗ denote the limit of thebounded nondecreasing sequence { l s } . Since l s ≤ λ ∗ ≤ λ ( ω s ) for each s ≥
0, it sufficesto show that l ∗ = λ ( ω ∗ ).Suppose, for a contradiction, that there exists a real number δ > λ ( ω ∗ ) ≥ l ∗ + δ. (8.1)By the continuity of λ , there exists s ∈ N such that λ ( ω s ) ≥ l ∗ + δ , for all s ≥ s . (8.2)Since ω ∗ is the limit of the sequence { ω s } , there exists s ∈ N such that (cid:107) ω s (cid:48) − ω s (cid:48)(cid:48) (cid:107) < min (cid:40)(cid:115) δ | γ | , δ µ (cid:41) , for all s (cid:48) ≥ s (cid:48)(cid:48) ≥ s , (8.3)where we define 1 /µ := + ∞ if µ = 0, and 1 / | γ | := + ∞ if γ = 0. Let s ∗ = max { s , s } .For each s ≥ s ∗ , it follows from the definition of the functions q s ( ω ) that q s ( ω s +1 ) ≥ q s ∗ ( ω s +1 ) , ≥ q s ∗ ( ω s +1 ) , = λ ( ω s ∗ ) + ∇ ˜ λ j ∗ ( ω s ∗ ) T ( ω s +1 − ω s ∗ ) + γ (cid:107) ω s +1 − ω s ∗ (cid:107) . where j ∗ ∈ { , . . . , n } is the index of the function that determines the value of q s ∗ ( w s +1 ) (see (4.5)). Now, by applying the Cauchy-Schwarz and triangle inequalities,and then using the inequalities (8.2) and (8.3), we arrive at q s ( ω s +1 ) ≥ λ ( ω s ∗ ) − (cid:107)∇ ˜ λ j ∗ ( x ∗ ) (cid:107)(cid:107) ω s +1 − ω s ∗ (cid:107) − | γ | (cid:107) ω s +1 − ω s ∗ (cid:107) , ≥ (cid:18) l ∗ + δ (cid:19) − (cid:18) µ · δ µ (cid:19) − (cid:18) | γ | · δ | γ | (cid:19) , = l ∗ + δ . Using the definition l s +1 = q s ( ω s +1 ), it follows that l s +1 ≥ l ∗ + δ , for all s ≥ s ∗ . E. MENGI, E.A. YILDIRIM, AND M. KILIC¸
Since δ >
0, this contradicts our assumption that l ∗ is the limit of the non-decreasingsequence { l s } . Therefore, we have λ ( ω ∗ ) < l ∗ + δ for all δ >
0, or equivalently λ ( ω ∗ ) ≤ l ∗ . Since l s ≤ λ ( ω ) for all s ∈ N and ω ∈ B , it follows that l ∗ ≤ λ ( ω ∗ ),which establishes that λ ( ω ∗ ) = l ∗ ≤ λ ( ω ) for all ω ∈ B . Therefore, ω ∗ is a globalminimizer of (P). Moreover, lim s →∞ l s = λ ∗ . The sequence { u s } must also convergeto λ ∗ , which can be deduced by observing λ ∗ ≤ u s ≤ λ ( ω s ) for each s ∈ N and takingthe limit as s → ∞ . The proof of assertion (i) is completed by repeating the sameargument for any other limit point of the sequence { ω s } .Assertion (ii) can be concluded by noting that the monotone, bounded sequences { l s } and { u s } must converge, and they have subsequences converging to λ ∗ .
9. Numerical Experiments.
We compare Algorithm 1 with the following al-gorithms: (1) a brute force approach; (2) the Piyavskii-Shubert algorithm [40, 45] (only one-dimensional case); (3)
DIRECT method [24]; (4) the specialized level-set based algorithms whenever possible.The brute force approach (1) splits the box B into sub-boxes of equal side-lengthsand the eigenvalue function is computed at the corners of the sub-boxes. Algorithms (2) and (3) are global optimization techniques based on Lipschitz continuity of thefunction. The latter method (3) benefits from several Lipschitz constant estimatessimultaneously, while the former one (2) utilizes a global Lipschitz constant. Algo-rithms that fall into the category (4) are level-set based approaches. They typicallyconverge fast, but each iteration is costly. Each of the algorithms in (4) is devisedfor a particular eigenvalue optimization problem. Example 1 (Numerical Radius):
This one-dimensional example concerns the cal-culation of the numerical radius (defined and motivated in Section 2.1) of an n × n matrix A . The matrix-valued function involved is A ( θ ) = − ( Ae iθ + A ∗ e − iθ ) / λ n ( θ ) := λ n ( A ( θ )) is sought to be minimized over all θ ∈ [0 , π ]. The numerical radiusof A corresponds to the negative of this globally minimal value of λ n ( θ ). Here, weassume that the generic analyticity holds, that is the eigenvalue λ n ( θ ) is simple forall θ . The derivative dλ n ( θ ) dθ = (cid:61) (cid:0) v ∗ n ( θ ) Ae iθ v n ( θ ) (cid:1) (9.1)can be deduced from (3.3). We numerically observe that typically λ (cid:48)(cid:48) n ( θ ) ≥ − (cid:107) A (cid:107) holds for all θ , and set γ = − (cid:107) A (cid:107) . We specifically focus on matrices A n = P n − ( n/ · iR n of various sizes, where P n is an n × n matrix obtained from a finite difference dis-cretization of the Poisson operator, and R n is a random n × n matrix with entriesselected from a normal distribution with zero mean and unit variance. This is acarefully chosen challenging example, as λ n ( θ ) has many local minima (see Figure9.1).In Table 9.1, the function evaluations of algorithms (1)-(3) are given along withthe function evaluations of Algorithm 1 for the matrix A n with n = 400 and withrespect to absolute accuracy. Algorithm 1 - called eigopt in the table - convergeslinearly; this is evident from about fixed number of function evaluations required forevery two-decimal-digit accuracy. All other algorithms, including DIRECT method, ptimization of Hermitian Eigenvalues λ n ( θ ) with respect to θ ∈ [0 , π ] in Figure 9.1 on the left. In the same figureon the right, the first five quadratic support functions formed are shown for the sameexample. The level-set based algorithm (4) for the numerical radius [32] requires thesolutions of eigenvalue problems twice the size of A ( θ ). It is not included in Table9.1, because these larger eigenvalue problems dominate the computation time ratherthan the calculation of λ n ( θ ). Instead we compare the CPU times (in seconds) ofAlgorithm 1 and this specialized algorithm on Poisson matrices A n of various sizes n in Table 9.2. Algorithm 1 is run to retrieve the results with at least 10-decimal-digitaccuracy. The table displays the superiority of the running times of Algorithm 1 ascompared to those of the level-set approach. (cid:15) − − − − − − eigopt 46 59 69 79 89 98brute force 881 8812 88125 881249 8815191 86070462Piyavskii-Shubert 1907 18817 – – – –DIRECT 25 51 61 105 245 597 Table 9.1
Number of function evaluations by various algorithms to compute the numerical radius of the × Poisson example (Example 1) with respect to absolute accuracy (cid:15) . Note that eigopt makesuse of the derivatives in addition to the eigenvalues evaluated. However, their derivatives comeessentially at no cost once the eigenvalues are evaluated. (cid:239) (cid:239) (cid:239) (cid:239) (cid:239) (cid:239) (cid:239) (cid:239) (cid:101) (cid:104) n ( (cid:101) ) (cid:239) (cid:239) (cid:239) (cid:239) (cid:239) (cid:239) (cid:239) (cid:101) (cid:104) n ( (cid:101) ) global minimizer quadratic support functionsq q q q q Fig. 9.1 . On the left, the plot of λ n ( A ( θ )) for the × Poisson example (Example 1) withrespect to θ ; On the right, the plot of λ n ( A ( θ )) together with the first five quadratic support functions q j , j = 0 , . . . , formed. In both figures, the asterisk represents the global minimizer, the squares onthe right mark the iterates of the algorithm around which the support functions are constructed. E. MENGI, E.A. YILDIRIM, AND M. KILIC¸ n
400 900 1600 2500 3600eigopt 14 103 328 1079 2788level-set 17 181 1477 – –
Table 9.2
CPU times (in seconds) required by Algorithm 1 (eigopt) and the specialized level-set approachin [32] to compute the numerical radius of Poisson matrices of varying size n . Example 2 (Distance to Uncontrollability):
Next, we consider the distance touncontrollability defined and motivated in Section 2.1. Here, for a given linear system x (cid:48) ( t ) = Ax ( t )+ Bu ( t ) , where A ∈ C n × n , B ∈ C n × m are such that n ≥ m , the smallestsingular value of A ( z ) = (cid:2) A − zI B (cid:3) is sought to be minimized over z ∈ C . Weagain assume generic simplicity, that is the multiplicity of σ n ( z ) := σ n ( A ( z )) is onefor all z . This property holds on a dense subset of all pairs ( A, B ). In this case,expressions for the gradient is given by (3.8). In particular, denoting a consistent pairof unit left and right singular vectors associated with σ n ( z ) by u n ( z ) ∈ C n , v n ( z ) = (cid:20) ˜ v n ( z )ˆ v n ( z ) (cid:21) ∈ C n + m where ˜ v n ( z ) ∈ C n , ˆ v n ( z ) ∈ C m , we have ∇ σ n ( z ) = (cid:18) ∂σ n ( z ) ∂ (cid:60) z , ∂σ n ( z ) ∂ (cid:61) z (cid:19) = ( −(cid:60) ( u ∗ n ( z )˜ v n ( z )) , (cid:61) ( u ∗ n ( z )˜ v n ( z )) ) . Furthermore, γ = − λ min (cid:0) ∇ σ n ( z ) (cid:1) numeri-cally. We perform tests on linear systems ( A, B ) arising from a discretization of theheat equation, taken from the SLICOT library, see [25, Example 3.2]. The matrix A is real, symmetric, tridiagonal, and n × n , whereas B is real and n × .
3, the number of function evaluations required by global optimizationalgorithms for Lipschitz continuous functions, and Algorithm 1 are presented whenthe order of the system satisfies n = 30. The Piyavskii-Shubert algorithm is omitted,because it would be based on locating global minimizers of piecewise cones, and it isnot immediate how one would locate these minimizers. Once again, the number offunction evaluations for Algorithm 1 seems to suggest linear convergence.The progress of the algorithm can be traced from the graph associated with it.Recall that the box is split into subregions (indeed polytopes). Inside each subregion,one of the quadratic support functions dominates the others. For the heat equationexample of order n = 30, these graphs are provided after 60, 360, and 580 iterations inFigure 9.2 in the top three plots. The bottom plot in Figure 9.2 is an illustration of thelevel sets of σ n ( z ) in the complex plane along with the computed global minimizer(accurate up to 13 decimal digits after 589 function evaluations) marked with anasterisk. The box is split into subregions more or less uniformly initially, e.g., after60 iterations. However, later iterations form finer subregions around minimizers of σ n ( z ).We also compare Algorithm 1 with the level-set approach in [20] on the heatequation examples of varying order. The level set approaches become prohibitivelyexpensive as the number of optimization parameters increases. Here, with the mini-mization over two parameters, it requires the solutions of eigenvalue problems of size n for a system of order n . For small systems, the level-set approach works very well,however, even for medium-scale systems it becomes computationally infeasible. Thisis illustrated in Table 9.4. On the other hand, Algorithm 1 is capable of solving evena problem of order 1000 in a reasonable amount of time. ptimization of Hermitian Eigenvalues (cid:15) − − − − − − eigopt 520 531 543 556 572 585brute force 4 . × . × . × . × . × . × DIRECT 37781 37867 37867 38233 38441 38805
Table 9.3
Number of function evaluations by various algorithms to compute the distance to uncontrollabil-ity for the linear system ( A, B ) arising from the heat equation [25, Example 3.2], where A ∈ R × , B ∈ R × with respect to absolute accuracy (cid:15) . − − − − − −
20 0 − − − − − − −
20 0 − − − − − − −
20 0 − (cid:239) (cid:239) (cid:239) (cid:239) (cid:239) (cid:239)
20 0 (cid:239)
Fig. 9.2 . The top three plots display the graph associated with Algorithm 1 after 60, 360,and 580 iterations to compute the distance to uncontrollability for the heat equation example [25,Example 3.2] of order n = 30 (see Example 2). The red squares correspond to dead vertices. Thebottom plot displays the level sets of σ n ( z ) for the same heat equation example together with thecomputed global minimizer marked with an asterisk. n
30 60 100 200 400 1000eigopt 45 46 56 85 119 384level-set 20 393 – – – –
Table 9.4
CPU times (in seconds) required by Algorithm 1 (eigopt) and the level-set approach in [20] tocompute the distance to uncontrollability for the heat equation examples (Example 2) with respectto the order of the system n . E. MENGI, E.A. YILDIRIM, AND M. KILIC¸ (cid:15) − − − − − − eigopt 17 (34) 34 (203) 50 (501) 67 (1021) 87 (2040) 108 (3725)DIRECT 459 45937 – – – – Table 9.5
Number of function evaluations by Algorithm 1 (eigopt) and DIRECT method for minimizingthe largest eigenvalue of the affine matrix function (Example 3) with respect to absolute accuracy (cid:15) .The CPU times for eigopt (in seconds) are also provided in parentheses.
Example 3 (Minimizing the Largest Eigenvalue, Convex):
This example istaken from [13], and concerns the minimization of the largest eigenvalue of an affinematrix function of the form A ( ω ) = A + (cid:88) j =1 ω j A j , depending on five parameters, where A j ∈ R × , j = 0 , . . . ,
5, are real, symmetricand as in [13], where the minimal value of λ ( ω ) := λ ( A ( ω )) is cited as 0 . A ( ω ) is affine, the largest eigenvalue λ ( ω ) is convex by Theorem 6.1 (seealso [14, Theorem A.1]), so we set γ = 0. Thus, this eigenvalue optimization problemcan be posed as a semi-definite program (SDP), and solved by means of interior-point methods [34]. The purpose here is to compare the performances of Algorithm1, and DIRECT method on a multi-dimensional example for which the solution isknown (even though our algorithm is really devised for non-convex problems, and inall likelihood an interior-point method would outperform it in the convex case).This comparison is provided in Table 9.5, where we again observe linear conver-gence of Algorithm 1. DIRECT method is not suitable even for a few decimal-digitprecision. Indeed, even after 500,000 function evaluations and 3732 seconds of CPUtime, it cannot achieve six decimal-digit accuracy. In contrast to the one-dimensionaland two-dimensional examples, keeping the graph structure properly, in particularforming the adjacencies between the new vertices, take almost all of the computa-tional time rather than the function evaluations. Even if the matrices A j were muchbigger than 5 ×
5, the computation times would not be affected significantly. There-fore, in the table, the CPU times (in seconds) are also provided in parenthesis.The later iterations are more expensive, since the number of new vertices createdincreases at the later iterations. Up to four dimensions, the increase in the numberof vertices at every iteration seems more or less fixed. In contrast, this does nothold when the dimension is five or more. (These observations are solely based onnumerical experiments; we do not have a clear understanding of this phenomenonat the moment.) This is illustrated for the affine example with five parameters inFigure 9.3. In the figure on the left, the solid and dashed lines represent the numberof dead vertices and the number of newly added vertices, respectively, at iterations1 , , . . . , Example 4 (Minimizing the Largest Eigenvalue, Non-convex):
This is anon-convex example depending on four parameters, involving the minimization of the ptimization of Hermitian Eigenvalues o f ne w l y added / dead v e r t i c e s t o t a l o f v e r t i c e s Fig. 9.3 . On the left, the number of dead vertices (solid line) and the number of newly addedvertices (dashed line) with respect to the iteration number are displayed, whereas on the right, thetotal number of vertices with respect to the iteration number is displayed for Example 3. (cid:15) − − − − − − eigopt 52 141 225 309 391 472DIRECT 45 161 369 939 1367 2049 Table 9.6
Number of function evaluations by Algorithm 1 (eigopt) and DIRECT method for minimizingthe largest eigenvalue of the matrix function (9.2) with respect to absolute accuracy (cid:15) . largest eigenvalue of the matrix function A ( ω ) = A + (cid:88) j =1 ω j A j + 12 (cid:88) j =1 4 (cid:88) k =1 ω j ω k A jk , (9.2)where A , A , . . . , A are as in the previous affine example taken from [13], while A jk are randomly chosen 5 × A jk = A kj . ByCorollary 6.2, a lower bound for λ min (cid:0) ∇ λ ( A ( ω )) (cid:1) is given by γ = λ min (cid:0) ∇ A (cid:1) where ∇ A ∈ R × , and A jk constitutes the 5 × ∇ A at rows5( j −
1) + 1 : 5 j and columns 5( k −
1) + 1 : 5 k . Table 9.6 displays a comparison of thenumber of function evaluations for Algorithm 1 and DIRECT method with respectto accuracy. As usual, Algorithm 1 seems to exhibit linear convergence.
10. Software.
A MATLAB implementation of Algorithm 1 is available on theweb. The implementation makes use of heaps, adjacency lists, stacks, and updates theunderlying graph structure efficiently. The user is expected to write down a MATLABroutine that calculates the eigenvalue function and its gradient at a given point. Thename of this routine and γ , a global lower bound on the minimum eigenvalues of theHessians of the eigenvalue functions must be supplied by the user. We refer to theweb page associated with this implementation , where a user guide is also provided.
11. Conclusion.
The analytical properties of eigenvalues of matrix-valued func-tions facilitate the use of so-called quadratic support functions that globally underes-timate the eigenvalue curves for their optimization. This observation motivates theidea of adapting support function based global optimization approaches, especially http://home.ku.edu.tr/ ∼ emengi/software/eigopt.html E. MENGI, E.A. YILDIRIM, AND M. KILIC¸ the approach due to Breiman and Cutler [4], for non-convex eigenvalue optimiza-tion. In this paper, we illustrated how such global optimization approaches based onthe derivative information could be realized in the context of non-convex eigenvalueoptimization. We derived the necessary quadratic support functions, elaborated ondeducing analytical global lower bounds γ for the second derivatives of the extremeeigenvalue functions, which are essential for the algorithm, and provided a global con-vergence proof. The algorithm is especially applicable for the optimization of extremeeigenvalues, for instance, for the minimization of the largest eigenvalue, and thoseeigenvalue functions that exhibit generic simplicity. Acknowledgements
We thank two anonymous referees for their invaluable com-ments, which improved this paper considerably. We are also grateful to MichaelKarow and Melina Freitag for helpful discussions and feedback.
REFERENCES[1] O. Axelsson, H. Lu, and B. Polman. On the numerical radius of matrices and its applicationto iterative solution methods.
Linear and Multilinear Algebra , 37:225–238, 1994.[2] D. Barnes. The shape of the strongest column is arbitrarily close to the shape of the weakestcolumns.
Quart. Appl. Math. , XLIV:605–609, 1986.[3] S. Boyd and V. Balakrishnan. A regularity result for the singular values of a transfer matrixand a quadratically convergent algorithm for computing its L ∞ -norm. Systems ControlLett. , 15(1):1–7, 1990.[4] L. Breiman and A. Cutler. A deterministic algorithm for global optimization.
Math. Program. ,58(2):179–199, February 1993.[5] N.A. Bruinsma and M. Steinbuch. A fast algorithm to compute the H ∞ -norm of a transferfunction matrix. Systems Control Lett. , 14:287–293, 1990.[6] R. Byers. A bisection method for measuring the distance of a stable matrix to the unstablematrices.
SIAM J. Sci. Stat. Comp. , 9:875–881, 1988.[7] R. Byers. The descriptor controllability radius. In
Numerical Methods Proceedings of the Inter-national Symposium MTNS-93 , volume II, pages 85–88. Uwe Helmke, Reinhard Mennicken,and Hosef Saurer, eds., Akademie Verlag, Berlin, 1993.[8] S.J. Cox and M.L. Overton. On the optimal design of columns against buckling.
SIAM J.Math. Anal. , 23:287–325, 1992.[9] J. Cullum, W.E. Donath, and P. Wolfe. The minimization of certain nondifferentiable sums ofeigenvalues of symmetric matrices.
Math. Program Study , 3:35–55, 1975.[10] W.E. Donath and A.J. Hoffman. Lower bounds for the partitioning of graphs.
IBM J. Res.Dev. , 17(5):420–425, September 1973.[11] M. Eiermann. Field of values and iterative methods.
Linear Algebra Appl. , 180:167–197, 1993.[12] R. Eising. Between controllable and uncontrollable.
Systems Control Lett. , 4(5):263–264, 1984.[13] M.K.H. Fan and B. Nekooie. On minimizing the largest eigenvalue of a symmetric matrix.
Linear Algebra Appl. , 214(0):225 – 246, 1995.[14] R. Fletcher. Semidefinite matrix constraints in optimization.
SIAM J. Control Optim. , 23:493–513, 1985.[15] M.A. Freitag and A. Spence. A Newton-based method for the calculation of the distance toinstability.
Linear Algebra Appl. , 435(12):3189–3205, 2011.[16] S. Friedland, J. Nocedal, and M.L. Overton. The formulation and analysis of numerical methodsfor inverse eigenvalue problems.
SIAM J. Numer. Anal. , 24(3):pp. 634–667, 1987.[17] M. Gao and M. Neumann. A global minimum search algorithm for estimating the distance touncontrollability.
Linear Algebra Appl. , 188-189:305–350, 1993.[18] V.P. Gergel. A global optimization algorithm for multivariate functions with Lipschitzian firstderivatives.
J. Global Optim. , 10(3):257–281, 1997.[19] M. Gu. New methods for estimating the distance to uncontrollability.
SIAM J. Matrix Anal.Appl. , 21(3):989–1003, 2000.[20] M. Gu, E. Mengi, M.L. Overton, J. Xia, and J. Zhu. Fast methods for estimating the distanceto uncontrollability.
SIAM J. Matrix Anal. Appl. , 28(2):477–502, 2006.[21] C. He and G.A. Watson. An algorithm for computing the distance to instability.
SIAM J.Matrix Anal. Appl. , 20:101–116, 1999.ptimization of Hermitian Eigenvalues [22] D. Hinrichsen and M. Motscha. Optimization problems in the robustness analysis of linearstate space systems. In Proceedings of the International Seminar on Approximation andOptimization , pages 54–78, New York, NY, USA, 1988. Springer-Verlag New York, Inc.[23] R.A. Horn and C.R. Johnson.
Topics in Matrix Analysis . Cambridge University Press, 1991.[24] D. R. Jones, C. D. Perttunen, and B. E. Stuckman. Lipschitzian optimization without theLipschitz constant.
J. Optim. Theory Appl. , 79(1):157–181, 1993.[25] D. Kressner, V. Mehrmann, and T.Penzl. Ctdsx - a collection of benchmarks for state-spacerealizations of continuous-time dynamical systems.
SLICOT Working Note , 1998.[26] D.E. Kvasov and Ya.D. Sergeyev. A univariate global search working with a set of Lipschitzconstants for the first derivative.
Optimization Lett. , 3(2):303–318, 2009.[27] D.E. Kvasov and Ya.D. Sergeyev. Lipschitz gradients for global optimization in a one-point-based partitioning scheme.
J. Comput. Appl. Math. , 236(16):4042 – 4054, 2012.[28] P. Lancaster. On eigenvalues of matrices dependent on a parameter.
Numer. Math. , 6:377–387,1964.[29] D. Lera and Ya.D. Sergeyev. Acceleration of univariate global optimization algorithms workingwith Lipschitz functions and Lipschitz first derivatives.
SIAM J. Optimization , 23(1):508– 529, 2013.[30] A.S. Lewis and M.L. Overton. Eigenvalue optimization.
Acta Numer. , 5:149–190, 0 1996.[31] M. M¨akel¨a. Survey of bundle methods for nonsmooth optimization.
Optim. Method. Softw. ,17(1):1–29, 2002.[32] E. Mengi and M.L. Overton. Algorithms for the computation of the pseudospectral radius andthe numerical radius of a matrix.
IMA J. Numer. Anal. , 25:648–669, 2005.[33] M. Myers and W. Spillers. A note on the strongest fixed-fixed column.
Quart. Appl. Math. ,XLIV:583–588, 1986.[34] Y. Nesterov and A. Nemirovski.
Interior point polynomial methods in convex programming .SIAM, Philadelphia, 1994.[35] N. Olhoff and S. Rasmussen. On single and bimodal optimum buckling loads of clampedcolumns.
Int. J. Solids Struct. , 13(7):605 – 614, 1977.[36] M.L. Overton. On minimizing the maximum eigenvalue of a symmetric matrix.
SIAM J. MatrixAnal. Appl. , 9(2):256–268, April 1988.[37] M.L. Overton. Large-scale optimization of eigenvalues.
SIAM J. Optimization , 2:88–120, 1991.[38] M.L. Overton and R.S. Womersley. Second derivatives for optimizing eigenvalues of symmetricmatrices.
SIAM J. Matrix Anal. Appl. , 16(3):697–718, July 1995.[39] C.C. Paige. Properties of numerical algorithms relating to computing controllability.
IEEETrans. Automat. Control , 26:130–138, 1981.[40] S. A. Piyavskii. An algorithm for finding the absolute extremum of a function.
USSR Comput.Math. and Math. Phys. , 12:57–67, 1972.[41] F. Rellich.
Perturbation Theory of Eigenvalue Problems . Gordon and Breach, 1969.[42] Ya.D. Sergeyev. Multidimensional global optimization using the first derivatives.
Comp. Math.Math. Phys+. , 39(5):743 – 752, 1999.[43] Y.D. Sergeyev and D.E. Kvasov. Global search based on efficient diagonal partitions and a setof Lipschitz constants.
SIAM J. Optimization , 16(3):910–937, 2006.[44] A. Shapiro and M.K.H. Fan. On eigenvalue optimization.
SIAM J. Optimization , 5:552–569,1995.[45] B. Shubert. A sequential method seeking the global maximum of a function.
SIAM J. Numer.Anal. , 9:379–388, 1972.[46] I. Tadjbakhsh and J.B. Keller. Stringest columns and isoperimetric inequalities for eigenvalues.
J. Appl. Mech. , 29(1):159–164, 1962.[47] C. F. Van Loan. How near is a matrix to an unstable matrix?
Lin. Alg. and its Role in SystemsTheory , 47:465–479, 1984.[48] C.T.C. Wall.