Variational analysis of spectral functions simplified
aa r X i v : . [ m a t h . O C ] J u l Variational analysis of spectral functions simplified
D. Drusvyatskiy * C. Kempton † Abstract.
Spectral functions of symmetric matrices – those depending on matri-ces only through their eigenvalues – appear often in optimization. A cornerstonevariational analytic tool for studying such functions is a formula relating their subd-ifferentials to the subdifferentials of their diagonal restrictions. This paper presents anew, short, and revealing derivation of this result. We then round off the paper withan illuminating derivation of the second derivative of C -smooth spectral functions,highlighting the underlying geometry. All of our arguments have direct analoguesfor spectral functions of Hermitian matrices, and for singular value functions ofrectangular matrices. Key words.
Eigenvalues, singular values, nonsmooth analysis, proximal mapping,subdifferential, Hessian, group actions.
AMS Subject Classification.
Primary 49J52, 15A18 ; Secondary 49J53, 49R05,58D19 . This work revolves around spectral functions . These are functions on the space of n × n symmetric matrices S n that depend on matrices only through their eigenvalues,that is, functions that are invariant under the action of the orthogonal group byconjugation. Spectral functions can always be written in a composite form f ◦ λ ,where f is a permutation-invariant function on R n and λ is a mapping assigning toeach matrix X the vector of eigenvalues ( λ ( X ) , . . . , λ n ( X )) in nonincreasing order.A pervasive theme in the study of such functions is that various variationalproperties of the permutation-invariant function f are inherited by the inducedspectral function f ◦ λ ; see e.g. [1–5, 16–18]. Take convexity for example. Supposing University of Washington, Department of Mathematics, Seattle, WA 98195; Research ofDrusvyatskiy and Kempton was partially supported by the AFOSR YIP award FA9550-15-1-0237.* E-mail: [email protected]; ∼ ddrusv/ † E-mail: [email protected]; f is closed and convex, the main result of [7] shows that the Fenchel conjugateof f ◦ λ admits the elegant representation( f ◦ λ ) ⋆ = f ⋆ ◦ λ. (1.1)An immediate conclusion is that f ◦ λ agrees with its double conjugate and istherefore convex, that is, convexity of f is inherited by the spectral function f ◦ λ . Anelegant characterization of the subdifferential ∂ ( f ◦ λ )( X ) in terms of ∂f ( λ ( X )) thenreadily follows [7, Theorem 3.1] — an important result for optimization specialists.In a follow up paper [8], Lewis showed that even for nonconvex functions f , thefollowing exact relationship holds: ∂ ( f ◦ λ )( X ) = { U (Diag v ) U T : v ∈ ∂f ( λ ( X )) , U ∈ O nX } , (1.2)where O nX := { U ∈ O n : X = U (Diag λ ( X )) U T } . Here, the symbol O n denotes the group of orthogonal matrices and the symbols ∂ ( f ◦ λ ) and ∂f may refer to the Fr´echet, limiting, or Clarke subdifferentials; seee.g. [14] for the relevant definitions. Thus calculating the subdifferential of thespectral function f ◦ λ on S n reduces to computing the subdifferential of the usuallymuch simpler function f on R n . For instance, subdifferential computation of the k ’thlargest eigenvalue function X λ k ( X ) amounts to analyzing a piecewise polyhedralfunction, the k ’th order statistic on R n [8, Section 9]. Moreover, the subdifferentialformula allows one to gauge the underlying geometry of spectral functions, throughtheir “active manifolds” [1], for example.In striking contrast to the convex case [7], the proof of the general subdifferentialformula (1.2) requires much finer tools, and is less immediate to internalize. Thispaper presents a short, elementary, and revealing derivation of equation (1.2) thatis no more involved than its convex counterpart. Here’s the basic idea. Considerthe Moreau envelope f α ( x ) := inf y (cid:8) f ( y ) + 12 α | x − y | (cid:9) . Similar notation will be used for the envelope of f ◦ λ . In direct analogy to equation(1.1), we will observe that the Moreau envelope satisfies the equation( f ◦ λ ) α = f α ◦ λ, and derive a convenient formula for the corresponding proximal mapping. The casewhen f is an indicator function was treated in [2], and the argument presented hereis a straightforward adaptation, depending solely on the Theobald–von Neumanninequality [19, 20]. The key observation now is independent of the eigenvalue set-ting: membership of a vector v in the proximal or in the Fr´echet subdifferential of2ny function g at a point x is completely determined by the local behavior of theunivariate function α g α ( x + αv ) near the origin. The proof of the subdifferentialformula (1.2) quickly flows from there. It is interesting to note that the argumentuses very little information about the properties of the eigenvalue map, with the ex-ception of the Theobald–von Neumann inequality. Consequently, it applies equallywell in a more general algebraic setting of certain isometric group actions, encom-passing also an analogous subdifferential formula for functions of singular valuesderived in [11, 12, 15]; a discussion can be found in the appendix. A different Lietheoretic approach in the convex case appears in [9].We complete the paper by reconsidering the second-order theory of spectralfunctions. In [10, 16, 17], the authors derived a formula for the second derivative ofa C -smooth spectral function. In its simplest form it reads ∇ F (Diag a )[ B ] = Diag (cid:0) ∇ f ( a )diag( B ) (cid:1) + A ◦ B, where A ◦ B is the Hadamard product and A ij = ( ∇ f ( a ) i −∇ f ( a ) j a i − a j if a i = a j ∇ f ( a ) ii − ∇ f ( a ) ij if a i = a j . This identity is quite mysterious, and its derivation is quite opaque geometrically.In the current work, we provide a transparent derivation, making clear the role ofthe invariance properties of the gradient graph. To this end, we borrow some ideasfrom [17], while giving them a geometric interpretation.The outline of the manuscript is as follows. Section 2 records some basic notationand an important preliminary result about the Moreau envelope (Lemma 2.1). Sec-tion 3 contains background material on orthogonally invariant functions. Section 4describes the derivation of the subdifferential formula and Section 5 focuses on thesecond-order theory – the main results of the paper.
This section briefly records some basic notation, following closely the monograph[14]. The symbol E will always denote an Euclidean space (finite-dimensional realinner product space) with inner product h· , ·i and induced norm | · | . A closed ballof radius ε > x will be denoted by B ε ( x ). The closure and theconvex hull of a set Q in E will be denoted by cl Q and conv Q , respectively.Throughout, we will consider functions f on E taking values in the extendedreal line R := R ∪ {±∞} . For such a function f and a point ¯ x , with f (¯ x ) finite, the proximal subdifferential ∂ p f (¯ x ) consists of all vectors v ∈ E such that there existsconstants r > ε > f ( x ) ≥ f (¯ x ) + h v, x − ¯ x i − r | x − ¯ x | for all x ∈ B ε (¯ x ) . f is C -smooth near ¯ x , the proximal subdifferential ∂ p f (¯ x ) consists onlyof the gradient ∇ f (¯ x ). A function f is said to be prox-bounded if it majorizes somequadratic function. In particular, all lower-bounded functions are prox-bounded.For prox-bounded functions, the inequality in the definition of the proximal sub-differential can be taken to hold globally at the cost of increasing r [14, Propo-sition 8.46]. The Fr´echet subdifferential of f at ¯ x , denoted ˆ ∂f (¯ x ), consists of allvectors v ∈ E satisfying f ( x ) ≥ f (¯ x ) + h v, x − ¯ x i + o ( | x − ¯ x | ) . Here, as usual, o ( | x − ¯ x | ) denotes any term satisfying o ( | x − ¯ x | ) | x − ¯ x | →
0. Whenever f is C -smooth near ¯ x , the set ˆ ∂f (¯ x ) consists only of the gradient ∇ f (¯ x ). Thesubdifferentials ∂ p f (¯ x ) and ˆ ∂f (¯ x ) are always convex, while ˆ ∂f (¯ x ) is also closed.The limiting subdifferential of f at ¯ x , denoted ∂f (¯ x ), consists of all vectors v ∈ E sothat there exist sequences x i and v i ∈ ˆ ∂f ( x i ) with ( x i , f ( x i ) , v i ) → (¯ x, f (¯ x ) , v ). Thesame object arises if the vectors v i are restricted instead to lie in ∂ p f ( x i ) for eachindex i ; see for example [14, Corollary 8.47]. The horizon subdifferential , denoted ∂ ∞ f (¯ x ), consists of all limits of λ i v i for some sequences v i ∈ ∂f ( x i ) and λ i ≥ x i → ¯ x and λ i ց
0. This object records horizontal “normals” to theepigraph of the function. For example, f is locally Lipschitz continuous around ¯ x ifand only if the set ∂ ∞ f (¯ x ) contains only the zero vector.The two key constructions at the heart of the paper are defined as follows.Given a function f : E → R and a parameter α >
0, the
Moreau envelope f α andthe proximal mapping P α f are defined by f α ( x ) := inf y ∈ E (cid:8) f ( y ) + 12 α | y − x | (cid:9) ,P α f ( x ) := argmin y ∈ E (cid:8) f ( y ) + 12 α | y − x | (cid:9) . Extending the definition slightly, we will set f ( x ) := f ( x ). It is easy to see that f is prox-bounded if and only if there exists some point x ∈ E and a real α > f α ( x ) > −∞ .The proximal and Fr´echet subdifferentials are conveniently characterized by adifferential property of the function α f α ( x + αv ). This observation is recordedbelow. To this end, for any function ϕ : [0 , ∞ ) → R , the one-sided derivative willbe denoted by ϕ ′ + (0) := lim α ց ϕ ( α ) − ϕ (0) α . Lemma 2.1 (Subdifferential and the Moreau envelope) . Consider an lsc, prox-bounded function f : E → R , and a point x with f ( x ) finite.Fix a vector v ∈ E and define the function ϕ : [0 , ∞ ) → R by setting ϕ ( α ) := f α ( x + αv ) . Then the following are true. i) The vector v lies in ˆ ∂f ( x ) if and only if ϕ ′ + (0) = | v | . (2.1) (ii) The vector v lies in ∂ p f ( x ) if and only if there exists α > satisfying x ∈ P α f ( x + αv ) , or equivalently ϕ ( α ) = f ( x ) + | v | α. In this case, the equation above continues to hold for all ˜ α ∈ [0 , α ] .Proof. Claim ( ii ) is immediate from definitions; see for example [14, Proposition8.46]. Hence we focus on claim ( i ). To this end, note first that the inequality f α ( x + αv ) − f ( x ) α ≤ | v | v ∈ E . (2.2)Consider now a vector v ∈ ˆ ∂f ( x ) and any sequences α i ց x i ∈ P α i ( x + α i v ).We may assume x i = x since otherwise there’s nothing to prove. Clearly x i tend to x and hence f α i ( x + α i v ) − f ( x ) = f ( x i ) − f ( x ) + 12 α i | ( x i − x ) − α i v | ≥ o ( | x i − x | ) + 12 α i | x i − x | + α i | v | . Consequently, we obtain the inequality f α i ( x + α i v ) − f ( x ) α i ≥ | x i − x | α i · o ( | x i − x | ) | x i − x | + 12 (cid:12)(cid:12)(cid:12) x i − xα i (cid:12)(cid:12)(cid:12) + | v | . Taking into account (2.2) yields the inequality0 ≥ | x i − x | α i · (cid:18) o ( | x i − x | ) | x i − x | + 12 (cid:12)(cid:12)(cid:12) x i − xα i (cid:12)(cid:12)(cid:12)(cid:19) . In particular, we deduce x i − xα i →
0, and the equation (2.1) follows.Conversely suppose that equation (2.1) holds, and for the sake of contradictionthat v does not lie in ˆ ∂f ( x ). Then there exists κ > y i → x satisfying f ( y i ) − f ( x ) − h v, y i − x i ≤ − κ | y i − x | . Then for any α >
0, observe f α ( x + αv ) − f ( x ) α ≤ α (cid:16) f ( y i ) − f ( x ) + 12 α | ( y i − x ) − αv | (cid:17) ≤ − κ | y i − x | α + 12 (cid:12)(cid:12)(cid:12) y i − xα (cid:12)(cid:12)(cid:12) + | v | . Setting α i := | y i − x | κ and letting i tend to ∞ yields a contradiction.5 Symmetry and orthogonal invariance
Next we recall a basic correspondence between symmetric functions and spectralfunctions of symmetric matrices. The discussion follows that of [8]. Henceforth R n will denote an n -dimensional real Euclidean space with a specified basis. Hence onecan associate R n with a collection of n -tuples ( x , . . . , x n ), in which case the innerproduct h· , ·i is the usual dot product. The finite group of coordinate permutationsof R n will be denoted by Π n . A function f : R n → R is symmetric whenever it isΠ n -invariant, meaning f ( πx ) = f ( x ) for all x ∈ R n and π ∈ Π n . It is immediate to verify that if f is symmetric, then so is the Moreau envelope f α for any α ≥
0. This elementary observation will be important later.The vector space of real n × n symmetric matrices will be denoted by S n andwill be endowed with the trace inner product h X, Y i = tr XY , and the inducedFrobenius norm | X | = √ tr X . For any x ∈ R n , the symbol Diag x will denote the n × n matrix with x on its diagonal and with zeros off the diagonal, while for amatrix X ∈ S n , the symbol diag X will denote the n -vector of its diagonal entries.The group of real n × n orthogonal matrices will be written as O n . The eigenvaluemapping λ : S n → R n assigns to each matrix X in S n the vector of its eigenvalues( λ ( X ) , . . . , λ n ( X )) in a nonincreasing order. A function F : S n → R is spectral if itis O n -invariant under the conjugation action, meaning F ( U XU T ) = F ( X ) for all X ∈ S n and U ∈ O n . In other words, spectral functions are those that depend on matrices only throughtheir eigenvalues. A basic fact is that any spectral function F on S n can be writtenas a composition of F = f ◦ λ for some symmetric function f on R n . Indeed, f canbe realized as the restriction of F to diagonal matrices f ( x ) = F (Diag x ).Two matrices X and Y in S n are said to admit a simultaneous spectral decom-position if there exists an orthogonal matrix U ∈ O n such that U XU T and U Y U T are both diagonal matrices. It is well-known that this condition holds if and onlyif X and Y commute. The matrices X and Y are said to admit a simultaneous or-dered spectral decomposition if there exists an orthogonal matrix U ∈ O n satisfying U XU T = Diag λ ( X ) and U Y U T = Diag λ ( Y ). The following result characterizingthis property, essentially due to Theobald [19] and von Neumann [20], plays a centralrole in spectral variation analysis. Theorem 3.1 (Von Neumann-Theobald) . Any two matrices X and Y in S n satisfythe inequality | λ ( X ) − λ ( Y ) | ≤ | X − Y | . Equality holds if and only if X and Y admit a simultaneous ordered spectral decom-position. h λ ( X ) , λ ( Y ) i ≥ h X, Y i for all X, Y ∈ S n . In this section, we derive the subdifferential formula for spectral functions. In whatfollows, for any matrix X ∈ S n define the diagonalizing matrix set O X := { U ∈ O n : U (Diag λ ( X )) U T = X } . The spectral subdifferential formula readily follows from Lemma 2.1 and the follow-ing intuitive proposition, a proof of which can essentially be seen in [2, Proposition8].
Theorem 4.1 (Proximal analysis of spectral functions) . Consider a symmetric function f : R n → R . Then the equation ( f ◦ λ ) α = f α ◦ λ holds . (4.1) In addition, the proximal mapping admits the representation: P α ( f ◦ λ )( X ) = (cid:8) U (cid:0) Diag y (cid:1) U T : y ∈ P α f ( λ ( X )) , U ∈ O X (cid:9) . (4.2) Moreover, for any Y ∈ P α ( f ◦ λ )( X ) the matrices X and Y admit a simultaneousordered spectral decomposition.Proof. For any X and Y , applying the trace inequality (Theorem 3.1), we deduce f ( λ ( Y )) + 12 α | Y − X | ≥ f ( λ ( Y )) + 12 α | λ ( Y ) − λ ( X ) | ≥ f α ( λ ( X )) . (4.3)Taking the infimum over Y , we deduce ( f ◦ λ ) α ( X ) ≥ f α ( λ ( X )). On the other hand,for any U ∈ O X , the inequalities hold:( f ◦ λ ) α ( X ) = inf Y (cid:8) f ( λ ( Y )) + 12 α | Y − X | (cid:9) = inf Y (cid:8) f ( λ ( Y )) + 12 α | U T Y U − Diag λ ( X ) | (cid:9) ≤ f α ( λ ( X )) . This establishes (4.1).To establish equation (4.2), consider first a matrix U ∈ O X and a vector y ∈ P α f ( λ ( X )), and define Y := U (Diag y ) U T . Then we have( f ◦ λ )( Y )+ 12 α | Y − X | = f ( y ) + 12 α | y − λ ( X ) | = f α ( λ ( X )) = ( f ◦ λ ) α ( X ) . Y ∈ P α ( f ◦ λ )( X ) is valid, as claimed. Conversely, fix anymatrix Y ∈ P α ( f ◦ λ )( X ). Then plugging in Y into (4.3), the left-hand-side equals( f ◦ λ ) α ( X ) and hence the two inequalities in (4.3) hold as equalities. The secondequality immediately yields the inclusion λ ( Y ) ∈ P α f ( λ ( X )), while the first alongwith Theorem 3.1 implies that X and Y admit a simultaneous ordered spectraldecomposition, as claimed.Combining Lemma 2.1 and Theorem 4.1, the main result of the paper readilyfollows. Theorem 4.2 (Subdifferentials of spectral functions) . Consider an lsc symmetricfunction f : R n → R . Then the following equation holds: ∂ ( f ◦ λ )( X ) = (cid:8) U (cid:0) Diag v (cid:1) U T : v ∈ ∂f ( λ ( X )) , U ∈ O X (cid:9) . (4.4) Analogous formulas hold for the proximal, Fr´echet, and horizon subdifferentials.Proof.
Fix a matrix X in the domain of f ◦ λ and define x := λ ( X ). Without lossof generality, suppose that f is lower-bounded. Indeed if this were not the case,then since f is lsc there exists ε > f is lower-bounded on the ball B ǫ ( x ).Consequently adding to f the indicator function of the symmetric set ∪ π ∈ Π B ǫ ( πx )assures that the function is lower-bounded.We first dispense with the easy inclusion ⊆ for all the subdifferentials. To thisend, recall that if V is a proximal subgradient of f ◦ λ at X , then there exists α > X ∈ P α ( f ◦ λ )( X + αV ). Theorem 4.1 then implies that X and V commute. Taking limits, we deduce that all Fr´echet, limiting, and horizonsubgradients of f ◦ λ at X also commute with X . Recalling that commuting matricesadmit simultaneous spectral decomposition, basic definitions immediately yield theinclusion ⊆ in equation (4.4) for the proximal and for the Fr´echet subdifferentials.Taking limits, we deduce the inclusion ⊆ in (4.4) for the limiting and for the horizonsubdifferentials, as well.Next, we argue the reverse inclusion. To this end, define V := U (Diag v ) U T foran arbitrary matrix U ∈ O X and any vector v ∈ R n . Then Theorem 4.1, along withthe symmetry of the envelope f α , yields the equation( f ◦ λ ) α ( X + αV ) − f ( λ ( X )) α = f α ( x + αv ) − f ( x ) α . Consequently if v lies in ∂ p f ( x ), then Lemma 2.1 shows that for some α > | v | , or equivalently | V | . Lemma 2.1 then yields the inclusion V ∈ ∂ p ( f ◦ λ )( X ). Similarly if v lies in ˆ ∂f ( x ), then the same argument but with α tending to 0 shows that V lies in ˆ ∂ ( f ◦ λ )( X ). Thus the inclusion ⊇ in equation (4.4)holds for the proximal and for the Fr´echet subdifferentials. Taking limits, the sameinclusion holds for the limiting and for the horizon subdifferentials. This completesthe proof. 8 emark . It easily follows from Theorem 4.2 that the inclusion ⊇ holds for theClarke subdifferential. The reverse inclusion, however, requires a separate argumentgiven in [8, Sections 7-8].In conclusion, we should mention that all the arguments in the section applyequally well for Hermitian matrices (with the standard Hermitian trace product),with the orthogonal matrices replaced by unitary matrices. Entirely analogous ar-guments also apply for functions of singular values of rectangular matrices (real orcomplex). For more details, see the appendix in the arXiv version of the paper. C -smooth spectral functions In this section, we revisit the second-order theory of spectral functions. To thisend, fix for the entire section an lsc symmetric function f : R n → R and define thespectral function F := f ◦ λ on S n . It is well known that f is C -smooth around amatrix X if and only if F is C -smooth around λ ( X ); see [10, 16–18]. Moreover, aformula for the Hessian of F is available: for matrices A = Diag( a ) and B ∈ S n wehave ∇ F ( A )[ B ] = Diag (cid:0) ∇ f ( a )diag( B ) (cid:1) + A ◦ B, where A ◦ B is the Hadamard product and A ij = ( ∇ f ( a ) i −∇ f ( a ) j a i − a j if a i = a j ∇ f ( a ) ii − ∇ f ( a ) ij if a i = a j . The assumption that A is a diagonal matrix is made without loss of generality,as will be apparent shortly. In this section, we provide a transparent geometricderivation of the Hessian formula by considering invariance properties of gph ∇ F .Some of our arguments give a geometric interpretation of the techniques in [17]. Remark . Throughout the section we will appealto the following basic property of the Hessian. For any C -smooth function g on anEuclidean space, the vector z := ∇ g ( a )[ b ] is the unique vector satisfying ( z, − b ) ∈ N gph ∇ g ( a, ∇ g ( a )).Consider now the action of the orthogonal group O n on S n by conjugation namely U.X = U XU T . Recall that F is invariant under this action, meaning F ( U.X ) = F ( X ) for all orthogonal matrices U . This action naturally extends to the productspace S n × S n by setting U. ( X, Y ) = (
U.X, U.Y ). As we have seen, the graph gph ∇ F is then invariant with respect to this action: U. gph ∇ F = gph ∇ F for all U ∈ O n . One immediate observation is that N gph ∇ F ( U.X, U.Y ) =
U.N gph ∇ F ( X, Y ). Conse-quently we deduce( Z, − B ) ∈ N gph ∇ F ( X, Y ) ⇐⇒ ( U.Z, − U.B ) ∈ N gph ∇ F ( U.X, U.Y )9he formula ∇ F ( X )[ B ] = U T . ∇ F ( U.X )[ U.B ] (5.1)now follows directly from Remark 5.1, whenever F is C -smooth around X . As aresult, when speaking about the operator ∇ F ( X ), we may assume without loss ofgenerality that X and ∇ F ( X ) are both diagonal matrices.Next we briefly recall a few rudimentary properties of the conjugation action; seefor example [6, Sections 4, 8, 9]. We say that a n × n matrix W is skew-symmetric if W T = − W . Then it is well-known that O n is a smooth manifold and the tangentspace to O n at the identity matrix consists of skew-symmetric matrices: T O n ( I ) = { W ∈ R n × n : W is skew-symmetric } . The commutator of two matrices
A, B ∈ R n × n , denoted by [ A, B ] is the matrix[
A, B ] := AB − BA . An easy computation shows that the commutator of a skew-symmetric matrix with a symmetric matrix is itself symmetric. Moreover, the iden-tity h X, [ W, Z ] i = h [ X, W ] , Z i holds for any matrices X, Z ∈ S n and skew-symmetric W . For any matrix A ∈ S n ,the orbit of A , denoted by O n .A is the set O n .A = { U.A : U ∈ O n } . Similarly, the orbit of a pair (
A, B ) ∈ S n × S n is the set O n . ( A, B ) = { ( U.A, U.B ) : U ∈ O n } . An standard computation now shows that orbits are smooth manifolds with tangentspaces T O n .A ( A ) = { [ W, A ] : W is skew-symmetric } ,T O n . ( A,B ) ( A, B ) = { ([ W, A ] , [ W, B ]) : W is skew-symmetric } . Now supposing that F is twice differentiable at a matrix A ∈ S n × n , the graphgph ∇ F certainly contains the orbit O n . ( A, ∇ F ( A )). In particular, this implies thatthe tangent space to gph ∇ F at ( A, ∇ F ( A )) contains the tangent space to the orbit: { ([ W, A ] , [ W, ∇ F ( A )]) : W skew-symmetric } . Thus for any B ∈ S n , the tuple ( ∇ F ( A )[ B ] , − B ) is orthogonal to the tuple([ W, A ] , [ W, ∇ F ( A )]) for any skew-symmetric matrix W . We record this elemen-tary observation in the following lemma. This also appears as [17, Lemma 3.2]. Compute the differential of the mapping O n ∋ U U.A emma 5.2 (Orthogonality to orbits) . Suppose F is C -smooth around A ∈ S n .Then for any skew-symmetric matrix W and any B ∈ S n , we have (cid:10) ∇ F ( A )[ B ] , [ W, A ] (cid:11) = h B, [ W, ∇ F ( A )] i . Proof.
This is immediate from the preceding discussion.Next recall that the stabilizer of a matrix A ∈ S n is the set:Stab( A ) = { U ∈ O n : U.A = A } . Similarly we may define the set Stab(A,B).
Lemma 5.3 (Tangent space to the stabilizer) . For any matrices
A, B ∈ S n , thetangent spaces to Stab ( A ) and to Stab ( A, B ) at the identity matrix are the sets { W ∈ R n × n : W skew-symmetric, [ W, A ] = 0 } , (cid:8) W ∈ R n × n : W skew-symmetric , [ W, A ] = [
W, B ] = 0 (cid:9) , respectively.(Proof sketch). Define the orbit map θ ( A ) : O n → O n .A by setting θ ( A ) ( U ) := U.A . Aquick computation shows that θ ( A ) is equivariant with respect to left-multiplicationaction of O n on itself and the conjugation action of O n on O n .A . Hence the equiv-ariant rank theorem ( [6, Theorem 7.25]) implies that θ ( A ) has constant rank. Infact, since θ ( A ) is surjective, it is a submersion. It follows that the stabilizerStab( A ) = ( θ ( A ) ) − ( A )is a smooth manifold with tangent space at the identity equal to the kernel of thedifferential d θ ( A ) (cid:12)(cid:12) U = I ( W ) = [ W, A ]. The expression for the tangent space to Stab( A )immediately follows. The analogous expression for Stab( A, B ) follows along similarlines.With this, we are able to state and prove the main theorem.
Theorem 5.4 (Hessian of C -smooth spectral functions) . Consider a symmetricfunction f : R n → R and the spectral function F = f ◦ λ . Suppose that F is C -smooth around a matrix A := Diag ( a ) and for any matrix matrix B ∈ S n define Z := ∇ F ( A )[ B ] . Then equalitydiag ( Z ) = ∇ f ( a )[ diag ( B )] , holds, while for indices i = j , we have Z ij = ( B ij (cid:16) ∇ f ( a ) i −∇ f ( a ) j a i − a j (cid:17) if a i = a j B ij (cid:0) ∇ f ( a ) ii − ∇ f ( a ) ij (cid:1) if a i = a j . roof. First observe that clearly f must be C smooth at a . Now, since A is diagonal,so is the gradient ∇ F ( A ). So without loss of generality, we can assume ∇ F ( A ) =Diag( ∇ f ( a )).Observe now that ( Z, − B ) is orthogonal to the tangent space of gph ∇ F at( A, ∇ F ( A )). On the other hand, for any vector a ′ ∈ R n , we have equality (cid:28)(cid:18) Z − B (cid:19) , (cid:18) Diag( a ′ ) − Diag( a )Diag( ∇ f ( a ′ )) − Diag( ∇ f ( a )) (cid:19)(cid:29) = (cid:28)(cid:18) diag( Z ) − diag( B ) (cid:19) , (cid:18) a ′ − a ∇ f ( a ′ ) − ∇ f ( a ) (cid:19)(cid:29) . It follows immediately that the tuple (diag( Z ) , − diag( B )) is orthogonal to the tan-gent space of gph ∇ f at ( a, ∇ f ( a )). Hence we deduce the equality diag( Z ) = ∇ f ( a )[diag( B )] as claimed.Next fix indices i and j with a i = a j , and define the skew-symmetric matrix W ( i,j ) := e i e Tj − e j e Ti , where e k denotes the k ’th standard basis vector. ApplyingLemma 5.2 with the skew-symmetric matrix W = a i − a j W ( i,j ) , we obtain − Z ij = D Z, (cid:2) a i − a j W ( i,j ) , A (cid:3)E = − D [ a i − a j W i,j , B (cid:3) , ∇ F ( A ) E = − D diag[ a i − a j W i,j , B (cid:3) , ∇ f ( a ) E = − B ij (cid:18) ∇ f ( a ) i − ∇ f ( a ) j a i − a j (cid:19) . The claimed formula Z ij = B ij (cid:16) ∇ f ( a ) i −∇ f ( a ) j a i − a j (cid:17) follows.Finally, fix indices i and j , with a i = a j . Observe now the inclusionStab( A ) ⊂ Stab( ∇ F ( A )) . Indeed for any matrix U ∈ Stab( A ), we have ∇ F ( A ) = ∇ F ( U AU T ) = U ∇ F ( A ) U T . This in particular immediately implies that the tangent space T gph ∇ F ( A, ∇ F ( A )) isinvariant under the action of Stab( A ), that is U.T gph ∇ F ( A, ∇ F ( A )) = T gph ∇ F ( A, ∇ F ( A ))for any U ∈ Stab( A ). Hence their entire orbit Stab( A ) . ( X, Y ) of any tangent vector(
X, Y ) ∈ T gph ∇ F ( A, ∇ F ( A )) is contained in the tangent space T gph ∇ F ( A, ∇ F ( A )).We conclude that the tangent space to such an orbit Stab( A ) . ( X, Y ) at (
X, Y ) iscontained in T gph ∇ F ( A, ∇ F ( A )) as well.Define now the matrices E i := Diag( e i ) and ˆ Z := Diag( ∇ f ( a )[ e i ]). Because F is C -smooth, clearly the inclusion ( E i , ˆ Z ) ∈ T gph ∇ F ( A, ∇ F ( A ) holds. The aboveargument, along with Lemma 5.3, immediately implies the inclusion { ([ W, E i ] , [ W, ˆ Z ]) : W skew-symmetric , [ W, A ] = 0 } ⊆ T gph ∇ F ( A, ∇ F ( A ))12nd in particular, ([ W, E i ] , [ W, ˆ Z ]) is orthogonal to ( Z, − B ) for any skew-symmetric W satisfying [ W, A ] = 0. To finish the proof, simply set W = W ( i,j ) . Then since a i = a j , we have [ W, A ] = 0 and therefore − Z ij = (cid:10) Z, [ W ( i,j ) , E i ] (cid:11) = h B, [ W ( i,j ) , ˆ Z ] i = − D [ W ( i,j ) , B ] , ˆ Z E = − B ij (cid:0) ∇ f ( a ) ii − ∇ f ( a ) ij (cid:1) , as claimed. This completes the proof. Remark . The appealing geometric techniques presented in this section seempromising for obtaining at least necessary conditions for the generalized Hessian, inthe sense of [13], of spectral functions that are not necessarily C -smooth. Indeedthe arguments presented deal entirely with the graph gph ∇ f , a setting perfectlyadapted to generalized Hessian computations. There are difficulties, however. Toillustrate, consider a matrix Z ∈ ∂ F ( A | V ). Then one can easily establish propertiesof Diag Z analogous to those presented in Theorem 5.4, as well as properties of Z ij for indices i and j satisfying a i = a j . The difficulty occurs for indices i and j with a i = a j . In this case, our argument used explicitly the fact that tangent cones togph ∂f are linear subspaces, a property that is decisively false in the general setting. A Comments on isometric group actions
It is clear from the Section 1-4, that there is a richer underlying structure governingthe results of Theorems 4.1 and 4.2, with the trace inequality (Theorem 3.1) playingan essential role. This appendix outlines a rudimentary algebraic framework inwhich the previous arguments can be understood, unifying the eigenvalue and thesingular value pictures [11], while leaving room for new settings to be explored.Fix a metric space V and a group G acting on V by isometries. Let H be anothermetric space injecting isometrically by a mapping i : H ֒ → V into V . Intuitively, H is a subset of V with i the canonical injection. Notationally, however, it is cleaner toconsider H as a separate entity. Without loss of generality, we will use the symbol d ( · , · ) to denote the metric both in V and in H . Fix also a distinguished G -invariantmapping p : V → H . The diagram summarizes the notation: H i ⇄ p V It is instructive to keep in mind the following motivating examples: R n Diag ⇄ λ S n , R n Diag ⇄ λ H n , R m Diag ⇄ σ R m × n , R m Diag ⇄ σ C m × n . In the first example (the focus of the previous sections), the group G = O n actsby conjugation U.X = U XU T . In the second example, H n is the space of n × Hermitian matrices (with the standard Hermitian inner production h X, Y i =re X ∗ Y ), and G is the unitary group acting by U.X = U XU ∗ . In the third example R m × n is the space of real m × n matrices (with the trace product h X, Y i = tr X T Y ),the group G = O m × O n acts by ( U, V ) .X = U XV T , and σ is the mapping assigningto each m × n matrix its vector of singular values in a nonincreasing order. Thefourth example is analogous. The goal of this section is to isolate the shared featuresof the four examples above that make a subdifferential formula along the lines of(4.4) possible. That is, we aim to investigate conditions on p under which onecan effectively treat G -invariant functions F : V → R by instead considering theirrestrictions F ◦ i : H → R . Some notational abstraction will greatly help simplifythe ensuing formulas. To this end, following standard terminology, the pullback ofany mapping F on V is the mapping F ∗ := F ◦ i defined now on H . Similarly thepullback of a mapping f on H is the mapping f ∗ := f ◦ p on V . The pushforward of p is the mapping p ∗ = i ◦ p : V → V . For instance in the first example, for any function F : S n → R , the pullback F ∗ ( x ) is the diagonal restriction x F (Diag ( x )); thepullback of a function f on R n is the spectral mapping f ∗ = f ◦ λ ; and the pullbackof λ is the reordering mapping ↑ : R n → R n , meaning x ↑ is obtained by permutingcoordinates of x to be nonincreasing. The following definition identifies the salientproperties needed, in light of the current paper, for effective treatment of G -invariantfunctions F : V → R by means of their restrictions F ∗ : H → R . For clarity, elementsof H will be denoted with lower-case letters, while elements of V will be denotedwith upper-case letters. Definition A.1 (Metric reduction) . The space V metrically reduces to H if thefollowing compatibility conditions hold:1. (Idempotence) p ∗ ◦ p ∗ = p ∗ ;2. (Orbit preservation) p ∗ ( X ) lies in the G -orbit of X , for all X ∈ V ;3. (Non-expansiveness) d (cid:0) p ( X ) , p ( Y )) ≤ d (cid:0) X, Y ) for all
X, Y ∈ V ;The reduction is faithful if in addition the following is true for all
X, Y ∈ V : d (cid:0) p ( X ) , p ( Y )) = d (cid:0) X, Y ) = ⇒ ∃ g ∈ G with gX = p ∗ ( X ) and gY = p ∗ ( Y ) . An appropriate notion of symmetry on H that is compatible with G -invarianceon V is as follows. A function f : H → R is p -symmetric whenever f ( p ∗ ( x )) = f ( x ) for any x ∈ H . In the spectral example, S n faithfully reduces to R n as a consequence of Theorem 3.1; G -invariant functions are what we called spectral, while λ -symmetric functions arewhat we called symmetric. The other three running examples are analogous.14 emma A.2 (Invariance and symmetry) . The following two properties of a function F : V → R are equivalent.(i) F is G -invariant.(ii) F = f ◦ p for some p -symmetric function f on H .(iii) F = ( F ∗ ) ∗ Proof.
Suppose that ( i ) holds and define f := F ∗ . Then observe f ∗ ( X ) = F ◦ p ∗ ( X ).By the orbit preservation property, there exists some g ∈ G satisfying gx = p ∗ ( X )and hence f ∗ ( X ) = F ( X ) for all X ∈ V . Hence implication ( iii ) holds. Supposenow that ( iii ) holds, meaning F ( X ) = F ∗ ( p ( X )) for all X ∈ V . Hence, in particular F ∗ ( y ) = F ( i ( y )) = ( F ∗ ) ∗ ( i ( y )) = F ∗ ( p ◦ i ( y )) = F ∗ ( p ∗ ( y )) for all y ∈ H . Bydefinition then F ∗ is p -symmetric and ( ii ) follows. The final implication ( ii ) ⇒ ( i )is trivial since p is G -invariant.For notational convenience, henceforth, for any point y ∈ H the correspondingcapital letter Y will stand for i ( y ). Observe that the Moreau envelopes and proximalmappings of functions on V and on H have obvious meanings. A proof nearlyidentical to that of Theorem 4.1 shows that if V metrically reduces to H , then forany lsc p -symmetric function f : H → R the commutatively relation holds:( f ∗ ) α = ( f α ) ∗ . Assuming in addition that the reduction is faithful, the equation holds: P α f ∗ ( X ) = (cid:8) g − Y : y ∈ P α f ( p ( X )) , g ∈ G X (cid:9) , where G X := { g ∈ G : p ∗ ( X ) = gX } . Moreover, for any Z ∈ P α f ∗ ( X ) there exists g ∈ G satisfying p ∗ ( Z ) = gZ and p ∗ ( X ) = gX . Suppose moreover that V and H are Euclidean spaces with i a linearmapping, and that G is a compact subgroup of linear isometries. Then a proofidentical to that of Theorem 4.2 shows that the following formula holds: ∂f ∗ ( X ) = (cid:8) g − V : v ∈ ∂f ( p ( X )) , g ∈ G X (cid:9) , The four running examples of the section fit nicely into this framework.
References [1] A. Daniilidis, D. Drusvyatskiy, and A.S. Lewis. Orthogonal invariance andidentifiability.
SIAM J. Matrix Anal. Appl. , 35(2):580–598, 2014.152] A. Daniilidis, A.S. Lewis, J. Malick, and H. Sendov. Prox-regularity of spectralfunctions and spectral sets.
J. Convex Anal. , 15(3):547–560, 2008.[3] A. Daniilidis, J. Malick, and H.S. Sendov. Locally symmetric submanifoldslift to spectral manifolds.
Preprint U.A.B. /2009, 43 p., arXiv:1212.3936[math.OC] , 2012.[4] C. Davis. All convex invariant functions of hermitian matrices. Arch. Math. ,8:276–278, 1957.[5] D. Drusvyatskiy and M. Larsson. Approximating functions on stratified sets.
Trans. Amer. Math. Soc. , 367(1):725–749, 2015.[6] J.M. Lee.
Introduction to smooth manifolds , volume 218 of
Graduate Texts inMathematics . Springer, New York, second edition, 2013.[7] A.S. Lewis. Convex analysis on the Hermitian matrices.
SIAM J. Optim. ,6(1):164–177, 1996.[8] A.S. Lewis. Nonsmooth analysis of eigenvalues.
Math. Program. , 84(1, Ser.A):1–24, 1999.[9] A.S. Lewis. Convex analysis on Cartan subspaces.
Nonlinear Anal. , 42(5, Ser.A: Theory Methods):813–820, 2000.[10] A.S. Lewis and H.S. Sendov. Twice differentiable spectral functions.
SIAM J.Matrix Anal. Appl. , 23(2):368–386 (electronic), 2001.[11] A.S. Lewis and H.S. Sendov. Nonsmooth analysis of singular values. I. Theory.
Set-Valued Anal. , 13(3):213–241, 2005.[12] A.S. Lewis and H.S. Sendov. Nonsmooth analysis of singular values. II. Appli-cations.
Set-Valued Anal. , 13(3):243–264, 2005.[13] B.S. Mordukhovich.
Variational Analysis and Generalized Differentiation I:Basic Theory . Grundlehren der mathematischen Wissenschaften, Vol 330,Springer, Berlin, 2006.[14] R.T. Rockafellar and R.J-B. Wets.
Variational Analysis . Grundlehren dermathematischen Wissenschaften, Vol 317, Springer, Berlin, 1998.[15] H.S. Sendov.
Variational spectral analysis . ProQuest LLC, Ann Arbor, MI,2001. Thesis (Ph.D.)–University of Waterloo (Canada).[16] H.S. Sendov. The higher-order derivatives of spectral functions.
Linear AlgebraAppl. , 424(1):240–281, 2007. 1617] M. ˇSilhav´y. Differentiability properties of isotropic functions.
Duke Math. J. ,104(3):367–373, 2000.[18] J. Sylvester. On the differentiability of O( n ) invariant functions of symmetricmatrices. Duke Math. J. , 52(2):475–483, 1985.[19] C.M. Theobald. An inequality for the trace of the product of two symmetricmatrices.
Math. Proc. Cambridge Philos. Soc. , 77:265–267, 1975.[20] J. von Neumann. Some matrix inequalities and metrization of matrix-space.