A Tutorial on Matrix Perturbation Theory (using compact matrix notation)
aa r X i v : . [ m a t h . SP ] F e b A Tutorial on Matrix Perturbation Theory(using compact matrix notation)
Bassam Bamieh
Abstract
Analytic perturbation theory for matrices and operators is an immensely useful mathematical tech-nique. Most elementary introductions to this method have their background in the physics literature,and quantum mechanics in particular. In this note, we give an introduction to this method that isindependent of any physics notions, and relies purely on concepts from linear algebra. An additionalfeature of this presentation is that matrix notation and methods are used throughout. In particular, weformulate the equations for each term in the analytic expansions of eigenvalues and eigenvectors as matrixequations , namely Sylvester equations in particular. Solvability conditions and explicit expressions forsolutions of such matrix equations are given, and expressions for each term in the analytic expansionsare given in terms of those solutions. This unified treatment simplifies somewhat the complex notationthat is commonly seen in the literature, and in particular, provides relatively compact expressions forthe non-Hermitian and degenerate cases, as well as for higher order terms.
We want to study the behavior of eigenvalues and eigenvectors of matrices that are a function of a “small”parameter ǫ of the form A ǫ = A + ǫA , (1)where A and A are given matrices. If the eigenvectors and eigenvalues are analytic functions of ǫ in aneighborhood of zero, then we can write the eigenvalue/eigenvector relations as power series in ǫ , equateterms of same powers in ǫ and derive expressions for each set of terms. These expressions and the relatedalgebra can get messy. It is shown in this document that by adopting matrix notation, expressions aresimplified and compactified, and additional insight is obtained. It is rather ironic that matrix notation isnot fully utilized in standard treatments of matrix perturbation theory. It can be argued that a better wayto think about finding expansion terms in the eigenvectors is to treat them all together as a matrix, ratherthan as individual vectors. Better insight is achieved in this manner, especially for the cases of degenerateeigenvalues and higher order terms. Notation and Preliminaries
Before we begin, we set some useful matrix notation for eigenvector/eigenvalue relations, as well as manip-ulations with diagonal matrices and the Hadamard product. • A vector v i ( w ∗ i ) is a right (left) eigenvector of a matrix A if Av i = λ i v i , ( w ∗ i A = λw ∗ i ) . Throughout this note, we will assume the semi-simple case, i.e. that A ǫ has a full set of eigenvectors(i.e. diagonalizable) for each ǫ in some neighborhood of zero. In this case, for an n × n matrix A , there1re n eigenvalue/vector relations Av i = λ i v i ( w ∗ i A = λ i w ∗ i ) , i = 1 , . . . , n, It is very useful to note that these n relations can be compactly rewritten as a matrix equation " Av · · · Av n = " λ v · · · λ n v n ⇔ A " v · · · v n = " v · · · v n λ . . . λ n ⇔ AV = V Λ , (2)where V is a matrix whose columns are the eigenvectors of A , and Λ is the diagonal matrix made upof the eigenvalues of A . Similarly, we also have W ∗ A = Λ W ∗ , where the rows of W ∗ are the left eigenvectors. Note that in the special case of Hermitian (or moregenerally normal) matrices, the left and right eigenvectors coincide, i.e. W = V . • For any square matrix M , let dg ( M ) be itself a square diagonal matrix made up of only the diagonalelements of M . Clearly dg ( M + N ) = dg ( M ) + dg ( N ) for any two matrices M and N . If D is a diagonal matrix with compatible dimensions, then it immediately follows that dg ( M D ) = dg ( DM ) = dg ( M ) dg ( D ) . Thus for any two matrices M and N with compatible dimensions dg (cid:16) M dg ( N ) (cid:17) = dg ( M ) dg ( N ) . • The Hadamard product M ◦ N of two matrices is the element-by-element product. It is distributiveover matrix additions, but not over matrix products generally except with diagonal matrices M ◦ (cid:0) N + N ) = M ◦ N + M ◦ N , M ◦ (cid:0) N Λ) = (cid:0) M ◦ N (cid:1) Λ , when Λ is diagonal. If M = uw ∗ is rank 1, then it follows that M ◦ N = ( uw ∗ ) ◦ N = diag ( u ) N diag ( w ) , where diag ( u ) is the diagonal matrix formed from the entries of the vector u . Thus if M = P i λ i u i w ∗ i is a diagonalizable matrix, then we can obtain an expression for the Hadamard product in terms of asum of standard matrix products M ◦ N = X i λ i u i w ∗ i ! ◦ N = X i λ i diag ( u i ) N diag ( w i )We will use the notation M k ◦ for the Hadamard k power of M , i.e. the matrix whose entries are m kij ,with m ij being the entries of M . Similarly, M −◦ is the matrix with entries 1 /m ij if all entries are m ij = 0. The Hadamard pseudo-inverse M †◦ is the matrix with entries 1 /m ij if m ij = 0, and 0 if m ij = 0. Consider (1) and assume that the eigenvectors and eigenvalues are analytic functions of ǫ in a neighborhoodof 0. The eigenvector/eigenvalue relationship then reads as A ǫ V ǫ = V ǫ Λ ǫ , W ∗ ǫ A ǫ = Λ ǫ W ∗ ǫ m ( A + ǫA ) ( V + ǫV + · · · ) = ( V + ǫV + · · · ) (Λ + ǫ Λ + · · · ) (3)( W ∗ + ǫW ∗ + · · · ) ( A + ǫA ) = (Λ + ǫ Λ + · · · ) ( W ∗ + ǫW ∗ + · · · ) (4)2ote that Λ and Λ i ’s are diagonal matrices . These equations describe eigenvectors that each belong toa one dimensional subspace, and thus V ǫ and W ǫ are not unique unless we impose some normalizationconstraint. There are many possible such constraints. The first we will use is the reciprocal basis constraint W ∗ ( ǫ ) V ( ǫ ) = I which gives W ∗ ( ǫ ) V ( ǫ ) = I ⇔ ( W ∗ + ǫW ∗ + · · · ) ( V + ǫV + · · · ) = I. (5)Equating equal powers of ǫ in (3) gives a sequence of matrix equations A V = V Λ W ∗ A = Λ W ∗ (6) A V + A V = V Λ + V Λ W ∗ A + W ∗ A = Λ W ∗ + Λ W ∗ (7) A V + A V = V Λ + V Λ + V Λ W ∗ A + W ∗ A = Λ W ∗ + Λ W ∗ + Λ W ∗ (8) ...A V k + A V k − = k X i =0 V i Λ k − i W ∗ k A + W ∗ k − A = k X i =0 Λ k − i W ∗ i (9)A similar exercise for (5) gives W ∗ V = I, W ∗ V + W ∗ V = 0 , · · · k X i =0 W ∗ i V k − i = 0 . (10)In the special case of Hermitian matrices, we have W ǫ = V ǫ , and the eigenvectors are thus normalized to beorthonormal. We also then have the condition V ∗ V = − V ∗ V , i.e. V ∗ V is skew-Hermitian, which impliesthat its diagonal entries must be imaginary, or zero if the vectors are real. In the real case, the diagonalentries are v ∗ i v i = 0, i.e. for each i , v i and v i are orthogonal. Geometrically, this means that the curve v i ( ǫ ) has a tangent at ǫ = 0 that is orthogonal to the direction of the unperturbed eigenvector v i . Moregenerally, the normalization condition insures that the curves v i ( ǫ ) lie on the sphere in R n for any ǫ (thusthe orthogonality of their initial tangents to each v i ). We first begin by calculating the first order behavior of the eigenvalues. This is the term Λ , for which wecan use equation (7). Left multiplication by W ∗ (to get rid of the V factor multiplying Λ ) gives W ∗ A V + W ∗ A V = W ∗ V Λ + W ∗ V Λ ⇔ Λ W ∗ V − W ∗ V Λ + W ∗ A V = Λ . Now make the following observations • Since Λ is diagonal, then ( W ∗ V ) Λ and Λ ( W ∗ V ) have equal diagonals, i.e. W ∗ V Λ − Λ W ∗ V has zeros on the diagonal. • Since Λ must be diagonal, thenΛ = dg (cid:16) W ∗ A V + W ∗ V Λ − Λ W ∗ V (cid:17) = dg (cid:16) W ∗ A V (cid:17) + 0 ⇒ Λ = dg (cid:16) W ∗ A V (cid:17) (11)Thus the first order correction to the i ’th eigenvalue is λ i = w ∗ i A v i , which is the well known expression[1].To fully appreciate this, expand (11) using partitioned matrix notationΛ = dg w ∗ ...w ∗ n (cid:20) A (cid:21) " v · · · v n = ( w ∗ A v ) . . . ( w ∗ n A v n ) . .1 Calculating First Order Eigenvector Terms: Distinct Eigenvalues Equations (7) can be rewritten as matrix equations with V and W as the unknowns respectively as follows A V + A V = V Λ + V Λ ⇔ A V − V Λ = ( V Λ − A V ) , (12) W ∗ A − Λ W ∗ = (Λ W ∗ − W ∗ A ) (13)The last two equations are Sylvester equations for the matrices V and W respectively. We first dis-cuss (12). Define the matrix-valued operator L ( X ) := A X − X Λ , and note that (12) can be rewrittenas L ( V ) = ( V Λ − A V ) . The properties of this “Sylvester operator” are discussed in Appendix A.1,and the solvability of equation (12) is determined by those properties. The following matrices appear in thesolution: V , W , the eigenvectors of A , A ∗ respectively, and the matrices whose ij ’th entries are given by (cid:0) Π (cid:1) ij := λ i − λ j , (cid:0) Π †◦ (cid:1) ij := λ i − λ j , i = j, , i = j. Entries of Π are made up of all possible sums of the eigenvalues of A and − Λ (i.e. differences of theeigenvalues of A ), and Π †◦ is the Hadamard (element-by-element) pseudo-inverse of Π . We now apply themachinery in the Appendices to the particular operator L in equation (12). • The operator L has the following spectral decomposition (applying formula (24) with U = Z = I ) L ( X ) = V (cid:16) Π ◦ (cid:0) W ∗ X (cid:1)(cid:17) = X ij ( λ i − λ j ) (cid:0) w ∗ i Xe j (cid:1) v i e ∗ j . (14)Since for i = j , λ i − λ j = 0, L is rank deficient by at least n . • Even though L is not of full rank, equation (12) is solvable since its right hand side is in the range of L . This argument is detailed in Appendix A.2. • The minimum norm solution of (12) is obtained from the formula (eq. (26), where we again use U = Z = I ) for the pseudo-inverse L † V = L † ( V Λ − A V ) = V (cid:16) Π †◦ ◦ (cid:0) W ∗ ( V Λ − A V ) (cid:1)(cid:17) = V (cid:16) Π †◦ ◦ (cid:0) Λ − W ∗ A V (cid:1)(cid:17) ⇒ V = − V (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) , (15)where we used W ∗ V = I , the distributive property of the Hadamard product ◦ , and the fact that Π †◦ Λ = 0 since Λ is diagonal and Π †◦ has zeros on the diagonal. • The Sylvester operator in (13) is just L ∗ , and we can similarly derive (see Appendix A.2) the minimum-norm solution W ∗ = (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) W ∗ . To compare the solution formula (15) with standard expressions in the literature, we expand it as V = − V (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) = − X i = j λ i − λ j (cid:0) w ∗ i A v j (cid:1) v i e ∗ j Note that V is a matrix whose k ’th column is v k . The k ’th column is obtained from V e k v k = − V e k = − X i = j w ∗ i A v j λ i − λ j v i e ∗ j e k = − X i = k w ∗ i A v k λ i − λ k v i , = X i = k w ∗ i A v k λ k − λ i v i , (16)which is the standard expression in the literature [1].4 niqueness of V and W Note that (15) is just one solution to the Sylvester equation. All other solutions can be obtained by addingarbitrary elements of the null space of L . In Appendix A.2, it is shown how the null spaces of L and L ∗ arecharacterized, and from that we can conclude that all solutions of the Sylvester equation (12) and (13) canbe written as V = − V (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) + V D , W ∗ = (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) W ∗ − D W ∗ , (17)where D and D are arbitrary diagonal matrices. The normalization conditions (10) now give0 = W ∗ V + W ∗ V = − W ∗ V (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) + (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) W ∗ V + D − D = D − D . (18)We now make several observations • In the self-adjoint case, we have W = V and W = V , and the normalization conditions (10) give0 = V ∗ V = V ∗ (cid:16) − V (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) + V D (cid:17) = − Π †◦ ◦ (cid:0) W ∗ A V (cid:1) + D . Since the first term has zeros on the diagonal ( Π †◦ is zero on the diagonal) and D is diagonal, weconclude that D = 0. Thus, the minimum norm solution (15) to the Sylvester equation does indeedsatisfy the normalization conditions (10). • In the general (non self-adjoint) case, it seems that the following set of solutions V = − V (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) + V D, W ∗ = (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) W ∗ − DW ∗ where D is any diagonal matrix will all satisfy the normalization condition (10) up to first order. Since V and W must be unique , it seems like consideration of higher order terms is necessary to find thecorrectly normalized solutions. • In some references [2, Eq. 5.1.31], a non-standard normalization is used. In our notation, this normal-ization is stated as follows w ∗ i v j = δ i − j , w ∗ i v ǫi = 1 , (19) W ∗ V = I, dg ( W ∗ V ǫ ) = I ⇒ ∀ k ≥ , dg ( W ∗ V k ) = 0 . This normalization will uniquely determine v ǫi unless it becomes orthogonal to w i . For finite matrices,we can guarantee that v ǫi will not be orthogonal to w i for ǫ in some neighborhood of 0.This non-standard normalization is used because it simplifies the recursive formulae for higher orderpeturbation terms as we will see later on. For now, observe that if we adopt this normalization andcheck the general solutions (17)0 = dg ( W ∗ V ) = dg (cid:16) − W ∗ V (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) + W ∗ V D (cid:17) = − dg (cid:0) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:1) + D = 0 + D So we can conclude that the minimum-norm solution (15) is indeed normalized with the non-standardnormalization (19). The condition W ∗ ǫ V ǫ = I uniquely determine the functions V ǫ and W ǫ , and therefore uniquely determines their Taylor seriesexpansion terms { V k } and { W k } . .2 Calculating First Order Eigenvector Terms V : Repeated (Degenerate) Case We consider the case when the eigenvalues are repeated, but A has a full set of eigenvectors. This conditionis automatically satisfied in the Hermitian A case, but may not be always true in the non-normal A case.We will not consider cases with non-trivial Jordan blocks (in A ) in this note. The main difficulty when A has repeated eigenvalues is that there is not a unique choice of the eigenvectors V in this case, evenafter normalization. For any repeated eigenvalue, there corresponds an m -dimensional invariant subspace,where m is the geometric multiplicity of the eigenvalue. Any basis of this invariant subspace is composed ofeigenvectors. It turns out that there is a special choice of basis which renders the expansion (3) valid. Themain task is to find such a basis.Now assume A has a repeated eigenvalue ¯ λ (multiplicity m ) and define a basis so that A is block-diagonalwith the first m basis elements a basis for the eigensubspace of ¯ λ . With this basis, A , V , W and Λ havea block partitioning as A = (cid:20) ( A )
00 ( A ) (cid:21) , V = (cid:20) ( V )
00 ( V ) (cid:21) , W = (cid:20) ( W )
00 ( W ) (cid:21) , Λ = (cid:20) ¯ λI
00 (Λ ) (cid:21) where I is the m × m identity matrix. Now take eq. (7)Λ V + A V = V Λ + V Λ . If we partition all matrices conformably with the above partitions, we get (cid:20) ¯ λI
00 (Λ ) (cid:21) (cid:20) ( V ) ( V ) ( V ) ( V ) (cid:21) + (cid:20) ( A ) ( A ) ( A ) ( A ) (cid:21) (cid:20) ( V )
00 ( V ) (cid:21) = (cid:20) ( V )
00 ( V ) (cid:21) (cid:20) (Λ )
00 (Λ ) (cid:21) + (cid:20) ( V ) ( V ) ( V ) ( V ) (cid:21) (cid:20) ¯ λI
00 (Λ ) (cid:21) . Taking the 11 block equation we get¯ λI n ( V ) + ( A ) ( V ) = ( V ) (Λ ) + ( V ) ¯ λI n , but ¯ λI n commutes with any matrix, so we simply get( A ) ( V ) = ( V ) (Λ ) , from which we can obtain (Λ ) by left multiplication by ( W ∗ ) (Λ ) = ( W ) ∗ ( A ) ( V ) = ( W ∗ A V ) , where the last equality follows from the block-diagonal structure of V and W . Now let’s examine themeaning of this. The perturbation expansion (3) assumes that all the Λ i ’s are diagonal matrices (otherwise (3)is not an eigenvalue/eigenvector relation). This equation states that for (Λ ) to be diagonal as required,we must choose ( V ) so that its columns are the eigenvectors of ( A ) . Consequently, (Λ ) will simplybe the diagonal matrix of eigenvalues of ( A ) .The meaning of the above is that only this particular choice of V will yield the expansion (3). Unlike thenon-repeated case, we also have to discover V , not just V and Λ .Now to calculate V we can apply the psuedo-inverse formula (26) (which produced (15) earlier), but nowwe pay special attention to the structure of Π †◦ . In this case, it is (cid:0) Π †◦ (cid:1) ij := λ i − λ j , λ i = λ j , , λ i = λ j . It is instructive to look at the structure of Π †◦ , it is of the following form Π †◦ = . . . , m × m , the diagonal is all zeros, and the remaining terms are 1 / ( λ i − λ j ). Theexpression for V is now the same as (15), but now we also make sure to use the V that made Λ diagonalabove (recall that the derivation of (15) assumed Λ to be diagonal) V = − V (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) . Again, to compare with existing expressions, find v k using V e k v k = − V e k = − X λ i = λ k w ∗ i A v j λ i − λ j v i e ∗ j e k = X λ i = λ k w ∗ i A v k λ k − λ i v i , (20) The following normalization will lead to rather compact recursive formulae for terms of all orders w ∗ i v j = δ i − j , w ∗ i v ǫi = 1 ,W ∗ V = I, dg ( W ∗ V ǫ ) = I ⇒ ∀ k ≥ , dg ( W ∗ V k ) = 0 . (21)We find the eigenvalues first. Decompose Λ ǫ = (Λ + ǫ Λ + · · · ) =: (Λ + ˜Λ), and rearrange to obtain anequation for ˜Λ( A + ǫA ) V ǫ = V ǫ (cid:16) Λ + ˜Λ (cid:17) (22) V ǫ ˜Λ = − V ǫ Λ + A V ǫ + ǫA V ǫ rearrange terms W ∗ V ǫ ˜Λ = − W ∗ V ǫ Λ + W ∗ A V ǫ + ǫW ∗ A V ǫ left multiply by W ∗ dg (cid:16) W ∗ V ǫ ˜Λ (cid:17) = − dg ( W ∗ V ǫ Λ ) + dg (Λ W ∗ V ǫ ) + dg ( W ∗ A ǫV ǫ ) take diagonals dg ( W ∗ V ǫ ) ˜Λ = − dg ( W ∗ V ǫ ) Λ + Λ dg ( W ∗ V ǫ ) + dg ( W ∗ A ǫV ǫ ) dg ( M Λ) = dg ( M ) Λ if Λ diagonal˜Λ = dg ( W ∗ A ǫV ǫ ) dg ( W ∗ V ǫ ) = I (cid:0) ǫ Λ + ǫ Λ + · · · (cid:1) = dg (cid:16) W ∗ A ǫ (cid:0) V + ǫV + · · · (cid:1)(cid:17) expand & equateΛ k = dg (cid:16) W ∗ A V k − (cid:17) (23)The first of these formulae is the familiar Λ = dg ( W ∗ A V ) . Successive Λ k ’s require finding the vectors V k − that are normalized according to (21) (and not the standard normalization).To find V k ’s, recall and rearrange equation (9) into a Sylvester equation for V k A V k + A V k − = V Λ k + V Λ k − · · · + V k − Λ + V k Λ A V k − Λ k V = V Λ k + V Λ k − + · · · + V k − Λ − A V k − All solutions to this Sylvester equation are given by the psuedo-inverse formula V k = V (cid:16) Π †◦ ◦ W ∗ (cid:0) V Λ k + V Λ k − + · · · + V k − Λ − A V k − (cid:1)(cid:17) + V D k , where D k is any diagonal matrix. Enforcing the normalization W ∗ V = I , dg ( W ∗ V k ) = 0 , k ≥ D = I and D k = 0 for k ≥
1, and we conclude V k = V (cid:16) Π †◦ ◦ W ∗ (cid:0) V Λ k + V Λ k − + · · · + V k − Λ − A V k − (cid:1)(cid:17) . k , we see that we can obtain V k using previous calculationsfor the terms V , . . . , V k − and Λ , . . . , Λ k − . For computations, the following equivalent form may be easierto implement V k = V k − X i =0 (cid:0) Π †◦ ◦ W ∗ V i (cid:1) Λ i − − Π †◦ ◦ ( W ∗ A V k − ) ! . The author would like to acknowledge helpful input from Maurice Filo, Stacy Patterson and Karthik Chick-magslur. 8
Appendix
A.1 Spectral Decomposition of the Sylvester Operator
Consider the Sylvester matrix equation (in X ) of the form AX + XB = Q, where A , B and Q are given (square) matrices. Solutions and properties of this equation are determined bythe Sylvester operator LL ( X ) := AX + XB.
We will need to find the “eigen-matrices” of L and its adjoint L ∗ . First we calculate L ∗ using the standardinner product on matrices h M, N i = tr (cid:0) M ∗ N (cid:1) . Starting from the ≡ equality tr (cid:0) X ∗ ( A ∗ Y + Y B ∗ ) (cid:1) = tr (cid:0) X ∗ A ∗ Y + B ∗ X ∗ Y (cid:1) = tr (cid:0) ( AX + XB ) ∗ Y (cid:1) = hL ( X ) , Y i ≡ h X, L ∗ ( Y ) i = tr (cid:0) X ∗ L ∗ ( Y ) (cid:1) we see that L ∗ ( Y ) = A ∗ Y + Y B ∗ . Thus in particular, L is self-adjoint only if A and B are self-adjoint Now we assume that A and B are diagonalizable (i.e. have a full set of linearly independent eigenvectorseach). A spectral decomposition of L can be obtained from the spectral decompositions of A and B . Thiswill then allow for applying arbitrary analytic functions on L including inversion.Denote the eigenvalues and right/left eigenvectors of A and B as follows Av i = λ i v i , A ∗ w i = λ ∗ i w i ,Bu j = γ j u j , B ∗ z j = γ ∗ j z j , or in matrix form AV = V Λ , A ∗ W = W Λ ∗ BU = U Γ , B ∗ Z = Z Γ ∗ . Note that W = V −∗ and Z = U −∗ . The n eigenvalues and “eigen-matrices” of L and L ∗ are found asfollows L ( v i z ∗ j ) = A v i z ∗ j + v i z ∗ j B = λ i v i z ∗ j + γ j v i z ∗ j = ( λ i + γ j ) v i z ∗ j , L ∗ ( w i u ∗ j ) = A ∗ w i u ∗ j + w i u ∗ j B ∗ = λ ∗ i w i u ∗ j + γ ∗ j w i u ∗ j = ( λ ∗ i + γ ∗ j ) w i u ∗ j . In other words, eigenvalues of L (resp. L ∗ ) are all n possible combinations λ i + γ j (resp. ( λ i + γ j ) ∗ )of eigenvalues of A and B , and the eigen-matrices are all the corresponding outer products of right/lefteigenvectors of A and B .Using the above, a spectral decomposition of L can now be written as follows L ( X ) = X ij ( λ i + γ j ) (cid:10) w i u ∗ j , X (cid:11) v i z ∗ j . This can be rewritten in compact notation. First define the matrix Π whose ij ’th entry is (cid:0) Π (cid:1) ij := λ i + γ j . Noting that v i = V e i (where e i is the vector of all zeros except 1 in the i ’th row), and similarly for the othereigenvectors, calculate L ( X ) = X ij ( λ i + γ j ) (cid:10) w i u ∗ j , X (cid:11) v i z ∗ j = X ij ( λ i + γ j ) (cid:0) w ∗ i Xu j (cid:1) V e i e ∗ j Z ∗ = V X ij ( λ i + γ j ) (cid:0) w ∗ i Xu j (cid:1) e i e ∗ j Z ∗ = V (cid:16) Π ◦ (cid:0) W ∗ XU (cid:1)(cid:17) Z ∗ , (24) A simple calculation will show that L is normal if A and B are normal. ◦ is the Hadamard (element-by-element) product of matrices. The last equation follows from observingthat W ∗ XU is the matrix whose ij ’th entry is (cid:0) w ∗ i Xu j (cid:1) . For later reference, we also calculate L ∗ L ∗ ( X ) = X ij ( λ ∗ i + γ ∗ j ) (cid:10) v i z ∗ j , X (cid:11) w i u ∗ j = X ij ( λ ∗ i + γ ∗ j ) (cid:0) v ∗ i Xz j (cid:1) W e i e ∗ j U ∗ = W X ij ( λ i + γ j ) (cid:0) v ∗ i Xz j (cid:1) e i e ∗ j U ∗ = W (cid:16) ¯ Π ◦ (cid:0) V ∗ XZ (cid:1)(cid:17) U ∗ , (25)where ¯ Π is the complex conjugate (without transposing) of Π .The inverse L − (if it exists) and the pseudo-inverse L † can be calculated from the spectral decomposition L − ( X ) = X ij λ i + γ j (cid:10) w i u ∗ j , X (cid:11) v i z ∗ j , L † ( X ) = X ij λ i + γ j =0 λ i + γ j (cid:10) w i u ∗ j , X (cid:11) v i z ∗ j . We can also give compact formulae if we define Π −◦ and Π †◦ using element-by-element operations (cid:0) Π −◦ (cid:1) ij := 1 λ i + γ j , (cid:0) Π †◦ (cid:1) ij := λ i + γ j , λ i + γ j = 0 , , λ i + γ j = 0 . L − and L † can now be rewritten as L − ( X ) = V (cid:16) Π −◦ ◦ (cid:0) W ∗ XU (cid:1)(cid:17) Z ∗ , L † ( X ) = V (cid:16) Π †◦ ◦ (cid:0) W ∗ XU (cid:1)(cid:17) Z ∗ . (26)The above can be generalized using the spectral decomposition to any function f analytic in a neighborhoodof the spectrum of L by (cid:0) f ( L ) (cid:1) ( X ) = V (cid:16) f ◦ ( Π ) ◦ (cid:0) W ∗ XU (cid:1)(cid:17) Z ∗ , where f ◦ ( Π ) is the element-by-element application of the function f on each entry of the matrix Π . Forexample, given a matrix differential equation˙ X = AX + XB = L ( X ) , X (0) = ¯ X we can write the solution formally as (cid:0) e t L (cid:1) ( ¯ X ). The formula above gives X ( t ) = (cid:0) e t L (cid:1) ( ¯ X ) = V (cid:16) e ◦ ( t Π ) ◦ (cid:0) W ∗ ¯ XU (cid:1)(cid:17) Z ∗ , where e ◦ ( t Π ) is the matrix whose ij ’th entry is e t ( λ i + γ j ) . A.2 Solvability of the Sylvester Equations (12) and (13)
We begin with the equation (12) for V . • We need to characterize N ( L ), the null space of L . Using (14) we note that L ( X ) = 0 means ∀ i, j, ( λ i − λ j ) ( w ∗ i Xe j ) = 0 ⇐⇒ ∀ i = j, ( w ∗ i Xe j ) = 0 ⇐⇒ W ∗ X = D, where D is some diagonal matrix (the diagonal entries of D are the numbers ( w ∗ i Xe i ) which cannotbe determined from the above condition). Finally we note that W and V are inverses of each other,so W ∗ X = D is equivalent to X = V D and we conclude X ∈ N ( L ) ⇔ X = V D, D = any diagonal matrix10
Similarly for L ∗ ( X ) = A ∗ X − X Λ ∗ . It’s spectral decomposition is X ij ( λ ∗ i − λ ∗ j ) (cid:0) v ∗ i Xe j (cid:1) w i e ∗ j , and again L ∗ ( X ) = 0 iff ∀ i = j, (cid:0) v ∗ i Xe j (cid:1) = 0. Repeating the above argument we conclude X ∈ N ( L ∗ ) ⇔ X = W D, D = any diagonal matrix • The Sylvester equation (12) is solvable iff ( V Λ − A V ) ∈ R ( L ) = N ( L ∗ ) ⊥ . Thus to verifysolvability, we need to show that V Λ − A V is orthogonal to N ( L ∗ ). Indeed X ∈ N ( L ∗ ) ⇔ X = W D, h X, V Λ − A V i = h W D, V Λ − A V i = tr (cid:0) DW ∗ V Λ (cid:1) − tr (cid:0) DW ∗ A V (cid:1) = tr (cid:0) D Λ (cid:1) − tr (cid:0) D W ∗ A V (cid:1) = tr (cid:16) D dg ( W ∗ A V ) (cid:17) − tr (cid:0) D W ∗ A V (cid:1) = tr (cid:0) D W ∗ A V (cid:1) − tr (cid:0) D W ∗ A V (cid:1) = 0 , where we used Λ = dg ( W ∗ A V ) , and that D is a diagonal matrix.For equation (13), note that by transposing it we get a Sylvester equation of a similar form to (12) A ∗ W − W Λ ∗ = W Λ ∗ − A ∗ W . This equation is L ∗ ( W ) = W Λ ∗ − A ∗ W where L ∗ is the adjoint of the Sylvester operator of equation (12),which we have already analyzed. In particular, the spectral decomposition of L ∗ can be calculated from (25) L ∗ ( Y ) = W (cid:16) ¯ Π ◦ (cid:0) V ∗ Y (cid:1)(cid:17) . Applying the pseudo-inverse to the right hand side W = L ∗† ( W Λ ∗ − A ∗ W ) = W (cid:16) ¯ Π †◦ ◦ (cid:0) V ∗ ( W Λ ∗ − A ∗ W ) (cid:1)(cid:17) = W (cid:16) ¯ Π †◦ ◦ (cid:0) Λ ∗ − V ∗ A ∗ W (cid:1)(cid:17) = − W (cid:16) ¯ Π †◦ ◦ (cid:0) V ∗ A ∗ W (cid:1)(cid:17) , where the last equality follows from ¯ Π †◦ having all zeros on the diagonal. We can rewrite the solution for W ∗ by noting that ¯ Π ∗ = ¯ Π T = − Π and similarly (cid:0) ¯ Π †◦ (cid:1) ∗ = − Π †◦ W ∗ = − (cid:16) W (cid:16) ¯ Π †◦ ◦ (cid:0) V ∗ A ∗ W (cid:1)(cid:17)(cid:17) ∗ = − (cid:16)(cid:0) ¯ Π †◦ (cid:1) ∗ ◦ (cid:0) V ∗ A ∗ W (cid:1) ∗ (cid:17) W ∗ = (cid:16) Π †◦ ◦ (cid:0) W ∗ A V (cid:1)(cid:17) W ∗ A.3 Background: Spectral Decomposition of a Matrix