Further Results on the Convergence of the Pavon-Ferrante Algorithm for Spectral Estimation
aa r X i v : . [ m a t h . O C ] J a n Further Results on the Convergenceof the Pavon–Ferrante Algorithmfor Spectral Estimation
Giacomo Baggio
Abstract
In this paper, we provide a detailed analysis of the global convergence properties of an extensivelystudied and extremely effective fixed-point algorithm for the Kullback–Leibler approximation of spectraldensities, proposed by Pavon and Ferrante in [1]. Our main result states that the algorithm globallyconverges to one of its fixed points.
Index Terms
Approximation of spectral densities, spectral estimation, generalized moment problems, Kullback–Leibler divergence, fixed-point iteration, convergence analysis.
I. I
NTRODUCTION
In recent years, the problem of approximating—in an optimal sense—a spectral density withanother one satisfying some given constraints has received considerable attention in the controland signal processing community. In [2], Georgiou and Lindquist considered the followingformulation of the above-mentioned approximation problem: Find the optimal Kullback–Leiblerapproximation of a spectral density given(i) an a-priori estimate of the spectrum describing a zero-mean second-order stationary process,and
G. Baggio is with the Dipartimento di Ingegneria dell’Informazione, Universit`a di Padova, via Gradenigo, 6/B I-35131 Padova,Italy. E-mail: [email protected] . Thursday 30 th August, 2018 DRAFT (ii) asymptotic state-covariance data that are typically inconsistent with the a-priori estimate.These data are obtained by feeding the above process to a measurement device consistingof a bank of rational filters.According to this formulation, the approximation problem turns into a (convex) optimizationproblem with integral constraints which falls into the celebrated category of (generalized) momentproblems . During the past century, the latter class of problems has played a crucial role in manyares of the engineering and mathematical sciences, see e.g. [3], [4] and references therein.In particular, some noteworthy (generalized) moment problems, which are close relatives of theGeorgiou–Lindquist approximation problem, are the covariance extension problem [5, Ch. 12.5],[6], [7], [8],
THREE-like spectral estimation [9], [10], [11], [12], the classical
Nevanlinna–Pickinterpolation problem [13], [14], [15], and its generalization, the augmented Basic InterpolationProblem (aBIP) [16], [17]. Moreover, variations of the latter classes of problems have generateda huge stream of literature in recent years, see for instance [18], [19], [20], [21], [22], [23], [24],[25], [26]. Among the large number of applications emerging from the class of (generalized)moment problems it is worth mentioning, besides spectral estimation, those related to systemmodeling/identification and optimal H ∞ control [27], [28], [29].In [2], the optimization problem was approached by resorting to the dual problem whichis finite-dimensional but, in general, it does not admit a closed-form solution. Also, standardgradient-based minimization techniques for the numerical solution of the dual problem haveproved to be computationally demanding and severely ill-conditioned. In order to tackle theseissues, in [1], Pavon and Ferrante proposed an alternative iterative method for the solution ofthe Georgiou–Lindquist approximation problem. The Pavon–Ferrante algorithm is a surprisinglysimple and efficient nonlinear fixed-point iteration in the set of positive semi-definite unittrace matrices. Furthermore, the algorithm exhibits very attractive and robust properties froma numerical viewpoint, since it can be implemented via the solution of an algebraic Riccatiequation and a Lyapunov equation [30]. On the other hand, despite the huge amount of numericalevidences, proving the convergence of the latter algorithm to a prescribed set of fixed points—which provide the solution of the approximation problem—has revealed to be an highly non-trivial challenge [31], [30]. In particular, in [30] it has been shown that the Pavon–Ferrantealgorithm is locally convergent to the aforesaid set of fixed points, through a rather tortuousyet enlightening proof. Nonetheless, a proof of global convergence of the algorithm, thoughconjectured and supported by a large number of numerical simulations, has so far been elusive. Thursday 30 th August, 2018 DRAFT
The present paper addresses this problem. Specifically, we consider a cost functional arisingfrom the formulation of the dual problem and we show that the latter is decreasing along thetrajectories generated by the Pavon–Ferrante algorithm. This provides an answer to a conjectureraised in [30, Sec. V] and leads to the main contribution of the paper, namely, a proof of global convergence of the Pavon–Ferrante algorithm towards its set of fixed points.
Paper structure.
The paper is organized as follows. In Section II, we introduce the Georgiou–Lindquist spectral approximation problem and its solution via the Pavon–Ferrante algorithm.In Section III, we estabilish some auxiliary results. Section IV contains the proof of globalconvergence to the set of fixed points of the Pavon–Ferrante algorithm. Finally, Section V collectssome concluding remarks.
Notation.
We let Z , C , R , R > , R ≥ , and T denote the set of integer, complex, real, positivereal, non-negative real numbers, and the unit circle in the complex plane, respectively. We denoteby C n × m the set of n × m matrices with complex entries. Given A ∈ C n × m , A ∗ will denote theHermitian transpose of A . We write A ≥ ( A > ) to mean that A = A ∗ ∈ C n × n is positivesemi-definite (positive definite, respectively). For A ≥ , A / will denote the principal matrixsquare root of A , i.e., the unique positive semi-definite Hermitian matrix whose square is A . Weendow the space C n with the standard inner product h x, y i = x ∗ y and norm k x k := h x, x i , for x, y ∈ C n , and the space of Hermitian matrices of dimension n × n , denoted by H n , with thetrace inner product h X, Y i := tr( XY ∗ ) and Frobenius norm k X k := tr( XX ∗ ) , for X, Y ∈ H n ,where tr( · ) denotes the trace operator. Moreover, we denote by S n the convex set of n × n positive semi-definite unit trace matrices. For a matrix-valued function G ( z ) in the complexvariable z , G ∗ ( z ) will denote the analytic continuation of the function that for z ∈ T equalsthe Hermitian transpose of G ( z ) . Finally, C ( T ) will denote the set of continuous functions on T , and C + ( T ) the set of continuous functions on T which take (strictly) positive values on thesame region, i.e., the space of continuous coercive discrete-time spectral density functions.II. P ROBLEM STATEMENT
In this section, we first review the spectral density approximation problem treated in [2]. Then,we discuss its solution via the algorithm proposed in [1].
Thursday 30 th August, 2018 DRAFT
A. The Georgiou–Lindquist approximation problem
Let { y ( t ) , t ∈ Z } be a zero-mean purely nondeterministic second-order stationary processand assume that an a-priori estimate Ψ( e jθ ) ∈ C + ( T ) of the spectral density of y ( t ) is given.Consider the rational matrix transfer function G ( z ) = ( zI − A ) − B, A ∈ C n × n , B ∈ C n × , of the discrete-time system x ( t + 1) = Ax ( t ) + By ( t ) , t ∈ Z , where A is Schur stable, i.e., all the eigenvalues of A are strictly inside T , and the pair ( A, B ) isreachable. Note that the n -dimensional process x ( t ) := [ x ( t ) , x ( t ) , . . . , x n ( t )] ⊤ coincides withthe output of a bank of n filters G ( z ) := [ g ( z ) , g ( z ) , . . . , g n ( z )] ⊤ fed by y ( t ) .Suppose we know the steady-state covariance of the process x ( t ) , which we denote by Σ > . Given the estimate Ψ( e jθ ) and the steady-state covariance Σ , the task is to estimate the spectraldensity of the process y ( t ) . To this end, we need to find the spectral density ˆΦ( e jθ ) ∈ C + ( T ) which is the “closest possible”, in a suitable sense, to the a-priori estimate Ψ among all spectra Φ( e jθ ) ∈ C + ( T ) satisfying the constraint Z π − π G ( e jθ )Φ( e jθ ) G ∗ ( e jθ ) d θ π = Z G θ Φ θ G ∗ θ = Σ . (In order to lighten the notation, throughout the paper we let G θ := G ( e jθ ) , Φ θ := Φ( e jθ ) , Ψ θ := Ψ( e jθ ) , and for integrals we use the above shorthand, where the integration takes place onthe unit circle w.r.t. the normalized Lebesgue measure.) By using the Kullback–Leibler divergence[32] as “measure of closeness” between spectral densities, namely, S (Φ θ k Ψ θ ) := Z Ψ θ log Ψ θ Φ θ , the problem can be formally stated as follows. Problem 1 (Georgiou–Lindquist approximation problem [2]) . Let Ψ θ ∈ C + ( T ) and Σ ∈ H n , Σ > . Find ˆΦ θ ∈ C + ( T ) that solves min Φ θ ∈K S (Φ θ k Ψ θ ) , On the problem of estimating covariance matrices from measurements obtained by linear filtering see also [22], [23].
Thursday 30 th August, 2018 DRAFT where K := (cid:26) Φ θ ∈ C + ( T ) : Z G θ Φ θ G ∗ θ = Σ (cid:27) . (1)The variational analysis outlined in [2] (see also [1], [30] where some additional details arespelled out and [10], [33] for the existence part) leads to the following result. Theorem 1.
The set K as defined in (1) is non-empty if and only if there exists H ∈ C × n s.t. Σ − A Σ A ∗ = BH + H ∗ B ∗ . (2)Moreover, assuming that the above condition is fulfilled, there exists ˆΛ ∈ H n such that G ∗ θ ˆΛ G θ > , ∀ θ ∈ [ − π, π ) , (3) Z G θ Ψ θ G ∗ θ ˆΛ G θ G ∗ θ = Σ . (4)For any such ˆΛ , ˆΦ θ := Ψ θ G ∗ θ ˆΛ G θ (5)is the unique solution of Problem 1.Supposing the feasibility condition (2) satisfied, in view of the above theorem, Problem 1 canbe reduced to the problem of finding Λ ∈ H n satisfying conditions (3)-(4).In [2], Georgiou and Lindquist exploited duality theory to arrive at the equivalent convexoptimization problem min Λ ∈L J (Λ) , (6)where J : L → R Λ tr(ΛΣ) − Z Ψ θ log G ∗ θ Λ G θ , (7)and L := { Λ ∈ H n : G ∗ θ Λ G θ > , ∀ θ ∈ [ − π, π ) } . Thursday 30 th August, 2018 DRAFT
In [2, Thm. 5] it has been established that the above dual problem admits a unique solution ˆΛ on L (Γ) := L ∩
Range Γ , where Γ is the linear operator defined by Γ : C ( T ) → H n Φ θ Z G θ Φ θ G ∗ θ , (8)and such a solution satisfies (4), so that ˆΛ returns the optimal estimate ˆΦ θ via (5). After asuitable parametrization of L (Γ) , problem (6) can be numerically solved using Newton-likeminimization methods [2, Sec. VII]. However, as mentioned in the introduction, these techniquesare affected by several numerical issues related to unboundedness of the gradient of J ( · ) aroundthe neighborhood of the boundary and high computational burden due to a large number ofbackstepping iterations [2, Sec. VII], [1], [31], [30]. B. The Pavon–Ferrante algorithm
An alternative, numerically robust algorithm for the solution of Problem 1 has been proposedby Pavon and Ferrante in [1] and further investigated in [31], [30]. Before presenting thealgorithm, we introduce some simplifications in the formulation of Problem 1; namely, wesuppose(i) the a-priori estimate Ψ θ to be such that R Ψ θ = 1 , and(ii) the steady-state covariance Σ to be normalized to identity, i.e., Σ = I .These assumptions can be made without any loss of generality, as explained in [30, Remark2.3]. Furthermore, notice that, with these simplifications in place, the cost functional J ( · ) in (7)becomes J (Λ) = tr(Λ) − Z Ψ θ log G ∗ θ Λ G θ . (9)The Pavon–Ferrante algorithm is a fixed-point iteration of the form Λ k +1 = Θ(Λ k ) := Z Λ / k G θ (cid:18) Ψ θ G ∗ θ Λ k G θ (cid:19) G ∗ θ Λ / k , (10) It is worth observing however that Problem (6) has in general infinitely many solutions on L . Thursday 30 th August, 2018 DRAFT for k ∈ Z , k ≥ , where the initialization is taken to be a positive definite trace-one matrix Λ > . Iteration (10) features several remarkable properties. Firstly, it preserves unit trace andpositivity. Furthermore, by introducing the sets M := { Λ ∈ S n : G ∗ θ Λ G θ > , ∀ θ ∈ [ − π, π ) } , (11) M + := { Λ ∈ M : Λ > } ⊂ M , (12)it has been shown in [1, Thm. 4.1] that Θ( · ) maps elements of M ( M + ) into elements of M ( M + , respectively).Secondly, and most importantly, if iteration (10) converges to a positive definite fixed pointof Θ( · ) , say ˆΛ > , then G ∗ θ ˆΛ G θ > , ∀ θ ∈ [ − π, π ) , and, by multiplying Eq. (10) on both sidesby ˆΛ − / , Z G θ Ψ θ G ∗ θ ˆΛ G θ G ∗ θ = I, so that conditions (3)-(4) are satisfied. As a consequence, if the feasibility condition in (2) issatisfied by Σ = I , such a ˆΛ yields the solution of Problem 1 via (5). Importantly, such afixed point always exists. In fact, let S denote the space of Λ ∈ H n satisfying (3)-(4), in [30,Thm. 3.2] it has been shown that the set of positive definite fixed points of iteration (10) definesa non-empty open convex set P of the space S . To conclude, we remark that the positive definiteones are not the only fixed points of iteration (10) which provide the solution to Problem 1.Indeed, in the closure of P there exist singular elements which still satisfy (3)-(4) and, thus,solve Problem 1 via (5), see [30, Sec. II-B]. In general, however, singular fixed points are notguaranteed to satisfy conditions (3)-(4) and, therefore, to solve Problem 1 via (5).III. P RELIMINARY RESULTS
In this section, we collect some auxiliary results which will be used in the proof of the maintheorems presented in the next section. The first result is a consequence of Jensen’s inequality[32, Thm. 2.6.2].
Lemma 1.
Let X ⊆ R . Consider an integrable function f : X → R > and an integrable function w : X → R > satisfying R X w ( x ) d x = 1 , then log Z X w ( x ) f ( x ) d x ≥ Z X w ( x ) log f ( x ) d x, and the equality is attained if and only if f ( x ) is constant a.e. on X . Thursday 30 th August, 2018 DRAFT
Another ancillary lemma is stated and proved below.
Lemma 2.
Let X ⊆ R . Let w : X → R and f : X × X → R ≥ be integrable functions with f ( · , · ) symmetric, i.e. f ( x, y ) = f ( y, x ) for all x, y ∈ X . Then it holds Z X Z X w ( x ) f ( x, y ) d x d y ≥ Z X Z X w ( x ) w ( y ) f ( x, y ) d x d y. (13) Proof.
Thanks to the symmetry of f ( · , · ) , we have Z X Z X w ( x ) f ( x, y ) d x d y = Z X Z X (cid:18) w ( x ) + w ( y ) (cid:19) f ( x, y ) d x d y. Now, since w ( x ) + w ( y ) ≥ w ( x ) w ( y ) , due to the fact that ( w ( x ) − w ( y )) ≥ , for all x, y ∈ X , the claim follows.Next we focus the attention on the function J ( · ) , as defined in Eq. (9). This function will playa key role in the convergence analysis presented in Section IV. Lemma 3. J ( · ) is a continuous and bounded function on S n . Proof.
We first note that, in view of the stability of A and reachability of the pair ( A, B ) ,for all Λ ∈ S n , G ∗ θ Λ G θ is a nonzero rational spectral density analytic on (an open annuluscontaining) T . This in turn implies that log G ∗ θ Λ G θ is integrable on T , see e.g. [34, p. 64].Since Ψ θ is bounded on T , Ψ θ log G ∗ θ Λ G θ is again integrable on T . This in turn implies that J ( · ) is bounded on S n . Now let ¯Λ ∈ S n and consider any sequence { Λ k } k ≥ , in S n such that lim k →∞ Λ k = ¯Λ . The corresponding sequence { G ∗ θ Λ k G θ } k ≥ is composed of nonzero rationalspectral densities analytic on T and such that lim k →∞ G ∗ θ Λ k G θ = G ∗ θ ¯Λ G θ uniformly on T , wherethe limit G ∗ θ ¯Λ G θ is a spectral density as before. Hence, from [35, Cor. 4.6], it follows that thesequence { log G ∗ θ Λ k G θ } k ≥ is uniformly integrable on T . Eventually, since Ψ θ is bounded on T , { Ψ θ log G ∗ θ Λ k G θ } k ≥ is again uniformly integrable, so that by Vitali’s convergence theorem[36, p. 133] it holds lim k →∞ J (Λ k ) = 1 − lim k →∞ Z Ψ θ log G ∗ θ Λ k G θ = 1 − Z lim k →∞ Ψ θ log G ∗ θ Λ k G θ = 1 − Z Ψ θ log G ∗ θ ¯Λ G θ = J ( ¯Λ) , which proves continuity. Thursday 30 th August, 2018 DRAFT
Consider the orthogonal complement (w.r.t. the trace inner product in H n ) of Range Γ , wherethe linear operator Γ has been defined in (8). This quantity has been shown in [18, Sec. IV-A]to be given by (Range Γ) ⊥ = { X ∈ H n : G ∗ θ XG θ = 0 , ∀ θ ∈ [ − π, π ) } . (14)Given Λ ∈ S n and any non-zero matrix Λ ⊥ ∈ (Range Γ) ⊥ such that Λ + Λ ⊥ ≥ , we observethat J (Λ) = J (Λ + Λ ⊥ ) and Λ + Λ ⊥ ∈ S n , since every element in (Range Γ) ⊥ is traceless [30,Sec. II].At this point, we analyze the behavior of J ( · ) in the region of the boundary of S n defined by N := { Λ ∈ S n : ∃ ¯ θ ∈ [ − π, π ) s.t. G ∗ ¯ θ Λ G ¯ θ = 0 } . (15)The following lemma provides a useful result in this regard. Lemma 4.
Suppose Ψ θ ∈ C + ( T ) . For all ¯Λ ∈ N , the (right-sided) directional derivative of J ( · ) at ¯Λ along any direction ∆ ¯Λ ∈ H n such that ¯Λ + ∆ ¯Λ ∈ M takes the value −∞ . Proof.
Let ¯Λ ∈ N . First, we note that ¯Λ + ε ∆ ¯Λ ∈ M for all ∆ ¯Λ ∈ H n such that ¯Λ + ∆ ¯Λ ∈ M ,and for all ε ∈ (0 , . The (right-sided) directional derivative of J ( · ) at ¯Λ in the direction ∆ ¯Λ isgiven by ∇ J ( ¯Λ; ¯Λ + ∆ ¯Λ) := lim ε → + J ( ¯Λ + ε ∆ ¯Λ) − J ( ¯Λ) ε = lim ε → + (cid:18) ε tr( ¯Λ + ε ∆ ¯Λ) − ε tr( ¯Λ) − ε Z Ψ θ log G ∗ θ ( ¯Λ + ε ∆ ¯Λ) G θ G ∗ θ ¯Λ G θ (cid:19) = − lim ε → + ε Z Ψ θ log (cid:18) ε G ∗ θ ∆ ¯Λ G θ G ∗ θ ¯Λ G θ (cid:19) = − Z Ψ θ G ∗ θ ∆ ¯Λ G θ G ∗ θ ¯Λ G θ , where we exploited the fact that tr(∆ ¯Λ) = 0 , and, in the last step, the Taylor expansion of log(1 + x ) . Eventually, since (i) ¯Λ ∈ N , and (ii) ∆ ¯Λ is such that ¯Λ + ∆ ¯Λ ∈ M , there exists (atleast) a frequency ¯ θ ∈ [ − π, π ) such that G ∗ ¯ θ ¯Λ G ¯ θ = 0 and G ∗ ¯ θ ∆ ¯Λ G ¯ θ = 0 . Since Ψ θ > for all θ ∈ [ − π, π ) , this in turn implies R Ψ θ G ∗ θ ∆¯Λ G θ G ∗ θ ¯Λ G θ = ∞ , which yields the thesis. Remark 1.
Lemma 4 provides a characterization of the elements of N in terms of directionalderivatives of J ( · ) along directions belonging to the subset of S n given by M . Notice that thisresult typically does not characterize all directional derivatives of J ( · ) evaluated at elements in Thursday 30 th August, 2018 DRAFT0 N along directions pointing to S n . However, for a particular subset of N this is indeed the case.Let x ∈ C n , k x k = 1 , and let P x := xx ∗ denote the orthogonal projection onto the subspacespanned by x . Consider the following subset of NN := { P ¯ x ∈ S n : ∃ ¯ θ ∈ [ − π, π ) , ¯ x ∈ C n , k ¯ x k = 1 , s.t.(i) h ¯ x, G ¯ θ i = 0 , and(ii) h x, G ¯ θ i 6 = 0 , ∀ x ∈ C n , k x k = 1 , x = ¯ x } . (16)By exploiting the same argument of the proof of Lemma 4, it follows that the directionalderivative of J ( · ) evaluated at ¯Λ ∈ N along any direction ∆ ¯Λ ∈ H n \{ } such that ¯Λ+∆ ¯Λ ∈ S n is unbounded below. Moreover, it is worth noticing that, for the particular case n = 2 , it holds N ≡ N . ♦ To conclude this section, we recall the discrete-time version of LaSalle’s invariance principle,whose proof can be found in [37, Prop. 2.6].
Proposition 1 (Discrete-time LaSalle’s invariance principle) . Consider a discrete-time system x ( t + 1) = f ( x ( t )) , x (0) ∈ X , t ≥ , where f : X → X is continuous and X is an invariant and compact set. Suppose V ( · ) is acontinuous function of x ∈ X , bounded below and satisfying ∆ V ( x ) := V ( f ( x )) − V ( x ) ≤ , ∀ x ∈ X , that is V ( x ) is non-increasing along (forward) trajectories of the dynamics. Then any trajectoryconverges to the largest invariant subset I contained in E := { x ∈ X : ∆ V ( x ) = 0 } .IV. G LOBAL CONVERGENCE ANALYSIS
In this section, we present the main results of this note. The first result (Theorem 2) states thatthe cost function (9) is always non-increasing along the trajectories of (10). This result providesa positive answer to a conjecture raised in the conclusive part of [30].
Theorem 2.
For every Λ ∈ S n it holds ∆ J (Λ) := J (Θ(Λ)) − J (Λ) ≤ , (17) Thursday 30 th August, 2018 DRAFT1 where J ( · ) has been defined in (9). Moreover ∆ J (Λ) = 0 if and only if Θ(Λ) = Λ + Λ ⊥ , with Λ ⊥ ∈ (Range Γ) ⊥ , where Γ is the linear operator defined in Eq. (8). Proof.
By plugging the expression of Θ( · ) into (17), we get ∆ J (Λ) = J (Θ(Λ)) − J (Λ) ≤ ⇔ Z Ψ θ log G ∗ θ Θ(Λ) G θ − Z Ψ θ log G ∗ θ Λ G θ ≥ ⇔ Z Ψ θ log f θ ≥ , (18)where f θ := G ∗ θ Θ(Λ) G θ G ∗ θ Λ G θ is well-defined and strictly positive on T , since Θ(Λ) has the same rankand kernel of Λ ∈ S n , cf. [30, Prop. 2.1].Firstly, we notice that the following inequality holds Z Ψ θ log f θ = − Z Ψ θ log f − / θ ≥ − Z Ψ θ f − / θ , (19)which is a consequence of Lemma 1.Secondly, by defining Π θ := Λ / G θ G ∗ θ Λ / G ∗ θ Λ G θ , we have the following chain of equations Z Ψ θ f − θ f θ = Z Z f − θ Ψ θ Ψ ω h Π θ , Π ω i (20) ≥ Z Z f − / θ f − / ω Ψ θ Ψ ω h Π θ , Π ω i (21) = (cid:28)Z Ψ θ f − / θ Π θ , Z Ψ ω f − / ω Π ω (cid:29) = k Λ / k (cid:13)(cid:13)(cid:13)(cid:13)Z Ψ θ f − / θ Π θ (cid:13)(cid:13)(cid:13)(cid:13) (22) ≥ (cid:12)(cid:12)(cid:12)(cid:12)(cid:28) Λ / , Z Ψ θ f − / θ Π θ (cid:29)(cid:12)(cid:12)(cid:12)(cid:12) (23) = (cid:18)Z Ψ θ f − / θ (cid:19) (24)where • Eq. (20) follows by noticing that f θ = R Ψ ω h Π θ , Π ω i , • Eq. (21) uses Lemma 2 applied to the symmetric function Ψ θ Ψ ω h Π θ , Π ω i , • Eq. (22) exploits the fact that k Λ / k = tr(Λ) = 1 , • Eq. (23) follows from Cauchy–Schwarz inequality, and • Eq. (24) uses the fact that h Λ / , Π θ i = 1 . Thursday 30 th August, 2018 DRAFT2
Eventually, a combination of the two sets of inequalities (19) and (20)-(24) yields R Ψ θ log f θ ≥ which, in turn, implies ∆ J (Λ) ≤ , in view of equivalence (18).Now it remains to prove that we attain equality in (17) if only if Λ ∈ S n is such that Θ(Λ) = Λ + Λ ⊥ with Λ ⊥ ∈ (Range Γ) ⊥ . In view of the definition of (Range Γ) ⊥ given inEq. (14), the “if” part becomes straightforward. Indeed, if Θ(Λ) = Λ + Λ ⊥ , we have ∆ J (Λ) = Z Ψ θ log G ∗ θ Θ(Λ) G θ G ∗ θ Λ G θ = Z Ψ θ log G ∗ θ (Λ + Λ ⊥ ) G θ G ∗ θ Λ G θ = Z Ψ θ log G ∗ θ Λ G θ G ∗ θ Λ G θ = 0 . So it remains to prove the “only if” part, i.e., if equality in (17) is attained for Λ then Θ(Λ) =Λ + Λ ⊥ with Λ ⊥ ∈ (Range Γ) ⊥ . To this end, we notice that a necessary condition for (17) tohold with equality is to have (19) satisfied with equality. By Lemma 1, this implies that thefunction f θ is constant for every θ ∈ [ − π, π ) , namely f θ = G ∗ θ Θ(Λ) G θ G ∗ θ Λ G θ = κ, ∀ θ ∈ [ − π, π ) , where κ > is a real constant. Now equality in (17) is attained (if and) only if κ = 1 , andtherefore we have that G ∗ θ Θ(Λ) G θ = G ∗ θ Λ G θ , ∀ θ ∈ [ − π, π ) . From the latter equation and by definition of (Range Γ) ⊥ in (14), it follows that Θ(Λ) = Λ + Λ ⊥ , Λ ⊥ ∈ (Range Γ) ⊥ . This completes the proof.The following theorem is based on the previous one and states that iteration (10) alwaysconverges to the set of fixed points of Θ( · ) . Theorem 3.
The trajectories generated by iteration (10) converge for all Λ ∈ S n to elementsbelonging to F := { Λ ∈ S n : Θ(Λ) = Λ } . Proof.
The proof consists of an application of the discrete-time version of LaSalle’s invarianceprinciple (Proposition 1). The natural candidate Lyapunov function V ( · ) of Proposition 1 isgiven in this case by J (Λ) which is continuous and bounded for every Λ ∈ S n (Lemma 3), and,by virtue of Theorem 2, non-increasing along the (forward) trajectories of the dynamics (10).Hence, by LaSalle’s invariance principle, we have that the (forward) trajectories generated byiteration (10) converges to the largest invariant set I contained in E := { Λ ∈ S n : ∆ J (Λ) = 0 } . Thursday 30 th August, 2018 DRAFT3
Therefore, it remains to show that the trajectories in I consist of fixed points of Θ( · ) only, thatis, I ≡ F . To this end, by Theorem 2, we know that the elements Λ ∈ E satisfy the condition Θ(Λ) = Λ + Λ ⊥ , (25)with Λ ⊥ ∈ (Range Γ) ⊥ . In view of the latter constraint on the dynamics (10) and the definitionof (Range Γ) ⊥ in (14), it follows that any trajectory belonging to E must obey to the recurrencerelation Λ k +1 = Λ / k M Λ / k , Λ ∈ E , k ≥ , (26)where M := Z π − π Ψ θ G θ G ∗ θ G ∗ θ Λ G θ , depends on the initial condition Λ only, in view of Eq. (25). Now, since (26) must generateunit trace trajectories starting from any Λ ∈ S n , we have tr(Λ ) = tr(Λ ) = tr(Λ ) = 1 . Byexploiting the cyclic property and the linearity of the trace, this in turn implies that tr(Λ ) − ) + tr(Λ ) = tr(Λ ) − / M Λ / ) + tr( M Λ / M Λ / )= tr(Λ ) − tr(Λ / M Λ / ) − tr(Λ / M Λ / ) + tr(Λ / M Λ / M Λ / )= tr (cid:16) Λ / ( I − M )Λ / (cid:17) = (cid:13)(cid:13)(cid:13) Λ / ( I − M )Λ / (cid:13)(cid:13)(cid:13) = 0 . The previous equation is satisfied if and only if Λ / ( I − M )Λ / = 0 , or, equivalently, if andonly if Λ = Λ / M Λ / . From the previous equation and Eq. (26) it readily follows that Λ must be a fixed point of Θ( · ) .This ends the proof.As a final result, we characterize a whole family of fixed points of Θ( · ) that are not asymptot-ically stable. The following result provides a partial answer to another conjecture of [30, Sec. V]claiming that orthogonal rank-one projections which do not belong to the closure of the set ofpositive definite fixed points P are unstable equilibrium points of Θ( · ) . Proposition 2.
The set N defined in Eq. (16) consists of fixed points that are not asymptoticallystable for the dynamics (10). Proof.
Let ¯Λ ∈ N . Notice that ¯Λ is a rank-one orthogonal projection so that ¯Λ is a fixed point of Θ( · ) by [1, Prop. 4.3]. Now observe that, in view of Lemma 4 and Remark 1, all the (right-sided) Thursday 30 th August, 2018 DRAFT4 directional derivatives at J ( ¯Λ) along directions pointing to S n take the value −∞ . This impliesthat in a sufficiently small neighbourhood U of ¯Λ , it holds J ( ¯Λ) > J (Λ) , ∀ Λ ∈ U ∩ S n , Λ = ¯Λ .In light of this, the claim follows from the fact that J ( · ) is non-increasing along trajectories ofthe dynamics (10) (Theorem 2). V. C ONCLUDING REMARKS
In this paper, we analyzed the global convergence properties of an extensively studied fixed-point algorithm for the Kullback–Leibler approximation of spectral densities introduced by Pavonand Ferrante in [1]. Our main result states that the Pavon–Ferrante algorithm globally convergesto one of its fixed points.A question which remains unanswered in the paper concerns global convergence of the Pavon–Ferrante algorithm to a fixed point leading to the solution of the Georgiou–Lindquist spectralapproximation problem, and, in particular, to the set P of positive definite fixed points. A possibleapproach to guarantee convergence to a positive definite fixed point is to modify the Pavon–Ferrante iteration by adding at each step a suitable “correction” term belonging to (Range Γ) ⊥ which prevents the iteration to approach the boundary of S n . Notice in particular that, in viewof Theorem 2, the presence of such terms does not affect the decreasing behavior of J ( · ) alongthe trajectories of the iteration, and, consequently, the convergence argument used in the proofof Theorem 3. This aspect will be the subject of future investigation.VI. A CKNOWLEDGEMENTS
The author wishes to thank Prof. R. Sepulchre for having brought to his attention the problemaddressed in the paper, and Prof. A. Ferrante for several enlightening discussions.R
EFERENCES [1] M. Pavon and A. Ferrante, “On the Georgiou–Lindquist approach to constrained Kullback–Leibler approximation of spectraldensities,”
IEEE Trans. Autom. Control , vol. 51, no. 4, pp. 639–644, 2006.[2] T. T. Georgiou and A. Lindquist, “Kullback–Leibler approximation of spectral density functions,”
IEEE Trans. Inf. Theory ,vol. 49, no. 11, pp. 2910–2917, 2003.[3] N. I. Akhiezer,
The classical moment problem and some related questions in analysis . Edinburgh: Oliver & Boyd, 1965.[4] C. I. Byrnes and A. Lindquist, “Important moments in systems and control,”
SIAM J. Control Optim. , vol. 47, no. 5, pp.2458–2469, 2008.[5] A. Lindquist and G. Picci,
Linear Stochastic Systems , ser. Series in Contemporary Mathematics. Springer-Verlag BerlinHeidelberg, 2015.
Thursday 30 th August, 2018 DRAFT5 [6] T. T. Georgiou, “Realization of power spectra from partial covariance sequences,”
IEEE Trans. Acoust., Speech, SignalProcess , vol. 35, no. 4, pp. 438–449, 1987.[7] C. I. Byrnes, A. Lindquist, S. V. Gusev, and A. S. Matveev, “A complete parameterization of all positive rational extensionsof a covariance sequence,”
IEEE Trans. Autom. Control , vol. 40, no. 11, pp. 1841–1857, 1995.[8] C. I. Byrnes, S. V. Gusev, and A. Lindquist, “A convex optimization approach to the rational covariance extension problem,”
SIAM J. Control Optim. , vol. 37, no. 1, pp. 211–229, 1998.[9] C. I. Byrnes, T. T. Georgiou, and A. Lindquist, “A new approach to spectral estimation: A tunable high-resolution spectralestimator,”
IEEE Trans. Signal Process. , vol. 48, no. 11, pp. 3189–3205, 2000.[10] T. T. Georgiou, “Spectral estimation via selective harmonic amplification,”
IEEE Trans. Autom. Control , vol. 46, no. 1,pp. 29–42, 2001.[11] F. Ramponi, A. Ferrante, and M. Pavon, “A globally convergent matricial algorithm for multivariate spectral estimation,”
IEEE Trans. Autom. Control , vol. 54, no. 10, pp. 2376–2388, 2009.[12] ——, “On the well-posedness of multivariate spectrum approximation and convergence of high-resolution spectralestimators,”
Systems & Control Lett. , vol. 59, no. 3, pp. 167–172, 2010.[13] C. I. Byrnes, T. T. Georgiou, and A. Lindquist, “A generalized entropy criterion for Nevanlinna–Pick interpolation withdegree constraint,”
IEEE Trans. Autom. Control , vol. 46, no. 6, pp. 822–839, 2001.[14] A. Blomqvist, A. Lindquist, and R. Nagamune, “Matrix-valued Nevanlinna–Pick interpolation with complexity constraint:An optimization approach,”
IEEE Trans. Autom. Control , vol. 48, no. 12, pp. 2172–2190, 2003.[15] C. I. Byrnes, T. T. Georgiou, A. Lindquist, and A. Megretski, “Generalized interpolation in H ∞ with a complexityconstraint,” Trans. Am. Math. Soc. , vol. 358, no. 3, pp. 965–987, 2006.[16] H. Dym, “A basic interpolation problem,”
Holomorphic spaces , vol. 33, pp. 381–423, 1998.[17] V. Bolotnikov and H. Dym,
On boundary interpolation for matrix valued Schur functions . American Mathematical Soc.,2006, vol. 181, no. 856.[18] A. Ferrante, M. Pavon, and F. Ramponi, “Hellinger versus Kullback–Leibler multivariable spectrum approximation,”
IEEETrans. Autom. Control , vol. 53, no. 4, pp. 954–967, 2008.[19] T. T. Georgiou and A. Lindquist, “A convex optimization approach to ARMA modeling,”
IEEE Trans. Autom. Control ,vol. 53, no. 5, pp. 1108–1119, 2008.[20] F. P. Carli, A. Ferrante, M. Pavon, and G. Picci, “A maximum entropy solution of the covariance extension problem forreciprocal processes,”
IEEE Trans. Autom. Control , vol. 56, no. 9, pp. 1999–2012, 2011.[21] A. Ferrante, C. Masiero, and M. Pavon, “Time and spectral domain relative entropy: A new approach to multivariatespectral estimation,”
IEEE Trans. Autom. Control , vol. 57, no. 10, pp. 2561–2575, 2012.[22] M. Zorzi and A. Ferrante, “On the estimation of structured covariance matrices,”
Automatica , vol. 48, no. 9, pp. 2145–2151,2012.[23] A. Ferrante, M. Pavon, and M. Zorzi, “A maximum entropy enhancement for a family of high-resolution spectral estimators,”
IEEE Trans. Autom. Control , vol. 57, no. 2, pp. 318–329, 2012.[24] M. Zorzi, “A new family of high-resolution multivariate spectral estimators,”
IEEE Trans. Autom. Control , vol. 59, no. 4,pp. 892–904, 2014.[25] ——, “Multivariate spectral estimation based on the concept of optimal prediction,”
IEEE Trans. Autom. Control , vol. 60,no. 6, pp. 1647–1652, 2015.[26] T. T. Georgiou and A. Lindquist, “Likelihood analysis of power spectra and generalized moment problems,”
IEEE Trans.Autom. Control (To appear) , 2017.
Thursday 30 th August, 2018 DRAFT6 [27] J. D. Stefanovski, “ H ∞ problem with nonstrict inequality and all solutions: Interpolation approach,” SIAM J. ControlOptim. , vol. 53, no. 4, pp. 1734–1767, 2015.[28] ——, “A reformulation of augmented basic interpolation problem and an application to H ∞ control,” Linear AlgebraAppl. , vol. 485, pp. 103–123, 2015.[29] ——, “New interpolation solution and application in system modeling and optimal control with prescribed distance toinstability,”
Int. J. Robust Nonlin. , vol. 26, no. 11, pp. 2455–2477, 2016.[30] A. Ferrante, F. Ramponi, and F. Ticozzi, “On the convergence of an efficient algorithm for Kullback–Leibler approximationof spectral densities,”
IEEE Trans. Autom. Control , vol. 56, no. 3, pp. 506–515, 2011.[31] A. Ferrante, M. Pavon, and F. Ramponi, “Further results on the Byrnes–Georgiou–Lindquist generalized moment problem,”in
Modeling, Estimation and Control . Springer Berlin Heidelberg, 2007, pp. 73–83.[32] T. M. Cover and J. A. Thomas,
Elements of information theory . John Wiley & Sons, 2012.[33] T. T. Georgiou, “The structure of state covariances and its relation to the power spectrum of the input,”
IEEE Trans. Autom.Control , vol. 47, no. 7, pp. 1056–1066, 2002.[34] Yu. A. Rozanov,
Stationary random processes . San Francisco: Holden-Day, 1967.[35] H. I. Nurdin, “Spectral factorization of a class of matrix-valued spectral densities,”
SIAM J. Control Optim. , vol. 45, no. 5,pp. 1801–1821, 2006.[36] W. Rudin,
Real and complex analysis . New York: McGraw-Hill, Inc., 1987.[37] J. P. LaSalle,
The Stability and Control of Discrete Processes , ser. Applied Mathematical Sciences. New York: SpringerVerlag, 1986, vol. 62.
Thursday 30 thth