[PDF] A New Proof of Hopf's Inequality Using a Complex Extension of the Hilbert Metric

Abstract

It is well known from the Perron-Frobenius theory that the spectral gap of a positive square matrix is positive. In this paper, we give a more quantitative characterization of the spectral gap. More specifically, using a complex extension of the Hilbert metric, we show that the so-called spectral ratio of a positive square matrix is upper bounded by its Birkhoff contraction coefficient, which in turn yields a lower bound on its spectral gap.

Full PDF

aa r X i v : . [ m a t h . SP ] J u l A New Proof of Hopf ’s Inequality Using a ComplexExtension of the Hilbert Metric

Wendi Han Guangyue HanThe University of Hong Kong The University of Hong Kongemail: [email protected] email: [email protected]

July 17, 2019

Abstract

Hopf’s inequality for positive linear operators yields a strengthening of Perron’stheorem. We give in this paper an alternative proof of this strengthening using acomplex extension of the Hilbert metric.

Index terms:

Perron’s theorem, Hopf’s inequality, positive matrix, Hilbert metric,Birkhoﬀ contraction coeﬃcient.

Let n be an integer greater than or equal to 2. Let A = ( a ij ) be an n × n positive matrix,i.e., a i,j > i, j . By Perron’s theorem [18], the largest eigenvalue (in modulus) of A , denoted by ρ ( A ), is unique, real and positive, and therefore, the spectral ratio κ ( A ) of A , deﬁned as κ ( A ) , max {| λ | : λ is an eigenvalue of A, λ = ρ ( A ) } /ρ ( A ) , is strictly less than 1. Ostrowski [16] strengthened this result and showed that κ ( A ) ≤ M − m M + m , (1)where m = min i,j a ij and M = max i,j a ij . Inspired by Ostrowski’s theorem, Hopf [11]further strengthened Perron’s theorem and showed that κ ( A ) ≤ M − mM + m . (2)It has been observed [17] that Hopf’s strengthening is tight in the sense that there areexamples of A for which (2) holds with equality.Though not the major concern of this work, let us mention that Frobenius [9, 10]generalized Perron’s theorem to non-negative matrices, which is popularly known as thePerron-Frobenius theorem. This result is the key pillar of the theory of non-negative1atrices, which has a wide range of applications in multiple disciplines; see, e.g., [21, 14,2, 1, 12]. Accordingly, there are numerous results characterizing the isolation of the largesteigenvalue of non-negative matrices, most of them in the forms of upper bounds on themodulus of the second largest eigenvalue; see, e.g., [19] and the references therein. And itis worthwhile to note that for certain special families of symmetric non-negative matrices(such as adjacency matrices of a regular graph and transition probabilities matrices ofa reversible stationary Markov chain), numerous Cheeger-type inequalities, which are inthe forms of bounds on the diﬀerence between the largest and second largest eigenvalue,have been established; see, e.g. [5, 4, 15, 13] and references therein.Although it often shows up in the literature, the exact expression as in (2) actuallydoes not appear in [11] and only follows from Theorem 4 therein, stated for more generalpositive linear operators. As a matter of fact, a careful examination of the proof ofTheorem 4 reveals that it yields a bound stronger than (2).To precisely state this stronger result, we need to introduce some notation and termi-nologies. Let W denote the standard simplex in the n -dimensional Euclidean space: W = ( w = ( w , w , ..., w n ) ∈ R n : n X i =1 w i = 1 , w i ≥ i ) , (3)and let W ◦ denote its interior, consisting of all the positive vectors in W . Let d H denotethe Hilbert metric on W ◦ , which is deﬁned by d H ( v, w ) , max i,j log (cid:18) w i /w j v i /v j (cid:19) , for any two vectors v, w ∈ W ◦ . (4)For any positive vector w = ( w , w , . . . , w n ) ∈ R n , we deﬁne its normalized version N ( w )as N ( w ) = ( w , w , . . . , w n ) w + w + · · · + w n , (5)which obviously belongs to W ◦ . Apparently, the matrix A induces a mapping f A : W ◦ → W ◦ , deﬁned by f A ( w ) = N ( Aw ) , for any vector w ∈ W ◦ . (6)It is well known that f A is a contraction mapping under the Hilbert metric and thecontraction coeﬃcient τ ( A ), deﬁned by τ ( A ) , sup v = w ∈ W ◦ d H ( Av, Aw ) d H ( v, w )and often referred to as the Birkhoﬀ contraction coeﬃcient , can be explicitly computedas τ ( A ) = 1 − p φ ( A )1 + p φ ( A ) , (7)where φ ( A ) = min i,j,k,l a ik a jl a jk a il . (8)We are now ready to state the aforementioned stronger result: The Hilbert metric is often deﬁned on a projective space (see, e.g., [21, 12]), which is equivalent tothe deﬁnition in this paper up to a usual normalization. heorem 1.1. For an n × n positive matrix A , we have κ ( A ) ≤ τ ( A ) . (9)As mentioned before, Theorem 1.1 follows from Theorem 4 in [11], which is a contractionresult with respect to the Hopf oscillation. Ostrowski [17] modiﬁed Birkhoﬀ’s argumentin [3] and gave an alternative proof of Theorem 1.1, which however still used the Hopfoscillation. In this work, we will give a new proof of Theorem 1.1 using a complex extensionof the Hilbert metric in lieu of the Hopf oscillation. As it turned out, the complex Hilbertmetric can be applied elsewhere; more speciﬁcally, it has been used [8] to establish theanalyticity of entropy rate of hidden Markov chains and specify the corresponding domainof analyticity. Let W C = { w = ( w , w , . . . , w n ) ∈ C n : P ni =1 w i = 1 } and let W + C = { w = ( w , w , . . . , w n ) ∈ W C : R ( w i /w j ) > i, j } . The following complex extension of the Hilbert metrichas been proposed in [8]: d H ( v, w ) = max i,j (cid:12)(cid:12)(cid:12)(cid:12) log (cid:18) w i /w j v i /v j (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , for any v, w ∈ W + C , (10)where log( · ) is taken as the principal branch of the complex log( · ) function. Here weremark that there are other complex extensions of the Hilbert metric; see, e.g., [20, 6].Our treatment however only uses the extension in (10), which will henceforth be referredto as the complex Hilbert metric. For any ε >

0, we deﬁne W ◦ C ( ε ) , { w = ( w , w , · · · , w n ) ∈ W C : ∃ v ∈ W ◦ such that | w i − v i | ≤ εv i for all i } . (11) It can be easily veriﬁed that for ε small enough, W ◦ C ( ε ) ⊂ W + C and thereby the complexHilbert metric is well-deﬁned on W ◦ C ( ε ).Extending the deﬁnition in (5), for any complex vector w = ( w , w , . . . , w n ) with w + w + · · · + w n = 0, we deﬁne its normalized version N ( w ) as N ( w ) = ( w , w , . . . , w n ) w + w + · · · + w n , which obviously belongs to W C . And furthermore, for any ε >

0, extending the deﬁnitionin (6), we deﬁne f A : W ◦ C ( ε ) → W ◦ C ( ε ) by: f A ( w ) = N ( Aw ) , for any vector w ∈ W ◦ C ( ε ) , (12)which is well-deﬁned if ε is small enough.The following lemma has been implicitly established in [8]. We outline its proof forcompleteness and clarity. An interested reader may refer to the proofs of Theorem 2 . emma 2.1. Consider an n × n positive square matrix A . For any small enough ε > < τ ε ( A ) < x, y ∈ W ◦ C ( ε ), d H ( f A ( x ) , f A ( y )) ≤ τ ε ( A ) d H ( x, y ) , (13)and moreover, τ ε ( A ) tends to τ ( A ) as ε tends to 0. Proof.

First of all, we note, by the deﬁnition in (10), that for any x, y ∈ W ◦ C ( ε ), d H ( f A ( x ) , f A ( y )) d H ( x, y ) = d H ( N ( Ax ) , N ( Ay )) d H ( x, y ) = max i,j | L i,j | , where L i,j = log ( P m a im x m / P m a jm x m ) − log ( P m a im y m / P m a jm y m )max k,l | log( x k /y k ) − log( x l /y l ) | . Letting c i = log( x i /y i ) for all i and choosing p, q such that | c p − c q | = max k,l | c k − c l | , wenote that L i,j can be rewritten as L i,j = log ( P m e c m − c q a im y m / P m e c m − c q a jm y m ) − log ( P m a im y m / P m a jm y m ) | c p − c q | . An application of the mean value theorem then yields that there exists ξ ∈ [0 ,

1] such that | L i,j | ≤ X l c l − c q | c p − c q | (cid:18) e ( c l − c q ) ξ a il y l P m e ( c m − c q ) ξ a im y m − e ( c l − c q ) ξ a jl y l P m e ( c m − c q ) ξ a jm y m (cid:19) . By the deﬁnition of W ◦ C ( ε ), there exist x ◦ , y ◦ ∈ W ◦ such that for some constant C > | x k − x ◦ k | ≤ C εx ◦ k , | y k − y ◦ k | ≤ C εy ◦ k for all k. Now, let D l = e ( c l − c q ) ξ a il y l P m e ( c m − c q ) ξ a im y m − e ( c l − c q ) ξ a jl y l P m e ( c m − c q ) ξ a jm y m , and D ◦ l = e ( c ◦ l − c ◦ q ) ξ a il y ◦ l P m e ( c ◦ m − c ◦ q ) ξ a im y ◦ m − e ( c ◦ l − c ◦ q ) ξ a jl y ◦ l P m e ( c ◦ m − c ◦ q ) ξ a jm y ◦ m , where we have, similarly as above, deﬁned c ◦ i = log( x ◦ i /y ◦ i ) for all i . It then follows fromthe established facts that for some constant C > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l c l − c q | c p − c q | D l − X l c l − c q | c p − c q | D ◦ l (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < C C ε, and (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l c l − c q | c p − c q | D ◦ l (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ τ ( A )that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l c l − c q | c p − c q | D l (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C C ε + τ ( A ) , d H ( f A ( x ) , f A ( y )) d H ( x, y ) ≤ C C ε + τ ( A ) . Setting τ ε ( A ) = C C ε + τ ( A ) and noting that ε can be chosen arbitrarily small, weestablish (13) and conclude that τ ε ( A ) tends to τ ( A ) as ε tends to 0. For a subset S of W ◦ , we generalize the deﬁnition in (11) and deﬁne S C ( ε ) , { w = ( w , w , · · · , w n ) ∈ W C : ∃ v ∈ S such that | w i − v i | ≤ εv i for all i } . We will need the following lemma, which, roughly speaking, asserts the equivalence be-tween the Euclidean metric (denoted by d E ) and the Hilbert metric on a complex neigh-borhood of a compact subset of W ◦ Lemma 3.1.

For any compact subset S of W ◦ , there exists ε > G , G > < ε < ε and for all v, w ∈ S C ( ε ), G d H ( v, w ) < d E ( v, w ) < G d H ( v, w ) . Proof.

The lemma follows from some straightforward arguments underpinned by the meanvalue theorem and the compactness of S , which are completely parallel to those in theproof of Proposition 2 . Proof.

Consider an n × n positive square matrix A . Let x = ( x , x , . . . , x n ) be theeigenvector corresponding to ρ ( A ). By the Perron-Frobenius theorem, we can choose x tobe a positive vector with x + x + · · · + x n = 1, i.e., x ∈ W ◦ . Let λ be an eigenvalue of A that is diﬀerent from ρ ( A ) and let y be a corresponding eigenvector. Here we remarkthat while ρ ( A ) and x are real, λ and y can be complex.Now, consider a compact subset S of W ◦ that contains x . It can be easily veriﬁedthat for any ε >

0, there exists n ∈ N such that for any n ≥ n , N ( A n ( x + y )) = N ( ρ n ( A ) x + λ n y ) ∈ S C ( ε ) . Henceforth, we let v = ρ ( A ) n x and w = λ n y . For any m ∈ N , it can be veriﬁed that d H ( N ( A m v ) , N ( A m ( v + w ))) = d H ( N ( ρ ( A ) m v ) , N ( ρ ( A ) m v + λ m w ))= d H ( N ( v ) , N ( v + ˜ λ m w )) , λ/ρ ( A ) as ˜ λ for notational simplicity. Now, using the deﬁnition ofthe complex Hilbert metric, we continue d H ( N ( A m v ) , N ( A m ( v + w ))) = max i,j =1 , ,...,n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log ( v i + ˜ λ m w i ) / ( v j + ˜ λ m w j ) v i /v j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max i,j =1 , ,...,n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log 1 + ˜ λ m ( w i /v i )1 + ˜ λ m ( w j /v j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max i,j =1 , ,...,n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log λ m ( w i /v i ) − ( w j /v j )1 + ˜ λ m ( w j /v j ) !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max i,j =1 , ,...,n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log w i /v i ) − ( w j /v j )(1 / ˜ λ m ) + ( w j /v j ) !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log w i /v i ) − ( w j /v j )(1 / ˜ λ m ) + ( w j /v j ) !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (14)where we have assumed i , j achieve the maxima in (14). We note that w i /v i = w j /v j ,since otherwise it would mean d H ( N ( A m v ) , N ( A m ( v + w ))) = 0 and therefore w wouldbe a scaled version of v , contradicting the fact that λ is diﬀerent from ρ ( A ).It follows from the fact that 0 < ˜ λ < C > m , d H ( N ( A m v ) , N ( A m ( v + w ))) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log w i /v i ) − ( w j /v j )(1 / ˜ λ m ) + ( w j /v j ) !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( w i /v i ) − ( w j /v j )(1 / ˜ λ m ) + ( w j /v j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . And by Lemmas 2.1 and 3.1, there exist 0 < τ ε ( A ) < C > d H ( N ( A m v ) , N ( A m ( v + w ))) ≤ C τ mε ( A ) d E ( N ( v ) , N ( v + w )) , which immediately implies that C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) / ˜ λ m ) + ( w j /v j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C τ mε ( A ) d E ( N ( v ) , N ( v + w )) | ( w i /v i ) − ( w j /v j ) | . One then veriﬁes that there exists a constant C > x, y ) suchthat d E ( N ( v ) , N ( v + w )) | ( w i /v i ) − ( w j /v j ) | < C , and furthermore, there exists a constant C > m , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) / ˜ λ m ) + ( w j /v j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ C ˜ λ m . It then follows that after choosing ε small enough and then n large enough, we have C C ˜ λ m ≤ C C τ mε ( A ) , m tend to inﬁnity, yields ˜ λ ≤ τ ε ( A ), where we have used the fact thatall the constants C , C , C , C can be chosen independent of ε . Moreover, using the factthat ε can be chosen arbitrarily small, we apply Lemma 2.1 to obtain ˜ λ ≤ τ ( A ), whichimmediately leads to κ ( A ) ≤ τ ( A ), as desired. Acknowledgement.

This work is supported by the Research Grants Council of theHong Kong Special Administrative Region, China, under Project 17301017 and by theNational Natural Science Foundation of China, under Project 61871343.

References [1] R. Bapat and T. RagHavan.

Nonnegative Matrices and Applications , New York: Cam-bridge University Press, 1997.[2] A. Berman and R. Plemmons.

Nonnegative Matrices in the Mathematical Sciences ,Philadephia, Pa.: Society for Industrial and Applied Mathematics, 1994.[3] G. Birkhoﬀ. Extensions of Jentzsch’s Theorem.

Transactions of the American Math-ematical Society , vol. 85, no. 1, pp. 219-227, 1957.[4] A. Brouwer and W. Haemers.

Spectra of graphs , Springer, New York, 2012.[5] F. Chung.

Spectral graph theory , Providence, R.I.: Published for the Conference Boardof the mathematical sciences by the American Mathematical Society, 1997.[6] L. Dubois. Projective metrics and contraction principles for complex cones.

Journalof the London Mathematical Society , vol. 79, no. 3, pp. 719-727, 2009.[7] G. Han and B. Marcus. Analyticity of entropy rate of hidden Markov chains.

IEEETrans. Info. Theory , vol. 52, no. 12, pp. 5251-5266, 2006.[8] G. Han, B. Marcus and Y. Peres. A note on a complex Hilbert metric with applicationto domain of analyticity for entropy rate of hidden Markov processes.

Entropy of Hid-den Markov Processes and Connections to Dynamical Systems , London MathematicalSociety Lecture Note Series, vol. 385, pp. 98-116, 2011.[9] G. Frobenius. ¨Uber matrizen aus positiven elementen.

Sitzungsberichte PreussischeAkademie der Wissenschaft , Berlin, pp. 471476, 514518, 1908, 1909.[10] G. Frobenius. ¨Uber matrizen aus nicht negativen elementen.

SitzungsberichtePreussische Akademie der Wissenschaft , Berlin, pp. 456477, 1912.[11] E. Hopf. An inequality for positive linear integral operators.

J. Math. Mech. , vol. 12,no. 5, pp. 683692, 1963.[12] B. Lemmens and R. Nussbaum.

Nonlinear Perron-Frobenius Theory , CambridgeUniversity Press, 2012. 713] D. Levin and Y. Peres.

Markov Chains and Mixing Times , American MathematicalSociety, 2nd Revised Edition, 2017.[14] H. Minc.

Nonnegative Matrices , New York: Wiley, 1988.[15] R. Montenegro and P. Tetali.

Mathematical Aspects of Mixing Times in Markovchains , Foundations and Trends in Theoretical Computer Science, Now Publishers,2006.[16] A. Ostrowski. On positive matrices.

Math. Ann. , vol. 150, no. 3, pp. 276284, 1963.[17] A. Ostrowski. Positive matrices and functional analysis.

Recent Advances in MatrixTheory , Madison: Univ. of Wisconsin Press, 1964.[18] O. Perron. Grundlagen f¨ur eine theorie des Jacobischen Kettenbruchalgorithmus.

Math. Ann. , vol. 64, pp. 1176, 1907.[19] U. Rothblum and C. Tan. Upper bounds on the maximum modulus of subdominanteigenvalues of nonnegative matrices.

Linear Algebra Appl , vol. 66, pp. 45-86, 1985.[20] H. Rugh. Cones and gauges in complex spaces: Spectral gaps and complex Perron-Frobenius theory.

Annals of Mathematics , vol. 171, no. 3, 2010.[21] E. Seneta.