A New Proof of Hopf's Inequality Using a Complex Extension of the Hilbert Metric
aa r X i v : . [ m a t h . SP ] J u l A New Proof of Hopf ’s Inequality Using a ComplexExtension of the Hilbert Metric
Wendi Han Guangyue HanThe University of Hong Kong The University of Hong Kongemail: [email protected] email: [email protected]
July 17, 2019
Abstract
Hopf’s inequality for positive linear operators yields a strengthening of Perron’stheorem. We give in this paper an alternative proof of this strengthening using acomplex extension of the Hilbert metric.
Index terms:
Perron’s theorem, Hopf’s inequality, positive matrix, Hilbert metric,Birkhoff contraction coefficient.
Let n be an integer greater than or equal to 2. Let A = ( a ij ) be an n × n positive matrix,i.e., a i,j > i, j . By Perron’s theorem [18], the largest eigenvalue (in modulus) of A , denoted by ρ ( A ), is unique, real and positive, and therefore, the spectral ratio κ ( A ) of A , defined as κ ( A ) , max {| λ | : λ is an eigenvalue of A, λ = ρ ( A ) } /ρ ( A ) , is strictly less than 1. Ostrowski [16] strengthened this result and showed that κ ( A ) ≤ M − m M + m , (1)where m = min i,j a ij and M = max i,j a ij . Inspired by Ostrowski’s theorem, Hopf [11]further strengthened Perron’s theorem and showed that κ ( A ) ≤ M − mM + m . (2)It has been observed [17] that Hopf’s strengthening is tight in the sense that there areexamples of A for which (2) holds with equality.Though not the major concern of this work, let us mention that Frobenius [9, 10]generalized Perron’s theorem to non-negative matrices, which is popularly known as thePerron-Frobenius theorem. This result is the key pillar of the theory of non-negative1atrices, which has a wide range of applications in multiple disciplines; see, e.g., [21, 14,2, 1, 12]. Accordingly, there are numerous results characterizing the isolation of the largesteigenvalue of non-negative matrices, most of them in the forms of upper bounds on themodulus of the second largest eigenvalue; see, e.g., [19] and the references therein. And itis worthwhile to note that for certain special families of symmetric non-negative matrices(such as adjacency matrices of a regular graph and transition probabilities matrices ofa reversible stationary Markov chain), numerous Cheeger-type inequalities, which are inthe forms of bounds on the difference between the largest and second largest eigenvalue,have been established; see, e.g. [5, 4, 15, 13] and references therein.Although it often shows up in the literature, the exact expression as in (2) actuallydoes not appear in [11] and only follows from Theorem 4 therein, stated for more generalpositive linear operators. As a matter of fact, a careful examination of the proof ofTheorem 4 reveals that it yields a bound stronger than (2).To precisely state this stronger result, we need to introduce some notation and termi-nologies. Let W denote the standard simplex in the n -dimensional Euclidean space: W = ( w = ( w , w , ..., w n ) ∈ R n : n X i =1 w i = 1 , w i ≥ i ) , (3)and let W ◦ denote its interior, consisting of all the positive vectors in W . Let d H denotethe Hilbert metric on W ◦ , which is defined by d H ( v, w ) , max i,j log (cid:18) w i /w j v i /v j (cid:19) , for any two vectors v, w ∈ W ◦ . (4)For any positive vector w = ( w , w , . . . , w n ) ∈ R n , we define its normalized version N ( w )as N ( w ) = ( w , w , . . . , w n ) w + w + · · · + w n , (5)which obviously belongs to W ◦ . Apparently, the matrix A induces a mapping f A : W ◦ → W ◦ , defined by f A ( w ) = N ( Aw ) , for any vector w ∈ W ◦ . (6)It is well known that f A is a contraction mapping under the Hilbert metric and thecontraction coefficient τ ( A ), defined by τ ( A ) , sup v = w ∈ W ◦ d H ( Av, Aw ) d H ( v, w )and often referred to as the Birkhoff contraction coefficient , can be explicitly computedas τ ( A ) = 1 − p φ ( A )1 + p φ ( A ) , (7)where φ ( A ) = min i,j,k,l a ik a jl a jk a il . (8)We are now ready to state the aforementioned stronger result: The Hilbert metric is often defined on a projective space (see, e.g., [21, 12]), which is equivalent tothe definition in this paper up to a usual normalization. heorem 1.1. For an n × n positive matrix A , we have κ ( A ) ≤ τ ( A ) . (9)As mentioned before, Theorem 1.1 follows from Theorem 4 in [11], which is a contractionresult with respect to the Hopf oscillation. Ostrowski [17] modified Birkhoff’s argumentin [3] and gave an alternative proof of Theorem 1.1, which however still used the Hopfoscillation. In this work, we will give a new proof of Theorem 1.1 using a complex extensionof the Hilbert metric in lieu of the Hopf oscillation. As it turned out, the complex Hilbertmetric can be applied elsewhere; more specifically, it has been used [8] to establish theanalyticity of entropy rate of hidden Markov chains and specify the corresponding domainof analyticity. Let W C = { w = ( w , w , . . . , w n ) ∈ C n : P ni =1 w i = 1 } and let W + C = { w = ( w , w , . . . , w n ) ∈ W C : R ( w i /w j ) > i, j } . The following complex extension of the Hilbert metrichas been proposed in [8]: d H ( v, w ) = max i,j (cid:12)(cid:12)(cid:12)(cid:12) log (cid:18) w i /w j v i /v j (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) , for any v, w ∈ W + C , (10)where log( · ) is taken as the principal branch of the complex log( · ) function. Here weremark that there are other complex extensions of the Hilbert metric; see, e.g., [20, 6].Our treatment however only uses the extension in (10), which will henceforth be referredto as the complex Hilbert metric. For any ε >
0, we define W ◦ C ( ε ) , { w = ( w , w , · · · , w n ) ∈ W C : ∃ v ∈ W ◦ such that | w i − v i | ≤ εv i for all i } . (11) It can be easily verified that for ε small enough, W ◦ C ( ε ) ⊂ W + C and thereby the complexHilbert metric is well-defined on W ◦ C ( ε ).Extending the definition in (5), for any complex vector w = ( w , w , . . . , w n ) with w + w + · · · + w n = 0, we define its normalized version N ( w ) as N ( w ) = ( w , w , . . . , w n ) w + w + · · · + w n , which obviously belongs to W C . And furthermore, for any ε >
0, extending the definitionin (6), we define f A : W ◦ C ( ε ) → W ◦ C ( ε ) by: f A ( w ) = N ( Aw ) , for any vector w ∈ W ◦ C ( ε ) , (12)which is well-defined if ε is small enough.The following lemma has been implicitly established in [8]. We outline its proof forcompleteness and clarity. An interested reader may refer to the proofs of Theorem 2 . emma 2.1. Consider an n × n positive square matrix A . For any small enough ε > < τ ε ( A ) < x, y ∈ W ◦ C ( ε ), d H ( f A ( x ) , f A ( y )) ≤ τ ε ( A ) d H ( x, y ) , (13)and moreover, τ ε ( A ) tends to τ ( A ) as ε tends to 0. Proof.
First of all, we note, by the definition in (10), that for any x, y ∈ W ◦ C ( ε ), d H ( f A ( x ) , f A ( y )) d H ( x, y ) = d H ( N ( Ax ) , N ( Ay )) d H ( x, y ) = max i,j | L i,j | , where L i,j = log ( P m a im x m / P m a jm x m ) − log ( P m a im y m / P m a jm y m )max k,l | log( x k /y k ) − log( x l /y l ) | . Letting c i = log( x i /y i ) for all i and choosing p, q such that | c p − c q | = max k,l | c k − c l | , wenote that L i,j can be rewritten as L i,j = log ( P m e c m − c q a im y m / P m e c m − c q a jm y m ) − log ( P m a im y m / P m a jm y m ) | c p − c q | . An application of the mean value theorem then yields that there exists ξ ∈ [0 ,
1] such that | L i,j | ≤ X l c l − c q | c p − c q | (cid:18) e ( c l − c q ) ξ a il y l P m e ( c m − c q ) ξ a im y m − e ( c l − c q ) ξ a jl y l P m e ( c m − c q ) ξ a jm y m (cid:19) . By the definition of W ◦ C ( ε ), there exist x ◦ , y ◦ ∈ W ◦ such that for some constant C > | x k − x ◦ k | ≤ C εx ◦ k , | y k − y ◦ k | ≤ C εy ◦ k for all k. Now, let D l = e ( c l − c q ) ξ a il y l P m e ( c m − c q ) ξ a im y m − e ( c l − c q ) ξ a jl y l P m e ( c m − c q ) ξ a jm y m , and D ◦ l = e ( c ◦ l − c ◦ q ) ξ a il y ◦ l P m e ( c ◦ m − c ◦ q ) ξ a im y ◦ m − e ( c ◦ l − c ◦ q ) ξ a jl y ◦ l P m e ( c ◦ m − c ◦ q ) ξ a jm y ◦ m , where we have, similarly as above, defined c ◦ i = log( x ◦ i /y ◦ i ) for all i . It then follows fromthe established facts that for some constant C > (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l c l − c q | c p − c q | D l − X l c l − c q | c p − c q | D ◦ l (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) < C C ε, and (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l c l − c q | c p − c q | D ◦ l (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ τ ( A )that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l c l − c q | c p − c q | D l (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C C ε + τ ( A ) , d H ( f A ( x ) , f A ( y )) d H ( x, y ) ≤ C C ε + τ ( A ) . Setting τ ε ( A ) = C C ε + τ ( A ) and noting that ε can be chosen arbitrarily small, weestablish (13) and conclude that τ ε ( A ) tends to τ ( A ) as ε tends to 0. For a subset S of W ◦ , we generalize the definition in (11) and define S C ( ε ) , { w = ( w , w , · · · , w n ) ∈ W C : ∃ v ∈ S such that | w i − v i | ≤ εv i for all i } . We will need the following lemma, which, roughly speaking, asserts the equivalence be-tween the Euclidean metric (denoted by d E ) and the Hilbert metric on a complex neigh-borhood of a compact subset of W ◦ Lemma 3.1.
For any compact subset S of W ◦ , there exists ε > G , G > < ε < ε and for all v, w ∈ S C ( ε ), G d H ( v, w ) < d E ( v, w ) < G d H ( v, w ) . Proof.
The lemma follows from some straightforward arguments underpinned by the meanvalue theorem and the compactness of S , which are completely parallel to those in theproof of Proposition 2 . Proof.
Consider an n × n positive square matrix A . Let x = ( x , x , . . . , x n ) be theeigenvector corresponding to ρ ( A ). By the Perron-Frobenius theorem, we can choose x tobe a positive vector with x + x + · · · + x n = 1, i.e., x ∈ W ◦ . Let λ be an eigenvalue of A that is different from ρ ( A ) and let y be a corresponding eigenvector. Here we remarkthat while ρ ( A ) and x are real, λ and y can be complex.Now, consider a compact subset S of W ◦ that contains x . It can be easily verifiedthat for any ε >
0, there exists n ∈ N such that for any n ≥ n , N ( A n ( x + y )) = N ( ρ n ( A ) x + λ n y ) ∈ S C ( ε ) . Henceforth, we let v = ρ ( A ) n x and w = λ n y . For any m ∈ N , it can be verified that d H ( N ( A m v ) , N ( A m ( v + w ))) = d H ( N ( ρ ( A ) m v ) , N ( ρ ( A ) m v + λ m w ))= d H ( N ( v ) , N ( v + ˜ λ m w )) , λ/ρ ( A ) as ˜ λ for notational simplicity. Now, using the definition ofthe complex Hilbert metric, we continue d H ( N ( A m v ) , N ( A m ( v + w ))) = max i,j =1 , ,...,n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log ( v i + ˜ λ m w i ) / ( v j + ˜ λ m w j ) v i /v j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max i,j =1 , ,...,n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log 1 + ˜ λ m ( w i /v i )1 + ˜ λ m ( w j /v j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max i,j =1 , ,...,n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log λ m ( w i /v i ) − ( w j /v j )1 + ˜ λ m ( w j /v j ) !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = max i,j =1 , ,...,n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log w i /v i ) − ( w j /v j )(1 / ˜ λ m ) + ( w j /v j ) !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log w i /v i ) − ( w j /v j )(1 / ˜ λ m ) + ( w j /v j ) !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , (14)where we have assumed i , j achieve the maxima in (14). We note that w i /v i = w j /v j ,since otherwise it would mean d H ( N ( A m v ) , N ( A m ( v + w ))) = 0 and therefore w wouldbe a scaled version of v , contradicting the fact that λ is different from ρ ( A ).It follows from the fact that 0 < ˜ λ < C > m , d H ( N ( A m v ) , N ( A m ( v + w ))) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log w i /v i ) − ( w j /v j )(1 / ˜ λ m ) + ( w j /v j ) !(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( w i /v i ) − ( w j /v j )(1 / ˜ λ m ) + ( w j /v j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . And by Lemmas 2.1 and 3.1, there exist 0 < τ ε ( A ) < C > d H ( N ( A m v ) , N ( A m ( v + w ))) ≤ C τ mε ( A ) d E ( N ( v ) , N ( v + w )) , which immediately implies that C (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) / ˜ λ m ) + ( w j /v j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ C τ mε ( A ) d E ( N ( v ) , N ( v + w )) | ( w i /v i ) − ( w j /v j ) | . One then verifies that there exists a constant C > x, y ) suchthat d E ( N ( v ) , N ( v + w )) | ( w i /v i ) − ( w j /v j ) | < C , and furthermore, there exists a constant C > m , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) / ˜ λ m ) + ( w j /v j ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ C ˜ λ m . It then follows that after choosing ε small enough and then n large enough, we have C C ˜ λ m ≤ C C τ mε ( A ) , m tend to infinity, yields ˜ λ ≤ τ ε ( A ), where we have used the fact thatall the constants C , C , C , C can be chosen independent of ε . Moreover, using the factthat ε can be chosen arbitrarily small, we apply Lemma 2.1 to obtain ˜ λ ≤ τ ( A ), whichimmediately leads to κ ( A ) ≤ τ ( A ), as desired. Acknowledgement.
This work is supported by the Research Grants Council of theHong Kong Special Administrative Region, China, under Project 17301017 and by theNational Natural Science Foundation of China, under Project 61871343.
References [1] R. Bapat and T. RagHavan.
Nonnegative Matrices and Applications , New York: Cam-bridge University Press, 1997.[2] A. Berman and R. Plemmons.
Nonnegative Matrices in the Mathematical Sciences ,Philadephia, Pa.: Society for Industrial and Applied Mathematics, 1994.[3] G. Birkhoff. Extensions of Jentzsch’s Theorem.
Transactions of the American Math-ematical Society , vol. 85, no. 1, pp. 219-227, 1957.[4] A. Brouwer and W. Haemers.
Spectra of graphs , Springer, New York, 2012.[5] F. Chung.
Spectral graph theory , Providence, R.I.: Published for the Conference Boardof the mathematical sciences by the American Mathematical Society, 1997.[6] L. Dubois. Projective metrics and contraction principles for complex cones.
Journalof the London Mathematical Society , vol. 79, no. 3, pp. 719-727, 2009.[7] G. Han and B. Marcus. Analyticity of entropy rate of hidden Markov chains.
IEEETrans. Info. Theory , vol. 52, no. 12, pp. 5251-5266, 2006.[8] G. Han, B. Marcus and Y. Peres. A note on a complex Hilbert metric with applicationto domain of analyticity for entropy rate of hidden Markov processes.
Entropy of Hid-den Markov Processes and Connections to Dynamical Systems , London MathematicalSociety Lecture Note Series, vol. 385, pp. 98-116, 2011.[9] G. Frobenius. ¨Uber matrizen aus positiven elementen.
Sitzungsberichte PreussischeAkademie der Wissenschaft , Berlin, pp. 471476, 514518, 1908, 1909.[10] G. Frobenius. ¨Uber matrizen aus nicht negativen elementen.
SitzungsberichtePreussische Akademie der Wissenschaft , Berlin, pp. 456477, 1912.[11] E. Hopf. An inequality for positive linear integral operators.
J. Math. Mech. , vol. 12,no. 5, pp. 683692, 1963.[12] B. Lemmens and R. Nussbaum.
Nonlinear Perron-Frobenius Theory , CambridgeUniversity Press, 2012. 713] D. Levin and Y. Peres.
Markov Chains and Mixing Times , American MathematicalSociety, 2nd Revised Edition, 2017.[14] H. Minc.
Nonnegative Matrices , New York: Wiley, 1988.[15] R. Montenegro and P. Tetali.
Mathematical Aspects of Mixing Times in Markovchains , Foundations and Trends in Theoretical Computer Science, Now Publishers,2006.[16] A. Ostrowski. On positive matrices.
Math. Ann. , vol. 150, no. 3, pp. 276284, 1963.[17] A. Ostrowski. Positive matrices and functional analysis.
Recent Advances in MatrixTheory , Madison: Univ. of Wisconsin Press, 1964.[18] O. Perron. Grundlagen f¨ur eine theorie des Jacobischen Kettenbruchalgorithmus.
Math. Ann. , vol. 64, pp. 1176, 1907.[19] U. Rothblum and C. Tan. Upper bounds on the maximum modulus of subdominanteigenvalues of nonnegative matrices.
Linear Algebra Appl , vol. 66, pp. 45-86, 1985.[20] H. Rugh. Cones and gauges in complex spaces: Spectral gaps and complex Perron-Frobenius theory.
Annals of Mathematics , vol. 171, no. 3, 2010.[21] E. Seneta.