Decision Theory and Large Deviations for Dynamical Hypotheses Test: Neyman-Pearson, Min-Max and Bayesian Tests
aa r X i v : . [ m a t h . S T ] J a n Decision Theory and Large Deviations for DynamicalHypotheses Test: Neyman-Pearson, Min-Max andBayesian Tests
Hermes H. Ferreira, Artur O. Lopes ‡ and S´ılvia R.C. Lopes Mathematics and Statistics InstituteFederal University of Rio Grande do SulPorto Alegre, RS, BrazilJanuary 21, 2021
Abstract
We analyze hypotheses tests via classical results on large deviations for the case oftwo different H¨older Gibbs probabilities. The main difference for the the classicalhypotheses tests in Decision Theory is that here the two considered measures aresingular with respect to each other. We analyze the classical Neyman-Pearson testshowing its optimality. This test becomes exponentially better when compared toother alternative tests, with the sample size going to infinity. We also consider both,the Min-Max and a certain type of Bayesian hypotheses tests. We shall considerthese tests in the log likelihood framework by using several tools of ThermodynamicFormalism. Versions of the Stein’s Lemma and the Chernoff’s information are alsopresented.
Keywords:
Decision Theory, Large Deviations Properties, Rejection Region, Ney-man-Pearson Hypotheses Test, Min-Max Hypotheses Test, Bayesian HypothesesTest, Thermodynamic Formalism, Gibbs Probabilities. : 62C20, 62C10, 37D35.
The problem we are interested in here can be simply expressed as the following: thereare two measures µ and µ that we know in advance what they are. A data set is obtainedby sampling but we do not know, in advance, if it was originated from µ or µ . Supposeit comes from µ . From this data set, we need to decide on which one of the two generatedthis sampling data. A hypotheses test is a method that helps us to make such a choice.Taking large samples from the random process we will be able to make the right decision,that is, to choose the alternative µ . Classical results on Large Deviations properties canestimate the risk of a wrong decision. In the Bayesian point of view, we should attach to µ a probability π and to µ a probability π , where π + π = 1.We shall extend the reasoning described on page 91 of section VI in [9], where theauthor considers LDP properties. However, we point out that in [9] there is no dynamicsinvolved in the process. ‡ Corresponding author. E-mail: [email protected]
LD for Dynamical Hypotheses Tests
We are interested in probabilities on the symbolic space Ω = { , , ..., d } N . The shifttransformation σ is given by σ ( b , b , b , b , ..., b n , ... ) = ( b , b , b , ..., b n , ... ).A nice reference for Thermodynamic Formalism is [29] (see also [13] and [27]). Forresults on Large Deviations for Thermodynamic Formalism we refer the reader to [22],[25], [26] and [27]. Important references for basic results in Hypotheses tests are Sections3.4 and 3.5 in [15], [1], [30], [9], [6], [5], [21], [11], [12], [14], where some of these referencesuse Large Deviations techniques. For additional results on the Bayesian point of view inThermodynamic Formalism, we refer the reader to [16], [23] and [28].Invariant probabilities for the shift transformation correspond to stationary processes X n , for n ∈ N , with values on { , , ..., d } .Given a H¨older potential A : Ω → R the pressure of A is defined as P ( A ) = sup µ invariant for the shift (cid:26)Z Adµ + h ( µ ) (cid:27) , where h ( µ ) is the Shannon-Kolmogorov entropy for the invariant probability measure µ .The unique probability which realizes such supremum is called the H¨older equilibriumprobability for the potential A . It’s known that P ( A ) is an analytic function on the po-tential A (see [29]). This property is quite useful to obtain good large deviation properties(see, for instance, [26] and [27]).Consider a H¨older continuous function log J : Ω → R , where J >
0, such that, P da =1 J ( a, b , b , b , b , · · · ) = 1, for all x = ( b , b , b , b , · · · ) ∈ Ω.In this case P (log J ) = 0 and the H¨older equilibrium probability will be called a H¨older Gibbs equilibrium probability for log J . For instance, when log J = − log d , thecorresponding equilibrium probability will be the maximum entropy probability, which isthe independent probability with weights 1 /d .Equilibrium probabilities play a central role in several problems in Statistical Physicsand Information Theory. Hypotheses tests are relevant in all these domains (see, forinstance, [10], [20], [31], [8], [33], [32] and [4]).To each H¨older Gibbs probability µ , one can associate a unique H¨older continuousfunction log J : Ω = { , ..., d } N → R (see [29]). We call J the Jacobian of µ . All Jacobianfunctions considered here are of H¨older class while all measures are ergodic when log( J ) isin the H¨older class. The Shannon-Kolmogorov entropy of such µ is given by the formula h ( µ ) = − R log J dµ.
Two different
H¨older Gibbs probabilities are singular with respectto each other (see [29]). We point out that, in most of the cases in the classical setting ofHypotheses tests, the researchers consider families of probabilities which are absolutelycontinuous with respect to each other. The set of H¨older Gibbs probabilities is dense inthe set of invariant probabilities (see, for instance, [26]).Here we will just consider probabilities µ on Ω of H¨older Gibbs type. The associatedstochastic process { X n } n ∈ N , taking values on { , , · · · , d } , is described by P ( X = a , X = a , · · · , X n = a n ) = µ ( a , a , · · · , a n ) , where a , a , · · · , a n ⊂ Ω is a general cylinder set.For instance, if µ is a Markov measure associated to a line stochastic matrix P =( p i j ) di,j =1 , then the function J on the cylinder i j has the constant value π i p ij π j , where π = ( π , π , .., π d ) is the initial stationary vector for P . The references [3], [18] and[19] consider statistical tests for Markov Chains. In the two by two case, we get that .F. Ferreira, A.O. Lopes, S.R.C. Lopes π i p ij π j = p j i , for i, j = 1 ,
2. The Jacobian is the natural extension of the concept ofstochastic matrix (see page 27, in [29] or example 1, in [27]).The paper is organized as follows: in Section 2 we present the basic idea of two simplehypotheses test in the thermodynamical formalism sense, where the definition of the type I and type II errors are stated. In Section 3, Large Deviation properties and somebasic results are presented. The Neyman-Pearson hypotheses test and its main result areconsidered in Section 4. The Min-Max hypotheses test is presented in Section 5, whilethe Bayesian hypotheses test is in Section 6. Finally, Section 7 presents an example basedon the Min-Max hypotheses test.
In this section, we set the preliminaries and basics concepts to consider a simplehypotheses test in thermodynamic formalism sense.Let { X n } n ∈ N be a stochastic process, defined in a probability on Ω = { , · · · , d } N . Wecan test two simple hypotheses in the following way: H : { X n } n ∈ N is described by µ with Jacobian J H : { X n } n ∈ N is described by µ with Jacobian J .The two measures µ and µ considered here are H¨older Gibbs probabilities and are,therefore, singular with respect to each other. As far as the authors know, this type oftests was not considered in the literature.We want to decide which one of the two hypotheses is true from samples x i = σ i ( x ), i = 0 , , , · · · , n −
1, where x ∈ Ω is chosen at random according to a given measure µ .In Sections 4 and subsection 4.1, we will choose to fix such µ as µ . We are interested onthe Large Deviations properties for such type of tests.One can announce H when H is true. This is called false alarm or type I error . Theprobability of false alarm is usually denoted by α , which is called the test size . Therefore,the value α denotes P (Decide H | H is true) = α . We choose α such that 0 < α < H when H is true. This is called a mis-specification or type II error . The probability of misspecification is usually denotedby 1 − β . The detection rate is the value β ∈ (0 , P (Decide H | H is true). The value β is called the power of the test . In general onehopefully would like to fix a value of β close to 1.We do not know in advance which hypothesis H or H is more likely to happen (atleast for non-Bayesian tests). For a fixed α we would like to choose a test that minimizesthe total error probability, that is, we would like to maximize β .We point out that the Bayesian point of view will be explored in Section 6.The test statistics will be associated to samples and they are given in a log-likelihoodform by S n ( x ) := 1 n n − X i =0 log (cid:18) J ( x i ) J ( x i ) (cid:19) . (2.1) LD for Dynamical Hypotheses Tests
A similar log likelihood test was considered in section VI in [9], but the author uses nodynamics.We shall introduce a sequence u n , n ∈ N , which will be necessary for the test. Therejection region R n is defined as R n = { x ∈ Ω | S n < u n } , n ∈ N . (2.2)We assume that lim n →∞ u n = E. (2.3)In the dynamic sense, the important quantity is the limit value E and not the specificvalues u n . Given a sample of size n , if S n < u n , we announce H , when H is true, and if S n > u n we announce H , when H is true. If S n = u n , from an asymptotic perspective,the choice does not matter.In all tests considered here, the main point is to find the optimal choice for E , thelimit of u n , for n ∈ N ) and its relationship with the asymptotic values of µ ( S n ≥ u n ) and µ ( S n ≤ u n ) , (2.4)for large n .For all tests, we shall consider large samples and we are interested in minimizingthe exponential rate of the probability of a wrong decision. In this direction, it will benecessary to study Large Deviations properties first. The results on Large Deviationsproperties need in here are presented in Section 3.For the Neyman-Pearson hypotheses test (see Section 4 and 4.1), the large sampleswill be taken according to µ and we want to estimate how small is the probability 1 − β n of announcing H when H is true. In this case, we shall consider samples of the process S n , for n ∈ N , which will be produced with the random choice given by µ and not by µ .We denote by β n = µ ( R n ) the power of the test at time n . We want to analyzethe misspecification probability, or type II error , which will be denoted by 1 − β n , fromsamples of size n .The asymptotic values of the probabilities of µ ( S n > u n ) and µ ( S n ≤ u n ), for n ∈ N ,are the essential information we shall consider. The main issue here is: µ { x | S n > u n } isassociated with a wrong decision by announcing H when H is true. On the other hand µ { x | S n ≤ u n } is associated with a wrong decision by announcing H when H is true.The main result of Section 4 and subsection 4.1 is given by Theorem 4 .
1. We statethis theorem below.
Theorem A.
The optimal choice for the value E , in the Neyman-Pearson hypothesestest, is given by E = Z log J dµ − Z log J d µ . Moreover, the decay rate for minimizing the probability of wrong decisions will be of order e − n ( R log J dµ − R log J d µ ) .The value R (log J − log J ) dµ is also known as Kullback-Leibler divergence . Forsome results on Kullback-Leibler divergence, we refer the reader [31], [18], [7] and [24].In the other two tests (see Sections 5 and 6) we will also consider large samples S n ,but we have to compare the corresponding asymptotic laws according to µ and also µ ,in terms of the expression (2.4). .F. Ferreira, A.O. Lopes, S.R.C. Lopes pressure as a function of a real parameter t ∈ R , moreprecisely, we shall need the function P given by t → P ( t ) = P ( t (log J − log J ) + log J ) . The main result in Section 5 is Theorem 5 . Theorem B.
In the Min-Max hypotheses test, the best choice of E will be E = 0 . More-over, the best decay rate for minimizing the probability of wrong decisions is given by e n r ,where r is the minimum of the pressure function P . In Section 6 a certain type of Bayesian hypotheses test will be studied. Hypothesis H will have probability π and hypothesis H will have probability π , where π + π = 1.In this section, we shall consider rejections regions of the form R n,λ = ( x ∈ Ω (cid:12)(cid:12)(cid:12)(cid:12) n n − X i =0 log J λ ( x i ) < u n ) , for n ∈ N , (2.5)where J λ = λJ + (1 − λ ) J , for λ ∈ [0 , . (2.6)We shall estimate π µ ( S n > u n ) and π µ ( S n ≤ u n ), for n ∈ N . For this test, weshall exhibit the best value of E λ that minimizes the probability of a wrong decision (see(6.21), (6.22)) and (6.24)), for each λ . We shall also find the best possible E λ , producingthe best decay rate, among all possible values of λ .We will show a version of Chernoff’s information in Section 6. In this section, we shall present the Large Deviations properties which will be necessaryfor the proof of our main results in Sections 4, 5, and 6.We shall be interested in estimating µ ( S n > u n ) = µ ( S n − u n >
0) and µ ( S n ≤ u n ) = µ ( S n − u n ≤ , (3.7)where S n is defined by (2.1) and u n → E , when n goes to infinity. We are interestedin Large Deviations for S n − u n ; that is, given an interval ( a, b ) ⊂ R , we want to esti-mate, µ j { ( S n − u n ) ∈ ( a, b ) } , for j = 0 ,
1. Intervals of the type ( −∞ ,
0) and (0 , ∞ ) areparticularly important.It is a classical result (see, for instance, [29]), that Z (log J − log J ) dµ > Z (log J − log J ) dµ > . We need to estimate µ j ( S n − u n ∈ ( a, b )) = P µ j n n − X i =0 (cid:20) log (cid:18) J ( x i ) J ( x i ) (cid:19) − u n (cid:21) ∈ ( a, b ) ! , LD for Dynamical Hypotheses Tests for j = 0 , φ jn ( t ) := 1 n log E µ j ( exp " t n − X i =0 (cid:18) log (cid:18) J ( x i ) J ( x i ) (cid:19) − u n (cid:19) , (3.8)for each n and each real value t , where E µ j denotes the expected values with respect tothe probability µ j , for j = 0 ,
1. Expression (3.8) is equivalent to φ jn ( t ) = 1 n log (cid:18)Z e t P ni =1 ( log J − log J )( σ i ( x )) d µ j ( x ) (cid:19) − t u n , (3.9)for j = 0 , n →∞ n log (cid:18)Z e t P ni =1 ( log J − log J )( σ i ( x )) d µ j ( x ) (cid:19) = P ( t (log J − log J ) + log J j ) . (3.10)Hence, from the expressions (3.8), (3.9) and (3.10), one has φ j ( t ) := lim n →∞ φ jn ( t ) = P ( t (log J − log J ) + log J j ) − t E, (3.11)for j = 0 , P j the function t → P j ( t ) := P ( t (log J − log J ) + log J j ) , (3.12)for j = 0 ,
1. The function t → P j ( t ) is convex, for j = 0 ,
1. Figure 3.1 shows the graphsof P (in blue) and P (in red), for the example in Section 7.One can easily show that, for any t ∈ R , P ( t ) = P ( t − . (3.13)The function t → P ( t (log J − log J ) + log J j ) − tE = P j ( t ) − t E is also convex, for j = 0 , P j (0) = P (0(log J − log J ) + log J j ) = P (log J j ) = 0, for j = 0 ,
1. Fromchapter 4 in [29], note that ddt P j ( t ) | t =0 = ddt ( P ( t (log J − log J ) + log J j ) | t =0 = Z (log J − log J ) dµ j , (3.14)for j = 0 , ddt P ( t ) | t =0 > ddt P ( t ) | t =0 <
0, if µ = µ . Besides, ddt ( P ( t (log J − log J ) + log J j ) | t = Z (log J − log J ) dµ jt , (3.15) .F. Ferreira, A.O. Lopes, S.R.C. Lopes Figure 3.1: Graphs of P (in blue) and P (in orange) for the functions defined in (3.12).For these plots we use the data from the example in Section 7.where µ jt is the equilibrium probability for t (log J − log J ) + log J j (see [29]).There exist values c + > > c − defined by c − = inf t ∈ R P ′ ( t ) and c + = sup t ∈ R P ′ ( t ) . From expression (3.13) we also have c − = inf t ∈ R P ′ ( t ) and c + = sup t ∈ R P ′ ( t ) . The main interest here is thee following: for each value E , where u n → E , one wants toestimate the asymptotic values of µ ( S n − u n >
0) and µ ( S n − u n ≤ • From expression (3.13), it holds ddt P ( t ) | t =1 = ddt ( P ( t (log J − log J ) + log J ) | t =1 = ddt P ( t ) | t =0 = Z (log J − log J ) dµ > . (3.16) • It is also true that ddt P ( t ) | t =0 = ddt P ( t ) | t = − . (3.17) Proposition 3.1.
For a fixed value E , it is true that(i) If E < R (log J − log J ) dµ , then lim n →∞ µ ( S n − u n >
0) = 1 . (3.18) (ii) If E > R (log J − log J ) dµ , then lim n →∞ µ ( S n − u n ≤
0) = 1 . (3.19) LD for Dynamical Hypotheses Tests
Proof:
According to [26], [22] or [25], the deviation function I j , for ( S n − u n ), n ∈ N ,and for the measure µ j , is I j ( x ) = sup t (cid:2) tx − φ j ( t ) (cid:3) = sup t [ t ( x + E ) − P ( t (log J − log J ) + log J j )]= sup t [ t ( x + E ) − P j ( t )] , (3.20)for a fixed value E and j = 0 , µ j { x ∈ Ω | ( S n − u n )( x ) ∈ ( a, b ) } ∼ e − n inf z ∈ ( a,b ) I j ( z ) , (3.21)for j = 0 ,
1. The function I j ( · ) is a real analytical one.Given E , take x = v j = − E + (cid:18)Z log J dµ j − Z log J dµ j (cid:19) . (3.22)By using (3.20) and (3.14), the supremum is attained at t = 0. Then, I j ( v j ) = 0 , for j = 0 , . (3.23)On the other hand, from (3.20) and (3.15), for x = 0, we get t Ej , where P ′ j ( t Ej ) = ddt ( P ( t (log J − log J ) + log J j ) | t Ej = E = Z (log J − log J ) dµ t Ej , (3.24)and I j (0) = t Ej E − P ( t Ej (log J − log J ) + log J j ) = t Ej E − P j ( t Ej )= t Ej (cid:18)Z (log J − log J ) dµ t Ej (cid:19) − (cid:20) t Ej (cid:18)Z (log J − log J ) dµ t Ej (cid:19) + Z log J j dµ t Ej + h ( µ t Ej ) (cid:21) = − (cid:20) Z log J j dµ t Ej + h ( µ t Ej ) (cid:21) > , (3.25)if µ t Ej = µ j . It follows from (3.13) that t E = t E − P ( t E ) = P ( t E ). From this,follows that I (0) = t E E − P ( t E ) and I (0) = t E E − P ( t E ) = I (0) − E. (3.26) Item ( i ) : From expression (3.21), with ( a, b ) = (0 , ∞ ), if E < R (log J − log J ) dµ , thatis v >
0, then lim n →∞ µ ( S n − u n >
0) = 1 , since − inf x> I ( x ) = 0. Hence, expression (3.18) is true.Now, if v <
0, as I ( v ) = 0, from (3.21), with ( a, b ) = (0 , ∞ ),lim n →∞ n log( µ ( S n − u n >
0) ) = lim n →∞ n log(1 − β n ) = − inf x> I ( x ) = − I (0) < . (3.27) .F. Ferreira, A.O. Lopes, S.R.C. Lopes − β n = µ ( S n − u n > ∼ e − n { inf I ( x ) | x ≥ } = e − n I (0) → . This corresponds to − E + Z (log J − log J ) dµ < . Note that if E = 0, then v < µ ( S n > ∼ e − n I (0) , (3.28)where I (0) > Item ( ii ) : On the other hand, if
E > R (log J − log J ) dµ , that is, v <
0, thenlim n →∞ µ ( S n − u n ≤
0) = 1 , (3.29)since − inf x< I ( x ) = 0. Expression (3.19) is true by (3.21), with ( a, b ) = ( −∞ , v >
0, as I ( v ) = 0, from (3.21)lim n →∞ n log( µ ( S n − u n ≤
0) ) = − inf x< I ( x ) = − I (0) < . Then, we obtain µ ( S n − u n ≤ ∼ e − n { inf I ( x ) | x ≤ } = e − n I (0) → . (cid:3) In this section, we shall deal with the dynamical Neyman-Pearson hypotheses test. Itsoptimality property will be shown in Section 4.1.From the ergodicity of µ , we have S n a.s ( µ ) −→ Z log (cid:18) J J (cid:19) d µ = Z log J dµ − Z log J d µ , (4.1)whenever hypothesis H is true (that is, the samples are obtained from the measure µ ).We point out that the right hand side of (4.1) is a relative entropy expression (see [8]).Analyzing the asymptotic values of µ ( S n ≤ u n ), associated to the type I error value0 < α <
1, when n → ∞ , it follows that u n −→ Z log J dµ − Z log J d µ > . This is true, otherwise, µ ( S n < u n ) would converge to 1 or 0.We set in the present section a sequence u n , such thatlim n →∞ u n = E = Z log J dµ − Z log J d µ . (4.2)0 LD for Dynamical Hypotheses Tests
Later, in Section 4.1, we will show that such E is optimal.The probability of misspecification is given by P (Decide H | H is true) = 1 − β . Wewill consider large samples of size n and we shall apply classical results of large deviationstheory. In the Neyman-Pearson (NP, for short) hypotheses test we shall consider the rejection region given by (2.2). According to this test, at time n (obtained from samplesof size n from measure µ ) we want to estimate 1 − β n .The sequence u n should be consistent with the value α in the sense that it shouldpreserve P (Decide H | H is true) = α , that is, we define u n by µ ( S n < u n ) = α. (4.3)As mentioned before1 − β n = µ ( S n ≥ u n ) = µ ( S n − u n ≥ . From expression (3.22), when E = R log J dµ − R log J d µ (see expression (4.2)),we get v = − E + (cid:18)Z log J dµ − Z log J dµ (cid:19) = − (cid:18)Z log J dµ − Z log J d µ (cid:19) + (cid:18)Z log J dµ − Z log J dµ (cid:19) < . (4.4)Then, v < I ( v ) = 0 . (4.5)When E = R log J dµ − R log J d µ , we get from (3.20) and (3.16), that t E = 1.From expressions (3.27), (3.20), (3.21), (3.23) and (3.25) we getlim n →∞ n log( µ ( S n − u n >
0) ) = − I (0) = − E = − (cid:18)Z log J dµ − Z log J dµ (cid:19) < . (4.6)The expression (4.6) shows that the Neyman-Pearson hypotheses test works well, since the probability of misspecification − β n goes exponentially fast to 0. The above expressioncan be seen as a version of Stein’s Lemma (see, for instance, [8]). In this section, we shall prove the optimality property for the Neyman-Pearson hy-potheses test (
N P , for short). One wonders if there is another alternative hypothesis thatprovides a smaller mean value error when the sample size n goes to infinity. We shallprove that the Neyman-Pearson hypotheses test is optimal in the sense of the largestpower test. From the dynamical point of view, the limit value E is the most importantissue and not the specific values u n . We shall show, in the next theorem below, that E = R log J dµ − R log J d µ is the best choice for E . For each possible value E weconsider a sequence u n such that the expression (2.3) holds. One may ask if there existbetter values for E . .F. Ferreira, A.O. Lopes, S.R.C. Lopes Theorem 4.1.
The optimal choice for the value E , in the Neyman-Pearson hypothesestest, is given by E = Z log J dµ − Z log J d µ . Moreover, the decay rate for minimizing the probability of wrong decisions will be of order e − n ( R log J dµ − R log J d µ ) . Proof:
We denote any other alternative hypotheses test by A . The sequence u n is definedas µ ( S n < u n ) = α, which describes the false alarm . We choose α such that 0 < α < H when H is true, that is, in optimizing the value β n = µ ( S n − u n < n ∈ N . The rejectionregion for the N P hypotheses test, at time n , is denoted by R NPn and defined as R NPn = ( x ∈ Ω | S n ( x ) = 1 n n − X i =0 log (cid:18) J ( x i ) J ( x i ) (cid:19) < u n ) . Similarly, the rejection region for another alternative hypotheses test A , at time n , willbe denoted by R An , where R An = ( x ∈ Ω | S n ( x ) = 1 n n − X i =0 log (cid:18) J ( x i ) J ( x i ) (cid:19) < ˜ u n ) , for another sequence ˜ u n > A corresponds a different choice of E , suchthat, ˜ u n → E . Assume that ˜ u n is such that˜ α = ˜ α n = µ ( S n < ˜ u n ) ≤ α = α n = µ ( S n < u n ) , (4.7)for all n . This means − µ ( S n < u n ) + µ ( S n < ˜ u n ) ≤ . (4.8)Note that ˜ u n ≤ u n , for all n . The inequality (4.8) means we are assuming that thealternative hypotheses test A has a smaller or equal false alarm probability. In otherwords, we do not want to increase the size of the test. Each choice of the sequence ˜ u n , n ∈ N , will be understood as an alternative possible hypotheses. For each alternativehypotheses test A , we obtain the associated value˜ β n := µ ( S n < ˜ u n ) , for any n ∈ N .Notice that ˜ β n = µ ( R An ) and β n = µ ( R NPn ). The optimality property means tocompare the
N P test with another test A based on the above sequence R An , n ∈ N , usinga sequence ˜ u n satisfying (4.7) and, such that, G := lim n →∞ ˜ u n < lim n →∞ u n = Z log J dµ − Z log J d µ . (4.9)2 LD for Dynamical Hypotheses Tests
We shall assume
G >
0. We point out that, for the alternative hypotheses test A , theassociated value ˜ β n satisfies µ ( { S n − ˜ u n < } ) = µ ( R An ) = ˜ β n . We want to show that˜ β n < β n = µ ( R NPn ) = µ ( { S n − u n < } ) , for large n . More precisely, we want to show thatlim n →∞ − β n − ˜ β n = 0 . (4.10)The expression (4.10) will guarantee that the N P test is exponentially better than theother alternative test A .It is known that 1 − β n ∼ e − n I (0) = e − n ( R log J dµ − R log J dµ ) . Now we will show that, for large n , µ ( R NPn ) = β n = µ ( S n < u n ) ≥ ˜ β n = µ ( S n < ˜ u n ) = µ ( R An ) , (4.11)or, equivalently, that1 − β n = µ ( S n ≥ u n ) ≤ − ˜ β n = µ ( S n > ˜ u n ) . (4.12)Note that µ ( S n > ˜ u n ) = µ ( S n − u n > ˜ u n − u n ) . (4.13)As u n → ( R log J dµ − R log J d µ ) and ˜ u n → G , from expressions (4.13), (3.20)-(3.21) and (3.25), we get the following limitlim n →∞ n log( µ ( S n > ˜ u n ) ) = − inf (cid:26) I ( x ) | x ≥ G − (cid:18)Z log J dµ − Z log J d µ (cid:19)(cid:27) . From the expression (4.9), we obtain G := G − (cid:18)Z log J dµ − Z log J d µ (cid:19) < . (4.14)Observe that v < G . Indeed, as G >
0, from the expression (4.14), one has v = (cid:18)Z log J dµ − Z log J dµ (cid:19) − (cid:18)Z log J dµ − Z log J d µ (cid:19) < G − (cid:18)Z log J dµ − Z log J d µ (cid:19) . Then, from the expression (4.14), we obtain I ( G ) = I (cid:18) G − (cid:18)Z log J dµ − Z log J d µ (cid:19)(cid:19) < Z log J dµ − Z log J dµ = I (0) . .F. Ferreira, A.O. Lopes, S.R.C. Lopes I ( · ) at points: v , G = G − (cid:0)R log J dµ − R log J dµ (cid:1) and zero.Therefore, lim n →∞ n log( µ ( S n > ˜ u n ) ) = − I ( G ) > − I (0)= − (cid:18)Z log J dµ − Z log J dµ (cid:19) = lim n →∞ n log( µ ( S n > u n ) ) . This proves expressions (4.12) and (4.10). (cid:3)
Figure 4.1 shows the large deviation rate function I ( · ) at points v , G and zero,where the point G is given by (4.14). In this section, we shall present the Min-Max Hypotheses text in the dynamical sense.Once more, for S n , n ∈ N , given by (2.1), µ { x | S n > u n } is associated to a wrong decisionby announcing H when H is true while µ { x | S n ≤ u n } is associated to a wrong decisionby announcing H when H is true. For each value E we consider a sequence u n , suchthat, (2.3) holds. In the same way as before, the limit value E is more important thanthe specific values u n .In this section we shall consider large deviation properties for both µ { x | S n > u n } and µ { x | S n < u n } . For the Min-Max hypotheses test, we need loss functions for a falsealarm. This is a classical ingredient in Hypotheses Tests (see [8] or [30]).Here we consider the case when the loss functions for false alarm for H and H areconstants, respectively, given by y and y . The main question here is once again whatis the best value for E ? The idea behind the use of loss functions is wrong decisions canhave a cost. We are interested in finding some optimality property in this setting. Fromthe large deviation properties for this setting, for each choice of limit value E we shallobtain C ( E ) = C ≥ C ( E ) = C ≥
0, such that, µ { x | S n > u n } ∼ e − C n and µ { x | S n ≤ u n } ∼ e − C n . LD for Dynamical Hypotheses Tests
In the Min-Max hypotheses test, we have to compare the asymptotic values of themaximum of y µ { x | S n > u n } ∼ y e − C n and y µ { x | S n ≤ u n } ∼ y e − C n , (5.1)that shall take into account the loss functions y and y , for each value E . Finally, weshall consider the minimum ˜ E among all possible values of E , that is, the minimum ofthe function E → max { C ( E ) , C ( E ) } . This means that in the Min-Max hypotheses testwe are interested in minimizing the maximal cost of wrong decisions by taking either H or H . Later, we will show that ˜ E = 0.Given an interval ( a, b ) ⊂ R we will be interested simultaneously in Large Deviationproperties for S n − u n , where u n → E . Then, we have to estimate both µ { ( S n − u n ) ∈ ( a, b ) } and µ { ( S n − u n ) ∈ ( a, b ) } . This problem was addressed in Section 3. The mainproperties we shall need in the Min-Max hypotheses test are related to the deviationfunctions I and I values. From Section 3, for each value E , we obtain the correspondingvalues t E and t E , such that ddt P ( t E ) = E and ddt P ( t E ) = E. From the expression (3.13), we obtain t E = t E −
1. And, from expression (3.25), we get I j (0) = t Ej E − P j ( t Ej ) , for j = 0 ,
1. Recall, from expression (3.26), that I (0) = I (0) − E .Denote by c + >
0, the limit of the derivative ddt P ( t ) | t = ddt ( P ( t (log J − log J ) + log J ) | t , when t → ∞ . The value c + is the maximal value of the ergodic optimization for thepotential log J − log J (see [2] and also Section 7). When E → c + , we get that P ( t E ) − E = P ( t E ) − ddt P ( t ) | t = t E → ∞ , since t E → ∞ . On the other hand, denote by c − <
0, the limit of the derivative ddt P ( t ) | t = ddt ( P ( t (log J − log J ) + log J ) | t , when t → −∞ . The value c − is the maximal value of the ergodic optimization for thepotential log J − log J (see [2] and also Section 7). When E → c − , we get that P ( t E ) − E = P ( t E ) − ddt P ( t ) | t = t E → ∞ , since t E → −∞ . Remark 5.1.
Observe that both I (0) ≥ I (0) ≥ depend on E . Only the valuesof E , such that, I (0) = t E E − P ( t E ) ≥ I (0) = t E E − P ( t E ) ≥
0, are relevant forthe Min-Max hypotheses test analysis. Indeed, if one of the two options do not happen,then y µ { x | S n > u n } ∼ y e − C n or y µ { x | S n ≤ u n } ∼ y e − C n will be large and this value of E should be discarded in the search for the optimal ˜ E . .F. Ferreira, A.O. Lopes, S.R.C. Lopes v <
0, then I (0) = 0 and, if v > I (0) = 0.If C > C , then the dominant part of the maximum of (5.1) is y e − C n . On the otherhand, if C > C , then the dominant part of the maximum of the same expression (5.1)is y e − C n . The specific values of y and y are irrelevant for this test and we just haveto look for the minimum value ˜ E of the function E → r ( E ) := max { inf { I ( x ) | x ≤ } , inf { I ( x ) | x ≥ } } , but only for values E such that both I (0) > I (0) > I (0) = t E E − P ( t E ) = I (0) − E . Then,we just have to find the minimum value ˜ E of the function E → inf { I (0) , I (0) } = inf { I (0) , I (0) − E } , (5.2)for values E , such that both I (0) > I (0) > µ { x | S n > u n } ∼ e − n inf { I ( x ) | x ≥ } , which depends oneach value of E through the limit (2.3). These values µ { x | S n > u n } will be maximumwhen inf { I ( x ) | x ≥ } is minimum. In the same way, according to Section 3, for eachvalue of E , we have that µ { x | S n > u n } ∼ e − n inf { I ( x ) | x ≤ } . These values µ { x | S n ≤ u n } will be maximum when inf { I ( x ) | x ≤ } is minimum.In the search for the optimal ˜ E , we consider several different cases according to theposition of E in the set ( c − , c + ). • Case 1: c − < E < R (log J − log J ) dµ < < R (log J − log J ) dµ < c + . In thissituation, inf { I ( x ) | x ≤ } = E − P ( t E ) and inf { I ( x ) | x ≥ } = 0. Hence, v > v >
0. Therefore, such values of E should be discarded according to Remark5.1. • Case 2: c + < R (log J − log J ) dµ < < R (log J − log J ) dµ < E < c + . In thissituation, inf { I ( x ) | x ≤ } = 0 and inf { I ( x ) | x ≥ } = E − P ( t E ). Hence, v < v <
0. Therefore, such values of E should be discarded according to Remark5.1. • Case 3: R (log J − log J ) dµ ≤ E ≤ R (log J − log J ) dµ . As r is a continuousfunction it follows from the above that there exists a minimum ˜ E for the functiondescribed by (5.2) restricted to this interval of values of E . This corresponds to v < v > Case 3 , t E range in an increasing monotonous way from 0 to 1.From Section 3 we obtain inf { I ( x ) | x ≤ } = t E E − P ( t E ) and inf { I ( x ) | x ≥ } = t E E − P ( t E ). When R (log J − log J ) dµ ≤ E <
0, from (5.2) we obtain r ( E ) = t E E − P ( t E ) − E = ( t E − E − P ( t E ) , (5.3)and, when R (log J − log J ) dµ ≥ E >
0, from (5.2) we obtain r ( E ) = t E E − P ( t E ) . (5.4)6 LD for Dynamical Hypotheses Tests
We shall analyze the following two functions: for R (log J − log J ) dµ ≤ E ≤ R (log J − log J ) dµ E → P ( t E ) − t E E + E = P ( t E ) − ( t E − P ′ ( t E ) (5.5)and E → P ( t E ) − t E E = P ( t E ) − t E P ′ ( t E ) . (5.6)Observe that (5.6) is a monotonous decreasing function. Indeed, ddE [ P ( t E ) − t E P ′ ( t E ) ] = − ( t E ) ( t E ) ′ P ′′ ( t E ) < , since ( t E ) ′ > t E > ddE [ P ( t E ) − t E P ′ ( t E ) + P ′ ( t E ) ] = (1 − t E ) ( t E ) ′ P ′′ ( t E ) > , since (1 − t E ), ( t E ) ′ > E = R (log J − log J ) dµ , which is a negative value, we have t E = 0 and P ( t E ) − t E E + E = P (0) − E + E = E . Hence, expression (5.5) is equal to R (log J − log J ) dµ <
0. On the other hand, when E = R (log J − log J ) dµ , which is a positivevalue, we have t E = 1 and P ( t E ) = 0. Hence, P ( t E ) − E + E = − E + E = 0. Thisdescribes the values of the function given by (5.5) on the interval R (log J − log J ) dµ
1, considertheir convex combination given by J λ = λJ + (1 − λ ) J . (6.1)We point out that J λ is also a H¨older Jacobian and our setting has a different nature thanthe one mentioned in section 10 of [8]. Making an analogy, for being consistent with [8],we should consider a convex combination of the logarithm of the Jacobians J and J ;instead, we did not use the logarithm function. In [8] the probabilities are on finite sets.We denote by µ λ the H¨older Gibbs probability associated with log J λ . For λ ∈ [0 , n ∈ N , set S λn ( x ) as S λn ( x ) := 1 n n − X i =0 log ( J λ ( x i )) . The rejections regions are of the form R n,λ = ( x ∈ Ω | n n − X i =0 log J λ ( x i ) < u n ) , for n ∈ N . (6.2)The a priori probability of H is given by π and the a priori probability of H is givenby π = 1 − π . We shall consider for the Bayesian hypotheses test a sequence u n → E , n ∈ N , in the same way as in (2.3). The expression π µ { x | S λn > u n } represents the mean value probability of a wrong decision by announcing H when H istrue while the expression π µ { x | S λn ≤ u n } , represents the mean value probability of a wrong decision by announcing H when H istrue.In the Bayes hypotheses test, for each value of λ ∈ [0 , u n , n ∈ N , this means to choose E ,that asymptotically minimizes π µ { x | S λn ≤ u n } + π µ { x | S λn > u n } , as n → ∞ . (6.3)We shall denote by E λ the best value of E , for each value λ . We will show later theexplicit expression for such E λ . We can also ask: among the different values of λ whichone determines the best E λ , in the sense of getting the best rate? We shall denote by˜ E, ˜ λ the optimal value, among all possible values of E and λ , for the asymptotic (6.3)minimizer. In our reasoning, we want to find the best choice of ˜ λ for which a best choiceof ˜ E is possible.8 LD for Dynamical Hypotheses Tests
In the same way as before, it will follow that, for each choice of λ and limit value E ,we shall obtain C ( E, λ ) = C ≥ C ( E, λ ) = C ≥
0, such that, µ { x | S λn > u n } ∼ e − C n and µ { x | S λn ≤ u n } ∼ e − C n . (6.4)From the expression (6.4), for each E and λ we get that the asymptotic of (6.3) is oforder e − min { C ( E,λ ) ,C ( E,λ ) } n . (6.5)When C ( E, λ ) = 0 or C ( E, λ ) = 0 we do not get the optimal values of E and λ forthe asymptotic in (6.3). Such values of E and λ should be discarded. We will show thatfor the optimal solution it is required that C ( E, λ ) = C ( E, λ ). This optimal solution iscalled
Chernoff information (we refer the reader to the end of the proof of theorem 11.9.1in [8], which considers a different setting).The optimal choice of E and λ will be described by expressions (6.21), (6.22) and(6.24) at the end of this section. The optimal value C ( E, λ ) will have a relative entropyexpression given by (6.22).We shall be interested in estimating µ (cid:0) S λn > u n (cid:1) = µ (cid:0) S λn − u n > (cid:1) , and also µ (cid:0) S λn ≤ u n (cid:1) = µ (cid:0) S λn − u n ≤ (cid:1) , where u n → E . This requires to estimate µ j ( ( S λn − u n ) ∈ ( a, b ) ) = P µ j n n − X i =0 [log ( J λ ( x i )) − u n ] ∈ ( a, b ) ! , for j = 0 , φ jn,λ ( t ) := 1 n log (cid:18)Z e t P ni =1 log J λ ( σ i ( x )) d µ j ( x ) (cid:19) − t u n , for each n , λ and real value t .For j = 0 , λ ∈ [0 ,
1] and t ∈ R ,lim n →∞ n log (cid:18)Z e t P ni =1 log J λ ( σ i ( x )) d µ j ( x ) (cid:19) = P ( t log J λ + log J j ) . Then, for j = 0 , λ ∈ [0 ,
1] and t ∈ R , denote φ jλ ( t ) := lim n →∞ φ jn,λ ( t ) = P ( t log J λ + log J j ) − t E. We denote by P j,λ , for j = 0 , λ ∈ [0 , t → P j,λ ( t ) = P ( t log J λ + log J j ) , which is convex and also monotone decreasing in t . Moreover, P j,λ (0) = P (0 log J λ +log J j ) = 0, for j = 0 , λ ∈ [0 , ddt P j,λ ( t ) | t =0 = ddt P ( t log J λ + log J j ) | t =0 = Z log J λ dµ j . (6.6) .F. Ferreira, A.O. Lopes, S.R.C. Lopes t ∈ R , ddt P ( t log J λ + log J j ) | t = Z log J λ dµ t,λj < , (6.7)where µ t,λj is the equilibrium probability for t log J λ + log J j .The deviation function I λj for ( S λn − u n ), n ∈ N and for µ j , j = 0 ,
1, is I λj ( x ) = sup t (cid:2) tx − φ jλ ( t ) (cid:3) = sup t [ t ( x + E ) − P ( t log J λ + log J j )]= sup t [ t ( x + E ) − P j,λ ( t )] . (6.8)If x = v λj = v E,λj = − E + Z log J λ dµ j , then, t = 0 and, I λj ( v E,λj ) = 0 . (6.9)The suitable values v E,λj are the ones such that v E,λ < v E,λ >
0. For each fixed λ ,this will require that E ≥ Z log J λ dµ and E ≤ Z log J λ dµ . We will show there exist values λ such that it is possible to find a non-trivial interval for E . We just have to find values λ , such that Z log J λ dµ > Z log J λ dµ . (6.10)We claim that there are values of λ such that the expression (6.10) holds. Indeed, λ = 0 ⇒ Z log J dµ − Z log J dµ > , while λ = 1 ⇒ Z log J dµ − Z log J dµ < . There exists a value λ such that Z log J λ dµ − Z log J λ dµ = 0 , (6.11)that is, there exists a value λ such that Z log( λJ + (1 − λ ) J ) dµ = Z log( λJ + (1 − λ ) J ) dµ . In fact, consider the functions g j ( λ ) = Z log( λJ + (1 − λ ) J ) dµ j , for j = 0 , g (0) > g (0) and g (1) < g (1). From this fact, the claim follows. Thefunctions g j , for j = 0 ,
1, are concave and g is a decreasing function while g is anincreasing one. Besides, the point where the two graphs coincide is unique.0 LD for Dynamical Hypotheses Tests
Therefore, there exists a value λ s , such that for 0 ≤ λ < λ s , a non trivial interval ofsuitable parameters E exists and it holds that0 > Z log J λ dµ > E > Z log J λ dµ . (6.12)In this case, for such parameters E , we have v E,λ < v E,λ >
0. From now on weassume that E is in the interval described by the expression (6.12).When x = 0, for λ ∈ [0 , λ s ], for j = 0 ,
1, we get t E,λj ∈ R , for which P ′ j,λ ( t E,λj ) = ddt P ( t log J λ + log J j ) | t E,λj = E = Z log J λ dµ t E,λj , (6.13)where µ t E,λj is the equilibrium probability for t E,λj log J λ + log J j , where E satisfies (6.12).From the convexity argument and expression (6.7), for fixed λ and for j = 0 ,
1, thevalue t E,λj is monotonous increasing on E . That is, for fixed λ and for j = 0 ,
1, thefunction E → t E,λj satisfies ddE t
E,λj > . (6.14)Denote I E,λj (0) by I E,λj (0) : = t E,λj E − P ( t E,λj log J λ + log J j ) = t E,λj E − P j,λ ( t E,λj ) − (cid:20)Z log J j dµ t E,λj + h ( µ t E,λj ) (cid:21) = − (cid:20)Z log J j dµ t E,λj − Z log J j,λ,E ( µ t E,λj ) (cid:21) > , (6.15)where J j,λ,E is the Jacobian of the invariant probability µ t E,λj which is, by it turns, theequilibrium probability for t E,λj log J λ + log J j , for j = 0 , λ ∈ [0 , λ s ].Using expressions (6.15) and (6.8), when u n → E , one can rewrite them both, asmentioned in (6.3), by µ j { x | S λn > u n } ∼ e − I E,λj (0) n , (6.16)for j = 0 ,
1. Hence, in the notation of (6.4), we get C j ( λ, E ) = I E,λj (0), for j = 0 , λ and j = 0 , ddE h t E,λj P ′ j,λ ( t E,λj ) − P j,λ ( t E,λj ) i = ( t E,λj ) ( t E,λj ) ′ P ′′ j,λ ( t E,λj ) . (6.17)Since P (0 log J λ + log J ) = 0 ,ddt P ,λ ( t ) | t =0 = Z log J λ dµ and 0 > Z log J λ dµ > E. From the pressure convexity, we get t E,λ < E → I E,λ (0) decreases with E, (6.18) .F. Ferreira, A.O. Lopes, S.R.C. Lopes Figure 6.1: Graph of the function R ( λ ) = I E λ,λ (0), when 0 ≤ λ ≤ λ s , using the stochasticmatrix P j , for j = 0 ,
1, from the example in Section 7.for each fixed λ . As R log J λ dµ < E , we obtain t E,λ >
0. From this property, one canshow, in a similar way, that E → I E,λ (0) increases with E, (6.19)for each fixed λ .For fixed 0 ≤ λ ≤ λ s , consider the functions E ∈ (cid:20)Z log J λ dµ , Z log J λ dµ (cid:21) → y a ( E ) = I E,λ (0)and E ∈ (cid:20)Z log J λ dµ , Z log J λ dµ (cid:21) → y b ( E ) = I E,λ (0) . As y a ( R log J λ dµ ) = 0, it follows, from the decreasing monotonicity (see expression(6.18)), that y a ( R log J λ dµ ) >
0. In fact, R log J λ dµ − R log J λ dµ > y b ( R log J λ dµ ) = 0, from the increasing monotonicity (see expression(6.19)), we get y b ( R log J λ dµ ) >
0. Hence, for each λ , 0 ≤ λ ≤ λ s , there exists a point E λ , such that I E λ ,λ (0) = I E λ ,λ (0). The value E λ determines the best rate for the parameter λ . Note also that this point E λ belongs to the interval [ R log J λ dµ , R log J λ dµ ]. Fur-thermore, if λ p < λ q ≤ λ s , then, as g is a decreasing function and g is an increasingone, we have that [ R log J λ q dµ , R log J λ q dµ ] ⊂ [ R log J λ p dµ , R log J λ p dµ ]. Moreover, E λ s = R log J λ s dµ = R log J λ s dµ .Note that, for any 0 ≤ λ < λ s , the value Z log J λ s d µ t Eλs,λs ∈ (cid:20)Z log J λ dµ , Z log J λ dµ (cid:21) . From the expression (6.5), property (6.16) and the fact that I E,λ (0) decreases with E (while I E,λ (0) increases with E ), for each fixed λ , we get the best value E is when E = E λ (see definition above). That is, when t E λ ,λ E λ − P ,λ ( t E λ ,λ ) = I E λ ,λ (0) = I E λ ,λ (0) = t E λ ,λ E λ − P ,λ ( t E λ ,λ ) . (6.20)2 LD for Dynamical Hypotheses Tests
Figure 6.2: Graphs of the functions λ → R log J λ dµ (in blue) and λ → R log J λ dµ (inorange) together with the graph of the values E λ (in green), as a function of λ , when0 ≤ λ ≤ λ s . The stochastic matrix P j , for j = 0 ,
1, is from the example in Section 7.Then, we need to find the value ˜ λ , which maximizes λ → I E λ ,λ (0), simultaneouslywith (6.20) holds, among all λ , such that λ ∈ [0 , λ s ].Note that when λ = λ s , we have both I E λ ,λ (0) = 0 = I E λ ,λ (0), which is not a goodchoice.The function λ → I E λ,λ (0) is monotonous decreasing on λ ∈ [0 , λ s ]. Therefore, thelargest I E λ ,λ (0) occurs when λ = 0. This means that we need to take J λ = J .The value E belongs to the interval [ R log J dµ , R log J dµ ] and, by definition, I E , (0) = I E , (0).The value E is determined (see expression (6.13)) by the equation P ′ , ( t E , ) = E = P ′ , ( t E , ) . (6.21)From expression (6.15), the corresponding value of I E , (0) is given by I E , (0) = t E , E − P , ( t E , ) = − (cid:20)Z log J dµ t E , − Z log J , ,E d µ t E , (cid:21) > . (6.22)Expression (6.22) is a relative entropy.Finally, the best choice for the hypotheses test, under thermodynamic formalism sense,will be when the rejection region is of the form R n, = ( x ∈ Ω (cid:12)(cid:12)(cid:12)(cid:12) n n − X i =0 log J ( x i ) < u n ) , n ∈ N , (6.23)with u n → E , and E satisfies (6.21).In this case, π µ { x | S n ≤ u n } + π µ { x | S n > u n } ∼ e − I E , (0) n (6.24)will describe the best possible rate among λ for minimizing the probability of awrong decision . .F. Ferreira, A.O. Lopes, S.R.C. Lopes In this section, we present an example for the Min-Max Hypotheses test. Recall that,given the two by two line stochastic matrix P , the value of the Jacobian J on the cylinder i j has the constant value π i p ij π j = p ji , where π = ( π , π ) is the initial stationary vectorfor P .We consider the case where P and P are described by the following two columnstochastic matrices P :=
14 1234 12 ! and P :=
23 1513 45 ! . In this case, the best rate is described by (5.7) as it was explained in Section 5. Weshall present now some explicit values for this example.Using techniques given in [17] one can show that the maximizing probability (see [2])for log J − log J is an orbit of period two. More precisely, m (log J − log J ) = log (cid:0) (cid:1) which is a value close to c + ∼ . m (log J − log J ), which is realized by a orbitof period 1, we get c − ∼ . P ′ and P ′ , that is, infinding c + and c − . We will show the domain of the Legendre transform for P , which isthe same as for P .Define K = log( J /J ). It is true that lim t → + ∞ P ′ ( t ) = lim t → + ∞ P ′ ( t ), and this limitvalue (see [2]) is given by m ( K ), as long as exists a function u , the so-called a calibratedsubaction , such that max y [ K ( yx ) + u ( yx )] = m ( K ) + u ( x ) , for all x ∈ Ω. We claim that this equation is satisfied when m ( K ) = 12 log (cid:18) (cid:19) . We refer the reader to the reference [2] for the max-plus algebra’s properties in ErgodicOptimization. The proof of the claim will be as follows. Define the matrix, with ε = −∞ ,by W := K ε K εK ε K εε K ε K ε K ε K . Now, m ( K ) = M n =1 T r ⊕ ( W ⊗ n ) n is simply the maximum cyclic mean in the directed graph which has transition costs W ij from node i to node j . Here, we denote T r ⊕ the max-plus trace and W ⊗ n is the n -thmax-plus power of W . It is easy to see that the maximal cyclic mean in such graph isgiven by the mean W + W = log (cid:0) (cid:1) . In fact, from this the following matrix u = K + K − m ( K ) 0 K − m ( K ) K − m ( K ) ! LD for Dynamical Hypotheses Tests is a calibrated sub-action, that is, the calibrated subaction is determined by the matrix u . Hence, we conclude that lim t → + ∞ P ′ i =1 , ( t ) = m ( K ) = log (cid:0) (cid:1) .We can also compute lim t →−∞ P ′ i =1 , ( t ) = − m ( − K ) by following the same procedurebut now with the matrix − W (here we only change the sign of the finite terms in thematrix W ). In this way, we consider m ( − K ) = log (cid:18) (cid:19) and v = − K − m ( − K ) − K − m ( − K ) ! , which satisfies max T y = x {− K ( y ) + v ( y ) } = m ( − K ) + v ( x ). This means thatlim t →−∞ P ′ i =1 , ( t ) = − log (cid:18) (cid:19) . This concludes the example. ♦ The method described in this section can be adapted to other cases.
Acknowledgments
H.H. Ferreira was supported by CAPES-Brazil. A.O.Lopes and S.R.C. Lopes’ re-searchs were partially supported by CNPq-Brazil.
References [1] F. Abramovich and Y. Ritov, Statistical Theory: A Concise Introduction, Chapman-Hall, 2013.[2] A. Baraviera, R. Leplaideur and A. O. Lopes,
Ergodic Optimization, zero temperature and the Max-Plus algebra . 23 o Col´oquio Brasileiro de Matem´atica, IMPA, Rio de Janeiro, 2013.[3] R. R. Bahadur. Large deviations of the maximum likelihood estimate in the Markov chain case. In J.S. Rostag, M. H. Rizvi and D. Siegmund, editors, Recent advances in statistics, 273—283. AcademicPress, Boston, 1983.[4] T. Benoist, V. Jaksic , Y. Pautrat, C.-A. Pillet, On Entropy Production of Repeated QuantumMeasurements I. General Theory.
Commun. Math. Phys. , Vol. 357, 77–123, 2018.[5] L. D. Broemeling,
Bayesian Inference for Stochastic Processes , CRC Press, 2018.[6] D. Bohle, A. Marynych and M. Meiners, A Fundamental Problem of Hypotesis testing with finitee-commerce, arXiv, 2020.[7] J-R Chazottes and D Gabrielli, Large deviations for empirical entropies of g-measures.
Nonlinearity ,Vol. 18, 2545–2563, 2005.[8] T. Cover and J. Thomas,
Elements of Information Theory , second edition. New York: Wiley Press,2006[9] J. A. Buckle,
Large Deviation Techniques in Decision, Simulation and Estimation . New York: Wiley,1990.[10] A. Caticha, Entropic Physics: Lectures on Probability, Entropy and Statistical Physics, arXiv, 2021.[11] G.B. Cybis, S. R. C. Lopes and H.P. Pinheiro, Power of the Likelihood Ratio Test for Models ofDNA Base Substitution.
Journal of Applied Statistics , Vol. 38, 2723–2737, 2011. .F. Ferreira, A.O. Lopes, S.R.C. Lopes [12] M. Denker and W. Woyczynski. Introductory Statistics and Random Phenomena: Uncertainty, Com-plexity and Chaotic Behavior in Engineering and Science . New York: Birkh¨auser, 2012.[13] M. Denker, Basics of Thermodynamics, Lecture Notes - Penn State Univ., 2011.[14] R. Dakovic, M. Denker and M. Gordin, Circular unitary ensembles: parametric models and theirasymptotic maximum likelihood estimates.
Journal of Mathematical Sciences , Vol. 219(5), 714–730,2016.[15] A. Dembo and O. Zeitouni,
Large Deviation Techniques and Applications . New York: SpringerVerlag, 2010.[16] A.C.D. van Enter, A. O. Lopes, S. R. C Lopes and J. K. Mengue, How to get the Bayesian a posterioriprobability from an a priori probability via Thermodynamic Formalism for plans; the connection toDisordered Systems. To appear, 2020.[17] H.H. Ferreira, A.O. Lopes and E.R. Oliveira, An iteration process for approximating subactions. Toappear in “Modeling, Dynamics, Optimization and Bioeconomics IV” Editors: Alberto Pinto andDavid Zilberman, Springer Proceedings in Mathematics and Statistics, Springer Verlag, 2020.[18] V. Girardin and P. Regnault P, Escort distributions minimizing the Kullback–Leibler divergence fora large deviations principle and tests of entropy level.
Ann Inst Stat Math. , Vol. 68, 439—468, 2016.[19] V. Girardin, L. Lhote and P. Regnault, Different Closed-Form Expressions for Generalized EntropyRates of Markov Chains.
Methodology and Computing in Applied Probability , Vol. 21, 1431–1452,2019.[20] S. Ihara,
Information Theory for continuous systems . World Scientific, 1993.[21] M.J. Karling, S.R.C. Lopes and R.M. de Souza, A Bayesian Approach for Estimating the Parametersof an α -Stable Distribution. Journal of Statistical Computation and Simulation, available online at https://doi.org/10.1080/00949655.2020.1865958 , 2020.[22] Y. Kifer, Large Deviations in Dynamical Systems and Stochastic processes, TAMS , Vol. 321(2),505–524, 1990.[23] A.O. Lopes, S.R.C. Lopes and P. Varandas, Bayes posterior convergence for loss functions via almostadditive Thermodynamic Formalism, arXiv, 2020.[24] A.O. Lopes and J.K. Mengue, On information gain, Kullback-Leibler divergence, entropy productionand the involution kernel, arXiv, 2020.[25] A.O. Lopes, Entropy, Pressure and Large Deviation,
Cellular Automata, Dynamical Systems andNeural Networks , E. Goles e S. Martinez (eds.), Kluwer, Massachusets, 79–146, 1994.[26] A. O. Lopes, Entropy and Large Deviation,
NonLinearity , Vol. 3(2), 527–546, 1990.[27] A.O. Lopes,
Thermodynamic Formalism, Maximizing Probabilities and Large Deviations , Preprint -UFRGS.[28] K. McGoff, S. Mukherjee and A. Nobel, Gibbs posterior convergence and Thermodynamic formalism,arXiv, 2019.[29] W. Parry and M. Pollicott. Zeta functions and the periodic orbit structure of hyperbolic dynamics.
Ast´erisque , Vol. 187-188, 1990.[30] V. K. Rohatgi,
An Introduction to Probability Theory and Mathematical Statistics . New York: Wiley,1976.[31] T. Sagawa, Entropy, divergence and majorization in classical and quantum theory, arXiv, 2020. LD for Dynamical Hypotheses Tests [32] Y. Suhov and M. Kelbert, Probability and Statistics by Example. II, Cambridge: Cambridge Uni-versity Press, 2014.[33] W. von der Linden, V. Dose and U. von Toussaint.