Tests on components of density mixtures
aa r X i v : . [ m a t h . S T ] M a y Test on the components of mixture densities
Florent AUTIN ∗ (Universit´e Aix-Marseille 1)and Christophe POUET † (Ecole Centrale Marseille)September 17, 2018 Abstract
This paper deals with statistical tests on the components of mixturedensities. We propose to test whether the densities of two independentsamples of independent random variables Y , . . . , Y n and Z , . . . , Z n re-sult from the same mixture of M components or not. We provide a testprocedure which is proved to be asymptotically optimal according to theminimax setting. We extensively discuss the connection between the mix-ing weights and the performance of the testing procedure and illustrate itwith numerical examples. This link had never been clearly exposed up tonow. Since more than 20 years, the mixture model has gained a lot of attention. Thisis due to its ease of interpretation by viewing each component as a distinctgroup in the data. This model has been widely applied in several areas such asfinance, economy, biology, astronomy, survey methods,...Most of the theoretical results in the literature deal with the estimation of thecomponents or of the mixing weights. There are two types of mixture models :the most popular one has fixed mixing weights and the other one has varyingmixing weights.On the one hand, many statisticians have been interested in estimating themixing weights. For example, Hall [12], Titterington [24] and Hall and Tit-terington [13] have considered nonparametric estimation of the mixing weights. ∗ Address : C.M.I., 39 rue F. Joliot Curie, 13453 Marseille Cedex 13. Universit´e Aix-Marseille 1. FRANCE. Email: [email protected] † Address : Ecole Centrale Marseille, 38 rue F. Joliot-Curie, 13451 Marseille Cedex 20.FRANCE. Email : [email protected] 2010 subject classification: Primary: 62C20, 62G10, 62G20; Secondary: 30H25, 42C40Key words and phrases: Besov spaces, minimax theory, mixture model, nonparametric tests,wavelet decomposition Y , . . . , Y n and Z , . . . , Z n be two inde-pendent n -samples of independent random variables. We propose to study inthis paper whether these two samples of random variables come from the samemixture of M unknown densities p u (1 ≤ u ≤ M ) or not. We assume that themixing weights associated with each observation are available to the statisti-cian. In Butucea and Tribouley [4] some procedures are proposed to test if two n -samples of i.i.d. variables have common probability density. Their settingis equivalent to the case M = 1 in our mixture problem. Here the problemappears more complex since the two samples are not based on random variableswith the same marginal densities. Our results show that there is no loss in theminimax rate compared to the simpler case studied by Butucea and Tribouley[4]. In Section 2 we provide an asymptotically minimax test which is based onwavelet methods and we prove the dependence between the mixing weights andthe constants appearing in the definition of the minimax rate of testing. Untilnow this phenomenon has never been studied and is extensively discussed in3his paper. In addition to our theoretical result some numerical experimentsare given in Section 3 in order to illustrate the strong connection between themixing weights and the performance of the test. As expected, our test performsvery well for various mixture models. Sections 4 and 5 are respectively devotedto possible extensions of work and to proofs of main results.Here we introduce the wavelet framework that will be used. We first recall that wavelets have been often applied in different mathematicalfields such as in approximation theory, in signal analysis and in statistics forinstance. In particular, many recent statistical works on estimation (see amongothers Autin [1], Donoho et al [9], Cohen et al [5] ) and on hypothesis testing(see Spokoiny [23]) use the wavelet setting to provide efficient estimators andtests. There are many explanations for the huge interest of the wavelet setting.One of them is that wavelets bases are localized both in frequency and in time,contrary to the classical Fourier basis which is only localized in frequency. Asa consequence, the wavelet setting appears to be well adapted to describe localcharacteristics of a signal to be reconstructed.Let φ and ψ be two compactly supported functions of L ( R ) and denote for all j in N and all k in Z and all x in R , φ jk ( x ) = 2 j/ φ (2 j x − k ) and ψ jk ( x ) =2 j/ ψ (2 j x − k ).Suppose that for any j in N : • { φ jk , ψ j ′ k ; j ′ ≥ j ; k ∈ Z } constitutes an orthonormal basis of L ( R ), • support ( φ ) ∪ support ( ψ ) ⊂ [ − L, L [ for some
L > φ is calledthe scaling function and ψ the associated wavelet.Any function h in L ( R ) can be represented as: h ( t ) = X k ∈ Z α jk φ jk ( t ) + X j ′ ≥ j X k ∈ Z β jk ψ j ′ k ( t )where ∀ j ∈ N , ∀ j ′ ≥ j, ∀ k ∈ Z : • α jk = Z I jk h ( t ) φ jk ( t ) dt and β j ′ k = Z I j ′ k h ( t ) ψ j ′ k ( t ) dt , • I jk = (cid:8) x ∈ R ; − L ≤ j x − k < L (cid:9) = (cid:2) k − L j , k + L j (cid:2) . Let us now describe the testing problem we focus on.4 .3 Mathematical description of the testing problem
Let Y , . . . , Y n be a sample of independent random variables with unknownmarginal densities f i ( . ) = M X u =1 ω u ( i ) p u ( . ) , ≤ i ≤ n, and let Z , . . . , Z n be another sample of independent random variables withunknown marginal densities g i ( . ) = M X u =1 σ u ( i ) q u ( . ) , ≤ i ≤ n. We also assume that the two samples are independent.Here and in what follows, we suppose that the mixing weights ( ω u ( i ) , ≤ u ≤ M , 1 ≤ i ≤ n ) and ( σ u ( i ) , ≤ u ≤ M, ≤ i ≤ n ) are known to the statisticianand satisfy • ∀ ( u, i ) ∈ { , . . . , M } × { , . . . , n } , min( ω u ( i ) , σ u ( i )) ≥ • ∀ i ∈ { , . . . , n } , M X u =1 ω u ( i ) = M X u =1 σ u ( i ) = 1,and are known by the statistician whereas the densities p u and q u (1 ≤ u ≤ M )are unknown.Let us denote −→ p = ( p , . . . , p M ) and −→ q = ( q , . . . , q M ) . We study in this paper a nonparametric procedure to test whether the samplesresult from the same mixture of densities. Let D denote the set of all probabilitydensities with respect to the Lebesgue measure on R . For any real number R >
0, we defineΘ ( R ) = { ( −→ p , −→ q ) : ∀ u ∈ { , . . . , M } , p u = q u ∈ S ( R ) } where S ( R ) = D ∩ L ∞ ( R ) ∩ L ( R ).We consider the following null hypothesis H : ( −→ p , −→ q ) ∈ Θ ( R ) . For a given
C >
0, we defineΘ ( R, C, n, s ) = n ( −→ p , −→ q ) : ∀ u ∈ { , . . . , M } , p u − q u ∈ B s , ∞ ( R ) , ∃ u ∈ { , . . . , M } , ( p u , q u ) ∈ Λ n ( R, C ) o , where Λ n ( R, C ) = (cid:8) ( p, q ) ∈ ( D ∩ L ∞ ( R )) , k p − q k ≥ Cr n (cid:9) , for a sequence r n tending to 0 when n goes to infinity and B s , ∞ ( R ) is the R -ball of a functionalspace defined below. We consider the following alternative H : ( −→ p , −→ q ) ∈ Θ ( R, C, n, s ) .
5s usual in the nonparametric setting, we focus on a large class of functionshaving some regularity so as to derive optimal properties. For the chosen waveletbasis, the space B s , ∞ ( R ) represents the R -ball of the so-called Besov body whichis composed of all the functions h ∈ L ( R ) for which the sequence of waveletcoefficients ( α jk , β j ′ k , j ∈ N , j ′ ≥ j, k ∈ Z ) satisfies:sup j ∈ N js X j ′ ≥ j X k ∈ Z β j ′ k ≤ R. The minimax settingIn this paragraph we recall the minimax approach which is often used to eval-uate the performances of testing procedures. Given the sum of the probabilityerrors, say γ ∈ [0 , r n between the nullhypothesis and the alternative. This rate r n is the best possible rate separatingat least one of the M couples of density components p u and q u . It is usuallycalled the minimax rate . Let us recall the classical definition for the separationrate. Definition 1.1
Let < γ < . We say that r n is the minimax rate separating H and H of our testing problem at level γ if the two following statements aresatisfied:1. there exist a sequence of test procedures ∆ ∗ n and a constant C γ such that lim sup n →∞ sup ( −→ p , −→ q ) ∈ Θ ( R ) P −→ p , −→ q (∆ ∗ n = 1) + sup ( −→ p , −→ q ) ∈ Θ ( R,C,n,s ) P −→ p , −→ q (∆ ∗ n = 0) ! ≤ γ (1) for all C > C γ ;2. there exists a constant c γ such that lim inf nto ∞ inf ∆ sup ( −→ p , −→ q ) ∈ Θ ( R ) P −→ p , −→ q (∆ = 1) + sup ( −→ p , −→ q ) ∈ Θ ( R,C,n,s ) P −→ p , −→ q (∆ = 0) ! > γ (2) for all C < c γ , where the infimum is taken over all test procedures ∆ . Hypothesis on the modelIn our study we suppose that the mixing weights ( ω u ( i ) , ≤ u ≤ M, ≤ i ≤ n )and ( σ u ( i ) , ≤ u ≤ M, ≤ i ≤ n ) satisfy an added hypothesis. Let us denoteby Ω = (Ω) u,i the matrix with coefficients Ω u,i = ω u ( i ) and Σ = (Σ) u,i thematrix with coefficients Σ u,i = σ u ( i ) . HYP-1 The smallest eigenvalues of the ( M × M )-matrices Γ n = ΩΩ ∗ and Γ ′ n =ΣΣ ∗ are both larger than or equal to Kn , with 0 < K < . We recall the following proposition due to Maiboroda [19].6 roposition 1.1
Suppose that the previous conditions are satisfied by the mix-ing weights ( ω u ( i ) , ≤ u ≤ M, ≤ i ≤ n ) and ( σ u ( i ) , ≤ u ≤ M, ≤ i ≤ n ) associated with the model. Then, there exists a solution of the two problems [ find a l = { a l ( i ) , i = 1 , . . . , n } such that < ω k , a l > n := n n X i =1 ω k ( i ) a l ( i ) = δ kl ] , [ find b l = { b l ( i ) , i = 1 , . . . , n } such that < σ k , b l > n := n n X i =1 σ k ( i ) b l ( i ) = δ kl ] , where δ kl is the Kronecker delta. According to HYP- this solution satisfies M X l =1 < a l , a l > n := 1 n M X l =1 n X i =1 a l ( i ) ≤ MK , (3) M X l =1 < b l , b l > n := 1 n M X l =1 n X i =1 b l ( i ) ≤ MK . (4)
This paragraph deals with the case where the regularity s of the Besov bodythat appears in H is known. From now on we denote by a l and b l the n -vectors which are the solutions of the two optimization problems appearing inProposition 1.1. Let us describe the asymptotically minimax decision rule. For each level parameter j , we define the test procedure ∆ j comparing the teststatistic T j = 1 n M X l =1 X k X i = i [ a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )] [ a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )]with a threshold value t n = t r n where t is a constant chosen later. We define∆ j = (cid:26) T j > t n , T j ≤ t n . In this section, we provide two propositions which will be crucial when evaluatingthe performance of our test procedure. They deal with the behaviors of itsexpectation and its variance.
Proposition 2.1
Let j be any given level parameter. Then, E −→ p , −→ q ( T j ) = M X l =1 X k (cid:18)Z R ( p l − q l ) φ jk (cid:19) − n M X l =1 X k n X i =1 (cid:18)Z R ( a l ( i ) f i − b l ( i ) g i ) φ jk (cid:19) . emark 2.1 For the particular case where the sequences of the mixing weights ( ω u ( i ) , ≤ u ≤ M, ≤ i ≤ n ) and ( σ u ( i ) , ≤ u ≤ M, ≤ i ≤ n ) are identical,the test statistic T j is centered under the null hypothesis. Corollary 2.1
For any j ∈ N , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E −→ p , −→ q ( T j ) − M X l =1 X k (cid:18)Z R ( p l − q l ) φ jk (cid:19) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ LM R Kn .
Proposition 2.2
There exists a constant C T = C T ( R, L, k φ k ∞ ) > such that V ar −→ p , −→ q ( T j ) ≤ C T j n + 1 n X l k p l − q l k + r j n X l k p l − q l k ! M K . Remark 2.2
Under the null hypothesis the variance of the test statistic T j isless than or equal to C T M K − j n − . For any s >
0, let ( r n ) n ∈ N be the sequence such that r n = n − s s ∀ n ∈ N ∗ . The following theorem shows that the test procedure defined in section 2 pro-vides an accurate upper bound when it is well calibrated.
Theorem 2.1 (Upper bound)
Fix < γ < and consider the test procedure ∆ ∗ s = ∆ j n where j n is the smallest integer such that − j n ≤ n − s . Let t and C γ be two positive real numbers defined as follows : t = s C T γ + 8 LR ! MK ,C γ = 2 K s C T γ + R + tM ! . Then lim sup n →∞ sup ( −→ p , −→ q ) ∈ Θ ( R ) P −→ p , −→ q (∆ ∗ s = 1) + sup ( −→ p , −→ q ) ∈ Θ ( R,C,n,s ) P −→ p , −→ q (∆ ∗ s = 0) ! ≤ γ (5) for all C > C γ . C T is very complicated, it can beexactly calculated by following the proofs.Now, let us focus on the lower bound associated with our nonparametric testingproblem H versus H .We aim at providing a constant c γ such that we ensure that no test procedureis able to choose H or H with a sum of the probability errors less than γ (0 < γ < c γ and C γ the moreaccurate our results. The next theorem proves that our test procedure is asymp-totically minimax.Similarly to the classical methods for providing lower bounds (see for instanceGayraud and Pouet [21] or Butucea and Tribouley [4]) we shall consider a sub-space of Λ n ( R, C ) that is, for any chosen C > , ˜Λ n ( R, C, C ) = (cid:26) ( p, q ) ∈ Λ n ( R, C ); inf z ∈ [0 , min( p ( z ) , q ( z )) ≥ C (cid:27) . (6) Theorem 2.2 (Lower bound)
Let < γ < , s > and let c γ > satisfy c γ = (cid:18) C L K ln[4(1 − γ ) + 1] ∧ R (cid:19) − s M . Then for all
C < c γ lim inf n →∞ inf ∆ sup ( −→ p , −→ q ) ∈ Θ ( R ) P −→ p , −→ q (∆ = 1) + sup ( −→ p , −→ q ) ∈ Θ ( R,C,n,s ) P −→ p , −→ q (∆ = 0) ! > γ (7) where the infimum is taken over all test procedure ∆ . From Theorems 2.1 and 2.2 we deduce the minimax rate of testing. It is the sameas the one found by Butucea and Tribouley [4] when there is only one subgroup.Advances in our results are the extension to the varying mixing weights modelwhich allows non-identically distributed random variables compared to Butuceaand Tribouley [4] and the role played by the mixing weights which is clearlyexposed.
Corollary 2.2
For any s > , the test procedure ∆ ∗ s is asymptically minimaxand the minimax rate separating H and H is r n = n − s s . c γ and C γ In the two previous theorems we exhibited two constants appearing in the upperand the lower bounds. We think that the connection between these constantsand the model’s parameters M and K is a novelty and really deserves a discus-sion. Indeed, we keep in mind that • C γ is the minimal value for C such that our test statistic is able to detectif all the mixture components are identical in the two populations withthe sum of the probability errors not exceeding γ ;9 c γ is the maximal value for C such that no test statistic is able to detect ifall the mixture components are identical in the two populations with thesum of probability errors not exceeding γ .As a consequence we proved that our test statistic is optimal in the minimaxsense since it attains the minimax rate of convergence separating H and H .According to the definitions of c γ and C γ we let the reader be aware that: • the smaller the constant K , the larger the family of the mixing weightssatisfying HYP-1; • the smaller the constant M , the bigger (= the worse) the constant C γ andthe bigger the constant c γ ; • the smaller the constant K , the bigger (= the worse) the constant C γ andthe bigger the constant c γ .Although the exact separation constant is not established in this study (since c γ = C γ ), we prove that c γ and C γ strongly depend on the smallest eigenvalueof the matrices ΩΩ ∗ and ΣΣ ∗ . The aim of this section is twofold: to illustrate by numerical experiments thegood performance of the test procedures based on the statistics T j n and to showthe usefulness of our method on real data.First, 2 examples of mixture models are given to show the interest of the problemwe have considered. Next we illustrate the behaviour of the test statistics T j n . Figure 1: [Mixture with two components]Consider two populations sampled from the same mixture densities such that • the size of the two populations ( Y, Z ) is n = 500, • the ranks of the matrices of the mixing weights Ω ∗ and Σ ∗ are 2, • the two components of the mixtures are the uniform density U ([ − , N (3 , . Figure 2: [Mixture with three components]Consider two populations sampled from the same mixture densities such that • the size of the two populations ( Y, Z ) is n = 500, • the ranks of the matrices of the mixing weights Ω ∗ and Σ ∗ are 3, • the three components of the mixtures are the normal densities N ( − , , N (0 ,
1) and N (2 , Figure 1: Histogram (a) of population Y and histogram (b) population Z . −8 −6 −4 −2 0 2 4 6 8 10 120102030405060708090100 −8 −6 −4 −2 0 2 4 6 8 10 12020406080100120 Figure 2: Histogram (a) of population Y and histogram (b) of population Z .The histograms of the observations are quite different in Figures 1 and 2, al-though they correspond to mixture models with the same components. So theprevious schemes show how hard it is to guess whether the mixture componentsof the two populations ( Y, Z ) are exactly the same or not. Hence, it justi-fies that the statistician needs an adequate test statistic to decide whether thepopulations (
Y, Z ) have the same mixture components or not. t n In the theoretical part of this paper we provide a decision rule to test H against H . This decision rule ∆ j n relies on the sign of T j n − t n , where t n is the thresholdvalue depending on the sum of the errors γ and T j n is the test statistic. In thepositive case (resp. in the negative case) ∆ j n proposes to accept H (resp. H ).From the practical point of view, we give some hints to adjust the thresholdvalue t n . Here we use the Haar basis and we set s = 4. For this, we considertwo different approaches. 11he first approach consists in fixing the first type error, 0 < γ <
1, and inchoosing t n as the quantile of order 1 − γ of the test statistic obtained after1000 replications of the chosen mixture model.The second approach consists in choosing t n as the value for which the sum ofthe two errors is the minimal one according of the statistic of test obtained after1000 replications of the mixture model chosen. K and the performance of thetest procedure. The aim of this paragraph is to illustrate the connection between the value of K and the performance of our test procedure. We provide simulations of Gaussianmixture models and we give for several values of n • the value of t n associated with a first type error equal to 10%, • the power of the test procedure based on the threshold value t n , • the minimum of the global error γ opt - the sum of the first type and thesecond type errors - reachable by the test procedure, • the value t opt which corresponds to the global error γ opt .We consider two samples: Y , . . . , Y n and Z , . . . , Z n . Two mixture componentsare such that • under H , p ( · ) = q ( · ) ∼ N ( − ,
1) and p ( · ) = q ( · ) ∼ N (3 , • under H , p ( · ) ∼ N ( − , p ( · ) ∼ N (3 , q ( · ) ∼ N (0 ,
1) and q ( · ) ∼N (1 , Y and Z for Gaussian Model 1 are described in Table 1.Sample Range of i σ ( i ) or ω ( i ) σ ( i ) or ω ( i )Y i = 1 , . . . , . n . . i = 0 . n + 1 , . . . , n . . i = 1 , . . . , . n . . i = 0 . n + 1 , . . . , n . . K related tothe smallest eigenvalue is very close to 0. Therefore we expect poor results.12aussian Model 1 n = 200 n = 500 n = 1000 t n γ opt t opt K = 0 . Y and Z for Gaussian Model 2 are described in Table 3.Sample Range of i σ ( i ) or ω ( i ) σ ( i ) or ω ( i )Y i = 1 , . . . , . n . . i = 0 . n + 1 , . . . , n . . i = 1 , . . . , . n . . i = 0 . n + 1 , . . . , n . . K is almost three times the one appearing in Gaus-sian Model 1. Therefore we expect improved results.Gaussian Model 2 n = 200 n = 500 n = 1000 t n γ opt t opt K = 0 . Y and Z for Gaussian Model 3 are described in Table 5.Sample Range of i σ ( i ) or ω ( i ) σ ( i ) or ω ( i )Y i = 1 , . . . , . n . . i = 0 . n + 1 , . . . , n . . i = 1 , . . . , . n . . i = 0 . n + 1 , . . . , n . . K is more than five times the one appearing inGaussian Model 1 and more than twice the one appearing in Gaussian Model2. Therefore we expect better results. 13aussian Model 3 n = 200 n = 500 n = 1000 t n P γ opt t opt K = 0 . , n , the larger the value of K , the better the performance of the test procedure.Indeed, when the first type error is 10%, we see that increasing values of K increases the power of the test procedure. Moreover, we remark that the optimalglobal error γ opt increases when the value of K decreases. In fact, this is notsurprising as this behaviour was predicted by our theoretical results: the smallerthe value of K the larger the constant C γ (see Theorem 2.1). In other words,in a mixture model with a small value of K one needs a lot of observations toensure good performance of our test procedure. In this part we apply our results to real data. The dataset comes from a surveyconducted by the french national statistical agency called InstituT National deStatistique et d’Etudes Economiques (abbreviated to INSEE). This survey called
D´eclaration Annuelle des Donn´ees Sociales I below) and the one done by men in all other regions ofFrance (abbreviated to P below),2. working time of women in Ile-de-France and the one done by women in allother regions of France.In this study we decide to only consider highly skilled workers such as executivestaff, managers. There are two populations: • commercial and administrative staff (abbreviated to CAd ), • technical staff (abbreviated to Tech ).We restrict to people working more than 1 645 hours per year. The variable ofinterest is the number of working hours per year divided by 1 645. Therefore itis a ratio equals to or greater than 1.Available information about different subpopulations of I and P is gathered inthe following table: 14le-de-France ( I ) Other regions ( P )Executive staff Men Women Men Women CAd .
99% 41 .
01% 67 .
96% 32 . Tech .
08% 18 .
92% 86 .
72% 13 . I and 75 062 people in P .To begin, we pay attention to the mean of the working-ratio of each population,namely m I and m P . Although information about sex (men or women) is avail-able in the study conducted by INSEE, we assume that it is unknown in orderto show the interest of our model.Let σ I and σ P denote the standard deviations of population I and P accordingto the variable of interest. We suppose that a random sampling of order n =5 000 in each population is available and is conducted as follows: • I are CAd and 2 500 people living in I are Tech , • P are CAd and 2 500 people living in P are Tech .We are interested in the preliminary testing problem ( T ): H : m I = m P vs H : m I = m P . We decide to address this testing problem by using the test statistic U = | ˆ m I − ˆ m P | p ˆ σ I + ˆ σ P , where ˆ m I (resp. ˆ m P ) and ˆ σ I (resp. ˆ σ P ) denote the usual estimators of m I (resp. m P ) and σ I (resp. σ P ), when using stratified random samplings likeours. Under the null hypothesis H , the random variable U is asymptoticallynormally distributed with mean 0 and variance 1.Here are the values computed from the samples:Ile-de-France ( I ) Other regions ( P )ˆ m I = 1 . m P = 1 . σ I = 0 . σ P = 0 . U is 3 . p -value is close to 0 . m I = m P . In other words, H is re-jected.At this stage, a natural question arises : what is the reason of such a difference?Two hypotheses could explain it: 15. distincts values of m I and m P are only related to the different proportionsof men (or analogously women) between the two populations:Ile-de-France ( I ) Other regions ( P )Men 68 .
93% (45187) 76 .
70% (57575)Women 31 .
07% (20371) 23 .
30% (17487)Table 9: Proportions of subpopulations by area and sex2. distincts values of m I and m P are also related to different distributions ofworking-ratio of population I (abbreviated to W.R. ( I ) ) and working-ratioof population P (abbreviated to W.R. ( P ) ).Trusting one of these new hypotheses becomes at first glance difficult to arguewhen only considering two random samples of size n in each population withoutthe knowledge of sex (man or woman). Nevertheless, a way to address thetesting problem ( T ): H ′ : distributions of W.R. ( I ) and W.R. ( P ) conditionnally to sex are identical vs H ′ : distributions of W.R. ( I ) and W.R. ( P ) conditionnally to sex are differentis to consider our testing procedure.Let p and p (resp. q and q ) denote the density functions of the random vari-ables W.R. ( I ) | man and W.R. ( I ) | woman (resp. W.R. ( P ) | man and W.R. ( P ) | woman ).The testing problem T can be written as follows: H ′ : p = q and p = q vs H ′ : p = q or p = q . Observations of the working-ratio random variables Y , . . . , Y n (resp. Z , . . . , Z n )in population I (resp. in P ) are available. The mixture model we get is the onedescribed in Section 1 . • M = 2 and n = 5 000, • ( ω ( i ) , ω ( i )) = (0 . , . n/ • ( ω ( i ) , ω ( i )) = (0 . , . n/ • ( σ ( i ) , σ ( i )) = (0 . , . n/ • ( σ ( i ) , σ ( i )) = (0 . , . n/ s = 4 andchoose the usual Haar wavelet to construct our test statistic T j s . The thresholdvalue of the testing procedure is computed according to the following heuristics: t = st α where t α is the 1 − α Gaussian quantile and s is the standard deviation16f the test statistics estimated by bootstrap (resampling is made 200 times). Aswe choose α = 10%, we have t . = 1 . T j s obtained is t j s = 0 . t =0 . t j s is larger than the threshold value t , we conclude that there ex-ists a difference between the distributions W.R. ( I ) and W.R. ( P ) conditionnallyto sex. In other words, H ′ is rejected.In this last paragraph, we study the numerical performances of our testing pro-cedure, built from T j s . For several values of n , a sample of size n is drawn from I (resp. P ) and is divided into two subsamples : one subsample of size n/ CAd and the other is drawn from the subpopu-lation
Tech .For each value of n , 200 samples are drawn. The results are gathered in thefollowing table:Sample size n First type error: E ( n ) I First type error: E ( n ) P Power1 000 0 0 0 . .
005 0 . .
005 0 .
005 0 . .
005 0 . .
005 0 0 . .
005 0 0 . .
005 0 .
005 0 . − First type error E ( n ) I is the proportion of observations of T j s larger than thethreshold value, when comparing two samples of size n in I . − First type error E ( n ) P is the proportion of observations of T j s larger than thethreshold value, when comparing two samples of size n in P . − Power is the proportion of observations T j s larger than the threshold value,when comparing a sample of size n in I and a sample of size n in P . It appears that the testing procedure with the heuristically chosen threshold isvery conservative. This is the only drawback of our methodology. Neverthelessthe behaviour of the testing procedure is as expected: the larger the samplesize the larger the power. As we see, for the cases n ≥ As a conclusion, we have provided a statistical procedure for a testing problemon the mixture components of two populations (
Y, Z ). This one was provedto be optimal in the minimax sense (Theorems 2.1 and 2.2). In addition, weexplained how the weights of the mixture model influence the performance ofthe statistical rule. All these theoretical results are illustrated by our numericalexperiments.It seems to us important to give some hints about possible extensions of thiswork. From the theoretical and practical points of view, it would be interestingto study the same problem without assuming that the mixing weights are exactlyknown to the statistician. Several explanations can be given • the statistician can estimate the mixing weights for an observation byusing covariates and an appropriate predictive model such as the logisticone, • a Bayesian approach is chosen for the mixing weights, • exogenous information allows the statistician to roughly estimate the mix-ing weights.In this case several natural questions arise • What statistical rule should be considered? • What kind of performance can be expected for such a rule? • How much do random mixing weights deteriorate the performance?Such questions are beyond the scope of this article and their answers certainlyinvolve random matrices theory.Finally, it would be nice to show how to choose the adequate value of t n in abetter way than the complicated one given in Theorem 2.2. This section is devoted to the proofs of our results. The proofs often needtechnical lemmas which shall be proved in Appendix. For the sake of simplicitywe sometimes omit −→ p and −→ q in the indices when there is no ambiguity.18 .1 Proofs of Propositions and Corollaries Proof of Proposition 1.1 : We refer to Maiboroda [19]. A solution of the twoproblems is given for any ( l, i ) ∈ { , . . . , M } × { , . . . , n } by a l ( i ) = 1 det (Γ n ) M X u =1 ( − l + u γ lu ω u ( i ) b l ( i ) = 1 det (Γ ′ n ) M X u =1 ( − l + u γ ′ lu σ u ( i )where γ lu and γ ′ lu are respectively the minor ( l, u ) of the matrix Γ n and the minor( l, u ) of the matrix Γ ′ n . Inequalities (3) and (4) are obtained by using lemma 6.1. (cid:3) Proof of Proposition 2.1 : Let us evaluate the expectation of T j . E −→ p , −→ q ( T j ) = E −→ p , −→ q n M X l =1 X k X i = i ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i ))( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) = 1 n M X l =1 X k X i = i E −→ p , −→ q [ a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )] E −→ p , −→ q [ a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )] , since the random variables ( Y i , Z i ) and ( Y i , Z i ) are independent.We have for all 1 ≤ i ≤ n , E −→ p , −→ q [ a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )] = Z R M X u =1 ( a l ( i ) ω u ( i ) p u − b l ( i ) σ u ( i ) q u ) ! φ jk . By introducing the diagonal term i = i in the sum, we get E −→ p , −→ q ( T j ) = 1 n M X l =1 X k Z R φ jk n X i =1 M X u =1 a l ( i ) ω u ( i ) p u − n X i =1 M X u =1 b l ( i ) σ u ( i ) q u !! − n M X l =1 X k n X i =1 (cid:18)Z R ( a l ( i ) f i − b l ( i ) g i ) φ jk (cid:19) = M X l =1 X k (cid:18)Z R ( p l − q l ) φ jk (cid:19) − n M X l =1 X k n X i =1 (cid:18)Z R ( a l ( i ) f i − b l ( i ) g i ) φ jk (cid:19) , because of the two properties 1 n n X i =1 a l ( i ) ω u ( i ) = δ lu and 1 n n X i =1 b l ( i ) σ u ( i ) = δ lu .Thus the result for the expectation is proved. (cid:3) roof of Corollary 2.1 :According to proposition 2.1 we only have to bound the quantity1 n M X l =1 X k n X i =1 (cid:18)Z R ( a l ( i ) f i − b l ( i ) g i ) φ jk (cid:19) . Using the Cauchy-Schwarz inequality and lemma 6.3, we have M X l =1 X k n X i =1 (cid:18)Z R ( a l ( i ) f i − b l ( i ) g i ) φ jk (cid:19) ≤ M X l =1 X k n X i =1 Z I jk ( a l ( i ) f i − b l ( i ) g i ) Z φ jk = M X l =1 n X i =1 "X k Z I jk ( a l ( i ) f i − b l ( i ) g i ) ≤ n X i =1 M X l =1 "X k Z I jk ( a l ( i ) f i ) + Z I jk ( b l ( i ) g i ) ≤ L n X i =1 M X l =1 a l ( i ) k f i k + n X i =1 M X l =1 b l ( i ) k g i k ! ≤ LM R nK . Last inequality is due to proposition 1.1 and the fact that for all 1 ≤ i ≤ n thedensity functions f i and g i belong to L ( R ). (cid:3) Proof of Proposition 2.2 : Let us consider the variance of T j . For all ( i , i ), let h j ( i , i ) denote the quantity h j ( i , i ) = X k M X l =1 ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) . T j satisfies n V ar −→ p , −→ q ( T j ) = V ar −→ p , −→ q X i = i h j ( i , i ) = X i = i ,i = i C ov ( h j ( i , i ) , h j ( i , i ))= X i = i V ar ( h j ( i , i )) + X i = i C ov ( h j ( i , i ) , h j ( i , i ))+ X i = i = i C ov ( h j ( i , i ) , h j ( i , i )) + X i = i = i C ov ( h j ( i , i ) , h j ( i , i ))+ X i = i = i C ov ( h j ( i , i ) , h j ( i , i )) + X i = i = i C ov ( h j ( i , i ) , h j ( i , i ))+ X i = i = i = i C ov ( h j ( i , i ) , h j ( i , i ))= X u =1 A i . Using independence arguments, A = X i = i = i = i C ov ( h j ( i , i ) , h j ( i , i )) = 0 . We are still required to bound for the quantities A i (1 ≤ i ≤ . Since the ways tobound A and A (resp. A , A , A and A ) are similar, we will only bound A and A . Such bounds are given in lemmas 6.7 and 6.8. The proof of proposition2.2 is a direct consequence of lemmas 6.7 and 6.8 by taking C T = 2 ¯ C T ∨ C T . (cid:3) Proof of Theorem 2.1.
Let us fix 0 < γ < s >
0. Under the null hypothesis, we use directly thewell-known Bienayme-Chebyshev inequality. P −→ p , −→ p (∆ ∗ s = 1) = P −→ p , −→ p ( T j n > t n ) ≤ P −→ p , −→ p (cid:18) T j n − E ( T j n ) > t n − LM R Kn (cid:19) ≤ V ar −→ p , −→ p ( T j n ) (cid:0) t n − LMR Kn (cid:1) ≤ C T M j n n K (cid:0) t − LMR K (cid:1) r n . j n and the threshold t n , we have C T M j n n K (cid:0) t − LMR K (cid:1) r n ≤ C T M K (cid:0) t − LMR K (cid:1) . Then P −→ p , −→ p (∆ ∗ s = 1) ≤ γ . Under the alternative, we use the expectation of the test statistic and someapproximation argument. The second type error is P −→ p , −→ q (∆ ∗ s = 0) = P −→ p , −→ q (cid:0) − T j n + E −→ p , −→ q ( T j n ) ≥ − t n + E −→ p , −→ q ( T j n ) (cid:1) . The wavelet expansion in the Besov body B s , ∞ leads to E −→ p , −→ q ( T j n ) − t n = M X l =1 k p l − q l k − M X l =1 X j ≥ j n X k (cid:18)Z R ( p l − q l ) ψ jk (cid:19) − n M X l =1 X k n X i =1 (cid:18)Z R ( a l ( i ) f i − b l ( i ) g i ) φ j n k (cid:19) − t n ≥ M X l =1 k p l − q l k − M R − j n s − LM R Kn − t n . ≥ M X l =1 k p l − q l k − M R − j n s − t n , for any n large enough.As a consequence, applying the Bienayme-Chebychev inequality leads to P −→ p , −→ q ( − T j n + E f,g ( T j n ) ≥ − t n + E f,g ( T j n )) ≤ C T M j n + n X l k p l − q l k + √ j n n X l k p l − q l k ! n K M X l =1 k p l − q l k − M R − j n s − t n ! . The choice of j n and the fact that the functions are in the alternative entail thefollowing upper bound P −→ p , −→ q (∆ ∗ s = 0) ≤ C T M j n + n X l k p l − q l k + √ j n n X l k p l − q l k ! K n M X l =1 k p l − q l k − M R − j n s − t r n ! . j n , and r n , one gets for n large enough: P −→ p , −→ q (∆ ∗ s = 0) ≤ C T M j n + n X l k p l − q l k + √ j n n X l k p l − q l k ! n K (cid:0) − RC − tMC (cid:1) M X l =1 k p l − q l k ! ≤ C T (cid:0) − RC − tMC (cid:1) K C . For all
C > C γ , we finally obtain P −→ p , −→ q (∆ ∗ s = 0) ≤ γ . The results on the first-type and second-type errors show that if
C > C γ thesum of the errors is less than γ . Therefore the upper bound is proved. (cid:3) Proof of Theorem 2.2.
Let γ ∈ ]0 , , C > C >
0. We define˜Θ ( R, C, C , n, s ) = n ( −→ p , −→ q ) : ∀ u ∈ { , . . . , M } , p u − q u ∈ B s , ∞ ( R ) , ∃ u ∈ { , . . . , M } , ( p u , q u ) ∈ ˜Λ n ( R, C, C ) o , where ˜Λ n ( R, C, C ) is defined in (6). It is well-known that inf ∆ sup ( −→ p , −→ q ) ∈ Θ ( R ) P −→ p , −→ q (∆ = 1) + sup ( −→ p , −→ q ) ∈ Θ ( R,C,n,s ) P −→ p , −→ q (∆ = 0) ! ≥ inf ∆ sup ( −→ p , −→ q ) ∈ Θ ( R ) P −→ p , −→ q (∆ = 1) + sup ( −→ p , −→ q ) ∈ ˜Θ ( R,C,C ,n,s ) P −→ p , −→ q (∆ = 0) ! ≥ − (cid:13)(cid:13) P −→ p , −→ p − P π (cid:13)(cid:13) , where k . k is the L - distance and π is an a priori probability measure on theset Λ n ( R, C ). First we define the probability measure π and its support. Let θ = ( θ , . . . , θ M ) denote an eigenvector associated with the smallest eigenvalueof ΣΣ ⋆ - which is Kn according to HYP-1 - such that k θ k = 1.Recall that here j n is the same as the one defined in theorem 2.1. Let T be thesubset of Z containing every integer k satisfying the following properties • k ∈ T = ⇒ (cid:2) k − L jn , k + L jn (cid:2) ⊂ [0 , • ( k, k ′ ) ∈ T × T with k = k ′ = ⇒ (cid:2) k − L jn , k + L jn (cid:2) ∩ h k ′ − L jn , k ′ + L jn h = ∅ .23he cardinal of T is clearly equal to T = ⌊ jn − L ⌋ and we denote its elements k , . . . , k T . The following parametric family of functions is considered q l,ζ ( z ) = p l ( z ) + 2 s +1 C √ M L θ l X k ∈T ζ k − j n s − jn ψ j n k ( z ) , where ζ k = +1 or − ζ k does not depend on the index l . Therefore the density of Z i is g i,ζ ( z ) = M X l =1 σ l ( i ) √ M L θ l s +1 C X k ∈T ζ k − j n s − jn ψ j n k ( z ) + M X l =1 σ l ( i ) p l ( z ) . The probability measure π is such that the ζ k ’s are independent Rademacherrandom variables with parameter .The function q l,ζ is a density. Indeed, for n large, q l,ζ is non-negative. Moreover,as ψ j n k is a wavelet, we have R ψ j n k = 0 and therefore R q l,ζ = 1. If C < p R/M s +2 , then q l,ζ − p l belongs to the ball of the Besov body B s , ∞ ( R ).There exists l such that M θ l ≥ k p l − q l,ζ k = T LM C s − j n s − j n θ l ≥ C n − s s +1 . Therefore the probability measure π is solely concentrated on the alternative.It is well-known that the L distance can be bounded by the L distance. Wehave (cid:13)(cid:13) P −→ p , −→ p − P π (cid:13)(cid:13) ≤ vuut E −→ p , −→ p "(cid:18) d P π d P −→ p , −→ p (cid:19) − vuuut E −→ p , −→ p E π n Y i =1 g i,ζ ( Z i ) g i ( Z i ) !! − . (8)Therefore it suffices to evaluate the second-order moment of the likelihood ratio: E −→ p , −→ p E π n Y i =1 g i,ζ ( Z i ) g i ( Z i ) !! = E −→ p , −→ p Y k ∈T Z n Y i =1 s +1 C √ M L ζ k − j n s − jn ψ j n k ( Z i ) g i ( Z i ) M X l =1 θ l σ l ( i ) ! dπ ( ζ , . . . , ζ T ) ! . Let us introduce the following random variables˜ Z ik = 2 s +1 C √ M L − j n s − jn ψ j n k ( Z i ) g i ( Z i ) M X l =1 θ l σ l ( i ) .
24e have E −→ p , −→ p Y k ∈T Z n Y i =1 s +1 C √ M L ζ k − j n s − jn ψ j n k ( Z i ) g i ( Z i ) M X l =1 θ l σ l ( i ) ! dπ ( ζ , . . . , ζ T ) ! = E −→ p , −→ p Y k ∈T " n Y i =1 (cid:16) Z ik (cid:17) + n Y i =1 (cid:16) − ˜ Z ik (cid:17) = E −→ p , −→ p " Y k ∈T n Y i =1 (1 + 2 ˜ Z ik + ˜ Z ik ) + n Y i =1 (1 − Z ik + ˜ Z ik ) + 2 n Y i =1 (1 − ˜ Z ik ) ! = E −→ p , −→ p h Y k ∈T (cid:8) n Y i =1 (1 + ˜ Z ik ) + 2 n Y i =1 (1 − ˜ Z ik )+ n X i =1 ˜ Z ik h i ( ˜ Z k , . . . , ˜ Z i − ,k , ˜ Z i +1 ,k , . . . , ˜ Z nk ) (cid:9)i = E −→ p , −→ p h Y k ∈T n Y i =1 (1 + ˜ Z ik ) + n Y i =1 (1 − ˜ Z ik ) ! + T X r =1 n X i =1 ˜ Z ik r ˜ h ( ˜ Z k , . . . , ˜ Z n,k r − , ˜ Z k r , . . . , ˜ Z i − ,k r , ˜ Z i +1 ,k r , . . . , ˜ Z nk r , ˜ Z ,k r +1 , . . . , ˜ Z nk T i , where the functions h i and ˜ h i are sums of products of their arguments. As E −→ p , −→ p ( ˜ Z ik ) = 0 and ˜ Z ik ˜ Z ik ′ = 0 for k = k ′ , the last term vanishes. Thus we areonly interested in the first term.Define for all k ∈ T : h l ( k ) = X ≤ i
Lemma 6.1 M X l =1 n X i =1 a l ( i ) ≤ M nK , (10) M X l =1 n X i =1 b l ( i ) ≤ M nK . (11)
Proof of Lemma 6.1 :The proofs of (10) and (11) are identical, that’s why we only prove (10). Let λ min (Γ n ) be the smallest non negative eigenvalue of the matrix Γ n . Let A =( A ) ≤ j ≤ n, ≤ l ≤ M denote the ( n × M ) matrix with coefficients A j,l = a l ( j ). Sincethe matrix AA ∗ has at most M non negative eigenvalues, we have M X l =1 n X i =1 a l ( i ) = trace ( AA ∗ ) ≤ M λ max ( AA ∗ ) . (12)Clearly, the following implication holds λ is a non negative eigenvalue of AA ∗ = ⇒ n λ − is an eigenvalue of Γ n . So λ max ( AA ∗ ) ≤ n λ min (Γ n ) . (13)Lemma 6.1 is proved by inequalities (12) and (13) and under HYP-1. (cid:3) Lemma 6.2
For all ( j, k ) ∈ Z × Z , let us put I jk = (cid:20) k − L j , k + L j (cid:20) . Then for any fixed ( j, k ) Card { k ′ ∈ Z : I jk ∩ I jk ′ = ∅} ≤ L. roof of Lemma 6.2 :Clearly, I jk ∩ I jk ′ = ∅ ⇐⇒ k ′ − L ≥ k + L or k ′ + L ≤ k − L. Hence, I jk ∩ I jk ′ = ∅ ⇐⇒ k − L < k ′ < k + 2 L. As a consequence, we have
Card { k ′ ∈ Z : I jk ∩ I jk ′ = ∅} ≤ L. (cid:3) Lemma 6.3
For any function h ∈ L ( R ) X k Z I jk | h ( x ) | dx ≤ L k h k . Proof of Lemma 6.3 : Let us define for any h ∈ L ( R ) : p jk ( h ) = Z I jk | h ( x ) | dx, ∀ j ∈ N , ∀ k ∈ Z . Judging from the definition of the intervals I jk , we easily prove that for any j ∈ N , X k p jk ( h ) = L X u =1 X i ∈ Z p j, Li + u ( h ) ≤ L X u =1 Z R | h ( x ) | dx = 2 L k h k . (cid:3) Lemma 6.4
Let W be either Y or Z . For any ≤ i ≤ n and any ( j, k ) , wehave | E ( φ jk ( W i )) | ≤ (cid:18) L sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) − j . Proof of Lemma 6.4 :Using the Cauchy-Schwarz inequality, we obtain | E ( φ jk ( W i )) | ≤ (cid:12)(cid:12)(cid:12)(cid:12)Z φ jk f i (cid:12)(cid:12)(cid:12)(cid:12) ∨ (cid:12)(cid:12)(cid:12)(cid:12)Z φ jk g i (cid:12)(cid:12)(cid:12)(cid:12) ≤ Z | φ jk | sup l k p l k ∞ ∨ Z | φ jk | sup l k q l k ∞ ≤ (cid:18) L sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) − j . (cid:3) emma 6.5 Let W be either Y or Z and c be either a or b . For any ≤ i ≤ n and any ( j, k ) , the following inequalities hold X k ′ | E ( φ jk ( W i ) φ jk ′ ( W i )) | ≤ L sup l ( k p l k ∞ ∨ k q l k ∞ ) , sup l (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X k Z φ jk ( p l − q l ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ L k φ k ∞ j , sup l | c l ( i ) | ≤ s n X l h c l , c l i n . Proof of Lemma 6.5 :Since the wavelets are compactly supported, for any fixed k the sum over k ′ hasat most 4 L terms which are non zeros (see lemma 6.2). So, the Cauchy-Schwarzinequality entails that X k ′ | E ( φ jk ( W i ) φ jk ′ ( W i )) | ≤ X k ′ Z | φ jk | | φ jk ′ | f i ∨ X k ′ Z | φ jk | | φ jk ′ | g i ≤ X k ′ (cid:18) k f i k ∞ Z | φ jk | | φ jk ′ | (cid:19) ∨ X k ′ (cid:18) k g i k ∞ Z | φ jk | | φ jk ′ | (cid:19) ≤ sup l k p l k ∞ X k ′ Z | φ jk | | φ jk ′ | ! ∨ sup l k q l k ∞ X k ′ Z | φ jk | | φ jk ′ | ! ≤ L sup l ( k p l k ∞ ∨ k q l k ∞ ) . We also havesup l (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X k Z φ jk ( p l − q l ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ j k φ k ∞ sup l X k Z I jk | p l − q l |≤ L (cid:18)Z p l + Z q l (cid:19) k φ k ∞ j = 4 L k φ k ∞ j . Clearly, for any 1 ≤ i ≤ n ,sup l | c l ( i ) | ≤ sup l sX i c l ( i ) ≤ s n X l h c l , c l i n . (cid:3) emma 6.6 Let p l , q l p l ′ and q l ′ be four probability densities in L . Then, forany j ∈ N X k (cid:18)Z φ jk p l − Z φ jk q l (cid:19) ≤ L k p l − q l k ; X k X k ′ : I jk ∩ I jk ′ = ∅ (cid:12)(cid:12)(cid:12)(cid:12) ( Z φ jk p l − Z φ jk q l )( Z φ jk ′ p l ′ − Z φ jk ′ q l ′ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ L (cid:0) k p l − q l k + k p l ′ − q l ′ k (cid:1) . Proof of Lemma 6.6 :Using the Cauchy-Schwarz inequality, we have X k (cid:18)Z φ jk p l − Z φ jk q l (cid:19) ≤ X k Z I jk ( p l − q l ) ≤ L k p l − q l k . Lemma 6.3 entails that X k X k ′ : I jk ∩ I jk ′ = ∅ (cid:12)(cid:12)(cid:12)(cid:12) ( Z φ jk p l − Z φ jk q l )( Z φ jk ′ p l ′ − Z φ jk ′ q l ′ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ X k X k ′ : I jk ∩ I jk ′ = ∅ (cid:18)Z φ jk p l − Z φ jk q l (cid:19) + X k X k ′ : I jk ∩ I jk ′ = ∅ (cid:18)Z φ jk p l ′ − Z φ jk q l ′ (cid:19) ≤ " L X k Z φ jk Z I jk ( p l − q l ) + 4 L X k Z φ jk Z I jk ( p l ′ − q l ′ ) ≤ (cid:0) L k p l − q l k + 8 L k p l ′ − q l ′ k (cid:1) ≤ L (cid:0) k p l − q l k + k p l ′ − q l ′ k (cid:1) . (cid:3) Lemma 6.7
There exists a constant ¯ C T = ¯ C T ( R, L, k φ k ∞ ) > such that A := X i = i V ar −→ p , −→ q ( h j ( i , i )) ≤ ¯ C T M K j n . Proof of Lemma 6.7 :Let us evaluate each variance V ar −→ p , −→ q ( h j ( i , i )) = C ov ( h j ( i , i ) , h j ( i , i )) .
30e expand the covariance C ov (cid:0) ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) , ( a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) ( a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) (cid:1) = C ov ( a l ( i ) φ jk ( Y i ) a l ( i ) φ jk ( Y i ) , a l ′ ( i ) φ jk ′ ( Y i ) a l ′ ( i ) φ jk ′ ( Y i )) − C ov ( a l ( i ) φ jk ( Y i ) a l ( i ) φ jk ( Y i ) , a l ′ ( i ) φ jk ′ ( Y i ) b l ′ ( i ) φ jk ′ ( Z i )) − C ov ( a l ( i ) φ jk ( Y i ) a l ( i ) φ jk ( Y i ) , b l ′ ( i ) φ jk ′ ( Z i ) a l ′ ( i ) φ jk ′ ( Y i ))+ C ov ( a l ( i ) φ jk ( Y i ) a l ( i ) φ jk ( Y i ) , b l ′ ( i ) φ jk ′ ( Z i ) b l ′ ( i ) φ jk ′ ( Z i )) − C ov ( a l ( i ) φ jk ( Y i ) b l ( i ) φ jk ( Z i ) , a l ′ ( i ) φ jk ′ ( Y i ) a l ′ ( i ) φ jk ′ ( Y i ))+ C ov ( a l ( i ) φ jk ( Y i ) b l ( i ) φ jk ( Z i ) , a l ′ ( i ) φ jk ′ ( Y i ) b l ′ ( i ) φ jk ′ ( Z i ))+ C ov ( a l ( i ) φ jk ( Y i ) b l ( i ) φ jk ( Z i ) , b l ′ ( i ) φ jk ′ ( Z i ) a l ′ ( i ) φ jk ′ ( Y i )) − C ov ( a l ( i ) φ jk ( Y i ) b l ( i ) φ jk ( Z i ) , b l ′ ( i ) φ jk ′ ( Z i ) b l ′ ( i ) φ jk ′ ( Z i )) − C ov ( b l ( i ) φ jk ( Z i ) a l ( i ) φ jk ( Y i ) , a l ′ ( i ) φ jk ′ ( Y i ) a l ′ ( i ) φ jk ′ ( Y i ))+ C ov ( b l ( i ) φ jk ( Z i ) a l ( i ) φ jk ( Y i ) , a l ′ ( i ) φ jk ′ ( Y i ) b l ′ ( i ) φ jk ′ ( Z i ))+ C ov ( b l ( i ) φ jk ( Z i ) a l ( i ) φ jk ( Y i ) , b l ′ ( i ) φ jk ′ ( Z i ) a l ′ ( i ) φ jk ′ ( Y i )) − C ov ( b l ( i ) φ jk ( Z i ) a l ( i ) φ jk ( Y i ) , b l ′ ( i ) φ jk ′ ( Z i ) b l ′ ( i ) φ jk ′ ( Z i ))+ C ov ( b l ( i ) φ jk ( Z i ) b l ( i ) φ jk ( Z i ) , a l ′ ( i ) φ jk ′ ( Y i ) a l ′ ( i ) φ jk ′ ( Y i )) − C ov ( b l ( i ) φ jk ( Z i ) b l ( i ) φ jk ( Z i ) , a l ′ ( i ) φ jk ′ ( Y i ) b l ′ ( i ) φ jk ′ ( Z i )) − C ov ( b l ( i ) φ jk ( Z i ) b l ( i ) φ jk ( Z i ) , b l ′ ( i ) φ jk ′ ( Z i ) a l ′ ( i ) φ jk ′ ( Y i ))+ C ov ( b l ( i ) φ jk ( Z i ) b l ( i ) φ jk ( Z i ) , b l ′ ( i ) φ jk ′ ( Z i ) b l ′ ( i ) φ jk ′ ( Z i )) . According to independence arguments, the following terms are clearly equal tozero: C ov ( a l ( i ) φ jk ( Y i ) a l ( i ) φ jk ( Y i ) , b l ′ ( i ) φ jk ′ ( Z i ) b l ′ ( i ) φ jk ′ ( Z i )) , C ov ( a l ( i ) φ jk ( Y i ) b l ( i ) φ jk ( Z i ) , b l ′ ( i ) φ jk ′ ( Z i ) a l ′ ( i ) φ jk ′ ( Y i )) , C ov ( b l ( i ) φ jk ( Z i ) a l ( i ) φ jk ( Y i ) , a l ′ ( i ) φ jk ′ ( Y i ) b l ′ ( i ) φ jk ′ ( Z i )) , C ov ( b l ( i ) φ jk ( Z i ) b l ( i ) φ jk ( Z i ) , a l ′ ( i ) φ jk ′ ( Y i ) a l ′ ( i ) φ jk ′ ( Y i )) . The remaining terms can be split into two types: those involving two differentrandom variables and those involving three different random variables. Let ushandle these two cases separately. First, we consider the case with two differentrandom variables. We need to bound terms such as X i = i X k,k ′ C ov ( a l ( i ) φ jk ( Y i ) a l ( i ) φ jk ( Y i ) , a l ′ ( i ) φ jk ′ ( Y i ) a l ′ ( i ) φ jk ′ ( Y i ))= X i = i X k,k ′ a l ( i ) a l ( i ) a l ′ ( i ) a l ′ ( i ) E ( φ jk ( Y i ) φ jk ′ ( Y i )) E ( φ jk ( Y i ) φ jk ′ ( Y i )) − X i = i X k,k ′ a l ( i ) a l ( i ) a l ′ ( i ) a l ′ ( i ) E ( φ jk ( Y i )) E ( φ jk ′ ( Y i )) E ( φ jk ( Y i )) E ( φ jk ′ ( Y i )) .
31s the wavelets are compactly supported, we get for any ( i , i ), (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X k,k ′ a l ( i ) a l ( i ) a l ′ ( i ) a l ′ ( i ) E ( φ jk ( Y i ) φ jk ′ ( Y i )) E ( φ jk ( Y i ) φ jk ′ ( Y i )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ k f i k ∞ X k,k ′ | a l ( i ) a l ( i ) a l ′ ( i ) a l ′ ( i ) | Z | φ jk φ jk ′ | f i ≤ j +3 L k φ k ∞ sup l ( k p l k ∞ ∨ k q l k ∞ ) | a l ( i ) a l ( i ) a l ′ ( i ) a l ′ ( i ) | . The second sum is much simpler to bound. According to lemma 6.4 it can bebounded as follows (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X k,k ′ a l ( i ) a l ( i ) a l ′ ( i ) a l ′ ( i ) E ( φ jk ( Y i )) E ( φ jk ′ ( Y i )) E ( φ jk ( Y i )) E ( φ jk ′ ( Y i )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X k,k ′ | a l ( i ) a l ( i ) a l ′ ( i ) a l ′ ( i ) | E ( | φ jk ( Y i ) | ) E ( | φ jk ′ ( Y i ) | ) (cid:16) √ L − j (cid:17) sup l ( k p l k ∞ ∨ k q l k ∞ )= L − j X k,k ′ | a l ( i ) a l ( i ) a l ′ ( i ) a l ′ ( i ) | Z I jk | φ jk | f i Z I jk ′ | φ jk ′ | f i (cid:18) sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) ≤ L | a l ( i ) a l ( i ) a l ′ ( i ) a l ′ ( i ) | k φ k ∞ sup l ( k p l k ∞ ∨ k q l k ∞ ) . Let us now focus on the sums over i , i , l and l ′ . X i = i X l, l ′ | a l ( i ) a l ( i ) a l ′ ( i ) a l ′ ( i ) | ≤ X i ,i X l, l ′ (cid:0) a l ( i ) a l ′ ( i ) + a l ′ ( i ) a l ( i ) (cid:1) ≤ n X l, l ′ h a l , a l i n h a l ′ , a l ′ i n ≤ M n K . We see that this term behaves like n . The three other terms featuring only twodifferent random variables are handled in the same way.Therefore it remains to evaluate the eight terms with three different randomvariables. For example, let us consider C ov ( a l ( i ) φ jk ( Y i ) a l ( i ) φ jk ( Y i ) , a l ′ ( i ) φ jk ′ ( Y i ) a l ′ ( i ) φ jk ′ ( Z i )) , and let us omit for a moment the sums over i , i , k, k ′ , l and l ′ . The covariancecan be expanded as C ov ( φ jk ( Y i ) φ jk ( Y i ) , φ jk ′ ( Y i ) φ jk ′ ( Z i )) = E ( φ jk ′ ( Z i )) E ( φ jk ( Y i )) C ov ( φ jk ( Y i ) , φ jk ′ ( Y i )) . When we add the sums over k and k ′ , the second term is exactly handled asthe second term above in the case of two different random variables. Thus,32t remains to consider the first summand. As above, the compactness of thewavelet entails that (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X k,k ′ E ( φ jk ′ ( Z i )) E ( φ jk ( Y i )) C ov ( φ jk ( Y i ) , φ jk ′ ( Y i )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X k,k ′ | E ( φ jk ′ ( Z i )) E ( φ jk ( Y i )) E ( φ jk ( Y i )) E ( φ jk ′ ( Y i )) | + X k,k ′ | E ( φ jk ′ ( Z i )) E ( φ jk ( Y i )) E ( φ jk ( Y i ) φ jk ′ ( Y i )) | = A + A , According to lemmas 6.4 and 6.5, we have A = X k,k ′ | E ( φ jk ′ ( Z i )) E ( φ jk ( Y i )) E ( φ jk ( Y i )) E ( φ jk ′ ( Y i )) |≤ (cid:18) − j L sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) X k Z | φ jk | g i X k ′ Z | φ jk ′ | f i ≤ L k φ k ∞ sup l ( k p l k ∞ ∨ k q l k ∞ )and A = X k,k ′ | E ( φ jk ′ ( Z i )) E ( φ jk ( Y i )) E ( φ jk ( Y i ) φ jk ′ ( Y i )) |≤ (cid:18) − j L sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) X k,k ′ | E ( φ jk ( Y i ) φ jk ′ ( Y i )) |≤ L (cid:18) − j L sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) j k φ k ∞ X k Z | φ jk | f i ≤ L (cid:18) L sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) k φ k ∞ Z f i ≤ L k φ k ∞ sup l ( k p l k ∞ ∨ k q l k ∞ ) . It remains to sum over i and i as the sums over l and l ′ are not important(they only change the constant). We have X i = i X l,l ′ | a l ( i ) a l ( i ) a l ′ ( i ) b l ′ ( i ) | ≤ X i ,i X l,l ′ | a l ( i ) a l ( i ) a l ′ ( i ) b l ′ ( i ) |≤ X l,l ′ X i ,i a l ( i ) b l ′ ( i ) + X i ,i a l ′ ( i ) a l ( i ) = n X l,l ′ ( h a l , a l i n h b l ′ , b l ′ i n + h a l ′ , a l ′ i n h a l , a l i n ) ≤ M n K . n . The other covariances involving three ran-dom variables are handled exactly in the same way.By combining all the previous bounds, we conclude that A ≤ ( k + k j ) M n K , with k = 224 RL k φ k ∞ , k = 32 RL k φ k ∞ . As a consequence if we write ¯ C T = k + k one gets A ≤ ¯ C T M K j n . (cid:3) Lemma 6.8
There exists a constant ˜ C T = ˜ C T ( R, L, k φ k ∞ ) > such that forany j ∈ N A := X i = i = i C ov ( h j ( i , i ) , h j ( i , i )) ≤ ˜ C T M K " n X l k p l − q l k + 2 j n X l k p l − q l k . Proof of Lemma 6.8 :Clearly, the term A can be bounded as follows A = X i = i = i C ov ( h j ( i , i ) , h j ( i , i ))= X i = i = i X k,k ′ X l,l ′ C ov (cid:0) ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) , ( a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) ( a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) (cid:1) = X i = i = i X k,k ′ X l,l ′ C ov ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i ) , a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) × E ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) E ( a l ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i ))= X i ,i ,i X k,k ′ X l,l ′ C ov ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i ) , a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) E ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) E ( a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) − X i = i ,i X k,k ′ X l,l ′ C ov ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i ) , a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) E ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) E ( a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) − X i = i ,i X k,k ′ X l,l ′ C ov ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i ) , a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) E ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) E ( a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i ))+ X i = i = i X k,k ′ X l,l ′ C ov ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i ) , a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) E ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) E ( a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i ))= A − A − A + A ≤ | A | + | A | + | A | + | A | .
34e will separetely bound each term.Let us start with | A | . The first step is to expand the covariance. | A | = | X i ,i ,i X k,k ′ X l,l ′ C ov ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i ) , a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) E ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) | = n (cid:12)(cid:12)(cid:12) X i X k,k ′ X l,l ′ C ov (( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) , ( a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )))( Z φ jk p l − Z φ jk q l )( Z φ jk ′ p l ′ − Z φ jk ′ q l ′ ) (cid:12)(cid:12)(cid:12) = n (cid:12)(cid:12)(cid:12) X i X k,k ′ X l,l ′ (cid:2) E ( a l ( i ) φ jk ( Y i ) a l ′ ( i ) φ jk ′ ( Y i )) + E ( b l ( i ) φ jk ( Z i ) b l ′ ( i ) φ jk ′ ( Z i )) − E ( a l ( i ) φ jk ( Y i )) E ( a l ′ ( i ) φ jk ′ ( Y i )) − E ( b l ( i ) φ jk ( Z i )) E ( b l ′ ( i ) φ jk ′ ( Z i )) (cid:3) ( Z φ jk p l − Z φ jk q l )( Z φ jk ′ p l ′ − Z φ jk ′ q l ′ ) (cid:12)(cid:12)(cid:12) . The first two terms involve only one expectation and can be bounded in thesame way. Therefore let us bound the quantity (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X i X k,k ′ X l,l ′ E ( a l ( i ) φ jk ( Y i ) a l ′ ( i ) φ jk ′ ( Y i )) ( Z φ jk p l − Z φ jk q l )( Z φ jk ′ p l ′ − Z φ jk ′ q l ′ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Clearly X i | a l ( i ) a l ′ ( i ) | ≤ n q h a l , a l i n h a l ′ , a l ′ i n ≤ MK n .Since | E ( φ jk ( Y i ) φ jk ′ ( Y i )) | ≤ sup l ( k p l k ∞ ∨ k q l k ∞ ), lemma 6.6 entails that X k X k ′ : I jk ∩ I jk ′ = ∅ (cid:12)(cid:12)(cid:12)(cid:12) ( Z φ jk p l − Z φ jk q l )( Z φ jk ′ p l ′ − Z φ jk ′ q l ′ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ L (cid:0) k p l − q l k + k p l ′ − q l ′ k (cid:1) . Then one deduces that for any 1 ≤ i ≤ n X k,k ′ X l,l ′ | E ( φ jk ( Y i ) φ jk ′ ( Y i )) | (cid:12)(cid:12)(cid:12)(cid:12) ( Z φ jk p l − Z φ jk q l )( Z φ jk ′ p l ′ − Z φ jk ′ q l ′ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ L sup l ( k p l k ∞ ∨ k q l k ∞ ) X l k p l − q l k . Hence (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X i X k,k ′ X l,l ′ E ( a l ( i ) φ jk ( Y i ) a l ′ ( i ) φ jk ′ ( Y i )) ( Z φ jk p l − Z φ jk q l )( Z φ jk ′ p l ′ − Z φ jk ′ q l ′ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ M L K sup l ( k p l k ∞ ∨ k q l k ∞ ) X l k p l − q l k n. (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X k,k ′ E ( φ jk ( Y i )) E ( φ jk ′ ( Y i )) ( Z φ jk p l − Z φ jk q l )( Z φ jk ′ p l ′ − Z φ jk ′ q l ′ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ X k,k ′ | E ( φ jk ( Y i )) E ( φ jk ′ ( Y i )) | ((cid:18)Z φ jk p l − Z φ jk q l (cid:19) + (cid:18)Z φ jk ′ p l ′ − Z φ jk ′ q l ′ (cid:19) ) ≤ sup l,l ′ " √ L ( k p l k ∞ ∨ k q l k ∞ ) − j X k | E ( φ jk ( Y i )) | X k ′ (cid:18)Z φ jk ′ p l ′ − Z φ jk ′ q l ′ (cid:19) ≤ √ L k φ k ∞ (cid:18) sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) sup l ′ k p l ′ − q l ′ k . Last inequalities are obtained by using lemma 6.6 for any 1 ≤ i ≤ n. Hence (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X i X k,k ′ X l,l ′ E ( a l ( i ) φ jk ( Y i )) E ( a l ′ ( i ) φ jk ′ ( Y i )) ( Z φ jk p l − Z φ jk q l )( Z φ jk ′ p l ′ − Z φ jk ′ q l ′ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ MK L k φ k ∞ (cid:18) sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) sup l ′ k p l ′ − q l ′ k n. Therefore the two last bounds entail that | A | ≤ c M n K X l k p l − q l k , where c = 4 L √ R (cid:16) √ R + √ L k φ k ∞ (cid:17) . The way to bound A and A is trickier. We have | A | ≤ (cid:12)(cid:12)(cid:12) X l,l ′ X i ,i X k,k ′ [ a l ( i ) a l ′ ( i ) C ov ( φ jk ( Y i ) , φ jk ′ ( Y i )) + b l ( i ) b l ′ ( i ) C ov ( φ jk ( Z i ) , φ jk ′ ( Z i ))][ a l ( i ) E ( φ jk ( Y i )) − b l ( i ) E ( φ jk ( Z i ))] [ a l ′ ( i ) E ( φ jk ′ ( Y i )) − b l ′ ( Z i ) E ( φ jk ′ ( Z i ))] (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) X l,l ′ X i X k,k ′ [ a l ( i ) a l ′ ( i ) C ov ( φ jk ( Y i ) , φ jk ′ ( Y i )) + b l ( i ) b l ′ ( i ) C ov ( φ jk ( Z i ) , φ jk ′ ( Z i ))][ a l ( i ) E ( φ jk ( Y i )) − b l ( i ) E ( φ jk ( Z i ))] (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19) (cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12) X l,l ′ X i X k,k ′ a l ( i ) a l ′ ( i ) E ( φ jk ( Y i ) φ jk ′ ( Y i ))( a l ( i ) E ( φ jk ( Y i )) − b l ( i ) E ( φ jk ( Z i ))) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) X l,l ′ X i X k,k ′ a l ( i ) a l ′ ( i ) E ( φ jk ( Y i )) E ( φ jk ′ ( Y i ))( a l ( i ) E ( φ jk ( Y i )) − b l ( i ) E ( φ jk ( Z i ))) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) X l,l ′ X i X k,k ′ b l ( i ) b l ′ ( i ) E ( φ jk ( Z i ) φ jk ′ ( Z i ))( a l ( i ) E ( φ jk ( Y i )) − b l ( i ) E ( φ jk ( Z i ))) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) X l,l ′ X i X k,k ′ b l ( i ) b l ′ ( i ) E ( φ jk ( Z i )) E ( φ jk ′ ( Z i ))( a l ( i ) E ( φ jk ( Y i )) − b l ( i ) E ( φ jk ( Z i ))) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19) (cid:12)(cid:12)(cid:12) . The calculations are rather lengthy and involve eight terms. But the bright sideis that the terms can be split into two groups. There are terms involving twoexpectations such as X l,l ′ X i X k,k ′ a l ( i ) a l ′ ( i ) E ( φ jk ( Y i ) φ jk ′ ( Y i )) a l ( i ) E ( φ jk ( Y i )) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19) , and terms involving three expectations such as X l,l ′ X i X k,k ′ a l ( i ) a l ′ ( i ) E ( φ jk ( Y i )) E ( φ jk ′ ( Y i )) a l ( i ) E ( φ jk ( Y i )) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19) . Still using lemmas 6.4 and 6.5, we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l,l ′ X i X k,k ′ a l ( i ) a l ′ ( i ) E ( φ jk ( Y i ) φ jk ′ ( Y i )) a l ( i ) E ( φ jk ( Y i )) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ L k φ k ∞ sup l k p l k ∞ M X l =1 k p l − q l k M X l =1 h a l , a l i n ! j n ≤ r M K L k φ k ∞ sup l k p l k ∞ M X l =1 k p l − q l k j n ;37 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l,l ′ X i X k,k ′ a l ( i ) a l ′ ( i ) E ( φ jk ( Y i ) φ jk ′ ( Y i )) b l ( i ) E ( φ jk ( Z i )) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ L k φ k ∞ sup l k q l k ∞ M X l =1 k p l − q l k M X l =1 h a l , a l i n ! M X l =1 ( h a l , a l i n + h b l , b l i n ) 2 j n ≤ r M K L k φ k ∞ sup l k q l k ∞ M X l =1 k p l − q l k j n ; (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l,l ′ X i X k,k ′ b l ( i ) b l ′ ( i ) E ( φ jk ( Z i ) φ jk ′ ( Z i )) b l ( i ) E ( φ jk ( Z i )) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ L k φ k ∞ sup l k q l k ∞ M X l =1 k p l − q l k M X l =1 h b l , b l i n ! j n ≤ r M K L k φ k ∞ sup l k q l k ∞ M X l =1 k p l − q l k j n ; (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l,l ′ X i X k,k ′ b l ( i ) b l ′ ( i ) E ( φ jk ( Z i ) φ jk ′ ( Z i )) a l ( i ) E ( φ jk ( Y i )) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ L k φ k ∞ sup l k p l k ∞ M X l =1 k p l − q l k M X l =1 h b l , b l i n ! M X l =1 ( h a l , a l i n + h b l , b l i n ) 2 j n ≤ r M K L k φ k ∞ sup l k p l k ∞ M X l =1 k p l − q l k j n . Next we come to the second term. We have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l,l ′ X i X k,k ′ a l ( i ) a l ′ ( i ) E ( φ jk ( Y i )) E ( φ jk ′ ( Y i )) a l ( i ) E ( φ jk ( Y i )) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ k φ k ∞ L sup l k p l k ∞ M X l =1 k p l − q l k M X l =1 h a l , a l i n ! j n ≤ r M K k φ k ∞ L sup l k p l k ∞ M X l =1 k p l − q l k j n , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l,l ′ X i X k,k ′ a l ( i ) a l ′ ( i ) E ( φ jk ( Y i )) E ( φ jk ′ ( Y i )) b l ( i ) E ( φ jk ( Z i )) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ k φ k ∞ L sup l k q l k ∞ M X l =1 k p l − q l k M X l =1 h a l , a l i n ! M X l =1 ( h a l , a l i n + h b l , b l i n ) 2 j n ≤ r M K k φ k ∞ L sup l k q l k ∞ M X l =1 k p l − q l k j n , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l,l ′ X i X k,k ′ b l ( i ) b l ′ ( i ) E ( φ jk ( Z i )) E ( φ jk ′ ( Z i )) b l ( i ) E ( φ jk ( Z i )) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ k φ k ∞ L sup l k q l k ∞ M X l =1 k p l − q l k M X l =1 h b l , b l i n ! j n ≤ r M K k φ k ∞ L sup l k q l k ∞ M X l =1 k p l − q l k j n , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l,l ′ X i X k,k ′ b l ( i ) b l ′ ( i ) E ( φ jk ( Z i )) E ( φ jk ′ ( Z i )) a l ( i ) E ( φ jk ( Y i )) (cid:18) n Z φ jk ′ p l ′ − n Z φ jk ′ q l ′ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ √ k φ k ∞ L sup l k p l k ∞ M X l =1 k p l − q l k M X l =1 h b l , b l i n ! M X l =1 ( h a l , a l i n + h b l , b l i n ) 2 j n ≤ r M K k φ k ∞ L sup l k p l k ∞ M X l =1 k p l − q l k j n . All these bounds entail that | A | ≤ c (cid:18) MK (cid:19) j n M X l =1 k p l − q l k , with c = 48 √ RL k φ k ∞ . As a consequence, one similarly gets | A | ≤ c (cid:18) MK (cid:19) j n M X l =1 k p l − q l k , with c = c .Let us now consider | A | . | A | ≤ | X i X k,k ′ X l,l ′ C ov ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i ) , a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) E ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) E ( a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) | . (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X i X l ′ E ( a l ′ ( i ) φ jk ′ ( Y i ) − b l ′ ( i ) φ jk ′ ( Z i )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X l ′ Z φ jk ′ ( p l ′ − q l ′ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ n X l ′ k p l ′ − q l ′ k . According to lemma 6.5, we have for any 1 ≤ i ≤ n and any l , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X k E ( a l ( i ) φ jk ( Y i ) − b l ( i ) φ jk ( Z i )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ | a l ( i ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X k Z φ jk f i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! ∨ | b l ( i ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)X k Z φ jk g i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! ≤ L ( | a l ( i ) | ∨ | b l ( i ) | ) 2 j k φ k ∞ ≤ L k φ k ∞ j M X l =1 h a l , a l i n ∨ M X l =1 h b l , b l i n ! √ n ≤ L r MK k φ k ∞ j √ n. According to lemmas 6.4 and 6.5, we have for any fixed k , X i X k ′ | a l ( i ) a l ′ ( i ) | (cid:18)(cid:12)(cid:12)(cid:12)(cid:12)Z φ jk φ jk ′ f i (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)Z φ jk f i Z φ jk ′ f i (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) ≤ n q h a l , a l i n h a l ′ , a l ′ i n X k ′ (cid:12)(cid:12)(cid:12)(cid:12)Z φ jk φ jk ′ f i (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)Z φ jk f i (cid:12)(cid:12)(cid:12)(cid:12) X k ′ (cid:12)(cid:12)(cid:12)(cid:12)Z φ jk ′ f i (cid:12)(cid:12)(cid:12)(cid:12)! ≤ n q h a l , a l i n h a l ′ , a l ′ i n (cid:18) L sup l ( k p l k ∞ ∨ k q l k ∞ ) + (2 L ) k φ k ∞ sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) ≤ MK (cid:18) L sup l ( k p l k ∞ ∨ k q l k ∞ ) + (2 L ) k φ k ∞ sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) n and X i X k ′ | b l ( i ) b l ′ ( i ) | (cid:18)(cid:12)(cid:12)(cid:12)(cid:12)Z φ jk φ jk ′ g i (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)Z φ jk g i Z φ jk ′ g i (cid:12)(cid:12)(cid:12)(cid:12)(cid:19) ≤ n q h b l , b l i n h b l ′ , b l ′ i n X k ′ (cid:12)(cid:12)(cid:12)(cid:12)Z φ jk φ jk ′ g i (cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)Z φ jk g i (cid:12)(cid:12)(cid:12)(cid:12) X k ′ (cid:12)(cid:12)(cid:12)(cid:12)Z φ jk ′ g i (cid:12)(cid:12)(cid:12)(cid:12)! ≤ n q h b l , b l i n h b l ′ , b l ′ i n (cid:18) L sup l ( k p l k ∞ ∨ k q l k ∞ ) + (2 L ) k φ k ∞ sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) ≤ MK (cid:18) L sup l ( k p l k ∞ ∨ k q l k ∞ ) + (2 L ) k φ k ∞ sup l ( k p l k ∞ ∨ k q l k ∞ ) (cid:19) n. Hence, | A | ≤ c j (cid:18) MK (cid:19) n X l k p l − q l k , c = 4 L k φ k ∞ √ R (cid:16) L √ R + (2 L ) k φ k ∞ (cid:17) .When we carefully look at the bounds of A i for i ∈ { , , , } , we deduce thatthere exists a Cste > A ≤ ˜ C T M K " n X l k p l − q l k + 2 j n X l k p l − q l k , with ˜ C T = X i =1 c i . (cid:3) References [1] Autin, F. (2006). Maxiset for density estimation on R . Math. MethodsStatist. , vol. (2), 123-145.[2] Avellaneda, M. (1999, 2000, 2001) Quantitative Analysis in Financial Mar-kets: Collected Papers of the New York University Mathematical FinanceSeminar Volumes I,II, III, World Scientific.[3] Bernhard, W., and Leblang, D. (2006). Democratic Processes and FinancialMarkets, Cambridge University Press, New York.[4] Butucea, C., and Tribouley, K. (2006). Nonparametric homogeneity tests. J. Statist. Plann. and Inference, vol. , 597-639.[5] Cohen, A., DeVore, R., Kerkyacharian, G., and Picard, D. (2001). Maxi-mal spaces with given rate of convergence for thresholding algorithms.
Appl.Comput. Harmon. Anal. , vol. (2), 167-191.[6] Cont, R. (2007). Volatility clustering in financial markets: empirical factsand agent-based models. In Long memory in economics (eds. A. Kirman andG. Teyssiere), pp 289–309. Springer, Berlin.[7] Daubechies, I. (1996). Ten Lectures on Wavelets, SIAM, Philadelphia.[8] Delmas, C. (2003). On likelihood ratio tests in Gaussian mixture models. Indian J. Statist. , vol. (3), 513-531.[9] Donoho, D., Johnstone, I., Kerkyacharian, G., and Picard, D. (1996). Den-sity estimation by wavelet tresholding. Ann. Statist. , vol. (2), 508-539.[10] Garel, B. (2001). Likelihood ratio test for univariate Gaussian mixture. J.Statist. Plann. Inference , vol. (2), 325-350.[11] Garel, B. (2005). Asymptotic theory of the likelihood ratio test for theidentification of a mixture. J. Statist. Plann. Inference , vol. (2), 271-296.4112] Hall, P. (1981). On the nonparametric estimation of mixture proportions.
J. Roy. Statist. Soc. Ser B , vol. J. Roy. Statist. Soc. Ser. B , vol. (3), 465-473.[14] Hall, P., and Zhou, X.H. (2003). Nonparametric estimation of componentdistributions in a multivariate mixture. Ann. Statist. , vol. (1), 201-224.[15] Hosmer, D.W. (1973). A comparison of iterative maximum likelihood esti-mates of the parameters of a mixture of two normal distributions under threetypes of sample. Biometrics , vol. , 761-770.[16] Lodatko, N., and Maiboroda, R. (2007). Estimation of the density of adistribution from observations with an admixture. Theory Probab. Math.Statist. vol. , 99-108.[17] McKnight, P.E., McKnight, K.M., Figueredo, A.J., and Sidani, S. (2007).Missing data: a gentle introduction. Guilford Press, New York.[18] Maiboroda, R.E. (2000). A homogeneity criterion for mixtures with varyingconcentrations. Ukrainian Math. J. , vol. (8), 1256-1263.[19] Maiboroda, R.E. (2000). An asymptotically effective estimate for a distri-bution from a sample with a varying mixture. Theory Probab. Math. Statist. ,vol. , 121-130.[20] Pokhyl’ko, D. (2005). Wavelet estimators of a density constructed fromobservations of a mixture. Theor. Prob. and Math. Statist. vol. , 135-145.[21] Gayraud, G., and Pouet, C.(2005). Adaptive Minimax Testing in the Dis-crete Regression Scheme. Probab. Theory Related Fields vol. (4), 531-558.[22] Qin, J. (1999). Empirical likelihood ratio based confidence intervals formixture proportions.
Annals of Statist. , vol. (4), 1368-1384.[23] Spokoiny, V.G. (1996). Adaptive hypothesis testing using wavelets. Ann.Statist. , (6), 2477-2498[24] Titterington, D.M. (1983). Minimum distance nonparametric estimation ofmixture proportions. J. Roy. Statist. Soc. Ser. B , Series B, vol. (1), 37-46.[25] van de Geer, S. (1995). Asymptotic normality in mixture models. ESAIMProbab. Statist. , vol.1