Exponential bounds for minimum contrast estimators
aa r X i v : . [ m a t h . S T ] J a n Electronic Journal of Statistics
ISSN: 1935-7524
Exponential bounds for minimumcontrast estimators
Yuri Golubev
Universit´e de Provence39, rue F. Joliot-Curie13453 Marseille, Francee-mail: [email protected]
Vladimir Spokoiny
Weierstrass-Institute andHumboldt University Berlin,Mohrenstr. 39, 10117 Berlin, Germanye-mail: [email protected]
Abstract:
The paper focuses on general properties of parametric mini-mum contrast estimators. The quality of estimation is measured in termsof the rate function related to the contrast, thus allowing to derive ex-ponential risk bounds invariant with respect to the detailed probabilisticstructure of the model. This approach works well for small or moderatesamples and covers the case of a misspecified parametric model. Anotherimportant feature of the presented bounds is that they may be used in thecase when the parametric set is unbounded and non-compact. These boundsdo not rely on the entropy or covering numbers and can be easily computed.The most important statistical fact resulting from the exponential bonds isa concentration inequality which claims that minimum contrast estimatorsconcentrate with a large probability on the level set of the rate function. Intypical situations, every such set is a root-n neighborhood of the parameterof interest. We also show that the obtained bounds can help for boundingthe estimation risk, constructing confidence sets for the underlying param-eters. Our general results are illustrated for the case of an i.i.d. sample. Wealso consider several popular examples including least absolute deviationestimation and the problem of estimating the location of a change point.What we obtain in these examples slightly differs from the usual asymp-totic results presented in statistical literature. This difference is due to theunboundness of the parameter set and a possible model misspecification.
AMS 2000 subject classifications:
Primary 62F10; secondary 62J12,62F25.
Keywords and phrases: exponential risk bounds, rate function, quasimaximum likelihood, smooth contrast.
1. Introduction
One of the most fundamental ideas in statistics is to describe an unknown dis-tribution IP of the observed data Y ∈ IR n with the help of a simple parametricfamily ( IP θ , θ ∈ Θ ) , where Θ is a subset in a finite dimensional space, say, in IR p . In this situation, the statistical model is characterized by the value of the imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators parameter θ ∈ Θ and the statistical inference about IP is reduced to recover-ing θ . The standard likelihood approach suggests to estimate θ by maximizingthe corresponding likelihood function. The maximum likelihood estimator canbe generalized in several ways resulting in the so-called minimum contrast and M-estimators ; see Huber (1967) and Huber (1981). The main idea behind thisgeneralization is to estimate the underlying parameter θ by minimizing over Θ a contrast function − L ( Y , θ ) : e θ = argmin θ ∈ Θ {− L ( Y , θ ) } = argmax θ ∈ Θ L ( Y , θ ) . (1.1)The negative sign in this notation comes from the main example which we havein mind when L ( Y , θ ) is the log-likelihood or quasi log-likelihood. A naturalcondition on the contrast function is that its expectation under the true measure IP θ is minimized at the true parameter θ , i.e. θ = argmax θ ∈ Θ IE θ L ( Y , θ ) . (1.2)If L ( Y , θ ) is log-likelihood ratio, that is, L ( Y , θ ) = log dIP θ dIP θ ( Y )then the value − IE θ L ( θ , θ ) coincides with the Kullback-Leibler divergence K ( IP θ , IP θ ) between IP θ and IP θ . It is well known that K ( IP θ , IP θ ) is alwaysnon-negative and K ( IP θ , IP θ ) = 0 if and only if IP θ = IP θ .If the distribution IP does not belong to the parametric family ( IP θ , θ ∈ Θ ) ,then the target of estimation can be naturally defined as the point of minimumof − IE L ( Y , θ ) . We will see that this point θ indeed minimizes a specialdistance between the underlying measure IP and the measures IP θ from thegiven parametric family.The classical parametric statistical theory focuses mostly on asymptotic prop-erties of the difference between e θ and the true value θ as the sample size n tends to infinity. There is a vast literature on this issue. We only mentionthe book Ibragimov and Khas’minskij (1981), which provides a comprehensivestudy of asymptotic properties of maximum likelihood and Bayesian estima-tors. Typical results claim that the maximum likelihood and Bayes estimatorsare asymptotically optimal under certain regularity conditions. Large deviationresults about minimum contrast estimators can be found in Jensen and Wood(1998) and Sieders and Dzhaparidze (1987), while subtle small sample size prop-erties of these estimators are presented in Field (1982) and Field and Ronchetti(1990).Another stream of the literature considers minimum contrast estimators in ageneral i.i.d. situation, when the parameter set Θ is a subset of some functionalspace. We mention the papers Van de Geer (1993), Birg´e and Massart (1993),Birg´e and Massart (1998), Birg´e (2006) and references therein. The studiesmostly focused on the concentration properties of the maximum max θ L ( Y , θ ) imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators rather on the properties of the estimator e θ which is the point of maximumof L ( Y , θ ) . The established results are based on deep probabilistic facts fromthe empirical process theory; see e.g. van der Vaart and Wellner (1996). Inthis paper we also focus on the properties of the maximum of L ( Y , θ ) over θ ∈ Θ . However, we do not assume any particular structure of the contrast. Ourbasic result claims that if for every θ ∈ Θ the differences L ( Y , θ ) − L ( Y , θ )has exponential moments, then under rather general and mild conditions, themaximum max θ { L ( Y , θ ) − L ( Y , θ ) } has similar exponential moments. In whatfollows, to keep notation shorter, we omit the argument Y in the contrastfunction L ( Y , θ ) writing L ( θ ) instead of L ( Y , θ ) . However, one has to keepin mind that L ( θ ) is a random field that depends on the observed data Y . Wealso denote L ( θ , θ ) = L ( θ ) − L ( θ ) . To explain the main idea in this paper, introduce the function M ( µ, θ , θ ) def = − log IE exp (cid:8) µL ( θ , θ ) (cid:9) . Let µ ∗ be a maximizer of this function w.r.t. µ , i.e. µ ∗ ( θ ) def = argmax µ M ( µ, θ , θ ) . (1.3)The rate function is defined via the Legendre transform of L ( θ , θ ) : M ∗ ( θ , θ ) def = max µ M ( µ, θ , θ ) = − log IE exp (cid:8) µ ∗ ( θ ) L ( θ , θ ) (cid:9) . (1.4)Similar notions have already appeared in Chernoff (1952) and Bahadur (1960)for studying the models with i.i.d. observations.Obviously M ∗ ( θ , θ ) ≥ M (0 , θ , θ ) = 0 . The following identity follows im-mediately from the above definition: IE exp n µ ∗ ( θ ) L ( θ , θ ) + M ∗ ( θ , θ ) o = 1 , θ ∈ Θ. We aim to extend this pointwise identity to the supremum over θ ∈ Θ , whichparticularly enables us to replace θ with the estimator e θ . Unfortunately, insome situations, IE exp sup θ (cid:8) µ ∗ ( θ ) L ( θ , θ ) + M ∗ ( θ , θ ) (cid:9) = ∞ . We illustratethis fact by some examples for a simple Gaussian liner model. To illustrate how the quantities µ ∗ ( θ ) and M ∗ ( θ , θ ) can be computed let usconsider the simplest case where L ( θ , θ ) is a Gaussian field. Example 1.1. [Gaussian contrast] Let for each pair θ , θ ′ ∈ Θ , the difference L ( θ , θ ′ ) = L ( θ ) − L ( θ ′ ) is a Gaussian random variable. In this case we call L ( θ ) imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators a Gaussian contrast. With M ( θ , θ ′ ) = − IEL ( θ , θ ′ ) , D ( θ , θ ′ ) = Var L ( θ , θ ′ ) ,the random variable L ( θ , θ ′ ) is normal N (cid:0) − M ( θ , θ ′ ) , D ( θ , θ ′ ) (cid:1) . Moreover, M ( µ, θ , θ ) = − log IE exp (cid:8) µL ( θ , θ ) (cid:9) = µM ( θ , θ ) − µ D ( θ , θ ) / µ ∗ ( θ ) , M ∗ ( θ , θ ) defined in (1.3)–(1.4) can be easily computed: µ ∗ ( θ ) = argmax µ ≥ (cid:8) µM ( θ , θ ) − µ D ( θ , θ ) / (cid:9) = M ( θ , θ ) D ( θ , θ ) , M ∗ ( θ , θ ) = sup µ ≥ M ( µ, θ , θ ) = M ( θ , θ )2 D ( θ , θ ) . The formula can be further simplified if L ( θ ) is a Gaussian log-likelihood. Example 1.2. [Gaussian model] Let L ( θ , θ ) = log dIP θ dIP θ ( Y )be a Gaussian random variable for any θ ∈ Θ , and in addition IP = IP θ forsome θ ∈ Θ . As in previous example, let M ( θ , θ ) and D ( θ , θ ) denote meanand variance of L ( θ , θ ) . The likelihood property implies IE θ exp { L ( θ , θ ) } =1 yielding M ( θ , θ ) = D ( θ , θ ) / µ ∗ ( θ ) ≡ / M ∗ ( θ , θ ) = M ( θ , θ ) / Example 1.3. [Linear Gaussian model] Consider the linear model Y = Xθ + σ ε , where Y ∈ IR n , θ ∈ IR p , X is a known n × p matrix, and ε is a whiteGaussian noise in IR n , i.e. ε i are i.i.d. standard normal. Then L ( θ ) = −k Y − Xθ k n / (2 σ ) , where k · k n denotes the standard Euclidian norm in IR n . Obviously M ( θ , θ ) = k X ( θ − θ ) k n / (2 σ ) , D ( θ , θ ) = k X ( θ − θ ) k n /σ , and thus (see Example 1.2) M ∗ ( θ , θ ) = k X ( θ − θ ) k n / (8 σ ) . The log-likelihood ratio can be written as L ( θ , θ ) = h X ( θ − θ ) , ε i n /σ − k X ( θ − θ ) k n / (2 σ ) . Let k denote the rank of the matrix X ⊤ X . Obviously k ≤ p and the vectors X ( θ − θ ) span a linear subspace X in IR n of dimension k . Denote by Π imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators the projector in IR n on X . Thensup θ ∈ IR p (cid:8) µ ∗ ( θ ) L ( θ , θ ) + M ∗ ( θ , θ ) (cid:9) = sup θ ∈ IR p n h X ( θ − θ ) , ε i n σ − k X ( θ − θ ) k n σ o = sup u ∈ IR n n h Π u , ε i n σ − k Π u k n σ o = sup u ∈ IR n n h Π u , Π ε i n σ − k Π u k n σ o = k Π ε k n / , where the maximum is attained at any u ∈ IR n such that Π u = 2 σΠ ε . Itis well known that k Π ε k n follows χ - distribution with k degree of freedomand IE θ exp sup θ (cid:8) µ ∗ ( θ ) L ( θ , θ ) + M ∗ ( θ , θ ) (cid:9) = IE exp (cid:8) k Π ε k n / (cid:9) = ∞ . However, for any positive s < θ (cid:8) µ ∗ ( θ ) L ( θ , θ ) + s M ∗ ( θ , θ ) (cid:9) = sup u ∈ IR n (cid:8) h Π u , ε i n / (2 σ ) − (2 − s ) k Π u k k / (8 σ ) (cid:9) = k Π ε k n / (4 − s ) , and thus IE θ exp sup θ (cid:8) µ ∗ ( θ ) L ( θ , θ ) + s M ∗ ( θ , θ ) (cid:9) = IE exp n k Π ε k n − s o = (cid:16) − s − s (cid:17) k/ . An important feature of this inequality is that it only involves the effectivedimension k of the parameter space and does not depend on the design X ,noise level σ , sample size n , etc. Later we show that such a behaviour of thelog-likelihood is not restricted to Gaussian linear models and it can be provedfor a quite general statistical set-up. The examples from Section 1.1 suggest to consider in the general situation themaximum of the random field µ ∗ ( θ ) L ( θ , θ ) + s M ∗ ( θ , θ ) for s < ρ ∈ (0 , IE sup θ ∈ Θ exp n ρ (cid:2) µ ∗ ( θ ) L ( θ , θ ) + s M ∗ ( θ , θ ) (cid:3)o ≤ C ( ρ, s ) , (1.5)where C ( ρ, s ) is a constant that can be easily controlled in typical examples.This result particularly yields that µ ∗ ( e θ ) L ( e θ , θ ) and M ∗ ( e θ , θ ) have bounded imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators exponential moments. Another corollary of this fact is that e θ concentrates onthe sets A ( z , θ ) = { θ : M ∗ ( θ , θ ) ≤ z } for sufficiently large z in the sense thatthe probability IP (cid:0)e θ A ( z , θ ) (cid:1) is exponentially small in z . Usually every suchconcentration set is a root-n vicinity of the point θ . See Section 2.3 for preciseformulations. Ibragimov and Khas’minskij (1981) stated a version of (1.5) forthe i.i.d. case and used it to prove consistency of e θ .We briefly comment on some useful features of the basic inequality (1.5). Firstof all this bound is non-asymptotic and may be used even if the sample size issmall or moderate. It is also applicable in the situation when the parametricmodeling assumption is misspecified. Our results may be used in such cases aswell with the “true” parameter θ defined as the maximum point of the contrastexpected value: θ = argmax θ IEL ( θ ) .Another interesting question is about the accuracy of estimation when theparameter set Θ is not compact. The typical results in the classical parametrictheory has been established for compact parametric sets since this assumptionsimplifies considerably the conditions and the technical tools. There exist veryfew results for the case of non-compact sets. See Ibragimov and Khas’minskij(1981) for an example. Our conditions are quite mild and particularly, the pa-rameter set can be non-compact and unbounded. Moreover, we present someexamples in Section 4 illustrating that the quality of the minimum contrastestimation can heavily depend on topological properties of Θ and on the be-havior of the rate function M ∗ ( θ , θ ) for large θ . The corresponding accuracyof estimation can be different from the classical root- n behavior.The paper is organized as follows. The main result is presented in Section 2.Section 2.3 presents some useful corollaries of (1.5) describing concentrationproperties of e θ , some risk bounds, confidence sets for the target parameter θ based on the L ( e θ , θ ) . Section 2.4 specifies the approach to the important caseof a smooth contrast. In this situation the main conditions ensuring (1.5) aresubstantially simplified. Section 3 illustrates how our approach applies to theclassical i.i.d. case while Section 4 presents some applications of the generalexponential bound to three particular problems: estimation of the median, ofthe scale parameter of an exponential model and of the change point location.Although these examples have already been studied, the proposed approachreveals some new features of the classical least squares and least absolute devi-ation estimators in the cases when the parametric assumption is misspecified orthe parameter set is not compact. In the case of median estimation the resultapplies even if the observations do not have the first moment. The last examplein this section considers the prominent change point problem. We particularlyshow that in the case when the size of the jump is completely unknown, theaccuracy of estimation of its location differs from the well known parametricrate 1 /n and it depends on the distance of the change point to the edge of theobservation interval and involves an extra iterated-log factor. imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators
2. Risk bound for the minimum contrast
This section presents a general exponential bound on the minimum contrastvalue in a rather general set-up. Let − L ( θ ) , θ ∈ Θ, be a random contrastfunction of a finite dimensional parameter θ ∈ Θ ⊂ IR p given on some prob-ability space ( Ω, F , IP ) . We also assume that L ( θ ) is separable random fieldand IEL ( θ ) exists for all θ ∈ Θ . The minimum contrast estimator is definedas a minimizer of − L ( θ ) and the target of estimation is the value θ whichminimizes the expectation − IEL ( θ ) . It is clear that for any θ ◦ ∈ Θ e θ = argmax θ ∈ Θ L ( θ , θ ◦ ) and θ = argmax θ ∈ Θ IEL ( θ , θ ◦ ) . Our study focuses on the value of maximum in θ of the random field L ( θ , θ ) : L ( e θ , θ ) = sup θ ∈ Θ L ( θ , θ ) = sup θ ∈ Θ (cid:8) L ( θ ) − L ( θ ) (cid:9) . By definition, L ( e θ , θ ) is a non-negative random variable. The main goal of this paper is to obtain exponential bounds for the supremumin θ of the random field L ( θ , θ ) , without specifying a particular structureof the model or contrast function L ( θ ) . Instead we impose some conditions offinite exponential moments for the increments L ( θ , θ ′ ) = L ( θ ) − L ( θ ′ ) . With M ( µ, θ , θ ) = − log IE exp (cid:8) µL ( θ , θ ) (cid:9) , the global exponential moment condi-tion reads as follows: ( EG ) For any θ ∈ Θ the set Υ ( θ ) = (cid:8) µ ∈ (0 , ∞ ) : M ( µ, θ , θ ) < ∞ (cid:9) isnon-empty. Note that Υ ( θ ) is an interval because M ( µ, θ , θ ) < ∞ implies M ( µ ′ , θ , θ ) < ∞ for all µ ′ < µ . Moreover, in the basic example of the log-likelihood contrast,it holds M (1 , θ , θ ) = − log IE θ (cid:0) dIP θ /dIP θ (cid:1) ≤ θ and the condition( EG ) is fulfilled automatically with (0 , ⊂ Υ ( θ , θ ) .Under the condition ( EG ) the functions µ ∗ ( θ ) and M ∗ ( θ , θ ) from (1.3)–(1.4) are non-trivial and correctly defined. Usually these functions can be easilyevaluated in a small neighborhood of the target parameter θ . However, it mightbe difficult to compute them for all θ ∈ Θ . Therefore, in the sequel we proceedwith another function µ ( θ ) , which can be viewed as a rough approximationof µ ∗ ( θ ) . Section 4 provides some examples. So, let µ ( θ ) be a given functiontaking values in Υ ( θ , θ ) . Define M ( θ , θ ) def = M ( µ ( θ ) , θ , θ ) = − log IE exp (cid:8) µ ( θ ) L ( θ , θ ) (cid:9) . The most important requirement on µ ( θ ) is that M ( θ , θ ) is positive andincreases as θ moves away from θ . By definition, for any θ ∈ Θ , IE exp n µ ( θ ) L ( θ , θ ) + M ( θ , θ ) o = 1 . (2.1) imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators This means that the random function µ ( θ ) L ( θ , θ ) + M ( θ , θ ) has boundedexponential moments for every θ . We aim to derive a similar fact for the supre-mum of this function in θ ∈ Θ . More precisely, we are interested in boundingthe following value: Q ( ρ, s ) def = IE sup θ ∈ Θ exp n ρ (cid:2) µ ( θ ) L ( θ , θ ) + s M ( θ , θ ) (cid:3)o , (2.2)where ρ, s ∈ [0 ,
1] .We begin with a rough upper bound for a special case of a discrete parameterset.
Theorem 2.1.
Assume ( EG ) and let Θ be a discrete set. Then for any s < Q (1 , s ) = IE sup θ ∈ Θ exp (cid:8) µ ( θ ) L ( θ , θ ) + s M ( θ , θ ) (cid:9) ≤ X θ ∈ Θ exp (cid:8) − (1 − s ) M ( θ , θ ) (cid:9) . (2.3) Proof.
Since IE exp (cid:8) µ ( θ ) L ( θ , θ ) + s M ( θ , θ ) (cid:9) = exp (cid:8) − (1 − s ) M ( θ , θ ) (cid:9) , weobviously have Q (1 , s ) ≤ X θ ∈ Θ IE exp (cid:8) µ ( θ ) L ( θ , θ ) + s M ( θ , θ ) (cid:9) = X θ ∈ Θ exp (cid:8) − (1 − s ) M ( θ , θ ) (cid:9) . Usually, the function M ( θ , θ ) rapidly grows as θ moves away from θ .This property is often sufficient to bound the sum in the right hand-side of (2.3)by a fixed constant.Although Theorem 2.1 is a rather simple corollary of (2.1), the bound (2.3)yields a number of useful statistical corollaries. Some of them are presented inSection 2.3. However, even in discrete case, this bound may be too rough (seethe example in Section 4.3). It is also clear that (2.3) is useless in the continuouscase. The next section demonstrates how the bound (2.3) can be extended tothe case of an arbitrary parameter set. Here we aim to extend the exponential bound (2.3) from the discrete case to thecase of an arbitrary finite dimensional parameter set. We apply the standardapproach which evaluates the supremum over the whole parameter set Θ via aweighted sum of local maxima.Define for any θ , θ ′ ∈ Θζ ( θ ) def = µ ( θ ) (cid:8) L ( θ , θ ) − IEL ( θ , θ ) (cid:9) , ζ ( θ , θ ′ ) def = ζ ( θ ) − ζ ( θ ′ ) . Note that the dependence of ζ ( θ , θ ′ ) on θ disappears if µ ( θ ) = µ ( θ ′ ) . imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators Usually the local properties of the centered contrast difference ζ ( θ , θ ′ ) arecontrolled by the variance D ( θ , θ ′ ) = Var ζ ( θ , θ ′ ) , which defines a semi-metricon Θ see, e.g. van der Vaart and Wellner (1996). However, in some cases, itis more convenient to deal with a slightly different metric which we denote by S ( θ , θ ′ ) . This metric usually bounds the standard deviation D ( θ , θ ′ ) fromabove. Sections 2.4 and 3 present some typical examples of constructing such ametric. Below in this section we assume that the metric S ( · , · ) is given. Definefor any point θ ◦ ∈ Θ and a radius ǫ > B ( ǫ, θ ◦ ) = (cid:8) θ : S ( θ , θ ◦ ) ≤ ǫ (cid:9) . To control the local behavior of the process L ( θ ) within any such ball B ( ǫ, θ ◦ ) ,we impose the following local exponential condition: ( EL ) There exist ǫ > and λ > such that for any θ ◦ ∈ Θ , ν > , and λ ≤ λ sup θ , θ ′ ∈ B ( ǫ, θ ◦ ) log IE exp (cid:8) λξ ( θ , θ ′ ) (cid:9) ≤ ν λ , where ξ ( θ , θ ′ ) def = ζ ( θ , θ ′ ) S ( θ , θ ′ ) . In fact, this condition only requires that every random increment ξ ( θ , θ ′ ) hasbounded exponential moment for some λ > λ for λ ≤ λ .For a fixed θ ◦ ∈ Θ and ǫ ′ ≤ ǫ , by N ( ǫ ′ , ǫ, θ ◦ ) we denote the local coveringnumber defined as the minimal number of balls B ( ǫ ′ , · ) required to cover theball B ( ǫ, θ ◦ ) . With this covering number we associate the local entropy Q ( ǫ, θ ◦ ) def = ∞ X k =1 − k log N (2 − k ǫ, ǫ, θ ◦ ) . We begin with a local result which bounds the maximum of the process L ( θ )over a local ball B ( ǫ, θ ◦ ) . Theorem 2.2.
Assume ( EG ) and ( EL ) with some ǫ > , ν ≥ , and λ > . Let also ρ < be such that ρǫ/ (1 − ρ ) ≤ λ . Then for any θ ◦ ∈ Θ log IE sup θ ∈ B ( ǫ, θ ◦ ) exp n ρ (cid:2) µ ( θ ) L ( θ , θ ) + M ( θ , θ ) (cid:3)o ≤ ν ǫ ρ − ρ + (1 − ρ ) Q ( ǫ, θ ◦ ) . The next theorem is the global bound which generalizes the upper boundfrom Theorem 2.1.
Theorem 2.3.
Assume ( EG ) and ( EL ) for some λ, ν , ǫ , and let π ( · ) be a σ -finite measure on Θ such that sup θ ∈ B ( ǫ, θ ◦ ) π (cid:0) B ( ǫ, θ ) (cid:1) π (cid:0) B ( ǫ, θ ◦ ) (cid:1) ≤ ν (2.4) imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators for some ν ∈ [1 , ∞ ) . Let for some ρ, s < , it holds ρǫ/ (1 − ρ ) ≤ λ and thefunction M ǫ ( θ ◦ , θ ) = inf θ ∈ B ( ǫ, θ ◦ ) M ( θ , θ ) fulfill H ǫ ( ρ, s ) def = log Z Θ π (cid:0) B ( ǫ, θ ) (cid:1) exp (cid:8) − ρ (1 − s ) M ǫ ( θ , θ ) (cid:9) π ( d θ ) ! < ∞ . (2.5) Let finally Q ( ǫ, θ ◦ ) ≤ Q ( ǫ ) for all θ ∈ Θ . Then the value Q ( ρ, s ) from (2.2)satisfies Q ( ρ, s ) ≤ ν ǫ ρ − ρ + (1 − ρ ) Q ( ǫ ) + log( ν ) + H ǫ ( ρ, s ) . (2.6)As in Theorem 2.1, proper growth conditions on the function M ( θ , θ ) ensurethat the integral H ǫ ( ρ, s ) in (2.6) is bounded by a fixed constant. This section demonstrates how Theorems 2.1–2.3 can be used in the statisticalanalysis of the minimum contrast estimator e θ = argmax θ ∈ Θ L ( θ ) . We showthat probabilistic properties of this estimator may be easily derived from thefollowing inequality: for prescribed ρ, s < IE exp n ρ (cid:2) µ ( e θ ) L ( e θ , θ ) + s M ( e θ , θ ) (cid:3)o ≤ Q ( ρ, s ) , (2.7)which obviously follows from Theorem 2.3 and the definition (2.2) of Q ( ρ, s ) . A first corollary of Theorem 2.1 presents exponential bounds separately for theminimum contrast L ( e θ , θ ) and for the “natural” loss M ( e θ , θ ) . Corollary 2.4.
For any ρ, s < IE exp n ρµ ( e θ ) L ( e θ , θ ) o ≤ Q ( ρ, , (2.8) IE exp n ρs M ( e θ , θ ) o ≤ Q ( ρ, s ) . (2.9)Substituting s = 0 in (2.7) yields the first bound. To prove the second one,notice that L ( e θ , θ ) ≥ { x ≥ } ≤ exp( µx ) for any µ > IE exp (cid:8) ρs M ( e θ , θ ) (cid:9) = IE exp (cid:8) ρs M ( e θ , θ ) (cid:9) (cid:8) L ( e θ , θ ) ≥ (cid:9) ≤ IE exp (cid:8) ρs M ( e θ , θ ) + ρµ ( e θ ) L ( e θ , θ ) (cid:9) ≤ Q ( ρ, s ) . Notice the exponential bound (2.9) implies a similar risk bound for a poly-nomial loss (cid:12)(cid:12) M ( e θ , θ ) (cid:12)(cid:12) r ; see Lemma 5.7 for a precise result. imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators e θ The assertion (2.7) can be used for establishing the concentration property ofthe estimator e θ . Consider the sets A ( r, θ ) def = { θ : M ( θ , θ ) ≤ r } for some r > e θ leaves the set A ( r, θ ) with the exponentially small probability of order exp( − ρsr ) . Corollary 2.5.
For any ρ, s < , it holds IP (cid:0)e θ A ( r, θ ) (cid:1) ≤ Q ( ρ, s ) exp( − ρsr ) . Proof.
The inequalities L ( e θ , θ ) ≥ M ( e θ , θ ) > r for e θ A ( r, θ ) imply IE e ρsr (cid:16)e θ A ( r, θ ) (cid:17) ≤ IE exp n ρ (cid:2) µ ( e θ ) L ( e θ , θ (cid:1) + s M ( e θ , θ ) (cid:3)o ≤ Q ( ρ, s )and the assertion follows.In typical situations, M ( θ , θ ) is proportional to the sample size n and eachset A ( r, θ ) corresponds to a root- n neighborhood of the point θ . See theSection 3 for applications related to the i.i.d. case. L ( e θ , θ )Next we discuss how the exponential bound (2.7) can be used for constructingthe confidence sets for the target θ based on the optimized contrast L ( e θ , θ ) .The inequality (2.8) claims that L ( e θ , θ ) is stochastically bounded. This justi-fies the following construction of confidence sets: E ( z ) = (cid:8) θ ∈ Θ : L ( e θ , θ ) ≤ z (cid:9) . To evaluate the covering probability, consider first the case when µ ( θ ) ≥ µ ∗ > θ ∈ Θ . The next result claims that E ( z ) does not cover the truevalue θ with a probability which decreases exponentially with z . Corollary 2.6.
Assume that µ ( θ ) ≥ µ ∗ > . Then for any z > and any ρ < IP (cid:0) θ / ∈ E ( z ) (cid:1) ≤ Q ( ρ,
0) exp (cid:8) − ρµ ∗ z (cid:9) . Proof.
The bound (2.8) implies IP (cid:0) θ / ∈ E ( z ) (cid:1) = IP (cid:0) L ( e θ , θ ) > z (cid:1) ≤ IE exp (cid:8) − ρµ ( e θ ) z (cid:9) exp (cid:8) ρµ ( e θ ) L ( e θ , θ ) (cid:9) ≤ exp (cid:8) − ρµ ∗ z (cid:9) IE exp (cid:8) ρµ ( e θ ) L ( e θ , θ ) (cid:9) ≤ Q ( ρ,
0) exp (cid:8) − ρµ ∗ z (cid:9) as required. imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators In the case when the function µ ( θ ) cannot be uniformly bounded from be-low by a positive constant, we assume that such a bound exists for every set A ( r, θ ) . Denote µ ∗ ( r ) def = inf θ ∈ A ( r, θ ) µ ( θ ) . Then IP (cid:0) θ / ∈ E ( z ) (cid:1) ≤ IP (cid:0) θ / ∈ E ( z ) , e θ ∈ A ( r, θ ) (cid:1) + IP (cid:0)e θ / ∈ A ( r, θ ) (cid:1) and combining Corollaries 2.5–2.6 yields Corollary 2.7.
For any z > and any ρ, s < and any r > IP (cid:0) θ / ∈ E ( z ) (cid:1) ≤ Q ( ρ,
0) exp (cid:8) − ρµ ∗ ( r ) z (cid:9) + Q ( ρ, s ) exp (cid:8) − ρsr (cid:9) . A reasonable choice of r in this bound is given by the balance relation µ ∗ ( r ) z = sr . With this choice the bound of Corollary 2.6 may by replacedby IP (cid:0) θ / ∈ E ( z ) (cid:1) ≤ Q ( ρ, s ) exp (cid:8) − ρµ ∗ ( r ) z (cid:9) . This section deals with the case when the contrast L ( θ ) is a smooth functionof θ . In this situation, the local condition ( EL ) is easy to verify. Moreover, thelocal balls B ( ǫ, θ ) nearly coincide with usual Euclidean ellipsoids and the localentropy can be easily bounded by an absolute constant only depending on thedimensionality p of the parameter space Θ .Suppose Θ is a convex set in IR p and the function L ( θ ) along with thescaling factor µ ( θ ) are differentiable w.r.t. θ . Below, the symbol ∇ stands forthe gradient w.r.t. θ .Define V ( θ ) def = IE ∇ ζ ( θ ) (cid:2) ∇ ζ ( θ ) (cid:3) ⊤ ,H ( λ, γ, θ ) def = log IE exp (cid:26) λ γ ⊤ ∇ ζ ( θ ) p γ ⊤ V ( θ ) γ (cid:27) . for every unit vector γ ∈ IR p . To simplify the presentation, here and in whatfollows we assume that every matrix V ( θ ) is non-degenerated. It is easy to seethat H (0 , γ, θ ) = 0 , ∂H (0 , γ, θ ) /∂λ = 0 , and ∂ H ( λ, γ, θ ) ∂ λ (cid:12)(cid:12)(cid:12)(cid:12) λ =0 = 4 γ ⊤ IE ∇ ζ ( θ ) (cid:2) ∇ ζ ( θ ) (cid:3) ⊤ γγ ⊤ V ( θ ) γ = 4 . Therefore for small λ H ( λ, γ, θ ) ≈ λ . Below we assume that this propertyis fulfilled uniformly in θ ∈ Θ and in γ over the unit sphere S p in IR p . imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators ( ED ) There exists λ > such that for some ν ≥ uniformly in θ ∈ Θ sup | λ |≤ λ sup γ ∈ S p λ − H ( λ, γ, θ ) ≤ ν . (2.10)Now we define the metric S ( θ , θ ′ ) by S ( θ , θ ′ ) def = sup t ∈ [0 , ( θ − θ ′ ) ⊤ V (cid:2) (1 − t ) θ ′ + t θ (cid:3) ( θ − θ ′ ) . (2.11)Define also for every θ ◦ ∈ Θ and ǫ > B ′ ( ǫ, θ ◦ ) by B ′ ( ǫ, θ ◦ ) = n θ : ( θ − θ ◦ ) ⊤ V ( θ ◦ ) ( θ − θ ◦ ) ≤ ǫ o . Obviously B ( ǫ, θ ◦ ) ⊆ B ′ ( ǫ, θ ◦ ) .In what follows, we assume that the radius ǫ can be chosen in such a waythat the functions V ( θ ) and M ( θ , θ ) have bounded fluctuations within theball B ′ ( ǫ, θ ◦ ) for every θ ◦ ∈ Θ . More precisely, for a given function f ( · ) defineits magnitude over B ′ ( ǫ, θ ◦ ) by A ǫ f ( θ ◦ ) def = sup θ , θ ′ ∈ B ′ ( ǫ, θ ◦ ) f ( θ ) f ( θ ′ ) . Similarly, the magnitude of the matrix V ( θ ) over B ′ ( ǫ, θ ◦ ) is computed asfollows A ǫ V ( θ ◦ ) def = sup θ , θ ′ ∈ B ′ ( ǫ, θ ◦ ) sup γ ∈ S p γ ⊤ V ( θ ) γγ ⊤ V ( θ ′ ) γ . Notice that under the condition A ǫ V ( · ) ≤ ν , the topology induced by themetric S ( · , · ) is (locally) equivalent to the Euclidean topology and the set B ( ǫ, θ ◦ ) can be well approximated by the ellipsoid B ′ ( ǫ, θ ◦ ) and computingthe local entropy Q ( ǫ, · ) can be reduced to the Euclidean case; see Lemma 5.4for more detail.Now we are ready to state an exponential bound for the contrast process inthe smooth case. Theorem 2.8.
Assume that ( EG ) and ( ED ) hold true with some ν and λ > . Suppose that there is a constant ǫ > such that ǫρ/ (1 − ρ ) ≤ λ and fora fixed ν ≥ and each θ ∈ Θ , it holds A ǫ V ( θ ) ≤ ν . (2.12) Let for some ρ, s < the function M ǫ ( θ ◦ , θ ) = inf θ ∈ B ( ǫ, θ ◦ ) M ( θ , θ ) fulfill H ǫ ( ρ, s ) def = log (cid:18) ω − p ǫ − p Z Θ p det V ( θ ) exp (cid:8) − ρ (1 − s ) M ǫ ( θ , θ ) (cid:9) d θ (cid:19) < ∞ , where ω p is the Lebesgue measure of the unit ball in IR p . Then it holds Q ( ρ, s ) ≤ (1 − ρ ) Q p + 2 ν ǫ ρ − ρ + 2 p log( ν ) + H ǫ ( ρ, s ) . imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators Remark 2.1.
The conditions of this theorem are very mild. ( EG ) only requiresthat L ( θ , θ ) has exponential moments. ( ED ) requires a similar conditionfor the centered and normalized gradient ∇ L ( θ ) . The inequalities (2.12) areequivalent to uniform continuity of the function V ( θ ) . Remark 2.2.
The presented exponential bound requires that the value H ǫ ( ρ, s )is finite. Fortunately it can be easily checked in typical situations. A typicalexample is given in Section 3 which deals with the i.i.d. case. e θ − θ Our main result controls the risk of the minimum contrast estimator in termsof the rate function M ( θ , θ ) . In the case of the smooth contrast, this resultmay be used to bound the classical estimation loss e θ − θ . The idea is to boundfrom the rate function M ( θ , θ ) by a quadratic function in a vicinity of thepoint θ and next to make use of the concentration property of e θ .Note that for any µ , it obviously holds M ( µ, θ , θ ) = 0 and a simple algebrayields for the gradient of M ( µ, θ , θ ) ∇ M ( µ, θ , θ ) (cid:12)(cid:12) θ = θ = dd θ M ( µ, θ , θ ) (cid:12)(cid:12) θ = θ = − µIE ∇ L ( θ ) (cid:12)(cid:12) θ = θ = − µ ∇ IEL ( θ ) = 0 . So, M ( µ, θ , θ ) can be majorated from below and from above in a vicinityof θ by the Taylor expansion of the second order. The same behavior can beexpected for the optimized rate function M ( θ , θ ) . This argument and theconcentration property from Corollary 2.5 lead to the following result: Corollary 2.9.
Suppose the conditions of Theorem 2.8 are satisfied and alsofor some r > , the function M ( θ , θ ) fulfills M ( θ , θ ) ≥ ( θ − θ ) ⊤ V ( θ − θ ) , θ ∈ A ( r, θ ) , for some positive matrix V . Then for any ρ, s < and z > IP (cid:0) k p V ( e θ − θ ) k > z (cid:1) ≤ Q ( ρ, s ) exp {− ρs min { z , r }} . Proof.
It is obvious that (cid:8) k p V ( e θ − θ ) k > z (cid:9) ⊆ (cid:8) k p V ( e θ − θ ) k > z , e θ ∈ A ( r, θ ) (cid:9) ∪ (cid:8)e θ A ( r, θ ) (cid:9) ⊆ (cid:8) M ( e θ , θ ) > z , e θ ∈ A ( r, θ ) (cid:9) ∪ (cid:8)e θ A ( r, θ ) (cid:9) = (cid:8)e θ A ( r ∧ z , θ ) (cid:9) and the result follows from Corollary 2.7.In the case of i.i.d. observations, the function M ( µ, θ , θ ) and hence thematrix V are proportional to the sample size n and the result of Corollary 2.9automatically yields the root-n consistency of e θ ; see Section 3 for more details. imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators
3. Quasi MLE for i.i.d. data
Let Y = ( Y , . . . , Y n ) be an i.i.d. sample from a distribution P . By IP wedenote the joint distribution of Y . Let also P = ( P θ , θ ∈ Θ ⊂ IR p ) be aparametric family. In contrast to the standard parametric hypothesis whichassumes that P ∈ P , in this section, we focus on the quality of estimationin the case when the underlying measure P does not necessarily belong tothe parametric family P . We will see that in this case the maximum likelihoodmethod estimates the point θ , which minimizes some special distance between P and P θ over θ ∈ Θ .In the rest of this section, the family P and the underlying measure P areassumed to be dominated by a measure P . We denote by p ( y, θ ) and p ( y )the corresponding densities: p ( y, θ ) = dP θ /dP ( y ) , p ( y ) = dP/dP ( y ) . Themaximum likelihood estimator e θ of the underlying parameter θ is computedas follows: e θ = argmax θ ∈ Θ L ( θ ) = argmax θ ∈ Θ n X i =1 ℓ ( Y i , θ ) , where ℓ ( Y, θ ) = log p ( Y, θ ) . Denote ℓ ( Y, θ , θ ′ ) = ℓ ( Y, θ ) − ℓ ( Y, θ ′ ) and m ( µ, θ , θ ) = − log E exp { µℓ ( Y, θ , θ ) } , The i.i.d. structure of the observations Y implies that M ( µ, θ , θ ) = n m ( µ, θ , θ ) . This enables us to redefine the function µ ∗ ( θ ) in terms of the function m ( · , θ , θ )corresponding to the marginal distribution P : µ ∗ ( θ ) = argmax µ m ( µ, θ , θ )and µ ( θ ) can be interpreted as an approximation of µ ∗ ( θ ) . Denote also m ( θ , θ ) = m ( µ ( θ ) , θ , θ ) , and for ζ ( θ ) = µ ( θ ) { ℓ ( Y , θ , θ ) − Eℓ ( Y , θ , θ ) } define v ( θ ) = E ∇ ζ ( θ )[ ∇ ζ ( θ )] ⊤ ,h ( δ, γ ; θ ) = log E exp (cid:26) δ γ ⊤ ∇ ζ ( θ ) p γ ⊤ v ( θ ) γ (cid:27) . Notice that if P coincides with P θ and µ ( θ ) is constant in a vicinity of θ ,then v ( θ ) is the standard Fisher information matrix. One can easily check that h (0 , γ ; θ ) = 0 , ∂h ( δ, γ ; θ ) ∂δ (cid:12)(cid:12)(cid:12)(cid:12) δ =0 = 0 , ∂ h ( δ, γ ; θ ) ∂ δ (cid:12)(cid:12)(cid:12)(cid:12) δ =0 = 4 . imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators It follows from Lemma 5.8 that for any ν > θ ∈ Θ there exists δ ( θ , ν ) > h ( δ, γ ; θ ) ≤ ν δ for all γ ∈ S p and δ ≤ δ ( θ , ν ) . We assume aslightly stronger condition that δ ( θ ) can be taken the same for all θ , i.e.sup θ ∈ Θ sup γ ∈ S p h ( δ, γ ; θ ) ≤ ν δ , δ ≤ δ. (3.1)In some cases, the matrix v ( θ ) should be replaced by its regularization v ( θ )to ensure this property, see Section 4.2 for an example.Independence of the Y i ’s implies that V ( θ ) def = Cov (cid:8) ∇ ζ ( θ ) (cid:9) = nv ( θ ) and H ( λ, γ, θ ) def = log IE exp n λ γ ⊤ ∇ ζ ( θ ) p γ ⊤ V ( θ ) γ o = nh ( n − / λ, γ ; θ )for any λ and any γ ∈ S p . Therefore, if n − / λ ≤ δ , then by (3.1): H ( λ, γ, θ ) ≤ ν λ and the condition ( ED ) is fulfilled with λ ≤ n / δ . Now one can easily refor-mulate Theorem 2.8 in terms of the marginal distribution P . Theorem 3.1.
Assume (3.1) for some δ > and ν ≥ . Suppose that thereare constants ǫ > and ν ≥ such that for each θ ∈ Θ A ǫ v ( θ ) ≤ ν . (3.2) Let also for some s, ρ < such that ǫρ/ (1 − ρ ) ≤ n / δ H ǫ ( ρ, s ) def = log (cid:18) ω − p ǫ − p Z Θ q det (cid:8) nv ( θ ) (cid:9) exp (cid:8) − ρ (1 − s ) n m ǫ ( θ , θ ) (cid:9) d θ (cid:19) < ∞ , where m ǫ ( θ , θ ) = inf θ ′ ∈ B ( ǫ, θ ) m ( θ , θ ) . Then the value Q ( ρ, s ) from (2.2)fulfills log Q ( ρ, s ) ≤ (1 − ρ ) Q p + 2 ν ǫ ρ − ρ + 2 p log( ν ) + H ǫ ( ρ, s ) . The integral in H ǫ ( ρ, s ) can be easily bounded in typical situations. Theresult presented below involves some conditions on the marginal rate function m ( θ , θ ) . Namely, it is assumed that this function is bounded from below bya quadratic polynom in a vicinity A ( r, θ ) def = { θ : m ( θ , θ ) ≤ r } of the point θ for some fixed r > k θ − θ k outside of this neighborhood.In particularly, it is shown in Section 5 that for n sufficiently large H ǫ ( ρ, s ) ≈ log ω − p π p | a r ǫ ρ (1 − s ) | p/ ! . (3.3) imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators Theorem 3.2.
Assume (3.1) and let ρ fulfill ρ/ (1 − ρ ) ≤ nδ . Suppose that(3.2) holds with ǫ = p (1 − ρ ) /ρ . Let for some r > , there are a positivematrix v and a constant a r > such that v ( θ ) ≤ v , m ( θ , θ ) ≥ a r ( θ − θ ) ⊤ v ( θ − θ ) , ∀ θ ∈ A ( r, θ ) . Let for some β > , hold: C r ( β ) def = Z Θ \ A ( r, θ ) q det (cid:8) v ( θ ) (cid:9) exp {− β m ǫ ( θ , θ ) } d θ < ∞ . Finally, let n be sufficiently large to ensure b r ( n ) def = ρ (1 − s ) nr − βr − a − r ǫ − ( p/
2) log n ≤ . (3.4) Then for some C depending on a r , ν , ν , C r ( β ) only, it holds log Q ( ρ, s ) ≤ Cp + p (cid:0) | (1 − ρ )(1 − s ) | − (cid:1) , This bound together with Corollary 2.9 yields IP (cid:0) n a r k v / ( e θ − θ ) k > z + pC ( ρ, s ) (cid:1) ≤ exp {− ρs min { z , r √ n }} with C ( ρ, s ) = C +log (cid:0) | (1 − ρ )(1 − s ) | − (cid:1) / e θ in a rather strong sense.
4. Examples
This section illustrates how the exponential bounds can be applied to someparticular situations. To simplify technical details, we do not try to cover themost general case. Rather we aim to show that our basic conditions can be easilyverified in typical situations.
The exponential model assumes that the observations Y = ( Y , . . . , Y n ) arei.i.d. exponential random variables from the exponential law P θ with an un-known parameter θ ∈ IR + : P θ ( Y i > y ) = exp( − θy ) . In this example we focuson the classical parametric set-up assuming that the underlying measure IP coincides with the product of IP θ for some θ ∈ IR + . The corresponding max-imum likelihood contrast is given by L ( θ ) = n X i =1 ℓ ( Y i , θ ) = − θ n X i =1 Y i + n log( θ ) imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators yielding e θ = n (cid:30) n X i =1 Y i , L ( e θ, θ ) = n log( e θ/θ ) + n ( θ/ e θ −
1) = n K ( e θ, θ ) , where K ( θ, θ ′ ) = θ ′ /θ − − log( θ ′ /θ ) is the Kullback-Leibler divergence betweenthe exponential laws P θ and P θ ′ .Define h ( δ ) def = log IE exp (cid:8) − δ ( θ Y − (cid:9) . Then, with u = θ/θ − m ( µ, θ, θ ) def = − log E θ exp (cid:8) µℓ ( θ, θ ) (cid:9) = µ [ u − log(1 + u )] − h ( µu ) . Therefore, with m ∗ ( u ) = max µ (cid:8) µ [ u − log(1 + u )] − h ( µu ) (cid:9) ,µ ∗ ( u ) = argmax µ (cid:8) µ [ u − log(1 + u )] − h ( µu ) (cid:9) , the optimal choice of µ ( θ ) is given by µ ∗ ( θ ) = µ ∗ ( u ) leading to m ∗ ( θ, θ ) = m ∗ ( u ) for u = θ/θ − m ∗ ( u ) . Simple algebra yields for Y ∼ Exp ( θ ) h ( δ ) = δ − log(1 + δ ) , m ( µ, θ, θ ) = log(1 + µu ) − µ log(1 + u ) , so that µ ∗ ( u ) = argmax µ (cid:8) log(1 + µu ) − µ log(1 + u ) (cid:9) = u − log(1 + u ) u log(1 + u ) . To simplify the calculations, we proceed further with the suboptimal choice µ ( θ ) ≡ µ = 1 / µ ∗ ( θ ) = µ ∗ ( u ) leading to m ( θ, θ ) def = m ( µ, θ, θ ) = m ( u ) with m ( u ) def = log(1 + u/ − . u ) = 12 log (cid:18) u u ) (cid:19) for u = θ/θ − > − m ( u ) ≥ c u for | u | ≤ m ( u ) ≥ c log(1 + u ) for u ≥ c , c > ζ ( θ ) def = µ (cid:8) ℓ ( Y , θ ) − IEℓ ( Y , θ ) (cid:9) = − µθ ( Y − /θ ) , ∇ ζ ( θ ) = − µ ( Y − /θ )so that with σ = Var Y = 1 /θ it holds v ( θ ) def = IE (cid:2) ∇ ζ ( θ ) (cid:3) ≡ µ σ =1 / (4 θ ) , log IE exp (cid:8) δ ∇ ζ ( θ ) / p v ( θ ) (cid:9) ≡ h ( δ ) , imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators and the condition (3.1) is obviously satisfied with some ν < ∞ . Similarly, theconditions (5.4) through (3.4) can be easily verified and Theorem 3.2 appliedwith s = 0 yields IE exp (cid:8) ρL ( e θ, θ ) / (cid:9) ≡ IE exp (cid:8) ρn K ( e θ, θ ) / (cid:9) ≤ C (1 − ρ ) / . (4.1)An important feature of this result is that it applies for the unbounded andnon-compact parameter set (0 , + ∞ ) . Another corollary of (4.1) is that the trueparameter θ is covered with a high probability by the confidence set E ( z ) ofthe form E ( z ) = { θ ∈ Θ : θ/ e θ − − log( θ/ e θ ) ≤ z /n } provided that z is sufficiently large. Median or more generally quantile estimation is known to be more robust andstable against outliers and it is frequently used in econometric studies; seeKoenker (2005), Koenker and Xiao (2006).Suppose we are given a sample Y = ( Y , . . . , Y n ) . In the problem of medianestimation, these random variables are assumed i.i.d. and we are interested inestimating the median θ which is a root of the equation P ( Y ≤ θ ) = P ( Y ≥ θ ) . Alternatively, the median minimizes the value E | Y − θ | provided that theexpectation of | Y | is finite. This remark leads to the natural estimator e θ ofthe median as the minimizer of the contrast − L ( θ ) = P ni =1 | Y i − θ | : e θ = argmax θ L ( θ ) = argmin θ n X i =1 | Y i − θ | . If the Y i ’s are i.i.d. with the Laplace density exp (cid:0) −| y − θ | (cid:1) / L ( θ )coincides (up to a constant factor) with the log-likelihood. In the general case, L ( θ ) can be treated as a quasi log-likelihood contrast. Later we also brieflycomment on the case when the Y i ’s are not i.i.d.Assume first that Y i has the density p θ ( y ) = p ( y − θ ) where p ( · ) is a centrallysymmetric function. To simplify the notation, we also assume that θ = 0 . Thegeneral case can be reduced to this one by a simple change of variables. Thedensity p ( y ) is supposed to be positive and for y > λ ( y ) = − (2 y ) − log[2 P ( Y > y )] . Equivalently, we can write P ( Y > y ) = e − yλ ( y ) / y ≥ λ ( y ) ≥ λ > λ ( y ) → | y | → ∞ means imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators heavy tails of the distribution P . Below we focus on the most interesting casewhen λ ( y ) is positive and monotonously decreases to zero in y > λ ( y ) is sufficiently regular and itsfirst derivative λ ′ ( y ) is uniformly continuous on IR . The assumption of heavytails implies that [ yλ ( y )] ′ ∈ [0 ,
1] and hence, | yλ ′ ( y ) | = (cid:12)(cid:12) [ yλ ( y )] ′ − λ ( y ) (cid:12)(cid:12) < . Let m ( θ ) def = E | Y − θ | , q ( θ ) def = P ( Y ≤ θ ) − P ( Y > θ ) . Obviously m ′ ( θ ) def = ∂m ( θ ) /∂θ = q ( θ ) . It is also clear that | q ( θ ) | ≤ θ ≥ ℓ ′ ( y, θ, θ ) def = ∂∂y ℓ ( y, θ, θ ) = ( , y / ∈ [0 , θ ] , , otherwise , and ℓ ( y, θ, θ ) = − θ for y < E e µℓ ( Y ,θ,θ ) = − Z e µℓ ( y,θ,θ ) dP ( Y > y )= e − µθ + Z µℓ ′ ( y, θ, θ )e µℓ ( y,θ,θ ) P ( Y > y ) dy = e − µθ + 2 µ Z θ e µ (2 y − θ ) P ( Y > y ) dy = e − µθ + µ e − µθ Z θ e y [ µ − λ ( y )] dy and similarly for θ < θ . We now fix µ ( θ ) = λ ( θ ) . Monotonicity of λ ( y ) implies E e µ ( θ ) ℓ ( Y ,θ,θ ) = e − θλ ( θ ) + λ ( θ )e − θλ ( θ ) Z θ e y [ λ ( θ ) − λ ( y )] dy ≤ (cid:8) θλ ( θ ) (cid:9) e − θλ ( θ ) . Therefore, for θ > m ( θ, θ ) ≥ θλ ( θ ) − log (cid:8) θλ ( θ ) (cid:9) . (4.2)The same low bound holds true for θ < θλ ( θ ) ≤ m ( θ, θ ) ≥ θ λ ( θ ) / . Now we check the condition (3.1). Define ζ ( θ ) def = E ( | Y − θ | − | Y | ) − ( | Y − θ | − | Y | ) . imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators Then, for θ > ∇ ζ ( θ ) = ( Y ≤ θ ) − ( Y > θ ) − q ( θ ) ,E (cid:12)(cid:12) ∇ ζ ( θ ) (cid:12)(cid:12) = 1 − q ( θ ) , Var ζ ( θ ) = Var Z θ ∇ ζ ( θ ) dθ ≤ θ Z θ IE (cid:12)(cid:12) ∇ ζ ( θ ) (cid:12)(cid:12) dθ = θ Z θ (cid:8) − q ( θ ) (cid:9) dθ, and θ − Var ζ ( θ ) = θ − Z θ (cid:8) − q ( θ ) (cid:9) dθ → , θ → ∞ because q ( θ ) → ζ ( θ ) def = λ ( θ ) ζ ( θ ) and ∇ ζ ( θ ) = ∂ζ ( θ ) /∂θ = λ ( θ ) ∇ ζ ( θ ) + θλ ′ ( θ ) ζ ( θ ) /θ. Note that |∇ ζ ( θ ) | ≤ | ζ ( θ ) /θ | ≤ λ ( θ ) → (cid:0) ζ ( θ ) /θ (cid:1) → θ → ∞ , while | θλ ′ ( θ ) | remains bounded by one. Thiseasily implies the condition (3.1) for some fixed δ > , ν ≥ v ( θ ) ≡ IE | Y | γ < ∞ for some γ > ρ = s and Corollary 2.4 lead to thebound for the loss e u = | e θ − θ | : IE exp (cid:8) ρ n (cid:2)e uλ ( e u ) − log { e uλ ( e u ) } (cid:3)(cid:9) ≤ Cρ / (1 − ρ ) / with some fixed constant C provided that n exceeds some minimal sample size n .The case of independent but non i.i.d. observations can be again reduced tothe considered case using P = n − P i =1 P i and defining the point θ as a rootof the equation n X i =1 P i ( Y i < θ ) = n X i =1 P i ( Y i > θ ) . Suppose the observations Y = ( Y , . . . , Y n ) follow the change point model: Y i = A ( i ≤ θ ) + σξ i , i = 1 , . . . , n, (4.3)where ξ i is a standard white Gaussian noise. Our goal is to estimate the changepoint location θ ∈ Θ = { , . . . , n − } . The obtained results can be easilyextended to the case of non-Gaussian errors under some exponential momentconditions. imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators We begin with the case when the amplitude A is known. To estimate θ , weuse the maximum likelihood estimator e θ A = argmax θ ∈ Θ L A ( θ ) , where the maximum likelihood contrast is given by L A ( θ ) = Aσ θ X i =1 Y i − A σ θ = A σ min( θ, θ ) − A θ σ + Aσ θ X i =1 ξ i . Note that L A ( θ ) is a Gaussian random variable for every θ with M ( θ, θ ) def = − IEL A ( θ ) = A σ | θ − θ | ,D ( θ, θ ) def = Var L A ( θ ) = A σ | θ − θ | = 2 M ( θ, θ ) . This yields for any µ ≥ M ( µ, θ, θ ) = µM ( θ, θ ) − µ D ( θ, θ ) / µ − µ ) M ( θ, θ ) , and the corresponding values µ ∗ ( θ ) , M ∗ ( θ, θ ) can be easily computed: µ ∗ ( θ ) = 1 / , M ∗ ( θ, θ ) = M ( θ, θ ) / . Therefore, for ρ < IE exp n ρ A σ | e θ − θ | o ≤ X θ ∈ Θ exp (cid:8) − ρ (1 − ρ )4 M ( θ, θ ) (cid:9) ≤ ∞ X k =0 exp (cid:26) − ρ (1 − ρ ) A σ k (cid:27) = 21 − C ( ρ )where C ( ρ ) = exp {− ρ (1 − ρ ) A / (8 σ ) } . By Lemma 5.7 IE | e θ A − θ | r ≤ C ( r ) (cid:0) σ /A (cid:1) r with some constant C ( r ) .Now we switch to the case when A > L A ( θ ) because it strongly depends on A .To find a reasonable contrast, one can use the maximum likelihood principle.Considering A as a nuisance parameter and maximizing L A ( θ ) w.r.t. A ≥ e θ = argmax θ n max A ≥ L A ( θ ) o = argmax θ σ θ (cid:20) θ X i =1 Y i (cid:21) , imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators where [ x ] + = max( x,
0) . In what follows we deal with a slightly modified versionof this estimator e θ = argmax θ ∈ Θ n L ( θ ) , with a new contrast L ( θ ) = 1 σ √ θ θ X i =1 Y i , which is again a Gaussian one. By the model equation (4.3), this contrast canbe represented in the form: L ( θ ) = 1 √ θ θ X i =1 ξ i + A min( θ, θ ) σ √ θ . It is easy to see that the drift M ( θ, θ ) = − IEL ( θ, θ ) satisfies M ( θ, θ ) = a d ( θ, θ )with a = σ − A √ θ and d ( θ, θ ′ ) = 1 − p min { θ/θ ′ , θ ′ /θ } = ( − p θ/θ ′ , θ ≤ θ ′ , − p θ ′ /θ, θ ≥ θ ′ . Similarly, D ( θ, θ ′ ) def = Var L ( θ, θ ′ ) = 2 | θ ′ − θ | ( √ θ + √ θ ′ ) p max( θ, θ ′ ) = 2 d ( θ, θ ′ )and obviously, M ( θ, θ ) = a D ( θ, θ ) / D ( θ, θ ) ≤ θ . As L ( θ )is a Gaussian contrast, it holds µ ∗ ( θ ) = M ( θ, θ ) D ( θ, θ ) = a , M ∗ ( θ, θ ) = a d ( θ, θ );see Example 1.1. Note that for every θ ∈ Θ , the value M ∗ ( θ, θ ) is boundedby a / A θ / (8 σ ) . So, this example is quite special in the sense that theKullback-Leibler divergence between measures IP θ and IP θ does not grow toinfinity with θ . We will see that this fact results in an extra loglog-factor in thebound for the minimum contrast.For given ǫ > θ ◦ ∈ Θ , the local ball B ( ǫ, θ ◦ ) = { D ( θ, θ ◦ ) ≤ ǫ } canbe represented in the form B ( ǫ, θ ◦ ) = (cid:8) θ : θ ◦ (1 − ǫ / ≤ θ ≤ θ ◦ (1 − ǫ / − (cid:9) . and it can be transformed into the usual symmetric interval around log θ ◦ byusing the parameter log θ instead of θ : B ( ǫ, θ ◦ ) = n θ : (cid:12)(cid:12) log θ − log θ ◦ (cid:12)(cid:12) ≤ − − ǫ / o . imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators This immediately implies that the local entropy Q ( ǫ, θ ◦ ) is bounded by Q = 1for all θ ◦ ∈ Θ .Let the measure π ( · ) assign the mass 1 to any point θ = 1 , . . . , n . Then π (cid:0) B ( ǫ, θ ◦ ) (cid:1) is equal to the number Π ǫ ( θ ) of points θ in B ( ǫ, θ ◦ ) , and itobviously holds Π ǫ ( θ ) ≈ K ( ǫ ) θ with K ( ǫ ) = (1 − ǫ / − − (1 − ǫ / ≥ ǫ for ǫ ≤ ǫ = 1 / M ( θ, θ ) ≥ H ǫ ( ρ, s ) from (2.5) for any s ≤ H ǫ ( ρ, s ) ≤ log n X θ =1 Π ǫ ( θ ) ! ≤ log (cid:0) C log n (cid:1) for some C > IE exp (cid:8) ρ a d ( e θ, θ ) / (cid:9) ≤ C log n. (4.4)Combining this with Lemma 5.7 yields IE (cid:12)(cid:12)(cid:12) A θ σ d ( e θ, θ ) (cid:12)(cid:12)(cid:12) r ≤ C | log log n | r . The extra log log -factor in this bound is due to the unbounded parameterset. In the case “classical” situation when the size A of the jump is boundedaway from zero and infinity and the true “relative” location θ /n is boundedaway from the edge 0 similar calculations (not presented here) lead to a bound IE exp (cid:8) C ρ A | e θ − θ | (cid:9) ≤ C which does not involve any extra log-term; seee.g. Csorg˝o and Horv´ath (1997) and references therein for asymptotic versionsof this result.It is also interesting to compare this result with the accuracy of the maximumlikelihood method in the case, where the magnitude of jump A is known. Onecan see that there is a payment for the adaptation to the nuisance parameter A which is in form of an extra log log -factor. Another observation is that theaccuracy of estimation strongly depends on the true location θ , more precisely,on the value a = A θ /σ . In the “classical” situation this value is of order n leading to the accuracy of order n − log log( n ) . If the value a is smaller in orderthan n , then the accuracy becomes worse by the same factor. In particular, if A θ /σ is of order one, then even consistency of e θ cannot be claimed.
5. Proofs
This section collects proofs of the main theorems and some auxiliary facts.
Assume that θ ◦ ∈ Θ . First we establish a local bound for the maximum of theprocess L ( θ , θ ) over the local ball B ( ǫ, θ ◦ ) = { θ : S ( θ , θ ◦ ) ≤ ǫ } . imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators Proof.
The main step of the proof is a bound for the stochastic component ζ ( θ , θ ♯ ) over the ball B ( ǫ, θ ◦ ) for a fixed θ ♯ ∈ B ( ǫ, θ ◦ ) . Lemma 5.1.
Assume that ζ ( θ ) is a separable process satisfying for any given θ ◦ ∈ Θ the condition ( EL ) . Then for any given θ ♯ ∈ B ( ǫ, θ ◦ ) and any λ with λ/ǫ ≤ λ log IE exp (cid:26) λǫ sup θ ∈ B ( ǫ, θ ◦ ) ζ ( θ , θ ♯ ) (cid:27) ≤ Q ( ǫ, θ ◦ ) + 2 ν λ . Proof.
The proof is based on the standard chaining argument (see e.g. van derVaart and Wellner (1996)). Without loss of generality, we assume that Q ( ǫ, θ ◦ ) < ∞ . Then for any integer k ≥ − k ǫ -net D k ( ǫ, θ ◦ ) in the localball B ( ǫ, θ ◦ ) having the cardinality N (2 − k ǫ, ǫ, θ ◦ ) . Using the nets D k ( ǫ, θ ◦ )with k = 1 , . . . , K − θ in D K ( ǫ, θ ◦ ) and θ ♯ . It means that one can find points θ k ∈ D k ( ǫ, θ ◦ ) ,k = 1 , . . . , K − S ( θ k , θ k − ) ≤ − k +1 ǫ for k = 1 , . . . , K . Here θ K means θ and θ means θ ♯ . Notice that θ k can be constructed recurrently: θ k = τ k ( θ k +1 ) , k = K − , . . . , τ k ( θ ) = argmin θ ′ ∈ D k ( ǫ, θ ◦ ) S ( θ , θ ′ ) . It obviously holds for θ ∈ D K ( ǫ, θ ◦ ) ζ ( θ , θ ♯ ) = K X k =1 ζ ( θ k , θ k − ) . For ξ ( θ k , θ k − ) = ζ ( θ k , θ k − ) / S ( θ k , θ k − ) it holds that ζ ( θ k , θ k − ) = S ( θ k , θ k − ) ξ ( θ k , θ k − ) = 2 ǫ c k ξ ( θ k , θ k − )with c k = c k ( θ , θ ◦ ) = S ( θ k , θ k − ) / (2 ǫ ) ≤ − k , andsup θ ∈ D K ( ǫ, θ ◦ ) ζ ( θ , θ ♯ ) ≤ K X k =1 sup θ ′ ∈ D k ( ǫ, θ ◦ ) ζ ( θ ′ , τ k − ( θ ′ ))= 2 ǫ K X k =1 sup θ ′ ∈ D k ( ǫ, θ ◦ ) c k ξ ( θ ′ , τ k − ( θ ′ )) . imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators Since c k ≤ − k , Lemma 5.6 below and condition ( EL ) implylog IE exp (cid:26) λǫ sup θ ∈ D K ( ǫ, θ ◦ ) ζ ( θ , θ ♯ ) (cid:27) ≤ log IE exp (cid:26) λ K X k =1 sup θ ′ ∈ D k ( ǫ, θ ◦ ) c k ξ ( θ ′ , τ k − ( θ ′ )) (cid:27) ≤ K X k =1 − k log (cid:20) IE exp n sup θ ′ ∈ D k ( ǫ, θ ◦ ) k c k λξ ( θ ′ , τ k − ( θ ′ )) o(cid:21) ≤ K X k =1 − k log (cid:20) X θ ′ ∈ D k ( ǫ, θ ◦ ) IE exp (cid:8) k c k λξ ( θ ′ , τ k − ( θ ′ )) (cid:9)(cid:21) ≤ K X k =1 − k (cid:8) log N (2 − k ǫ, ǫ, θ ◦ ) + 2 ν λ (cid:9) . These inequalities with the separability of ζ ( θ , θ ♯ ) yieldlog IE exp (cid:26) λǫ sup θ ∈ B ( ǫ, θ ◦ ) ζ ( θ , θ ♯ ) (cid:27) = lim K →∞ log IE exp (cid:26) λǫ sup θ ∈ D K ( ǫ, θ ◦ ) ζ ( θ , θ ♯ ) (cid:27) ≤ ∞ X k =1 − k (cid:8) ν λ + log N (2 − k ǫ, ǫ, θ ◦ ) (cid:9) ≤ ν λ + Q ( ǫ, θ ◦ )which completes the proof of the lemma.Now we are prepared to complete the proof of the theorem. Denote θ ♯ = argmax θ ∈ B ( ǫ, θ ◦ ) (cid:8) µ ( θ ) IEL ( θ , θ ) + M ( θ , θ ) (cid:9) . It is clear that sup θ ∈ B ( ǫ, θ ◦ ) n µ ( θ ) L ( θ , θ ) + M ( θ , θ ) o ≤ µ ( θ ♯ ) L ( θ ♯ , θ ) + M ( θ ♯ , θ ) + sup θ ∈ B ( ǫ, θ ◦ ) ζ ( θ , θ ♯ ) . This yields by the H¨older inequality and Lemma 5.1 with λ = ǫρ/ (1 − ρ ) thatlog IE exp n sup θ ∈ B ( ǫ, θ ◦ ) ρ (cid:2) µ ( θ ) L ( θ , θ ) + M ( θ , θ ) (cid:3)o ≤ log IE exp n ρ (cid:2) µ ( θ ♯ ) L ( θ ♯ , θ ) + M ( θ ♯ , θ ) (cid:3) + ρ sup θ ∈ B ( ǫ, θ ◦ ) ζ ( θ , θ ♯ ) o ≤ ρ log IE exp n µ ( θ ♯ ) L ( θ ♯ , θ ) + M ( θ ♯ , θ ) o + (1 − ρ ) log IE exp n ρ − ρ sup θ ∈ B ( ǫ, θ ◦ ) ζ ( θ , θ ♯ ) o ≤ (1 − ρ ) Q ( ǫ, θ ◦ ) + (1 − ρ )2 ν (cid:12)(cid:12)(cid:12)(cid:12) ǫρ − ρ (cid:12)(cid:12)(cid:12)(cid:12) imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators and the result follows. Theorem 2.2 implies a local bound for the process µ ( θ ) L ( θ , θ )+ M ( θ , θ ) overany ball B ( ǫ, θ ◦ ) . To derive a global bound we apply the following general fact: Lemma 5.2.
Let f ( θ ) be a nonnegative function on Θ ⊂ IR p and let for everypoint θ ∈ Θ a vicinity U ( θ ) be fixed such that θ ′ ∈ U ( θ ) implies θ ∈ U ( θ ′ ) .Let also a measure π (cid:0) U ( θ ) (cid:1) of the set U ( θ ) fulfill for every θ ◦ ∈ Θ sup θ ∈ U ( θ ◦ ) π (cid:0) U ( θ ) (cid:1) π (cid:0) U ( θ ◦ ) (cid:1) ≤ ν. (5.1) Then sup θ ∈ Θ f ( θ ) ≤ ν Z Θ f ∗ ( θ ) 1 π (cid:0) U ( θ ) (cid:1) dπ ( θ ) with f ∗ ( θ ) def = sup θ ′ ∈ U ( θ ) f ( θ ′ ) . Proof.
For every θ ◦ ∈ Θ Z Θ f ∗ ( θ ) 1 π (cid:0) U ( θ ) (cid:1) dπ ( θ ) ≥ Z U ( θ ◦ ) f ∗ ( θ ) 1 π (cid:0) U ( θ ) (cid:1) dπ ( θ ) ≥ f ( θ ◦ ) Z U ( θ ◦ ) π (cid:0) U ( θ ) (cid:1) dπ ( θ )because θ ∈ U ( θ ◦ ) implies θ ◦ ∈ U ( θ ) and hence, f ( θ ◦ ) ≤ f ∗ ( θ ) . Now by(5.1) Z Θ f ∗ ( θ ) 1 π (cid:0) U ( θ ) (cid:1) dπ ( θ ) ≥ f ( θ ◦ ) ν Z U ( θ ◦ ) π (cid:0) U ( θ ◦ ) (cid:1) dπ ( θ ) = f ( θ ◦ ) /ν as required.We are going to apply Lemma 5.2 with f ( θ ) = exp (cid:8) ρ (cid:2) µ ( θ ) L ( θ , θ ) + s M ( θ , θ ) (cid:3)(cid:9) . In view of the definition of M ǫ ( θ ◦ , θ ) = min θ ∈ B ( ǫ, θ ◦ ) M ( θ , θ ) it follows fromthe local bound of Theorem 5.1 thatlog IE exp n sup θ ∈ B ( ǫ, θ ◦ ) ρ (cid:2) µ ( θ ) L ( θ , θ ) + s M ( θ , θ ) (cid:3)o ≤ − ρ (1 − s ) M ǫ ( θ ◦ , θ ) + (1 − ρ ) Q ( ǫ, θ ◦ ) + 2 ν ǫ ρ − ρ . and the theorem follows directly from Lemma 5.2. imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators Below by C p we denote a generic constant (not necessarily the same) whichonly depends on the dimensionality p . First we show that the differentiabilitycondition ( ED ) implies the local moment condition ( EL ) . Lemma 5.3.
Assume that ( ED ) holds with some ν and λ . Then for any θ , θ ′ ∈ Θ and any λ with | λ | ≤ λ , log IE exp (cid:26) λ ζ ( θ , θ ′ ) S ( θ , θ ′ ) (cid:27) ≤ ν λ . (5.2) Proof.
For θ , θ ′ ∈ Θ , denote u = θ ′ − θ . With these notations L ( θ , θ ′ ) = u ⊤ Z ∇ L ( θ + t u ) dt. Similar expressions hold for
IEL ( θ , θ ′ ) and for ζ ( θ , θ ′ ) = L ( θ , θ ′ ) − IEL ( θ , θ ′ ) : ζ ( θ , θ ′ ) = u ⊤ Z ∇ ζ ( θ + t u ) dt. The definition of S ( θ , θ ′ ) implies for any t ∈ [0 , c ( t ) def = p u ⊤ V ( θ + t u ) u S ( θ , θ ′ ) ≤ , and therefore Lemma 5.6 and (2.10) with γ = u / k u k yieldlog IE exp (cid:26) λ ζ ( θ , θ ′ ) S ( θ , θ ′ ) (cid:27) = log IE exp (cid:26) λ Z c ( t ) γ ⊤ ∇ ζ ( θ + t u ) p γ ⊤ V ( θ + t u ) γ dt (cid:27) ≤ Z c ( t ) log IE exp (cid:26) λ γ ⊤ ∇ ζ ( θ + t u ) p γ ⊤ V ( θ + t u ) γ (cid:27) dt ≤ ν λ as required.Due to the next lemma, the smoothness of the contrast implies that the topol-ogy induced by the metric S ( · , · ) is locally equivalent to the Euclidean topologyand computing the local entropy Q ( ǫ, · ) can be reduced to the Euclidean case.Recall the notation B ′ ( ǫ, θ ◦ ) = n θ : ( θ − θ ◦ ) ⊤ V ( θ ◦ ) ( θ − θ ◦ ) ≤ ǫ o . The definition of B ( ǫ, θ ) implies that B ( ǫ, θ ◦ ) ⊆ B ′ ( ǫ, θ ◦ ) . imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators Lemma 5.4.
Assume ( ED ) with some λ , and let, for some fixed ν ≥ , ǫ > A ǫ V ( θ ) ≤ ν , θ ∈ Θ. (5.3) Then • ( EL ) is fulfilled for λ ≤ λ , i.e. (5.2) holds for all λ ≤ λ . • sup θ ∈ Θ Q ( ǫ, θ ) ≤ Q p + p log( ν ) , where Q p is the entropy of the unit ballin IR p in the Euclidean topology.Proof. The first claim is an immediate corollary of Lemma 5.3. Fix any θ ◦ ∈ Θ .Linear transformation with the matrix V − ( θ ◦ ) reduces the situation to thecase when V ( θ ◦ ) ≡ I and B ′ ( ǫ, θ ◦ ) is a usual Euclidean ball for any ǫ ≤ ǫ .Moreover, by (5.3), each elliptic set B ′ ( ǫ , θ ) for θ ∈ B ( ǫ, θ ◦ ) is nearly anEuclidean ball in the sense that the ratio of its largest and smallest axes (whichis the ratio of the largest and smallest eigenvalues of V − ( θ ◦ ) V ( θ ) V − ( θ ◦ ) ) isbounded by ν . Therefore, for any ǫ ≤ ǫ , a Euclidean net D e ( ǫ /ν ) with thestep ǫ /ν ensures a covering of B ( ǫ, θ ◦ ) by the sets B ( ǫ , θ ◦ ) , θ ◦ ∈ D e ( ǫ ) .Therefore, the corresponding covering number is bounded by ( ν ǫ/ǫ ) p yieldingthe claimed bound for the local entropy.Now we are ready to proceed with the proof of Theorem 2.8. We make useof the following technical result which helps to bound the global supremum ofa random function over an integral of local maxima.Consider the ellipsoid B ′ ( ǫ, θ ◦ ) = { θ : ( θ − θ ◦ ) ⊤ V ( θ ◦ ) ( θ − θ ◦ ) ≤ ǫ } . ItsLebesgue measure fulfills π ( B ′ ( ǫ, θ ◦ )) = ω p ǫ p (cid:14)p det { V ( θ ◦ ) } where ω p is thevolume of the unit ball in IR p . Condition (2.12) implies (5.1) with ν = ν p for π ( U ( θ )) = π ( B ′ ( ǫ, θ )) and the Lebesgue measure π . Now the result followsfrom Theorem 2.3. We start with some technical lemmas.
Lemma 5.5.
Suppose that for some r > , there are a positive matrix v anda constant a r > such that v ( θ ) ≤ v , m ( θ , θ ) ≥ a r ( θ − θ ) ⊤ v ( θ − θ ) , θ ∈ A ( r, θ ) (5.4) Then for any η > Z A ( r, θ ) q det (cid:8) nv ( θ ) (cid:9) exp (cid:8) − η n m ǫ ( θ , θ ) (cid:9) d θ ≤ a − pr (cid:0) ω p ǫ p + | π/η | p/ (cid:1) . Proof.
The conditions of the lemma imply that for θ ∈ A ( r, θ ) p n m ǫ ( θ , θ ) ≥ (cid:2) √ n a r k v / ( θ − θ ) k − ǫ (cid:3) + . imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators Changing the variable θ by u = (cid:0) n a r (cid:1) / v / ( θ − θ ) , yields in view of (5.4)that Z A ( r, θ ) exp n − η n m ǫ ( θ , θ ) oq det (cid:8) nv ( θ ) (cid:9) d θ ≤ a pr Z k u k≤ ǫ d u + Z IR p exp (cid:8) − η k u k (cid:9) d u ! ≤ a − pr (cid:0) ω p ǫ p + | π/η | p/ (cid:1) as required.Next we bound the part of the integral H ǫ ( ρ, s ) over the complement of A ( r, θ ) . Namely, we aim to show that Z Θ \ A ( r, θ ) q det (cid:8) nv ( θ ) (cid:9) exp n − ρ (1 − s ) n m ǫ ( θ , θ ) o d θ ≤ C r ( β )e − b r ( n ) . (5.5)Under (5.4), it obviuosly holds for θ ∈ Θ \ A ( r, θ ) that m ǫ ( θ , θ ) ≥ r − a − r ǫ/n and ρ (1 − s ) n m ǫ ( θ , θ ) ≥ β m ǫ ( θ , θ ) + { ρ (1 − s ) n − β } ( r − a − r ǫ/n ) ≥ β m ǫ ( θ , θ ) + b r ( n ) + ( p/
2) log n and (5.5) follows by det (cid:8) nv ( θ ) (cid:9) = n p det (cid:8) v ( θ ) (cid:9) .Lemma 5.5 with η = ρ (1 − s ) , (5.5), and b r ( n ) ≤ H ǫ ( ρ, s ) ≤ a − pr (cid:16) ω − p π p/ | ǫ ρ (1 − s ) | p/ (cid:17) + C r ( β ) / ( ω p ǫ p ) . To finalize the proof, we apply Theorem 3.1 with ǫ defined by the equation ǫ = (1 − ρ ) /ρ .log Q ( ρ, s ) ≤ (1 − ρ ) Q p + 2 ν ρ + 2 p log( ν )+ log (cid:16) ω − p π p a − pr | (1 − ρ )(1 − s ) | p/ + ω − p C r ( β ) ρ p/ (1 − s ) p/ (cid:17) ≤ Cp + p (cid:0) | (1 − ρ )(1 − s ) | − (cid:1) where C is a constant whose value depends on a r , ν , ν , and C r ( β ) . It isalso used that Q p ≤ Cp and log ω − p ≤ Cp . Lemma 5.6.
For any r.v.’s ξ k and any nonnegative λ k such that Λ = P k λ k ≤ IE exp (cid:18)X k λ k ξ k (cid:19) ≤ X k λ k log IE e ξ k . (5.6) imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators Proof.
Convexity of e x and concavity of x Λ imply IE exp (cid:26) ΛΛ X k λ k (cid:0) ξ k − log IE e ξ k (cid:1)(cid:27) ≤ IE Λ exp (cid:26) Λ X k λ k (cid:0) ξ k − log IE e ξ k (cid:1)(cid:27) ≤ (cid:26) Λ X k λ k IE exp (cid:0) ξ k − log IE e ξ k (cid:1)(cid:27) Λ = 1 . Lemma 5.7.
Let ξ be a nonnegative random variable and ϕ ( λ ) = log IE exp (cid:0) λξ (cid:1) . Then for any r > (cid:0) IEξ r (cid:1) /r ≤ inf λ : ϕ ( λ ) ≥ r λ − ϕ ( λ ) . (5.7) In particular, if ϕ ( λ ) ≤ a + σ λ for some a, σ ≥ , then (cid:0) IEξ r (cid:1) /r ≤ σ p max { a, r/ } . (5.8) Proof.
Consider the following function f ( x ) = ( log r ( x ) for x ≥ e r ,xr r /e r for x ≤ e r . A simple algebra reveals that for x > e r f ′ ( x ) = rx − log r − ( x ) ,f ′′ ( x ) = r ( r − x − log r − ( x ) − rx − log r − ( x )= rx − (cid:2) r − − log( x ) (cid:3) log r − ( x ) < . Since the function f ( x ) is linear for x ≤ e r , it is concave for all x ≥ x )] r + ≤ f ( x ) , because for x ≤ e r , the function f ( x ) coincides with the tangent of log r ( x ) at x = e r . Therefore, x r = λ − r log r (cid:0) e λx (cid:1) ≤ λ − r f ( e λx )and the Jensen inequality implies for any λ ≥ IEξ r ≤ λ − r IEf ( e λξ ) ≤ λ − r f (cid:0) IEe λξ (cid:1) = λ − r f (cid:0) e ϕ ( λ ) (cid:1) . (5.9)If ϕ ( λ ) ≥ r , then f (cid:0) e ϕ ( λ ) (cid:1) = log r (cid:0) e ϕ ( λ ) (cid:1) = ϕ r ( λ ) and (5.7) follows from (5.9).To prove (5.8), it remains to notice that the monotonicity of f ( · ) implies, inview of (5.9), that( IEξ r ) /r ≤ inf λ : a + σ λ ≥ r (cid:26) aλ + σ λ (cid:27) = (cid:26) σr ( r − a ) − / , a < r/ σ √ a, a ≥ r/ ≤ (cid:26) σ p r/ , a < r/ σ √ a, a ≥ r/ ≤ σ p max { a, r/ } . imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators Lemma 5.8.
Let a r.v. ξ fulfill IEξ = 0 , IEξ = 1 and IE exp( λ | ξ | ) = κ < ∞ for some λ > . Then for any ρ < there is a constant C depending on κ , λ and ρ only such that for λ < ρλ log IE e λξ ≤ C λ / . Moreover, there is a constant λ > such that for all λ ≤ λ log IE e λξ ≥ ρλ / . Proof.
Define h ( x ) = ( λ − λ ) x + m log( x ) for m ≥ λ < λ . It is easyto see by a simple algebra thatmax x ≥ h ( x ) = − m + m log mλ − λ . Therefore for any x ≥ λx + m log( x ) ≤ λ x + log (cid:18) m e( λ − λ ) (cid:19) m . This implies for all λ < λ IE | ξ | m exp( λ | ξ | ) ≤ (cid:18) m e( λ − λ ) (cid:19) m IE exp( λ | ξ | ) . Suppose now that for some λ > IE exp( λ | ξ | ) = κ ( λ ) < ∞ . Thenthe function h ( λ ) = IE exp( λξ ) fulfills h (0) = 1 , h ′ (0) = IEξ = 0 , h ′′ (0) = 1and for λ < λ , h ′′ ( λ ) = IEξ e λξ ≤ IEξ e λ | ξ | ≤ λ − λ ) IE exp( λ | ξ | ) . This implies by the Taylor expansion for λ < ρλ that h ( λ ) ≤ C λ / C = κ ( λ ) / (cid:8) λ (1 − ρ ) (cid:9) , and hence, log h ( λ ) ≤ C λ / References
Bahadur, R. (1960). On the asymptotic efficiency of tests and estimates.
Sankhya .Birg´e, L. (2006). Model selection via testing: an alternative to (penalized) max-imum likelihood estimators.
Annales de l’Institut Henri Poincare (B) Proba-bility and Statistics .Birg´e, L. and Massart, P. (1993). Rates of convergence for minimum contrastestimators.
Probab. Theory Relat. Fields , 97(1-2):113–150.Birg´e, L. and Massart, P. (1998). Minimum contrast estimators on sieves: Ex-ponential bounds and rates of convergence.
Bernoulli , 4(3):329–375. imsart-ejs ver. 2008/08/29 file: ejs_2009_352.tex date: October 21, 2018 olubev, yu. and spokoiny, v. /exponential bounds for minimum contrast estimators Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a hypothesisbased on the sum of observations.
Ann. Math. Stat. , 23:493–507.Csorg˝o, M. and Horv´ath, L. (1997).
Limit theorems in change-point analysis.
Chichester: John Wiley & Sons.Field, C. (1982). Small sample asymptotic expansions for multivariate M -estimates. Ann. Statist. , 10(3):672–689.Field, C. and Ronchetti, E. (1990).
Small sample asymptotics . Institute ofMathematical Statistics Lecture Notes—Monograph Series, 13. Institute ofMathematical Statistics, Hayward, CA.Huber, P. J. (1967). The behavior of maximum likelihood estimates under non-standard conditions. In
Proc. Fifth Berkeley Sympos. Math. Statist. and Prob-ability (Berkeley, Calif., 1965/66), Vol. I: Statistics , pages 221–233. Univ.California Press, Berkeley, Calif.Huber, P. J. (1981).
Robust statistics . John Wiley & Sons Inc., New York. WileySeries in Probability and Mathematical Statistics.Ibragimov, I. and Khas’minskij, R. (1981).
Statistical estimation. Asymptotictheory. Transl. from the Russian by Samuel Kotz.
New York - Heidelberg-Berlin: Springer-Verlag .Jensen, J. L. and Wood, A. T. (1998). Large deviation and other results forminimum contrast estimators.
Ann. Inst. Stat. Math. , 50(4):673–695.Koenker, R. (2005).
Quantile regression.
Cambridge University Press.Koenker, R. and Xiao, Z. (2006). Quantile autoregression.
J. Am. Stat. Assoc. ,101(475):980–990.Sieders, A. and Dzhaparidze, K. (1987). A large deviation result for parameterestimators and its application to nonlinear regression analysis.
Ann. Stat. ,15(3):1031–1049.Van de Geer, S. (1993). Hellinger-consistency of certain nonparametric maxi-mum likelihood estimators.
Ann. Stat. , 21(1):14–44.van der Vaart, A. and Wellner, J. A. (1996).
Weak convergence and empiricalprocesses. With applications to statistics.