[PDF] Efficiency requires innovation

Abstract

In estimation a parameter θ∈R from a sample ( x 1 ,…, x n ) from a population P θ a simple way of incorporating a new observation x n+1 into an estimator θ ~ n = θ ~ n ( x 1 ,…, x n ) is transforming θ ~ n to what we call the {\it jackknife extension} θ ~ (e) n+1 = θ ~ (e) n+1 ( x 1 ,…, x n , x n+1 ) , θ ~ (e) n+1 ={ θ ~ n ( x 1 ,…, x n )+ θ ~ n ( x n+1 , x 2 ,…, x n )+…+ θ ~ n ( x 1 ,…, x n−1 , x n+1 )}/(n+1). Though θ ~ (e) n+1 lacks an innovation the statistician could expect from a larger data set, it is still better than θ ~ n , var( θ ~ (e) n+1 )≤ n n+1 var( θ ~ n ). However, an estimator obtained by jackknife extension for all n is asymptotically efficient only for samples from exponential families. For a general P θ , asymptotically efficient estimators require innovation when a new observation is added to the data. Some examples illustrate the concept.

Full PDF

aa r X i v : . [ m a t h . S T ] F e b Eﬃciency requires innovation

Abram KaganDepartment of Mathematics, University of MarylandCollege Park, MD 20742, USA

Abstract

In estimation a parameter θ ∈ R from a sample ( x , . . . , x n ) from a pop-ulation P θ a simple way of incorporating a new observation x n +1 into anestimator ˜ θ n = ˜ θ n ( x , . . . , x n ) is transforming ˜ θ n to what we call the jack-knife extension ˜ θ ( e ) n +1 = ˜ θ ( e ) n +1 ( x , . . . , x n , x n +1 ),˜ θ ( e ) n +1 = { ˜ θ n ( x , . . . , x n )+˜ θ n ( x n +1 , x , . . . , x n )+ . . . +˜ θ n ( x , . . . , x n − , x n +1 ) } / ( n +1) . Though ˜ θ ( e ) n +1 lacks an innovation the statistician could expect from a largerdata set, it is still better than ˜ θ n ,var(˜ θ ( e ) n +1 ) ≤ nn + 1 var(˜ θ n ) . However, an estimator obtained by jackknife extension for all n is asymp-totically eﬃcient only for samples from exponential families. For a general P θ , asymptotically eﬃcient estimators require innovation when a new obser-vation is added to the data.Some examples illustrate the concept.1 Introduction

Let ˜ θ n = ˜ θ n ( x , . . . , x n ) be an estimator of θ based on sample of size n froma population P θ with θ ∈ R as a parameter. If another observation x n +1 isadded to the data, a simple way of incorporating it in the existing estimatoris by what we call the jackknife extension ,˜ θ ( e ) n +1 = ˜ θ ( e ) n +1 ( x , . . . , x n , x n +1 ) = (˜ θ n, + . . . + ˜ θ n,n +1 ) / ( n + 1) (1)where˜ θ n,i = ˜ θ n ( x , . . . , x i − , x i +1 , . . . , x n ) , i = 2 , . . . , n, ˜ θ n,n +1 = ˜ θ n ( x , . . . , x n , x n ) . Plainly, E θ (˜ θ ( e ) n +1 ) = E θ (˜ θ n ) and if ˜ θ n is symmetric in its arguments (as isusually the case) the jackknife extension is symmetric in x , . . . , x n , x n +1 .If var θ (˜ θ n ) < ∞ , then not only var θ (˜ θ ( e ) n +1 ) < var θ (˜ θ n ) but a stronger in-equality holds: ( n + 1)var θ (˜ θ ( e ) n +1 ) ≤ n var θ (˜ θ n ) . (2)The inequality (2) is a direct corollary of a special case of the so called variance drop lemma due to (Artstein et al. ,2004). Lemma 1

Let X , . . . , X n , X n +1 be independent identically distributed ran-dom variables and ψ ( X , . . . , X n ) a function with E ( | ψ ( X , . . . , X n +1 ) | ) < ∞ . Set ψ = ψ ( X , . . . , X n +1 ) , ψ i = ψ ( X , . . . , X i − , X i +1 , . . . , X n +1 ) , i = 2 , . . . , n +1 . Then var( n +1 X ψ i ) ≤ n n +1 X var( ψ i ) . (3)Note that with n + 1 instead of n on the right hand side of (5), the inequalitybecomes a trivial corollary of( n +1 X a i ) ≤ ( n + 1) n +1 X a i holding for any numbers a , . . . , a n +1 .For an extension of the variance drop lemma see (Madiman an Barron,2007). 2uppose that starting with n = m and ˜ θ m ( x , . . . , x m ), the statistician con-structs the jackknife extension ˜ θ ( e ) m +1 of ˜ θ m ( x , . . . , x m ), then the jackknifeextension ˜ θ ( e ) m +2 of ˜ θ ( e ) m +1 and so on. One can easily see that for n ≥ m theestimator ˜ θ ( e ) n ( x , . . . , x n ) thus obtained is a classical U -statistic with thekernel ˜ θ m ( x , . . . , x m ):˜ θ ( e ) n ( x , . . . , x n ) = 1 (cid:0) nm (cid:1) X ≤ i ≤ ... ≤ i m ≤ n ˜ θ m ( x i , . . . , x i m ) . (4)Hoeﬀding initiated studying U -statistics back in 1948. The variance of ˜ θ ( e ) n can be explicitly expressed in terms of ˜ θ m . Set˜ θ m | k ( x , . . . , x k ) = E θ { ˜ θ m ( X , . . . , X m ) | X = x , . . . , X k = x k } . The following formula due to Hoeﬀding (1948) expresses var(˜ θ ( e ) n ) via v k ( θ ) =var(˜ θ m | k ( X , . . . , X k )) , k = 1 , . . . , m :var θ (˜ θ ( e ) n ) = 1 (cid:0) nm (cid:1) m X k =1 (cid:18) mk (cid:19)(cid:18) n − mm − k (cid:19) v k ( θ ) . (5) Assume that the distributions P θ are given by diﬀerentiable in θ density(with respect to a measure µ ) p ( x ; θ ) with the Fisher information I ( θ ) = Z ( ∂ log p ( x ; θ ) ∂θ ) p ( x ; θ ) dµ ( x )well deﬁned and ﬁnite.If E θ (˜ θ m ) = γ ( θ ), by Cram´er-Rao inequalityvar θ (˜ θ m ) ≥ | γ ′ ( θ ) | mI ( θ ) . In particular, if ˜ θ m is an unbiased estimator of θ , var θ (˜ θ m ) ≥ mI ( θ ) .Furthermore, if var θ (˜ θ m ) < ∞ , then var θ (˜ θ ( e ) n ) < ∞ for all n > m and thefollowing lemma holds. Lemma 2 (Hoeﬀding 1948). As n → ∞ , √ n (˜ θ ( e ) n − γ ( θ )) is asymptoticallynormal N (0 , m v ( θ )) . γ n of γ ( θ ) basedon a sample ( x , . . . , x n ) from a population with Fisher information I ( θ ),var θ ( γ n ) ≥ | γ ′ ( θ ) | nI ( θ ) . (6)Combining (6) with Lemma 2 leads to a formula for the asymptotic eﬃciencyof ˜ θ ( e ) n : aseﬀ(˜ θ ( e ) n ) = | γ ′ ( θ ) | /I ( θ ) m v ( θ ) . (7) Lemma 3

Let X be a random element, X ∼ p ( x ; θ ) with ﬁnite Fisher in-formation I ( θ ) . If h ( X ) is a (scalar valued) function with E θ ( h ( X )) = µ ( θ ) diﬀerebtiable and var θ ( h ( X ) = σ ( θ ) < ∞ , then I ( θ ) ≥ | µ ′ ( θ ) | σ ( θ ) . (8) Proof . Take the projection of the Fisher score J ( X ; θ ) = ( p ′ ( x ; θ ) /p ( x ; θ )into the subspace span1 , h ( X ) of the Hilbert space of functions with g ( X )with E θ ( | g ( X ) | ) < ∞ :ˆ E θ { ( J ( X ; θ ) | , h ( X ) } = ˆ J ( X ; θ ) = a ( θ )( h ( X ) − µ ( θ )) . (9)Multiplying both sides by h ( X ) − µ ( θ ) and taking the expectations resultsin a ( θ ) = µ ′ ( θ ) /σ ( θ ) due to the property E θ ( J ( X ; θ ) h ( X )) = µ ′ ( θ ) of theFisher score. Hence I ( θ ) = var θ ( J ( X ; θ )) ≥ var θ ( ˆ J ( X ; θ )) = | µ ′ ( θ ) | σ ( θ )which is exactly (8). The equality sign in (8) is attained if and only if with P θ -probability one the relation p ′ ( x ; θ ) p ( x ; θ ) = a ( θ )( h ( x ) − γ ( θ )) (10)holds for a ( θ ).From E θ (˜ θ m | ) = γ ( θ ) , v = var(˜ θ m | ) and (8) one getsaseﬀ(˜ θ ( e ) n ) ≤ /m . (11)4hus, a necessary condition for the asymptotic eﬃciency of ˜ θ ( e ) n is m = 1and by virtue of (4)˜ θ ( e ) n ( x , . . . , x n ) = ( h ( x ) + . . . + h ( x n )) /n (12)for some h ( x ) with E θ { h ( X ) } = γ ( θ ).From Lemma 3 the estimator (12) is an asymptotically eﬃcient estimatorof γ ( θ ) if and only if the relation (10) holds implying that the family isexponential, p ( x ; θ ) = exp { A ( θ ) h ( x ) + B ( θ ) + g ( x ) } (13)where the functions in the exponent are such that E θ ( h ( X )) = γ ( θ ).From (8) one can see that that the maximum likelihood equation for θ basedon a sample ( x , . . . , x n ) from population (13) is( h ( x ) + . . . + h ( x n )) /n = γ ( θ ) (14)and ˜ θ ( e ) n ( x , . . . , x n ) = ( h ( x ) + . . . + h ( x n )) /n as the maximum likelihood estimator of γ ( θ ) is asymptotically eﬃcient.We summarize the above as a theorem. Theorem 1

Under the regularity type conditions of the theory of maximumlikelihood estimators, the jackknife extension estimators are asymptoticallyeﬃcient if and only if they are arithmetic means based on samples fromexponential families.

The jackknife extension lacks innovation. A jackknife extension estimatorbased on ( x , . . . , x n +1 ) diﬀers from the estimator based on ( x , . . . , x n ) onlyby the sample sample size. In a sense, it is an extensive vs. intensive use ofthe data when the main factor is quantity vs. quality.Nonparametric estimators of population characteristics such as the empir-ical distribution function, the sample mean and variance are jackknife ex-tensions. Their main goal is to be universal rather than optimal for in-dividual populations. An interesting statistic is the sample median ˜ µ n =˜ µ n ( x , . . . , x n ) constructed from a sample from a continuous population.Without loss in generality, one may assume x < . . . < x n . n = 2 m + 1 , ˜ µ n = x m +1 . If x n +1 < x or x n +1 > x n , one can easily seethat ˜ µ ( e ) n +1 = ( x ′ m +1 + x ′ m +2 ) / x ′ m +1 and x ′ m +2 are the ( m + 1)st and ( m + 2)nd elements of thesample ( x , . . . , x n +1 ). Thus, the median of a sample of an even size is ajackknife extension though one should keep in mind that the deﬁnitions ofthe median in samples of even and odd size are diﬀerent and it is not clearif the inequality (2) holds.For n = 2 m the jackknife extension of ˜ µ n = ( x m + x m +1 ) / x ′ m +1 .Let us start with simple cases of m = 2 and m = 3. In the ﬁrst case,˜ µ ( e )5 = 1 . x ′ + 2 x ′ + 1 . x ′ x ′ and its nearest neighbors. The same holds in thesecond case, with x ′ instead of x ′ and diﬀerent weights:˜ µ ( e )7 = 2 x ′ + 3 x ′ + 2 x ′ . It seems likely that the extrapolation to an arbitrary n = 2 m will result in˜ µ ( e ) n +1 = (( m + 1) / x ′ m + mx ′ m +1 + (( m + 1) / x ′ m +2 n + 1 . (16)Though (16) is a reasonable estimator of the median, it is not clear how itbehaves for n = 2 m in small and large samples compared to the standard˜ µ n +1 = x ′ m +1 . Artstein, S., Ball, K. M., Barthe, F., Naor, A. (2004). Solution of Shannonsproblem on the monotonicity of entropy.

J. Amer. Math. Soc.,

Ann. Math. Stat.,

J. Math. Sci. ,

2, 202-214.Madiman, M., Barron, A. (2007). Generalized Entropy Power Inequali-ties and Monotonicity Properties of Information.

IEEE Transactions onInformation Theory,53,