[PDF] A Note on Taylor's Expansion and Mean Value Theorem With Respect to a Random Variable

Abstract

We introduce a stochastic version of Taylor's expansion and Mean Value Theorem, originally proved by Aliprantis and Border (1999), and extend them to a multivariate case. For a univariate case, the theorem asserts that "suppose a real-valued function f has a continuous derivative f' on a closed interval I and X is a random variable on a probability space (\Omega, \mathcal{F}, P). Fix a \in I, there exists a \textit{random variable} \xi such that \xi(\omega) \in I for every \omega \in \Omega and f(X(\omega)) = f(a) + f'(\xi(\omega))(X(\omega) - a)." The proof is not trivial. By applying these results in statistics, one may simplify some details in the proofs of the Delta method or the asymptotic properties for a maximum likelihood estimator. In particular, when mentioning "there exists \theta ^ * between \hat{\theta} (a maximum likelihood estimator) and \theta_0 (the true value)", a stochastic version of Mean Value Theorem guarantees \theta ^ * is a random variable (or a random vector).

Full PDF

aa r X i v : . [ s t a t . O T ] F e b A Note on Taylor’s Expansion and Mean Value Theorem WithRespect to a Random Variable

Yifan Yang *1 and Xiaoyu Zhou †21,2 Department of Mathematics, University of Maryland, College Park, MD, 20742, U.S.A.

Abstract

We introduce a stochastic version of Taylor’s expansion and Mean Value Theorem, originallyproved by Aliprantis and Border (1999), and extend them to a multivariate case. For a univariatecase, the theorem asserts that “suppose a real-valued function f has a continuous derivative f ′ on aclosed interval I and X is a random variable on a probability space (Ω , F , P ) . Fix a ∈ I , there existsa random variable ξ such that ξ ( ω ) ∈ I for every ω ∈ Ω and f ( X ( ω )) = f ( a ) + f ′ ( ξ ( ω ))( X ( ω ) − a ) .” The proof is not trivial. By applying these results in statistics, one may simplify some details inthe proofs of the Delta method or the asymptotic properties for the maximum likelihood estimator.In particular, when mentioning “there exists θ ∗ between ˆ θ (a maximum likelihood estimator) and θ (the true value)”, a stochastic version of Mean Value Theorem guarantees θ ∗ is a random variable(or a random vector). Keywords. Taylor’s expansion, Mean Value Theorem, Measurable function

Taylor’s expansion and Mean Value Theorem (MVT) are widely used in the statistical literature. Inparticular, they are powerful tools to prove the Delta method and the asymptotic normality for maximumlikelihood (ML) estimators. For example, let log L denote the log likelihood function with respect tounknown parameter θ , let ˆ θ denote an ML estimator for the true value θ . Under the regularity conditions,to explore the asymptotic behaviors of ˆ θ , Kendall and Stuart (1961) obtained the following result byapplying Taylor’s expansion (or MVT) to ∂ log L/∂θ , * [email protected] † [email protected] log L∂θ (cid:12)(cid:12)(cid:12)(cid:12) ˆ θ = ∂ log L∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ + (cid:16) ˆ θ − θ (cid:17) ∂ log L∂θ (cid:12)(cid:12)(cid:12)(cid:12) θ ∗ , (1.1)where θ ∗ is some value between ˆ θ and θ .A technical question arises from θ ∗ due to the vague meaning of “value”. In other words, is θ ∗ anumber or a random variable? To answer this question clearly, we need to check how we apply Taylor’sexpansion (or MVT).One may view ˆ θ as a real number or vector, i.e. ˆ θ ( x ) ∈ R p for p ≥ , where x is envisioned asrealization of a random sample X , then apply Taylor’s expansion to ∂ log L/∂θ around θ and obtaina real number θ ∗ between θ and ˆ θ ( x ) satisfying formula (1.1). However, in the discussion related tothe asymptotic properties of ˆ θ , it might be more precise to view ˆ θ as an ML estimator instead of an MLestimate, i.e., ˆ θ ≡ ˆ θ ( X ) , a statistic (random variable) based on X . Therefore, it is meaningful to discusswhether ˆ θ has an asymptotic normal distribution.Unfortunately, the classical Taylor’s expansion (or MVT) cannot be applied to a function with respectto a random variable directly. In other words, we require a version of Taylor’s expansion (or MVT) asfollows: suppose a real-valued function f has a continuous derivative f ′ on a closed interval I and X is arandom variable on a probability space (Ω , F , P ) . Fix a ∈ I , there exists a random variable ξ such that ξ ( ω ) ∈ I for every ω ∈ Ω and f ( X ) = f ( a ) + f ′ ( ξ )( X − a ) . This statement might not be trivial. For example, for every ω ∈ Ω , X ( ω ) is a real number, thenapply the classical Taylor’s expansion (or MVT) to f , there exists (probably not unique) ξ ( ω ) between a and X ( ω ) satisfying the equation f ( X ( ω )) = f ( a ) + f ′ ( ξ ( ω ))( X ( ω ) − a ) . Suppose ξ ( ω ) is uniquefor every ω . Running over ω ∈ Ω , we obtain a map ξ : Ω → R , ω ξ ( ω ) . But we cannot concludethat this map ξ is obviously measurable. Moreover, in a general case, ξ ( ω ) might not be unique for some ω ∈ Ω . Both of them (uniqueness of ξ ( ω ) and measurability of ξ ) make the statement not trivial.Is it worth considering this question? One may avoid mentioning θ ∗ by using Taylor’s expansionwith Peano reminder or with integral form and it is totally correct. But if this question could be ﬁxed,many proofs in statistical monographs would make sense and will be much easier to be understood, in-cluding Kendall and Stuart (1961, (18.31), p.44; (18.44), p.49; (18.60, p.55)), Bhattacharya et al. (2016,p.57; Proof of Theorem 6.1, (6.55), p.131; Proof of Theorem 7.2, (7.8), p.168; Proof of Theorem 7.3,(7.31), p.173; Proof of Theorem 7.8, (7.130), p.192), Bickel and Doksum (2015, Proof of Theorem2.3.1, (5.3.2), p.307; Proof of A.14.17 Corollary, p.468), Cox and Hinkley (1974, (16), p.294; p.313),Serﬂing (2002, Proof of Theorem C, p.82; Proof of Lemma B, p.153), van der Vaart (2000, p.51; Proofof Theorem 5.41, p.68; Proof of Theorem 5.42, p.69; p.134; p.230; Proof of Theorem 23.5, p.331),Lehmann (2004, (4.2.13), p.233) Kosorok (2007, p.360; p.416; (21.28), p.421), Jiang (2010, (3.25),p.74; p.88; p.113; p.119; p.416), van der Vaart and Wellner (1996, p.313), Gin´e and Nickl (2016, Proofof Lemma 7.2.13, p.560; Proof of Proposition 7.2.19, p.567).In the rest of this note, we will recall the classical Taylor’s expansion with Lagrange remainder andMVT in Section 2. A stochastic version of Taylor’s theorem will be introduced and it will be extendedfor a multivariate case in Section 3. More discussions are present in Section 4. Classical statements of Taylor’s theorem and MVT can be found in the standard materials of mathemat-ical analysis. For example, see Theorem 5.10, Theorem 5.15 and Problem 30 of Rudin (1976, p.108;p.110; p.243) or see Theorem 4.2 and Taylor’s Formula of Lang (1993, p.341; p.349). We restate themclearly as follows.

Theorem 1 (deterministic Taylor’s theorems with Lagrange remainder) . Let n denote a positive integer.(1) (univariate case) Suppose f ( x ) is a real-valued function that possesses a continuous ( n − -thderivative, denoted by f ( n − ( x ) , on a closed interval I ⊂ R , and f ( n ) ( x ) exists for every interior point x of I . Fix a ∈ I , and suppose h ∈ R such that points { p ( t ) ≡ a + th : t ∈ [0 , } ⊂ I . Then, thereexists < θ < such that f ( a + h ) = f ( a ) + n − X k =1 f ( k ) ( a ) k ! h k + f ( n ) ( a + θh ) n ! h n . (2) (multivariate case) Suppose f ( x ) is a real-valued function that possesses a continuous n -thorder derivative on an open convex set E ⊂ R p . Fix a ∈ E , and suppose h = ( h , h , . . . , h p ) ⊤ ∈ R p is so close to that the points p ( t ) ≡ a + t h lie in E whenever ≤ t ≤ , i.e. { p ( t ) ≡ a + t h : t ∈ [0 , } ⊂ E . Then there exists < θ < such that f ( a + h ) = f ( a ) + n − X k =1 k !  p X q =1 h q ∂∂x q  k f ( a ) + 1 n !  p X q =1 h q ∂∂x q  n f ( a + θ h ) , where the operator is short for  p X q =1 h q ∂∂x q  n = X P pq =1 i q = n n ! i ! i ! . . . i p ! h i h i . . . h i p p ∂ n ∂x i ∂x i . . . ∂x i p p . (3) (vector-valued case) Suppose f ( x ) is a vector-valued function that possesses a continuous n -thorder derivative on an open convex set E ⊂ R p to an open set F ⊂ R p ′ . Fix a ∈ E and suppose h = ( h , h , . . . , h p ) ⊤ ∈ R p is so close to that the points p ( t ) ≡ a + t h lie in E whenever ≤ t ≤ .Denote by h ( k ) the k -tuple ( h , h , . . . , h ) . Then, the integral form is expressed by f ( a + h ) = f ( a ) + n − X k =1 k ! D k f ( a ) h ( k − + 1( n − Z (1 − t ) n − D n f ( a + t h ) h ( n ) dt, where D k f ( x ) = D (cid:0) D k − f (cid:1) ( x ) for a positive integer k (Lang, 1993, p.346) and D f ( x ) =  ∂∂x f ( x ) ∂∂x f ( x ) . . . ∂∂x p f ( x ) ∂∂x f ( x ) ∂∂x f ( x ) . . . ∂∂x p f ( x ) ... ... . . . ... ∂∂x f p ′ ( x ) ∂∂x f p ′ ( x ) . . . ∂∂x p f p ′ ( x )  denotes the ( p ′ × p ) -dimensional Jacobian matrix at x and f q ′ : R p → R denotes the q ′ -th componentof f for ≤ q ′ ≤ p ′ . The proof of Theorem 1 can be found in Rudin (1976, Theorem 5.10, p.108; Theorem 5.15, p.110;Problem 30, p.243) and Lang (1993, Theorem 4.2, p.341; Taylor’s Formula, p.349), we omit it here. ByTheorem 1, the following deterministic versions of MVT hold.

Corollary 1 (deterministic mean value theorems) . Under the assumptions for Theorem 1, let n = 1 .(1) (univariate case) there exists ξ ≡ a + θh for < θ < such that f ( a + h ) = f ( a ) + f ′ ( ξ ) h. (2) (multivariate case) there exists ξ ≡ a + θ h for < θ < such that f ( a + h ) = f ( a ) + ∇ f ( ξ ) h , where ∇ f ( x ) ≡ (cid:16) ∂∂x f ( x ) , ∂∂x f ( x ) , . . . , ∂∂x p f ( x ) (cid:17) denotes the row vector of the gradient at x .

3) (vector-valued case) the integral form is expressed by f ( a + h ) = f ( a ) + Z D f ( a + t h ) h dt. (2.1)Feng et al. (2013) discussed MVT for vector-valued case. Although there is no such a “mean value”when f is vector-valued, they suggested that one may use the integral form (2.1) or Peano’s form f ( a + h ) = f ( a ) + D f ( a ) h + o ( k h k ) , see the discussions in Feng et al. (2013, (15), p.247). Let us go back to the original question in Section 1: suppose a real-valued function f that has acontinuous derivative f ′ on a closed interval I and X is a random variable on a probability space (Ω , F , P ) . Fix a ∈ I , there exists a random variable ξ such that ξ ( ω ) ∈ I for every ω ∈ Ω and f ( X ) = f ( a ) + f ′ ( ξ )( X − a ) . A possible try is to impose more conditions on f . For example, we may assume that f ′ always has ameasurable inverse function ( f ′ ) − on I . Then, there exists a random variable ξ = ( f ′ ) − [( f ( X ) − f ( a )) / ( X − a )] ,where f ( X ) is a function with respect to random variable X . The second possible thought is to considerthe implicit form: f ′ ( ξ ) = [ f ( X ) − f ( a )] / ( X − a ) . On the right-hand side, it is a measurable function;and f ′ is continuous, so it is measurable. Then, ξ is measurable, following a conjecture: “Suppose g isa measurable function and the composition g ( f ) is a measurable function, then f is measurable.”.The third possible solution is to consider ξ ( ω ) for each ω . Theorem 1 guarantees that for every ω ∈ Ω , the set Ξ( X ( ω ) , a ) ≡ { ξ ( ω ) lie between a and X ( ω ) : f ( X ( ω )) = f ( a ) + f ′ ( ξ ( ω ))( X ( ω ) − a ) } is not empty. Since Ξ( X ( ω ) , a ) is bounded, there exists ξ ∗ ( ω ) such that: (1) ξ ( ω ) ≤ ξ ∗ ( ω ) for every ξ ( ω ) ∈ Ξ( X ( ω ) , a ) ; and (2) f ( X ( ω )) = f ( a ) + f ′ ( ξ ∗ ( ω ))( X ( ω ) − a ) . It is because there must bea sequence { ξ k ( ω ) } ⊂ Ξ( X ( ω ) , a ) such that lim k →∞ ξ k ( ω ) = sup Ξ( X ( ω ) , a ) ≡ ξ ∗ ( ω ) . Since f ′ iscontinuous, the sign of “limit” can pass through it. However, although we deﬁne a map ξ ∗ : Ω → R byunique ξ ∗ ( ω ) , we cannot obtain the measurability of ξ ∗ .We review some related works of literature but their thoughts cannot be applied here straightfor-wardly. For example, Benichou and Gail (1989) discussed the random vectors deﬁned by an implicit5unction by applying the implicit function theorem Taylor and Mann (1983, Theorem 1, p.225). Butwe cannot follow this thought easily due to ω in the presence of a box-like region deﬁned in the proofwhen considering X ( ω ) for a ﬁxed ω . Massey and Whitt (1993) provided a similar version of Taylor’sexpansion under expectation and their proof depends on the expectation heavily.Thanks to Aliprantis and Border (1999), they directly provided an elegant proof for a stochasticversion of Taylor’s theorem, i.e., it asserts that we can assign ξ ∗ ( ω ) for each ω in a measurable way. Theorem 2. (stochastic Taylor’s theorem with Lagrange remainder (univariate case)) Let n denote apositive number. Suppose f ( x ) is a real-valued function that possesses a continuous n -th order deriva-tive on a closed interval I ⊂ R . Fix a ∈ I and let X be a random variable on the probability space (Ω , F , P ) such that a + X ( ω ) belongs to I for all ω . Then there is a measurable function ξ such that ξ ( ω ) lies in the closed interval with endpoints and X ( ω ) for each ω ∈ Ω , and f ( a + X ( ω )) = f ( a ) + n − X k =1 f ( k ) ( a ) k ! X k ( ω ) + f ( n ) ( a + ξ ( ω )) n ! X n ( ω ) . (3.1) Proof.

We sketch the proof and the complete one can be found in Aliprantis and Border (1999, Proof ofTheorem 18.18, p.604). The following theorems and pages refer to this monograph.Deﬁne a correspondence ϕ : Ω ։ R (see the deﬁnition on p.4) by ϕ ( ω ) =  [0 , X ( ω )] , X ( ω ) > , [ X ( ω ) , , X ( ω ) < , { } , X ( ω ) = 0 , and let A = X − ((0 , ∞ )) , B = X − (( −∞ , and C = X − ( { } ) . Thus, A, B, C ∈ F . Moreover, ϕ is weakly measurable (see the deﬁnition on p.431) and has compact values (if ϕ ( x ) is a compact setfor each x ). Consider a distance function associated with ϕ (see the deﬁnition on p.595), deﬁned by δ ( ω, x ) = (cid:2) ( − x ) + + [ x − X ( ω )] + (cid:3) χ A ( ω ) + (cid:2) x + + [ X ( ω ) − x ] + (cid:3) χ B ( ω ) + | x | χ C ( ω ) , where x + = max { x, } , | x | denote the positive part and absolute value of x respectively, and χ A ( x ) denotes an inductor that takes value 1 if x ∈ A and takes value 0 otherwise. So, δ is a Carath´eodoryfunction (see the deﬁnition on p.153) by Theorem 18.5 (see p.595). Then, let6 ( ω ) = f ( a + X ( ω )) − f ( a ) − n − X k =1 k ! f ( k ) ( a ) X k ( ω ) , g ( ω, x ) = 1 n ! f ( n ) ( a + x ) X n ( ω ) , in which g : Ω × [ I ( l ) − a, I ( r ) − a ] → R is a Carath´eodory function because f ( n ) is continuous on I , where I ( l ) and I ( r ) denote the left and right endpoints of I , i.e. I = [ I ( l ) , I ( r )] . By Theorem 1,the function π : Ω → R is a measurable selector (see the deﬁnition on p.600) from the range of g on ϕ .Finally, by Filippov’s implicit function theorem (see p.603), there is a measurable function ξ : Ω → R such that ξ ( ω ) ∈ ϕ ( ω ) and g ( ω, ξ ( ω )) = π ( ω ) for all ω ∈ Ω . (cid:4) Although Aliprantis and Border (1999) does not mention it, we may extend it to a multivariate caseby checking each line of the arguments in the above proof. To display the proof clearly, we consider theproof for p = 2 and the proof for any positive p is extremely similar. Corollary 2. (stochastic Taylor’s theorem with Lagrange remainder (multivariate case)) Let n denotea positive number. Suppose f ( x ) is a real-valued function that possesses a continuous n -th orderderivative on an open convex set E ⊂ R p . Fix a ∈ E and suppose X = ( X , X , . . . , X p ) ⊤ be a p -dimensional random vector on the probability space (Ω p , F p , P ) such that { p ( t, ω ) ≡ a + t X ( ω ) : t ∈ [0 , } ⊂ E for all ω . Then there is a p -dimensional random vector ξ such that ξ ( ω ) lies in { p ( t, ω ) − a : t ∈ [0 , } for each ω ∈ Ω p , and f ( a + X ( ω )) = f ( a ) + n − X k =1 k !  p X q =1 X q ( ω ) ∂∂x q  q f ( a ) + 1 n !  p X q =1 X q ( ω ) ∂∂x q  n f ( a + ξ ( ω )) . (3.2) Proof.

We check the arguments of the proof for univariate case and let p = 2 . Deﬁne ϕ : Ω ։ R by7 (cid:16) ( ω , ω ) ⊤ (cid:17) =  [0 , X ( ω )] × [0 , X ( ω )] , X ( ω ) > , X ( ω ) > , [0 , X ( ω )] × [ X ( ω ) , , X ( ω ) > , X ( ω ) < , [0 , X ( ω )] × { } , X ( ω ) > , X ( ω ) = 0 , [ X ( ω ) , × [0 , X ( ω )] , X ( ω ) < , X ( ω ) > , [ X ( ω ) , × [ X ( ω ) , , X ( ω ) < , X ( ω ) < , [ X ( ω ) , × { } , X ( ω ) < , X ( ω ) = 0 , { } × [0 , X ( ω )] , X ( ω ) = 0 , X ( ω ) > , { } × [ X ( ω ) , , X ( ω ) = 0 , X ( ω ) < , { } × { } , X ( ω ) = 0 , X ( ω ) = 0 . By the deﬁnition, ϕ is a correspondence due to ϕ ( ω ) ∈ R for every ω ∈ Ω (see the deﬁnition on p.4).Let I ≡ (cid:8) ( ω , ω ) ⊤ ∈ Ω : X ( ω ) > , X ( ω ) > (cid:9) , I ≡ (cid:8) ( ω , ω ) ⊤ ∈ Ω : X ( ω ) > , X ( ω ) < (cid:9) , I ≡ (cid:8) ( ω , ω ) ⊤ ∈ Ω : X ( ω ) > , X ( ω ) = 0 (cid:9) , I ≡ (cid:8) ( ω , ω ) ⊤ ∈ Ω : X ( ω ) < , X ( ω ) > (cid:9) , I ≡ (cid:8) ( ω , ω ) ⊤ ∈ Ω : X ( ω ) < , X ( ω ) < (cid:9) , I ≡ (cid:8) ( ω , ω ) ⊤ ∈ Ω : X ( ω ) < , X ( ω ) = 0 (cid:9) , I ≡ (cid:8) ( ω , ω ) ⊤ ∈ Ω : X ( ω ) = 0 , X ( ω ) > (cid:9) , I ≡ (cid:8) ( ω , ω ) ⊤ ∈ Ω : X ( ω ) = 0 , X ( ω ) < (cid:9) , I ≡ (cid:8) ( ω , ω ) ⊤ ∈ Ω : X ( ω ) = 0 , X ( ω ) = 0 (cid:9) . So, I ij ∈ F for every i and j . Moreover, foreach open subset G of R , the lower inverse is deﬁned by ϕ l ( G ) = { ω ∈ Ω : ϕ ( ω ) ∩ G = ∅ } (see thedeﬁnition on p.557). Thus, ϕ is weakly measurable because ϕ l ( G ) ∈ F (see Deﬁnition 18.1, p.592)and has compact values because ϕ ( ω ) is a compact set for each ω .Consider a distance function associated with δ : Ω × X (= { x = ( x , x ) ⊤ } ) → R , deﬁned by S ( i ) = ( − x i ) + + [ x i − X i ( ω i )] + , S ( i ) = x + i + [ X i ( ω i ) − x i ] + , S ( i ) = | x i | , i = 1 , ,δ ( ω, x ) = [ S (1) + S (2)] χ A ( ω ) + [ S (1) + S (2)] χ A ( ω ) + [ S (1) + S (2)] χ A ( ω )+ [ S (1) + S (2)] χ A ( ω ) + [ S (1) + S (2)] χ A ( ω ) + [ S (1) + S (2)] χ A ( ω )+ [ S (1) + S (2)] χ A ( ω ) + [ S (1) + S (2)] χ A ( ω ) + [ S (1) + S (2)] χ A ( ω ) , which is a Carath´eodory function (see the deﬁnition on p.153) by Weak Measurability and DistanceFunctions theorem (see Theorem 18.5 on p.595). Consider functions8 ( ω ) = f ( a + X ( ω )) − f ( a ) − n − X k =1 f ( k ) ( a ) k ! X k ( ω ) , g ( ω, x ) = 1 n ! f ( n ) ( a + x ) X n ( ω ) . The function g : Ω × { p ( t, ω ) − a } → R is a Carath´eodory function because g ( · , x ) is measurablefor each x ∈ { p ( t, ω ) − a } and g ( ω, · ) is continuous for each ω ∈ Ω . By Theorem 1, the function π : Ω → R is a measurable selector from the range of g on ϕ . Applying Filippov’s theorem (seeTheorem 18.17 on p.603), there exists a measurable function ξ : Ω → R satisfying ξ ( ω ) ∈ ϕ ( ω ) and g ( ω, ξ ( ω )) = π ( ω ) for all ω ∈ Ω . (cid:4) Moreover, we have the following stochastic versions of MVT immediately.

Corollary 3 (stochastic mean value theorem) . Let n = 1 .(1) (univariate case) Under the assumptions for Theorem 2, there exists a measurable function ξ such that ξ ( ω ) lies in the closed interval with endpoints a and a + X ( ω ) for every ω ∈ Ω and f ( a + X ( ω )) = f ( a ) + f ′ ( ξ ( ω )) X ( ω ) . (2) (multivariate case) Under the assumptions for Corollary 2, there exists a p -dimensional randomvector ξ such that ξ ( ω ) lies in { p ( t, ω ) : t ∈ [0 , } for every ω ∈ Ω p and f ( a + X ( ω )) = f ( a ) + ∇ f ( ξ ( ω )) X ( ω ) . We introduce a stochastic version of Taylor’s expansion and MVT for both univariate and multivariatecases. By applying them, one can answer clearly there exists a random variable (or random vector) θ ∗ between an ML estimator ˆ θ and the true value θ when exploring the asymptotic behaviors of ˆ θ . As westated, it is a so small technical detail that does not change any correctness of statistical arguments. Butthe existence of a random variable (or random vector) θ ∗ can make the proof more easily understandableand more concise.In the end, we are questioning that if there is a stochastic version of Taylor’s expansion or MVT fortwo random variables. For example, let f have continuous derivative f ′ on an closed interval I and let9 and Y be two random variables on I that share the same probability space (Ω , F , P ) . Then, doesthere exist a random variable ξ such that f ( X ( ω )) = f ( Y ( ω )) + f ′ ( ξ ( ω ))( X ( ω ) − Y ( ω )) (4.1)for every ω ∈ Ω ? A motivation example derives from Shao (2003, Proof of Theorem 5.17, p.377),when proving the jackknife variance estimator is strongly consistent. To our best knowledge, Crescenzo(1999) provides an expectation version, that is, E ( f ( Y )) − E ( f ( X )) = E ( f ′ ( Z )) [ E ( Y ) − E ( X )] , for a non-negative random variable Z under some conditions. But the proof seems not easy to extend toformula (4.1). References

Aliprantis, C. and Border, K. (1999).

Inﬁnite Dimensional Analysis: A Hitchhiker’s Guide . Studies inEconomic Theory. Springer.Benichou, J. and Gail, M. H. (1989). A delta method for implicitly deﬁned random variables.

TheAmerican Statistician

A course in mathematical statistics and largesample theory . Springer.Bickel, P. J. and Doksum, K. A. (2015).

Mathematical statistics: basic ideas and selected topics ,volume 1. CRC Press, 2 edition.Cox, D. R. and Hinkley, D. V. (1974).

Theoretical statistics . Springer-Science Business Media, B.V.Crescenzo, A. D. (1999). A probabilistic analogue of the mean value theorem and its applications toreliability theory.

Journal of Applied Probability

The American Statistician

Mathematical foundations of inﬁnite-dimensional statistical models .Number 40. Cambridge University Press.Jiang, J. (2010).

Large sample techniques for statistics . Springer Science & Business Media.Kendall, G. M. and Stuart, A. (1961).

The Advanced Theory of Statistics , volume 2. Hafner PulishingCompany - New York, 3 edition.Kosorok, M. R. (2007).

Introduction to empirical processes and semiparametric inference . SpringerScience & Business Media.Lang, S. (1993).

Real and functional analysis , volume 142. Springer Science & Business Media.Lehmann, E. L. (2004).

Elements of large-sample theory . Springer Science & Business Media.Massey, W. A. and Whitt, W. (1993). A probabilistic generalization of taylor’s theorem.

Statistics &Probability Letters

Principles of mathematical analysis . McGraw-hill New York, 3 edition.Serﬂing, R. J. (2002).

Approximation theorems of mathematical statistics . John Wiley & Sons.Shao, J. (2003).

Mathematical Statistics . Springer Texts in Statistics. Springer, 2 edition.Taylor, A. E. and Mann, W. R. (1983).

Advanced calculus . John Wiley & Sons, INC., 605 Third Ave.,New York, NY 10158, USA.van der Vaart, A. (2000).

Asymptotic Statistics . Asymptotic Statistics. Cambridge University Press.van der Vaart, A. and Wellner, J. (1996).