Extreme(ly) mean(ingful): Sequential formation of a quality group
aa r X i v : . [ m a t h . P R ] N ov The Annals of Applied Probability (cid:13)
Institute of Mathematical Statistics, 2010
EXTREME(LY) MEAN(INGFUL): SEQUENTIAL FORMATION OFA QUALITY GROUP
By Abba M. Krieger , Moshe Pollak and Ester Samuel-Cahn University of Pennsylvania, Hebrew University and Hebrew University
The present paper studies the limiting behavior of the averagescore of a sequentially selected group of items or individuals, theunderlying distribution of which, F , belongs to the Gumbel domainof attraction of extreme value distributions. This class contains theNormal, Lognormal, Gamma, Weibull and many other distributions.The selection rules are the “better than average” ( β = 1) and the“ β -better than average” rule, defined as follows. After the first itemis selected, another item is admitted into the group if and only ifits score is greater than β times the average score of those alreadyselected. Denote by Y k the average of the k first selected items, andby T k the time it takes to amass them. Some of the key results ob-tained are: under mild conditions, for the better than average rule, Y k less a suitable chosen function of log k converges almost surelyto a finite random variable. When 1 − F ( x ) = e − [ x α + h ( x )] , α > h ( x ) /x α x →∞ −→
0, then T k is of approximate order k . When β >
1, theasymptotic results for Y k are of a completely different order of mag-nitude. Interestingly, for a class of distributions, T k , suitably normal-ized, asymptotically approaches 1, almost surely for relatively small β ≥
1, in probability for moderate sized β and in distribution when β is large.
1. Introduction and summary.
Individuals are observed sequentially. Theproblem of whether to accept an individual at the time that she is observedhas a rich literature. The most celebrated version is the “Secretary Prob-lem,” where the criterion is to select one individual and the objective isto maximize the probability that the best individual is chosen. This set-ting has been extended in various ways including selecting a limited number
Received May 2009; revised March 2010. Supported by funds from the Marcy Bogen Chair of Statistics at the Hebrew Universityof Jerusalem. Supported by the Israel Science Foundation Grant 467/04.
AMS 2000 subject classifications.
Primary 62G99; secondary 62F07, 60F15.
Key words and phrases.
Selection rules, averages, better than average, sequential ob-servations, asymptotics.
This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in
The Annals of Applied Probability ,2010, Vol. 20, No. 6, 2261–2294. This reprint differs from the original inpagination and typographic detail. 1
A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN of individuals and basing the reward on the rank or score of individual(s)selected.Another extension that has received recent attention is to select a groupof “quality members.” This might occur when a team of highly qualifiedprofessionals is assembled, for example, in an academic department or aconsulting group in a specialized area. The goal is to find good rules foreither accepting or rejecting each additional individual into the group at thetime that the individual is observed.One such rule that has been studied is to add a new member to thegroup only if this will not decrease the average quality of the group, termedin the literature as the “better than average selection rule.” This tacitlyassumes that “quality” is measurable. A generalization of the rule would beto only admit a new member whose score is say 5% higher than the currentaverage. We term the extended rules as “ β -better than average rules.” Theserules reduce to the better than average rule when β = 1, first considered byPreater [5], but allows say for β = 1 .
05 to produce a group that is even moreprogressively selective than when β = 1.The assumption that is commonly made is that the quality of the individ-uals are mutually independent from a common distribution. As the horizon, n , tends to infinity we study the asymptotic behavior of the average qualityof the group and the rate at which the group grows for the β -better thanaverage rules.The β -better than average rules are considered in Krieger, Pollak andSamuel-Cahn [4], and the present paper, which can be read independently,can be considered its natural continuation. Sequential selection of a “good”group, based only upon the relative ranks of the observations is consid-ered in Krieger, Pollak and Samuel-Cahn [3]. It should be noticed that therules considered here can be implemented without knowledge of the under-lying distribution, though their asymptotic behavior depends strongly onthat distribution. For convenience, we assume that the first item is alwaysselected. However, all asymptotic results remain correct if the selection pro-cess is adopted only after a core group of members already exists. Also, therandom variable is assumed to be nonnegative (or the process begins withthe first nonnegative observation), because negative averages multiplied by β > Y k , of the group, after k items have been retained, and T k , the time (in terms of the number ofobserved items) it takes to amass a group of size k . Our interest is in theasymptotics of these quantities, as k → ∞ . This paper, unlike [4], consid-ers F belonging to the extreme value domain of attraction of the Gumbeldistribution exp {− e − x } only. Write 1 − F ( x ) = exp {− H ( x ) } . Emphasis isgiven to a subset of these distributions, which are also “stretch exponential”distributions, where H ( x ) = x α + h ( x ), with h ( x ) /x α x →∞ −→
0, for all x > x , XTREME(LY) MEAN(INGFUL) for some x , where α >
0. This class includes the Gamma and Normal dis-tributions as particular cases.The “expected overshoot,” f ( x ) = E ( X − x | X > x ), plays an essentialrole. Our main findings are: for the “better than average” rule ( β = 1),under some mild conditions, the quantity Y k − G − (log k ) converges a.s. toa finite random variable where G ( x ) = R xx /f ( u ) du . These mild conditionsare satisfied in particular by the stretch exponential distributions with α ≥ G ( x ) and H ( x ) are close to each otherin that G ( x ) = (1 + o (1)) H ( x ). The convergence of Y k − G − (log k ) is shownin Section 3, where also the convergence of the sequence of expected valuesand variances of { Y k − G − (log k ) } is established. The behavior of Y k for β > β > Y k /k β − converges a.s. to a finite positive random variable.The behavior of T k is discussed in Section 5. It is shown that for stretchexponential random variables with α > β = 1 and every ε > T k /k − ε → ∞ a.s. as k → ∞ , while T k /k ε → α = 1, T k /k converges to a finite positive random variable. The “standardized” variable T ∗ k = T k . k − X j =1 [1 − F ( βY j )] − for β ≥ α >
0. For different values of β we obtain different asymp-totic behavior: we show that for 1 ≤ β < / α the random variable T ∗ k converges to 1 a.s. For 1 + 1 / α ≤ β < /α it converges to 1 in probabil-ity. For β > /α the random variable T ∗ k converges in distribution to anexponential mean one distribution, while for β = 1 + 1 /α the convergencein distribution is to a sum of conditionally independent exponential randomvariables. We conclude with Section 6, which contains further comments andremarks. Section 2 contains some preliminaries. Proofs are relegated to theAppendix in order to highlight the results in the paper.
2. Mathematical preliminaries.
The observations are denoted by X , X , . . . and are i.i.d. random variables from a common absolutely continuous dis-tribution F . We assume that 1 − F ( x ) > x < ∞ , unless statedotherwise.The behavior of rules will be characterized by considering two quantities: • T k = The number of observations inspected until the k th item is retained(including that item). • Y k = The average score of the first k items that are retained.The β better than average rule is defined as follows: for fixed β (whichis suppressed in the notation) and T k defined above as the number of items A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN observed until the k th item is selected, let T = 1 and Y = X . Define T k and Y k inductively by T k +1 = inf { i > T k : X i > βY k } , k = 1 , , . . . ,Y k +1 = X T k +1 , k = 1 , , . . . . It is clear that Y k increases in k for β = 1. If β > X i to avoid the situation that if Y k is negative then the cutoff to retain anobservation becomes less stringent.2.1. Theorems on almost sure convergence.
In this subsection, first wepresent two theorems, that exist in the literature, which will be useful inproving asymptotic results for the quantities of interest. First, we shall needthe following result, due to Robbins and Siegmund [6], quoted as follows:
Theorem 2.1.
Let (Ω , F , P ) be a probability space and F ⊂ F ⊂ · · · asequence of sub- σ -algebras of F . For each n = 1 , , . . . , let z n , β n , ξ n and ζ n be nonnegative F n -measurable random variables such that E ( z n |F n − ) ≤ z n − (1 + β n − ) + ξ n − − ζ n − . Then lim n →∞ z n exists and is finite and P ∞ n =1 ζ n < ∞ a.s. on { P ∞ n =1 β n < ∞ , P ∞ n =1 ξ n < ∞} . Corollary 2.1.
Let z n , β n , ξ n and ζ n be nonnegative sequences of con-stants such that P β n and P ξ n converge, and z n ≤ z n − (1 + β n − ) + ξ n − − ζ n − . Then lim n →∞ z n exists and is finite and P ∞ n =1 ζ n < ∞ . Proof.
This follows trivially from Theorem 2.1. (cid:3)
We also need the following theorem that appears in Feller [2], page 239.
Theorem 2.2.
Let Q , Q , . . . be independent r.v.s with E ( Q n ) = 0 , andlet S n = P ni =1 Q i . If:(1) b < b < · · · → ∞ are constants and(2) P ∞ n =1 E ( Q n /b n ) < ∞ then b − n S n → a.s. as n → ∞ . XTREME(LY) MEAN(INGFUL) Classes of distributions.
Preater [5] showed that when F is exponen-tial and β = 1, Y k − log k converges almost surely to a Gumbel distribution.Krieger, Pollak and Samuel-Cahn [4] extended this result in several ways.The asymptotic behavior of other quantities, such as T k , were obtained, val-ues of β > F , such as the Pareto and Beta, wereanalyzed.An interesting question is how the rules behave for other distributions F . This depends on the behavior of the overshoot, X − a | X > a , and itsexpectation f ( a ), f ( a ) := E ( X − a | X > a ) . (2.1)Let x F = sup { x : F ( x ) < } . Definition 2.1 (See [7], Section 1.1). A distribution function F , belong-ing to the domain of attraction of the Gumbel extreme value distributionΛ( x ) = exp {− e − x } , −∞ < x < ∞ , is called a Von Mises function (VM) ifthere exists x such that for x < x < x F and r > − F ( x ) = r exp (cid:26) − Z xx [1 /f A ( u )] du (cid:27) := e − H ( x ) , (2.2)where f A ( u ) > x < u < x F and f A is absolutely continuous on ( x , x F )with derivative f ′ A ( u ) and lim u ր x F f ′ A ( u ) = 0.Note that Definition 2.1 is identical to definition (1.3) in Section 1.1 of [7],except that we refer to the auxiliary function by f A instead of f to distin-guish it from the expected overshoot function. To link the two functions, wedefine g ( u ) = f ( u ) /f A ( u ). The representation of a VM distribution function F in (2.2) is equivalent to1 − F ( x ) = r exp (cid:26) − Z xx [ g ( u ) /f ( u )] du (cid:27) := e − H ( x ) . (2.3)It is shown in [7] that a twice differentiable VM distribution satisfieslim x ր x F F ′′ ( x )(1 − F ( x ))[ F ′ ( x )] = − . (2.4)It can be shown that lim u ր x F g ( u ) = 1 follows from (2.4).Let G ( x ) be defined by e − G ( x ) = r exp (cid:26) − Z xx [1 /f ( u )] du (cid:27) . (2.5)Thus, f ( x ) = 1 G ′ ( x ) . (2.6) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN
Note that ddx G − ( x ) = f ( G − ( x )) . (2.7)It is clear by this definition that H ( x ) = (1 + o (1)) G ( x ) as x ր x F .We shall consider only such VM for which x F = ∞ . (But see Remark 3.1.)Some of our results that will follow hold for VM distributions, but mostof the results pertain to a rich subclass. Specifically: Definition 2.2. F is a generalized stretched exponential distribution ifit is VM with H ( x ) = cx α + h ( x ), h ′′ ( x ) exists and c > α > x →∞ h ( x ) x α = 0(2.8)and lim x →∞ h ′ ( x ) x α − = 0 . (2.9)This class of distributions is denoted by G α . By change of variables y = c /α x it suffices in the sequel to consider only c = 1.The reason for extending the stretched exponential by adding h ( x ) isto include many of the classical families of distributions such as Normal,Gamma, Lognormal and Weibull. For example, the right-hand tail probabil-ity of the standard normal behaves like φ ( x ) /x by Mills’ ratio where φ ( x ) isthe standard normal density. Hence the standard normal belongs to G with h ( x ) = log( x ).
3. Average, when β = 1 . In this section we consider the behavior of Y k ,the average after k items are retained, using the better than average rule.The emphasis is on the random variables that are generated from a VMdistribution. In the first subsection, we consider the almost sure behavior,and in the ensuing subsection, results for the expectation and variance of Y k are presented.Let Z k = Y k − Y k − , the “overshoot” over Y k − . The results are based onthe following relationship: Y k = ( k − Y k − + Y k k = Y k − + Z k k = Y k − + Z ( Y k − ) k , (3.1)where Z ( a ) is distributed like X − a | X > a .The results depend on the expected overshoot f ( a ) = E [ Z ( a )]. We shalluse the following lemma later, that gives the expected overshoot and squaredovershoot for F in G α for large values of a . Specifically: XTREME(LY) MEAN(INGFUL) Lemma 3.1.
If the underlying distribution is in G α , α > then lim a →∞ EZ ( a ) a − α /α = 1(3.2) and lim a →∞ EZ ( a )2 a − α ) /α = 1 . (3.3)The proof of the results uses l’Hˆopital’s rule on E ( Z ( a )) = R ∞ (1 − F Z ( a ) ( y )) dy for the expected overshoot and E ( Z ( a )) = 2 R ∞ y (1 − F Z ( a ) ( y )) dy .This result implies that f ( a ) = a − α /α [1 + o (1)]. In some instances weneed a more refined result on the rate, that is the o (1) term, which dependson h ( x ). An easy case, as shown in the proof of Corollary 3.1, is when h ( x ) = 0, in which case the rate of o (1) is 1 /a α .In the more general case, we want to include h ( x ). The point of adding h ( x ) is to extend our results to known distributions such as the Normal. Therole that h ( x ) plays, is that it is small relative to x α .The following lemma provides a handle on the overshoot. Lemma 3.2. If F ∼ G α , then f ( a ) = 1 H ′ ( a ) (cid:18) O (cid:18) a α (cid:19)(cid:19) . Furthermore, if h ′ ( x ) / ( x α − ε − ) goes to , where < ε < α , we have that f ( a ) = a − α (cid:20) o (cid:18) a ε (cid:19)(cid:21). α. (3.4)These conditions on h ( x ) and its derivatives are hardly restrictive as theintent is for h ( x ) to be small. In particular, if h ( x ) is x γ for γ < α , then allof the above conditions hold.The details of the proofs throughout this and the remaining sections ofthe paper appear in the Appendix.3.1. Results on almost sure convergence of the mean.
The main resultin this subsection is that under mild conditions Y k − G − (log k ) convergesalmost surely to a finite random variable (Theorem 3.2). This is an extensionof the result in [5] that Y k − log k converges a.s. to a Gumbel distributionwhen observations are generated from an exponential distribution. Theorem3.1, which is simpler than Theorem 3.2, considers only the G α class of dis-tributions. This theorem standardizes Y k by dividing it by a function of k .Theorem 3.2, however, provides a stronger result, which for the G α class ofdistributions is applicable when α > A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN
Theorem 3.1.
If the underlying distribution function is in G α , where α > , and lim x →∞ h ′ ( x ) /x α − ε − = 0 for some ε > and lim x →∞ h ′′ ( x ) /x α − = 0 , then lim k →∞ Y k (log k ) /α = lim k →∞ Y k G − (log k ) = lim k →∞ Y k H − (log k ) = 1 a.s. (3.6)The proof considers S k = ( A k − where A k = Y k (log k ) /α . Theorem 2.1 isused to show that S k converges almost surely. We do not believe that thestrengthening of condition (2.9) by (3.5) is necessary for the conclusion tohold, though we use it in the proof. We know from Theorem 3.2 that it isnot needed for α > Y k − G − (log k ) converges a.s. to a finite random variable as k → ∞ . The conditions for this result are different from those of Theorem3.1, but distributions in G α with α > h made in Theorem 3.1. Theorem 3.2.
Let β = 1 and F be a VM distribution. Then under con-ditions: (A) EZ ( a ) < a γ for some < γ < ∞ and all a > a and (B) f ′ ( a ) ≤ for all a ≥ a , for some a < ∞ Y k − G − (log k ) converges a.s. to a finite random variable as k → ∞ . The core of the proof is to show that [ Y k − G − (log k )] converges almostsurely, using Theorem 2.1.Conditions (A) and (B) are usually satisfied for F a VM distribution when G ( x ) increases fast enough. In particular they hold for F ∈ G α with α > α >
0) follows from Lemma 3.1. Condition(B) holds since here f ( x ) = 1 /G ′ ( x ) = (1 + o (1)) { αx α − [1 + h ′ ( x ) αx α − ] } − , sofrom (2.9) f ( x ) is eventually decreasing. The case α = 1 holds when h ( x )is increasing. If F has increasing failure rate (IFR), that is, satisfies “newbetter than used,” then condition (B) is satisfied. Remark 3.1. If x F < ∞ , it is easy to see that lim k →∞ [ Y k − G − (log k )] =0 a.s. An example F of a VM distribution with x F < ∞ is 1 − F ( x ) = e /x I ( x < XTREME(LY) MEAN(INGFUL) Corollary 3.1.
Let F ∼ G α with α ≥ and h ( x ) = 0 . Then Y k − log /α k converges a.s. to a finite random variable as k → ∞ . Remark 3.2.
The conclusion of Theorem 3.2 does not hold for all F ∈G α , α >
0, thus not for all VM, for example, H ( x ) = x / . We omit the proof.3.2. Results on convergence of moments.
Theorem 3.3.
If conditions (A) and (B) given in Theorem 3.2 holdthen there exist constants < b , b , b < ∞ such that [ EY k − G − (log k )] → b ,E [ Y k − G − (log k )] → b and hence Var[ Y k − G − (log k )] → b .
4. Average, when β > . In this section we consider the behavior of Y k under the more stringent condition that an observation is retained only if itexceeds β times the previous average, where β >
1. The main result is that Y k must be standardized by an entirely different quantity, namely, k β − , inorder to get a.s. convergence. For F ∈ G α this standardization is correct forall α >
0. The result depends on the following relationship: Y k = Y k − + ( β − Y k − k + Z ( βY k − ) k . (4.1)The result concerns B k = Y k k β − . Let F k denote the σ -field generated by Y , . . . , Y k . It follows by dividing both sides of (4.1) by k β − that E ( B k |F k − ) = B k − (cid:18) O (cid:18) k (cid:19)(cid:19) + E (cid:18) Z ( βY k − ) k β (cid:12)(cid:12)(cid:12)(cid:12) F k − (cid:19) (4.2) = B k − (cid:18) O (cid:18) k (cid:19)(cid:19) + f ( βY k − ) /k β . Hence if the expected overshoot is bounded, it follows from Theorem 2.1that B k converges almost surely. A more refined result appears in the nextsubsection followed by remarks on special cases. The section ends with re-sults showing that under some conditions the expected value and varianceof B k also converge. A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN
Almost sure convergence of the mean.
We first show that B k con-verges almost surely under more general conditions in the following: Theorem 4.1.
Assume F is a VM distribution. Let B k = Y k k β − and f ( x ) = E ( X − x | X > x ) . (i) If f ( x ) < cx (log x ) ε , where c > and ε > , then B k converges a.s. toa nondegenerate positive random variable. (ii) If B k converges a.s., f is monotone and lim k →∞ E ( B k ) < ∞ thenfor some constant x > , Z ∞ x f ( x ) x dx < ∞ . (4.3) Remark 4.1. (a) Note that the sufficient condition (i) of Theorem 4.1can hold also for distributions that are not VM. An example is the Geometricdistribution.(b) Equation (4.3) does not have a β in the expression. Also, under themore restrictive condition of bounded expected overshoot that was used tointroduce this section (which led to an easy proof of almost sure convergenceof the desired quantity), (4.3) holds.The following is a general statement about convergence of Y k for thestretched exponential family of distributions. Corollary 4.1.
Let F ∈ G α , α > , β > . Then there exists a randomvariable < W β < ∞ such that Y k ( β ) k β − k →∞ −→ W β a.s. Proof.
Since f ( x ) x − α /α → f ( x ) < cx − α for some constant c > x > x for suitable choice of x . Hence the condition in (i) ofTheorem 4.1 holds. (cid:3) There exist VM distributions for which B k fails to converge a.s. to a finitelimit. The proposition below provides a general result for when B k does notconverge to a finite limit a.s. Proposition 4.1.
Let Ψ( a ) be an increasing positive function of a suchthat Z ∞ x Ψ( x ) x dx = ∞ . (4.4) XTREME(LY) MEAN(INGFUL) Let B k = Y k k β − , Z ( a ) ∼ X − a | X ≥ a and define Z ∗ ( a ) = Z ( a ) / Ψ( a ) . If thereexists a constant a and a nonnegative random variable V , not identicallyzero, such that for all a ≥ a V is stochastically smaller than Z ∗ ( a ) , then B k → ∞ a.s. as k → ∞ . Example 4.1.
Let 1 − F X ( x ) = e − (log x ) / , which is easily seen to be aVM distribution. Let Ψ( a ) = a/ log( a ). We shall show that the conditions(and hence the conclusions) of Proposition 4.1 hold for this example. Proof of Proposition 4.1.
It is immediate that R ∞ a Ψ( x ) x dx = ∞ .Furthermore,1 − F Z ∗ ( a ) ( x ) = 1 − F X ( a + xa/ log a )1 − F X ( a ) = exp {− (log a + log(1 + x/ log a )) / } exp {− (log a ) / } = exp (cid:26) − (log a ) (cid:18) log (cid:18) x log a (cid:19)(cid:19) − (cid:18) log (cid:18) x log a (cid:19)(cid:19) (cid:27) > exp (cid:26) − x − x / a ) (cid:27) > exp {− x − x / } for all a > e . Hence, if V is such that 1 − F V ( x ) = e − x − x / x ≥
0) then V is stochastically smaller than Z ∗ ( a ) for all a > e . Furthermore,lim n →∞ n X k =1 k β Ψ( γk β − ) = n X k =1 k β γk β − log( γk β − )= γ n X k =1 [ k (log γ + ( β −
1) log k )] − = ∞ . (cid:3) Note that since here f ( x ) = [1 + o (1)] x/ log x , it follows that one cannottake ε = 0 in Theorem 4.1(i).4.2. Convergence of moments.
We now turn to showing that the expec-tation of Y k suitably normalized converges to a finite limit for all randomvariables that belong to the stretch exponential.We first consider EB k and Var B k in a general setting. Theorem 4.2.
Under the following three conditions: (a) Var
X < ∞ ; (b) f ( a ) is nonincreasing for a > a ; (c) EZ ( a ) < ca for some c > and a > a ; EB k and Var B k converge to a finite limit. A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN
Remark 4.2.
Condition (a) always holds for nonnegative X with F aVM distribution (see Exercise 1.1.1(a) of [7]). Lemma 3.1 implies that (c)holds for any F ∈ G α with α ≥ /
2. Condition (b) holds for all F ∈ G α , α > X ∼ Exp(1).The above theorem does not apply for F ∈ G α , with α ≤
1. Nevertheless, EB k converges in this case as shown in the following theorem. Theorem 4.3.
Let B k = Y k k β − . If F ∈ G α , α > and β > , then EB k converges to a finite limit. Proof.
By Theorem 4.2 we need only consider the case α ≤
1. FromLemma 3.1 it follows that for some c and k large enough f ( βY k − ) < cY − αk − = c [( k − β − B k − ] − α = c ( k − ( β − − α ) B − αk − ≤ c ( k − ( β − − α ) [1 + B k − ] . Substituting this into (4.2) yields E ( B k |F k − ) ≤ B k − (cid:20) O (cid:18) k min(2 , β − α ) (cid:19)(cid:21) + O (cid:18) k β − α (cid:19) . (4.5)Taking expectations on both sides of (4.5) and using Corollary 2.1 yields theresult. (cid:3)
5. Time until k items are kept. Discussion of the problem.
In this section we turn to the secondquantity of interest, T k , the number of items that are observed until k itemsare retained. Unfortunately, it is generally impossible to normalize T k bya function of k and achieve almost sure convergence to a nondegeneraterandom variable. Instead we consider the following quantity: T ∗ k = T k P k − j =1 [1 − F ( βY j )] − , (5.1)which depends on the averages { Y j } , the expectation of which tends to 1.The results are obtained for the G α , α > α ≥ β . When β is relatively small, 1 ≤ β < α , then the conver-gence is almost sure to 1. When β is moderate in size, 1 + α ≤ β < α ,the convergence is to 1, in probability. Finally, if β is large, β ≥ α , theconvergence is in distribution to an exponential or a sum of conditionallyindependent exponential random variables with means summing up to 1. XTREME(LY) MEAN(INGFUL) Almost sure convergence, when β = 1 . Theorem 5.1.
Let β = 1 and X i ∼ F where F is G α , α > . Then T ∗ k = T k P k − j =1 [1 − F ( Y j )] − → almost surely . The proof uses Theorem 2.2 by conditioning on the responses { Y k } , letting P j = 1 − F ( Y j − ), b j = P ji =1 P − j and Q i = T i − T i − − P − i with T = 0.Though Theorem 5.1 gives no explicit order of magnitude of the conver-gence of T k , in terms of k , we get an idea of this magnitude in the following: Corollary 5.1.
For any δ > and F ∈ G α , α > , β = 1lim T k /k − δ = ∞ and lim T k /k δ = 0 a.s. For the exponential distribution T k k converges a.s. to a limit as shown in[4].5.3. Asymptotic results when β > . The focus is on T ∗ k , the number ofobservations that are observed until k items are retained suitably normalizedas defined in (5.1).For the sake of clarity, we consider in the continuation only F ∈ G α , α > h ( x ) ≡
0, that is, H ( x ) = x α . Theorem 5.2.
Let X ∼ F where − F ( x ) = e − x α and α > . Then as k → ∞ : (i) T ∗ k a.s. −→ for < β < α , (ii) T ∗ k P −→ for α ≤ β < α , (iii) T ∗ k D −→ Exp(1) and T k e βαY αk − D −→ Exp(1) for β > α . The result for β = 1 + α is of a different nature, and hence is treatedseparately in Theorem 5.3. To prove parts (ii) and (iii) of Theorem 5.2, wecompute the limiting generating function of T k , suitably standardized, andare able to recognize the distribution for which this limit is the generatingfunction. The results then follow from the Continuity theorem. This line ofreasoning is also used in proving Theorem 5.3.Note that for U ∼ Geo( p ), Ee − tU = 11 + (1 − e − t ) / ( pe − t ) . A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN
We ignore the first observation which adds one to T k (this will have no effecton the limiting distribution). Hence the resulting random part of T k (whichwe refer to as ˜ T k ) is the sum of, conditionally on { Y j } ∞ j =1 , independentgeometric random variables where p j = e − ( βY j − ) α . We have conditionallyon { Y j } E ( e − tγ ( k ) ˜ T k ) = k Y j =2 (cid:20) − e − tγ ( k ) e − ( βY j − ) α − tγ ( k ) (cid:21) − , where the sequence γ ( k ) is positive, and will be defined as a function of thegiven { Y k } , according to the need in the proof for each particular instance,but always tends to 0. Thuslog Ee − tγ ( k ) ˜ T k = − k X j =2 log (cid:18) − e − tγ ( k ) tγ ( k ) tγ ( k ) e tγ ( k )+( βY j − ) α (cid:19) (5.2) = − k X j =2 log[1 + (1 + o k (1)) tγ ( k ) e ( βY j − ) α ] . For (ii) we let γ ( k ) = 1 / P k − j =1 e ( βY j ) α and for (iii) we let γ ( k ) = e − ( βY k − ) α .We now turn to the case where β = 1 + 1 /α , so that β − /α . Thisis the only case where conditioning on the sequence { Y k } plays a role inthe limiting distribution obtained. We know from Theorem 4.1 that thereexists a random variable W , 0 < W < ∞ , such that Y k /k /α k →∞ −→ W a.s.Our result will be stated in terms of the value of W . Theorem 5.3.
Let X ∼ F where − F ( x ) = e − x α , α > and β = 1 + α .Let W = lim Y k /k /α . Then T ∗ k D −→ ∞ X j =1 R j as k → ∞ , where, conditionally on W = w , the R j are independently, exponentially dis-tributed with mean µ j , where µ j = exp( βw ) α − j ( βw ) α ] . Note that the µ j sum to 1. XTREME(LY) MEAN(INGFUL)
6. Concluding remarks.
The present paper extends the results in [4]where the Exponential, Beta and Pareto distributions are considered in de-tail, to other distributions that include the Normal, Gamma and Weibull.The results on the special distributions considered in [4] are “invertible” inthe sense that rates of convergence for Y k and T k imply rates of convergencefor the number of items that are kept and the average of the items kept after n items are observed. The results obtained for the distributions consideredhere are in general not invertible in this way.Preater in [5] considered the behavior of the average of the first k itemsthat are kept, Y k , when the distribution generating the observations is ex-ponential and β = 1 in the β better than average rule. He observed that Y k − log k converges a.s. and in L to a Gumbel distribution. The behaviorof this quantity for β > β = 1, Y k − log k con-verges a.s. When β > Y k /k β − converges a.s. In addition, the rate when β > Y k when β = 1 depends on the distribution.There are two interesting mathematical observations. First, it is not sur-prising that there should be some relationship between the domain of attrac-tion to which the extremal distribution of F belongs and the limiting distri-bution of Y k , since the Y k process will, on the average, select larger and largeritems. Preater in [5] shows that Y k − log k and max { X , . . . , X k } − log k havethe exact same limiting Gumbel distribution when the observations are i.i.d.from an exponential distribution (though Y k converges a.s. and in L whilethe maximum converges only in distribution). Will the limiting distributionof Y k , and M k = max { X , . . . , X k } always agree, or at least have the samerate of convergence? From the general theory of extreme values it followsthat 1 f ( H − (log k )) ( M k − H − (log k )) D −→ U = Gumbel as k → ∞ . (6.1)This should be compared with our result for β = 1 (under the appropriateconditions of Theorem 3.2), Y k − G − (log k ) a . s . −→ some finite random variable as k → ∞ . (6.2)The “normalization” is the same in (6.1) and (6.2) if and only if f ( x ) ≡ X is exponential.The second interesting mathematical observation is that for the Beta andPareto distributions, discussed in [4], we get the same kind of a.s. conver-gence for T k , after normalization (depending on β ) for all β ≥
1. In thefamilies of distributions considered in the present paper, different kinds ofasymptotic convergence hold for different values of β . Specifically, when β is relatively small, the normalized quantity converges almost surely. When β is in the middle range, the convergence is in probability. For large valuesof β the convergence is in distribution. A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN
APPENDIX
A.1. Proofs for Section 3.
Proof of Lemma 3.1.
Consider (3.3). For any nonnegative randomvariable Q , EQ = 2 R ∞ y (1 − F Q ( y )) dy . Thus EZ ( a ) = 2 R ∞ a ( x − a ) e − H ( x ) dxe − H ( a ) . So lim a →∞ EZ ( a )2 a δ = lim a →∞ R ∞ a ( x − a ) e − ( x α + h ( x )) dxa δ e − ( a α + h ( a ))l’Hˆopital = lim a →∞ − R ∞ a e − ( x α + h ( x )) dxe − ( a α + h ( a )) { δa δ − − [ αa α − + h ′ ( a )] a δ } (A.1) = lim a →∞ R ∞ a e − ( x α + h ( x )) dxe − ( a α + h ( a )) αa α + δ − [1 + h ′ ( a ) / ( αa α − )]= lim a →∞ R ∞ a e − ( x α + h ( x )) dxe − ( a α + h ( a )) αa α + δ − by (2.9). Using l’Hˆopital’s rule once more we get that the value in (A.1)equals lim a →∞ − e − ( a α + h ( a )) αe − ( a α + h ( a )) { ( α + δ − a α + δ − − [ αa α − + h ′ ( a )] a α + δ − } = lim a →∞ α a α − δ . Thus if we take δ = 2(1 − α ) the above limit is 1 /α and (3.3) follows.The proof for E ( Z ( a )) = R ∞ a e − H ( x ) dx/e − H ( a ) follows in a similar manner. (cid:3) Proof of Lemma 3.2.
Through integration by parts f ( a ) = R ∞ a e − H ( x ) dxe − H ( a ) = R ∞ a e − H ( x ) H ′ ( x ) /H ′ ( x ) dxe − H ( a ) = [ − e − H ( x ) /H ′ ( x )] ∞ a e − H ( a ) − R ∞ a e − H ( x ) H ′′ ( x ) / ( H ′ ( x )) dxe − H ( a ) . Note that H ′′ ( x ) / ( H ′ ( x )) tends to 0 for a VM distribution. Now to get therate, consider H ( x ) = x α + h ( x ) where lim x →∞ h ′ ( x ) /x α − = 0 by (2.9), and XTREME(LY) MEAN(INGFUL) assume lim x →∞ h ′′ ( x ) /x α − = 0. This implies that H ′′ ( x )( H ′ ( x )) = O ( x α ). Sincethe first term is 1 /H ′ ( a ), using l’Hˆopital’s rule on the second term yields f ( a ) = 1 H ′ ( a ) (cid:18) O (cid:18) a α (cid:19)(cid:19) . Finally, to get the rate at which f ( a ) / ( a − α /α ) goes to 1, we need the rateat which h ′ ( x ) /x α − goes to 0. If we assume that h ′ ( x ) / ( x α − ε − ) goes to 0,where 0 < ε < α , we have (3.4). (cid:3) Proof of Theorem 3.1.
Let A k = Y k (log k ) /α and S k = ( A k − . Notethat log( k − k = 1 − k log k + O ( k ), thus (cid:18) log( k − k (cid:19) /α = 1 − αk log k + O (cid:18) k (cid:19) . Hence, S k = (cid:18) A k − (cid:18) log( k − k (cid:19) /α + Z k k (log k ) /α − (cid:19) = (cid:20) ( A k − − − A k − (cid:18) αk log k + O (cid:18) k (cid:19)(cid:19) + Z k k (log k ) /α (cid:21) = ( A k − − + A k − (cid:18) α k (log k ) + O (cid:18) k (cid:19)(cid:19) + Z k k (log k ) /α + 2( A k − − (cid:20) Z k k (log k ) /α − A k − (cid:18) αk log k + O (cid:18) k (cid:19)(cid:19)(cid:21) − Z k k (log k ) /α A k − (cid:18) αk log k + O (cid:18) k (cid:19)(cid:19) . Taking conditional expectations on both sides, using (2.1), we therefore get E ( S k |F k − ) ≤ S k − + 2( S k − + 1) (cid:18) αk (log k ) + O (cid:18) k (cid:19)(cid:19) (A.2) + E ( Z ( Y k − ) |F k − ) k (log k ) /α + 2( A k − − (cid:20) f ( Y k − ) k (log k ) /α − A k − (cid:18) αk log k + O (cid:18) k (cid:19)(cid:19)(cid:21) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN since A k − ≤ S k − + 1). The first two terms in (A.2) therefore cause noproblem in the application of Theorem 2.1 to S k . By Lemma 3.1, for all k sufficiently large E ( Z ( Y k − ) |F k − ) k (log k ) /α < ε ) α Y − α ) k − k (log k ) /α = 2(1 + ε ) α A − α ) k − (log( k − /α ) − α ) k (log k ) /α < ε ) α A − α ) k − k (log k ) < ε ) α ( A k − + 1) k (log k ) < ε ) α (cid:20) S k − k (log k ) + 3 k (log k ) (cid:21) , so the second term in the last expression is summable, and again factoringout S k − the first term is also summable.It remains to deal with the last term in (A.2). From (3.4), f ( Y k − ) = Y − αk − [1 + o (1 /Y εk − )] α . Thus the first term in the square brackets in (A.2) satisfies f ( Y k − ) k log k /α = Y − αk − [1 + o (1 /Y εk − )] αk (log k ) /α = A − αk − [(log( k − /α ] − α [1 + o (1 / ( A k − (log( k − /α ) ε )] αk (log k ) /α = A − αk − [1 + o ( A − εk − / (log k ) ω )] αk log k , where ω = ε/α . The last line in (A.2) can therefore be rewritten as − A k − − A k − [1 − A − αk − + O (log k/k ) + A − αk − o ( A − εk − / (log k ) ω )] αk log k . (A.3)We want to study when (A.3) is positive for large k . This depends on theterm in brackets, which to simplify notation we denote by R ( x ), where x = A k − , and the dependence of R ( x ) on k is implicit. Note that for k sufficientlylarge O (log k/k ) < δ k ≡ / (log k ) ω . Also note that ν k ≡ A − εk − δ k < Y − εk − → k → ∞ and that ν k < x − ε δ k if A k − > x >
0. Hence when k is sufficientlylarge, R ( x ) ≤ R ( x ) ≤ R ( x ) where R ( x ) = 1 − x − α − δ k − ν k x − α (A.4) XTREME(LY) MEAN(INGFUL) and R ( x ) = 1 − x − α + δ k + ν k x − α . (A.5)The aim is to show that (A.3) is positive only when 1 − cδ k ≤ x ≤ cδ k for a suitably chosen constant 0 < c < ∞ . We consider two cases:(i) Assume x >
1. Then (A.3) is positive for the values of x such that R ( x ) <
0. Since R ( x ) ≤ R ( x ), the values of x such that R ( x ) <
0, or, equiv-alently, the values of x such that x α R ( x ) < x suchthat R ( x ) <
0. It suffices to consider x α − − δ k x α − ν k < . (A.6)The set of x such that (A.6) holds is equivalent to the set of x such that x < (cid:18) ν k − δ k (cid:19) /α < c δ k for k large for a suitably chosen constant 0 < c < ∞ .(ii) Assume x <
1. Then (A.3) is positive for the values of x such that R ( x ) >
0. Since R ( x ) ≥ R ( x ), the values of x such that R ( x ) >
0, or, equiv-alently, the values of x such that x α R ( x ) > x suchthat R ( x ) >
0. Hence, we want to consider when x α + δ k x α + ν k > . (A.7)Since δ k and ν k are arbitrarily small for k sufficiently large, there exists x > x > x . Therefore,(A.7) is equivalent to x > (cid:18) − ν k δ k (cid:19) /α > − c δ k (A.8)for k sufficiently large for a suitable chosen constant 0 < c < ∞ .The above analysis shows that (A.3) can be bounded from above by zerowhen A k − is outside the interval c ± δ k . When it is inside, (A.3) is boundedby O (1 / [ k (log k ) ω ]). Hence (A.3) is summable. Thus S k converges a.s. byTheorem 2.1. If S k converges to a value different from 0 this would lead toa contradiction, as the sum of the terms in (A.3) would go to minus infinity,while S k is nonnegative. Hence A k tends to 1 a.s.Note that when H ( x ) = x α + h ( x ), and (2.8) holds, then necessarily H − ( x ) = x /α + h ∗ ( x ) where h ∗ ( x ) x /α x →∞ −→
0. Thus H − (log k )(log k ) /α k →∞ −→ G − ( x ) = H − ( x )[1 + o ( x )], also the middle termin (3.6) converges a.s. to 1. (cid:3) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN
Proof of Theorem 3.2.
The first step in the proof is to show that( Y k − G − (log k )) converges a.s. to a finite random variable as k → ∞ .Since Y k → ∞ , there will be a (possibly random) k such that for all k > k , everything written below holds. Consider k > k only. Let c k = G − (log k ). Then, by (2.7) and the boundedness of f , c k − c k − = (log k − log( k − G − ( u k )] ′ = − log (cid:18) − k (cid:19) f ( G − ( u k ))(A.9) = f ( G − ( u k )) k + O (cid:18) k (cid:19) , where the O ( k ) term is positive andlog( k − ≤ u k ≤ log k. (A.10)Note that the last equality in (A.9) follows since f is bounded by condition(B). Now write( Y k − c k ) = (cid:20) ( Y k − − c k − ) + Z ( Y k − ) k + ( c k − − c k ) (cid:21) = ( Y k − − c k − ) + Z ( Y k − ) k + ( c k − − c k ) + 2 Z ( Y k − ) k ( c k − − c k )+ 2( Y k − − c k − ) (cid:20) Z ( Y k − ) k + ( c k − − c k ) (cid:21) . Taking conditional expectation, conditional on F k − , yields E [( Y k − c k ) |F k − ] = ( Y k − − c k − ) | {z } (i) + E [ Z ( Y k − ) |F k − ] k | {z } (ii) + ( c k − − c k ) | {z } (iii) + 2 f ( Y k − ) k ( c k − − c k ) | {z } (iv) (A.11) + 2( Y k − − c k − ) (cid:20) f ( Y k − ) k + ( c k − − c k ) (cid:21)| {z } (v) . We shall show that the conditions for Theorem 2.1 hold. We shall examineeach term in (A.11) separately. We first show that for any ω > Y k /k ω → k → ∞ a.s . (A.12) XTREME(LY) MEAN(INGFUL) Let W k ( ω ) = Y k k ω . Then clearly W k ( ω ) > E [ W k ( ω ) |F k − ] = (cid:18) k − k (cid:19) ω W k − ( ω ) + f ( Y k − ) k ω +1 < W k − ( ω ) + Bk ω +1 , since f is bounded (where we have denoted its bound by B ). It followsthat W k ( ω ) converges a.s. to a finite limit, L ( ω ) ≥
0. Then also W k ( ω/ → L ( ω/
2) a.s. But W k ( ω ) = W k ( ω/ /k ω/ , thus the limit must be 0 for all ω .Now consider term (ii) of (A.11). By condition (A) and (A.12), for all k sufficiently large E [ Z ( Y k − ) |F k − ] k < Y γk − k < εk ωγ k a.s.(A.13)Choose ω < γ and write 1 − ωγ = δ . The rightmost expression in (A.13) isthen ε/k δ , which clearly is summable.Term (iii) is summable by (A.9) and the boundedness of f .Term (iv) is negative, and hence causes no problem.Term (v): note first that by (A.9) f ( Y k − ) k + ( c k − − c k ) = f ( Y k − ) − f ( G − ( u k )) k + O (cid:18) k (cid:19) (A.14) = ( Y k − − G − ( u k )) k f ′ ( d k ) + O (cid:18) k (cid:19) , where d k is a value between Y k − and G − ( u k ). Since G − is increasing, itfollows from (A.10) that c k − ≤ G − ( u k ) ≤ c k . (A.15)Consider two cases:(a) Y k − − c k − ≤
0. Then by (A.15) also Y k − − G − ( u k ) ≤ O (1 /k ) term is positive.(b) Y k − − c k − >
0. If also Y k − − G − ( u k ) ≥
0, the previous argumentgoes through, except that we still must show that ( Y k − − c k − ) I ( Y k − − c k − > /k is summable. Now write ( Y k − − c k − ) I ( Y k − − c k − > < ( Y k − − c k − ) + 1. Thus( Y k − − c k − ) I ( Y k − − c k − > /k ≤ ( Y k − − c k − ) k + 1 k . (A.16)The first term on the right-hand side of (A.16) can be combined with (i) in(A.11), and the second is clearly summable.Now suppose Y k − − G − ( u k ) < < Y k − − c k − . Then c k − < Y k − < G − ( u k ) < c k . (A.17) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN
Since both | Y k − c k − | and | Y k − G − ( u k ) | are less than c k − c k − , it followsfrom (A.9) that (v) is summable.It follows that in all cases we can write E [( Y k − c k ) |F k − ] ≤ ( Y k − − c k − ) (1 + B k − ) + D k − − V k − , where B k , D k and V k are nonnegativerandom variables, and B k and D k are summable. Thus by Theorem 2.1( Y k − c k ) k →∞ −→ W a.s. , (A.18)where 0 ≤ W < ∞ is a random variable. Thus | Y k − c k | → k →∞ √ W a.s.It remains to show that when W = 0, Y k − c k cannot jump between √ W and −√ W an infinite number of times. It will then follow that the limitexists and is either √ W or −√ W . Recall that Y k − Y k − = Z k k , and that by(A.9) 0 < c k − c k − < γk , for some γ > P { Y k − Y k − > ε } = P (cid:26) Z k k > ε (cid:27) = P (cid:26) Z k k > ε (cid:27) ≤ Cε k δ . Thus by the Borel–Cantelli lemma P { Y k − Y k − > ε infinitely often } = 0.This implies P {| ( Y k − c k ) − ( Y k − − c k − ) | > ε infinitely often } = 0. Thus, if √ W > ε , Y k − c k cannot jump between √ W and −√ W an infinite number oftimes, that is, Y − c k will converge a.s. to √ W or −√ W . Since for W > ε > W − ε >
0, it followsthat Y k − c k converges. Clearly on the set where { W = 0 } the statement( Y k − c k ) → Y k − c k → (cid:3) Proof of Corollary 3.1.
The expected overshoot given
X > a is f ( a ) = R ∞ a e − x α dxe − a α = 1 α R ∞ a α y /α − e − y dye − a α . The right-hand side follows by change of variables to y = x α . But in Abramowitzand Stegun [1], page 263, R ∞ x t ν − e − t dte − x = x ν − (cid:20) ν − x + O (cid:18) x (cid:19)(cid:21) as x → ∞ . This implies that f ( a ) = 1 α a − α (cid:20) /α − a α + O (cid:18) a α (cid:19)(cid:21) as a → ∞ . Equation (2.6) implies G ′ ( a ) = 1 f ( a ) = αa α − (cid:20) − /αa α + O (cid:18) a α (cid:19)(cid:21) as a → ∞ . Integrating both sides results in G ( a ) = a α + ( α −
1) log a + O (1) for large a since the remainder term O ( a − ( α +1) ) has finite integral. XTREME(LY) MEAN(INGFUL) Now for any ε >
0, if a is sufficiently large, then G ( a /α + ε ) = ( a /α + ε ) α +( α −
1) log( a /α + ε ) + O (1). But ( a /α + ε ) α = a { εa /α } α = a + αεa − /α +smaller order terms. Hence if α > a is sufficiently large, then G ( a /α + ε ) > a . A similar argument shows that if α > a is sufficiently large,then G ( a /α − ε ) < a . Therefore lim a →∞ [ G − ( a ) − a /α ] = 0. (cid:3) Proof of Theorem 3.3.
Upon taking expectations on both sides of(A.11) we obtain E [( Y k − c k ) ] = E [( Y k − − c k − ) | {z } (i) ] + E (cid:18) E [ Z ( Y k − ) |F k − ] k | {z } (ii) (cid:19) + ( c k − − c k ) | {z } (iii) + 2 E [ f ( Y k − )] k ( c k − − c k ) | {z } (iv) (A.19) + 2 E (cid:18) ( Y k − − c k − ) (cid:20) f ( Y k − ) k + ( c k − − c k ) (cid:21)(cid:19)| {z } (v) . All we need to do is subtract E [( Y k − − c k − ) ] on both sides and sum. If aterm remaining on the right-hand side is positive then we need to show thatit is summable. If a term is negative it must be summable as the term on theleft-hand side is nonnegative. Hence we see that terms (ii), (iii) and (iv) causeno trouble. The only term of concern is (v). But the expectation (integralover the density of Y k ) can be divided into an integral over three regions:(i) Y k − ≤ c k − , (ii) Y k − ≥ G − ( u k ) and (iii) c k − < Y k − < G − ( u k ). Asin the proof of Theorem 3.2, the integrand for regions (i) and (ii) is negativeand over the third region it is positive, but can be dealt with in the sameway as in the proof of Theorem 3.2 by use of Corollary 2.1. The last twostatements of the theorem follow. (cid:3) A.2. Proofs for Section 4.
Proof of Theorem 4.1.
Proof of (i). E ( B k |F k − ) = B k − (cid:20)(cid:18) k − k (cid:19) β − (cid:18) β − k (cid:19)(cid:21) + f ( βY k − ) k β . (A.20)Thus E ( B k |F k − ) = B k − (cid:18) O (cid:18) k (cid:19)(cid:19) + f ( β ( k − β − B k − ) k β , (A.21) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN where O ( k ) >
0. Thus E ( B k |F k − ) > B k − (1 + O ( k )) which implies that B k converges, to a finite or infinite limit.Suppose first that the limit is infinite. Then there exist k and D > /β such that for all k > k , B k − > D . But then from (i), for k > k f ( β ( k − β − B k − ) k β < B k − cβ ( k − β − k β [log( βB k − ) + ( β −
1) log( k − ε < B k − cβk [log( βD ) + ( β −
1) log( k − ε < B k − cβ ( β − ε k [log( k − ε . But the term multiplying B k − on the right is summable, which implies that(A.21) satisfies the condition of Theorem 2.1, and hence B k converges to afinite limit. This contradiction implies that B k converges to a finite r.v. a.s. Proof of (ii). Now suppose that B k converges a.s. and lim EB k < ∞ . Itfollows from (4.1) and as in (A.20) that B k can be written as B k = B k − (cid:20) O (cid:18) k (cid:19)(cid:21) + Z ( βY k − ) k β , where O (1 /k ) is positive. It follows that B k > B k − , so that the limit ispositive. Since the support of the observations is not bounded, in a differentrealization one could obtain a higher value. Hence the limit is a nondegen-erate positive random variable. Set B = 0. Then B k = k X j =1 ( B j − B j − ) = O (1) k X j =1 B j − j + k X j =2 Z ( βY j − ) j β . (A.22)Since B k converges a.s. lim B k exists and is finite a.s. Taking expecta-tions and limits as k → ∞ on both sides of (A.22) and noting that EB k isassumed to be bounded implies that P ∞ j =1 Z ( βY j − ) j β < ∞ . This in turn im-plies P ∞ j =1 f ( βY j − ) j β < ∞ . Since B k converges a.s. to a random variable W β for 0 < ε < W β and a (random) k , we have for all k − > k , ( W β − ε )( k − β − < Y k − < ( W β + ε )( k − β − . If f is increasing ∞ > ∞ X k = k f ( βY k − ) k β > ∞ X k = k f ( β ( W β − ε )( k − β − ) k β (A.23) > (cid:18) (cid:19) β ∞ X k = k f ( A ( k − β − )( k − β , XTREME(LY) MEAN(INGFUL) where the inequality follows since ( k − k ) β > ( ) β , and where we have written A = β ( W β − ε ).Finally, ∞ X k = k f ( A ( k − β − )( k − β > Z ∞ k − x + 1) β f ( Ax β − ) dx> (cid:18) k − k − (cid:19) β Z ∞ k − f ( Ax β − ) x β dx. By change of variable to y = Ax β − the integral on the right-hand sidebecomes Aβ − R ∞ A ( k − β − f ( y ) y dy . This integral is therefore finite by (A.23). (cid:3) Proof of Proposition 4.1.
In a manner similar to the end of theproof in Theorem 4.1, it can be shown that if R ∞ C Ψ( y ) y dy diverges, thenlim n →∞ P nk = k k β Ψ( γk β − ) also diverges.Note that B k = B k − (cid:20) O (cid:18) k (cid:19)(cid:21) + Z ∗ ( βY k − ) k β Ψ( βY k − ) . (A.24)Let F k be the c.d.f. of Z ∗ ( βY k − ) conditional on Y k − . Let F V be the c.d.f.of V . Let U , U , . . . ∼ U [0 ,
1] i.i.d. Define V k = F − V ( U k ) (so V i are i.i.d. withc.d.f. F V ). Clearly, V k ≤ F − k ( U k ) conditional on Y k − once βY k − ≥ a (which will happen with probability 1).It follows that one can imbed the sequence Y , Y , . . . in a probability spacewhere V , V , . . . are i.i.d. with c.d.f. F V and V i ≤ Z ∗ ( βY k − ) for all i such that βY i − ≥ a conditional on Y k − . Define V ∗ i = c V i > c ) for some c such that P ( V i >c ) >
0. Clearly, V ∗ i ≤ Z ∗ ( βY k − ). Note that V ∗ i is c times a Bernoulli randomvariable. Now Z ∗ ( βY k − ) k β Ψ( βY k − ) ≥ V ∗ k Ψ( βY k − ) k β . (A.25)Recall that Y k ≥ β − kk Y k − so that for a constant a that is independent of Y and k , Y k > Y k Y j =2 β − jj ≥ Y a k β − . (A.26) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN
Hence (for k such that βY k − ≥ a and a a constant) since Ψ( a ) increasesin a Z ∗ ( βY k − ) k β Ψ( βY k − ) ≥ V ∗ k Ψ( βY a ( k − β − ) k β . (A.27)Finally, condition on Y and denote c k = Ψ( βY a ( k − β − ) k β . By (4.4) and what we showed above, lim n →∞ P nk =1 c k = ∞ . It is a straight-forward application of Kolmogorov’s three-series theorem (cf. Feller [2], page317) that lim n →∞ n X k =1 V ∗ k c k = ∞ a.s.(A.28)Putting (A.27) and (A.28) together obtains that Z ∗ ( βY k − ) k β Ψ( βY k − ) is notsummable. This and (A.24) imply that lim k →∞ B k = ∞ a.s. (cid:3) Proof of Theorem 4.2. EB k converges to a finite limit by (4.2) since f is bounded by assumption (b)Var B k = Var (cid:18) k − βk β Y k − + Z ( βY k − ) k β (cid:19) = (cid:20) ( k − β ) ( k − β − ) k β (cid:21) Var B k − + Var( Z ( βY k − )) k β (A.29) + 2 k − βk β Cov( Y k − , Z ( βY k − )) . We shall treat each of the three terms in (A.29) separately.(i) It is easily seen (by taking log) that the value in the square bracketis 1 + O ( k ).(ii) From condition (c) and the convergence of EB k to a finite limitVar( Z ( βY k − )) k β < EZ ( βY k − ) k β < cβEY k − k β < cβ (lim EB k + ε ) k β +1 . Thus the second term in the right-hand side of (A.29) is summable.(iii) We now show that the third term in the right-hand side of (A.29) isnegative or 0:Cov( Y k − , Z ( βY k − ))= E ( Y k − Z ( βY k − )) − E ( Y k − ) E ( Z ( βY k − )) XTREME(LY) MEAN(INGFUL) = E [ Y k − E ( Z ( βY k − ) |F k − )] − E ( Y k − ) E [ E ( Z ( βY k − ) |F k − )]= 1 β E [ βY k − f ( βY k − )] − β E ( βY k − ) Ef ( βY k − )= 1 β Cov( βY k − , f ( βY k − )) ≤ , where the last inequality follows from (b). It follows that (A.29) satisfies thecondition in Corollary 2.1 with z n = Var B n , and the result follows. (cid:3) A.3. Proofs for Section 5.
Proof of Theorem 5.1.
Let P j = 1 − F ( Y j − ). We shall use Theorem2.2 conditionally on the sequence { Y k } . Let b j = P ji =1 P − i and Q i = T i − T i − − P − i with T ≡
0. Obviously, the sequence { b j } ∞ j =1 satisfies the firstcondition of Theorem 2.2.Note that conditional on the sequence { P j } the distribution of T i − T i − isGeometric ( P i ) and these differences are conditionally independent of eachother. Hence { Q n } ∞ n =1 is a sequence of conditionally independent randomvariables with zero expectation and variance (1 − P n ) /P n . We shall showthat the second condition of Theorem 2.2 holds ∞ X n =1 E ( Q n /b n ) = ∞ X n =1 − P n P n (cid:30) n X j =1 P − j ! < ∞ X n =1 P n (cid:30) n X j =1 P − j ! . It therefore suffices to show that for all n ≥ n n X j =0 P n +1 P j +1 ≥ An / log n (A.30)for some A >
0. We shall actually show that for any 0 < ε < / j such that for all n ≥ j ≥ j P n +1 P j +1 > j − ε n ε . (A.31)From (A.31) it is immediate that (A.30) holds, since n X j =0 P n +1 P j +1 > n X j = j P n +1 P j +1 ≥ n ε n X j = j j − ε ≥ D n − ε − j − ε n ε > An − ε . Note that for H ( x ) = x α + h ( x ) for α > h that satisfies (2.8), we haveby Theorem 3.1, Y j = (log j ) /α (1 + ε j ) with ε j j →∞ −→
0. Thus H ( Y j ) = (log j )(1 + ε j ) α (cid:20) h ((log j ) /α (1 + ε j ))(log j )(1 + ε j ) α (cid:21) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN and since h ( x ) /x α x →∞ −→ ε > j suchthat for all j > j (1 + ε ) log j > H ( Y j ) > (1 − ε ) log j, which implies, since [1 − F ( Y j )] − = exp H ( Y j ), that j ε > [1 − F ( Y j )] − > j − ε . (A.32)Thus (A.31) follows.Note that here S n of Theorem 2.2 equals T n − P ni =1 P − i , thus b − n S n → T ∗ n − → { Y k } , it holds unconditionally. (cid:3) Proof of Corollary 5.1.
In (A.32) take any ε >
0. Hence for somepositive constants c , c , c ∗ , c ∗ and all k large enough c ∗ k ε > c k X j =1 j ε > k X j =1 [1 − F ( Y j )] − > c k X j =1 j − ε > c ∗ k − ε . Since T k P kj =1 [1 − F ( Y j )] − → k large enough and c ∗∗ a positive constant T k k − δ = T k P kj =1 [1 − F ( Y j )] − P kj =1 [1 − F ( Y j )] − k − δ > c ∗∗ k δ − ε → ∞ a.s. if δ > ε. The proof for T k k δ follows in a similar manner. (cid:3) Proof of Theorem 5.2.Proof of (i) . We shall (again) use Theorem 2.2 and show (A.30) where P j = 1 − F ( βY j − ). Assume that k (random) is such that for all k ≥ k , Z k < γY k − . Such a k exists with probability one by Lemma A.1. Then for k > k , Y k = Y k − + Z k + ( β − Y k − k ≤ Y k − (cid:18) γ + β − k (cid:19) . Thus Y αk ≤ Y αk − (cid:18) γ + β − k (cid:19) α ≤ Y αk − (cid:18) dk (cid:19) , (A.33)where d = ( γ + β − ρ uα and ρ uα (and for later purposes ρ lα ) is defined by1 + ρ lα x ≤ (1 + x ) α ≤ ρ uα x for 0 ≤ x ≤ , (A.34) XTREME(LY) MEAN(INGFUL) where ρ lα = α and ρ uα = 2 α − α ≥
1, while ρ lα = 2 α − ρ uα = α when α <
1. We have used the inequality (1 + x ) α ≤ α − x , valid forall α ≥ ≤ x ≤
1. We can therefore write, using (A.33), P k +1 P k = exp {− β α ( Y αk − Y αk − ) } ≥ exp (cid:26) − β α dk Y αk − (cid:27) . (A.35)Now let k ≥ k be so large that for all k ≥ k , Y k < ( W + ε ) k β − , whichexists, by Theorem 4.1. Then we can continue the inequality in (A.35), by P k +1 P k > exp (cid:26) − β α dk ( W + ε ) α k α ( β − (cid:27) = exp {− Bk α ( β − − } . To simplify notation let τ = α ( β − − , (A.36)thus τ > −
1. For j > k > k , we have P n +1 P j +1 = n Y k = j +1 P k +1 P k > exp ( − B n X k = j +1 k τ ) > exp (cid:26) − Bτ + 1 ( n τ +1 − ( j + 1) τ +1 ) (cid:27) . Thus n X j =1 P n +1 P j +1 > n X j = k P n +1 P j +1 > e − [ B/ ( τ +1)] n τ +1 n X j = k e [ B/ ( τ +1)]( j +1) τ +1 . But n X j = k e [ B/ ( τ +1)]( j +1) τ +1 > Z nk +1 e [ B/ ( τ +1)] x τ +1 dx, thus n X j =1 P n +1 P j +1 > Z nk +1 e [ B/ ( τ +1)] x τ +1 dx/e [ B/ ( τ +1)] n τ +1 . (A.37)We would like the right-hand side of (A.37), divided by n / ε for some(small) ε >
0, to tend to a nonzero limit in order for (A.30) to hold. Thusconsider, by use of l’Hˆopital’s rule, the limit as y → ∞ of q ( y ) = R yk +1 e Ax τ +1 dxy δ e Ay τ +1 where A > , lim y →∞ q ( y ) = lim y →∞ e Ay τ +1 e Ay τ +1 ( δy δ − + A ( τ + 1) y τ + δ ) , which is finite when τ + δ = 0 and tends to ∞ when τ + δ <
0. Now for δ = 1 /
2, by (A.36) we get a finite limit when α ( β − − / A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN β = 1 + 1 / (2 α ). Thus for β < / (2 α ) there will exist an ε > P n +1 j =0 P n +1 P j +1 > n / ε , and the result (i) follows. Proof of (ii). Let γ ( k ) = 1 P k − j =1 e ( βY j ) α (A.38)in (5.2). Then clearly γ ( k ) →
0. We shall show later that [(1 + o k (1)) tγ ( k ) × e ( βY j − ) α ] of (5.2) is arbitrarily close to 0 for 2 ≤ j ≤ k for all sufficientlylarge k and β < /α . It suffices to show this for j = k . We can then write,using (5.2) and (A.38), − (1 + ε ) t = − (1 + ε ) k X j =2 tγ ( k ) e ( βY j − ) α < log Ee − tγ ( k ) ˜ T k < − (1 − ε ) k X j =2 tγ ( k ) e ( βY j − ) α = − (1 − ε ) t. It follows that lim k →∞ E ( e − tγ ( k ) T k ) = e − t , which is the desired result. Westill must show that [(1 + o k (1)) tγ ( k ) e ( βY j − ) α ] of (5.2) is arbitrarily closeto 0 for j = k for all sufficiently large k and β < /α . Let ρ lα be definedby (A.34) Y αj − Y αj − > ρ lα ( β − Y αj − j > ρ lα ( β − j ( β − α W α (1 − ε ) j for all j sufficiently large, where by Theorem 4.1 lim Y j − ( j − β − = W >
0. Thus γ ( k ) e ( βY k − ) α = 1 P k − j =1 e − β α ( Y αk − − Y αj ) = 1 P k − j =1 e − β α P k − i = j +1 ( Y αi − Y αi − ) < P k − j = j e − D P k − i = j +1 i ( β − α − → , for suitable large j , as long as ( β − α − <
0, that is, β < /α [wherewe have let D = ρ lα β α ( β − W α (1 − ε )]. Proof of (iii). Here let γ ( k ) = e − β α Y αk − . With this γ ( k ), (5.2) becomeslog Ee − tγ ( k ) ˜ T k = − k X j =2 log[1 + (1 + o k (1)) te − β α ( Y αk − − Y αj − ) ]= − log[1 + (1 + o k (1)) t ](A.39) − k − X j =2 log[1 + (1 + o k (1)) te − β α P k − i = j ( Y αi − Y αi − ) ] . XTREME(LY) MEAN(INGFUL) Now for some
D > { Y k } )0 < e − β α ( Y αi − Y αi − ) < e − βαρlα ( β − Y αi − i < e − Di α ( β − − . Thus e − β α ( Y αk − − Y αj − ) < e − D R k − j x ν dx = e − D/ ( ν +1)[( k − ν +1 − j ν +1 ] , where ν = α ( β − − >
0, that is, β > /α . Butlim k →∞ k − X j =2 e − D/ ( ν +1)[( k − ν +1 − j ν +1 ] = lim k →∞ R k e Dx ν +1 dxe Dk ν +1 l’Hˆopital = lim k →∞ D ( ν + 1) k ν = 0 . Since the sum in the right-hand side of (A.39) tends to 0 andlim k →∞ E ( e − tγ ( k ) T k ) → / (1 + t ) , which is Ee − tQ where Q ∼ Exp(1), T k e βα Y αk − tends in distribution to an expo-nential distribution. The above proof shows that P k − j =1 e ( βY j ) α /e ( βY αk − ) a . s . −→
1, thus also T ∗ k D −→ Exp(1). (cid:3)
Lemma A.1.
Let X ∼ F where F is G α with α > . Let β > and let Z k be the random “overshoot” over βY k − . For any γ > , and any ≤ δ < ( β − α , P ( Z k > γY k − /k δ infinitely often ) = 0 . Proof.
Consider the event A = { Y k /k β − → W, < W < ∞} . We knowby Theorem 4.1 that P ( A ) = 1, and hence we shall assume that A occurs.Let A k = { Z k > γY k − /k δ } . We shall show that P ∞ k =1 P ( A k ) < ∞ so thatthe result will follow from the Borel–Cantelli lemma. P ( A k | Y k − ) = exp {− [( γ/k δ + β ) α − β α ] Y αk − − [ h (( γ/k δ + β ) Y k − ) − h ( βY k − )] } = exp (cid:26) − β α (cid:18)(cid:18) γ/βk δ (cid:19) α − (cid:19) Y αk − − h ′ ( Q k ) γk δ Y k − (cid:27) , where βY k − ≤ Q k ≤ ( β + γk δ ) Y k − . Write, by (2.9), (cid:12)(cid:12)(cid:12)(cid:12) − h ′ ( Q k ) γk δ Y k − (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) − h ′ ( Q k ) Q α − k γk δ Y k − Q α − k (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) o k Y αk − k δ (cid:12)(cid:12)(cid:12)(cid:12) , A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN where | o k | < ε for k ≥ k with large enough k and ε > k can be chosen to depend on Y only. For ε small enough thisimplies P ( A k | Y k − ) ≤ exp (cid:26) − β α (cid:18) αγ/βk δ + o (cid:18) k δ (cid:19)(cid:19) Y αk − + | o k | k δ Y αk − (cid:27) ≤ exp (cid:26) − [ β α − αγ − ε ] 1 k δ Y αk − (cid:27) ≤ exp (cid:26) − β α − αγ k δ cY k β − (cid:27) ≤ exp {− dY k β − − δ } . The next to last inequality follows from (A.26). Hence P ( A k | Y ) ≤ exp {− dY × k β − − δ } for k ≥ k = k ( Y ). If δ < β − P ∞ k =1 P ( A k | Y ) < ∞ , so, by theBorel–Cantelli lemma, conditional on Y , P ( A k i.o. | Y ) = 0. But this is truefor all Y . Hence P ( A k i.o.) = 0. (cid:3) Proof of Theorem 5.3.
Let γ ( k ) = 1 / P k − j =1 e dY αj where d = β α . Wewrite (5.2) aslog E ( e − tγ ( k ) ˜ T k ) = − k − X j =1 log[1 + t (1 + o k (1)) γ ( k ) e dY αk − j ] . (A.40)Since when R is exponentially distributed with mean µ , log E ( e − tR ) = − log(1 + µt ), it is sufficient to show that the right-hand side of (A.40) con-verges, as k → ∞ , to − P ∞ j =1 log[1 + tµ j ].First consider γ ( k ) e dY αk − j = 1 S j,k + T j,k , (A.41)where S j,k = P k − i = k − j e d ( Y αi − Y αk − j ) and T j,k = P k − j − i =1 e − d ( Y αk − j − Y αi ) . Note that Y i = Z i + βY i − where Z i is the amount above βY i − for the i th item thatis kept. Hence, Y i = ( i − Y i − + Z i + βY i − i = Y i − + Z i + Y i − /αi because β − /α. By Lemma A.1, for all i sufficiently large Y αi = Y αi − (cid:18) Z i iY i − + 1 αi (cid:19) α = Y αi − (cid:18) i + smaller order terms (cid:19) . XTREME(LY) MEAN(INGFUL) Let w = lim k →∞ Y k /k /α . Therefore, lim i →∞ Y αi − Y αi − = w α and for fixed b , lim i →∞ Y αi + b − Y αi = bw α . This implieslim k →∞ S j,k = lim k →∞ k − X i = k − j e d ( Y αi − Y αk − j ) = lim k →∞ j − X l =0 e d ( Y αk − j + l − Y αk − j ) (A.42) = j − X l =0 e dlw α = e dw α j − e dw α − . For any ε > m such that (1 − ε ) w α ≤ Y αi i +1 ≤ (1 + ε ) w α forall i ≥ m . This implieslim k →∞ T j,k = lim k →∞ m − X i =1 e − d ( Y αk − j − Y αi ) + lim k →∞ k − j − X i = m e − d ( Y αk − j − Y αi ) . Fix m . Then the first limit on the right-hand side is clearly zero since Y k − j →∞ as k → ∞ . Consider the second termlim sup k →∞ k − j − X i = m e − d ( Y αk − j − Y αi ) ≤ lim k →∞ k − j − m X l =1 e − dlw α (1 − ε ) = 1 e d (1 − ε ) w α − . Similarly, lim inf k →∞ k − j − X i = m e − d ( Y αk − j − Y αi ) ≥ e d (1+ ε ) w α − . Hence, lim k →∞ T j,k = 1 e dw α − j. (A.43)Substituting the results (A.42) and (A.43) into (A.41) yieldslim k →∞ γ ( k ) e dY αk − j = 1( e dw α j − / ( e dw α −
1) + 1 / ( e dw α − e dw α − e dw α j = µ j for fixed j. Returning to (A.40), fix n . − k − X j =1 log[1 + t (1 + o k (1)) γ ( k ) e dY αk − j ]= − n − X j =1 log[1 + t (1 + o k (1)) γ ( k ) e dY αk − j ](A.45) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN − k − X j = n log[1 + t (1 + o k (1)) γ ( k ) e dY αk − j ] . Equation (A.44) implies that each term in the sum of the first expression onthe right-hand side converges to log(1 + tµ j ) as k → ∞ . We need to show thatlim k →∞ P k − j = n log[1 + t (1 + o k (1)) γ ( k ) e dY αk − j ] can be made arbitrarily smallby choosing n to be sufficiently large (all terms in the sum are positive).Note that γ ( k ) e dY αk − j < S j,k . For any ε > n large enough so that Y αi i +1 ≥ (1 − ε ) w α for all i ≥ n . For j ≥ n , S j,k = k − X i = k − j e d ( Y αi − Y αk − j ) = j − X l =0 e d ( Y αk − j + l − Y αk − j ) ≥ j − X l =0 e dlw α (1 − ε ) = e dw α j (1 − ε ) − e dw α (1 − ε ) − . Hence, γ ( k ) e dY k − j < e dw α (1 − ε ) − e dw α j (1 − ε ) − < e − dw α ( j − − ε ) . Choose k large enough so that o k (1) < ε . Thenlim k →∞ k − X j = n log[1 + t (1 + o k (1)) γ ( k ) e dY αk − j ] ≤ lim k →∞ k − X j = n t (1 + ε ) e − dw α ( j − − ε ) < t (1 + ε ) e − dw α ( n − − ε ) e dw α (1 − ε ) − . Since the right-hand side goes to zero as n → ∞ the second term in the sumin (A.45) can be made arbitrarily small by choosing n sufficiently large. (cid:3) REFERENCES [1]
Abramowitz, M. and
Stegun, I. A. (1964).
Handbook of Mathematical Functionswith Formulas, Graphs, and Mathematical Tables . National Bureau of StandardsApplied Mathematics Series . Dover, New York. MR0167642[2] Feller, W. (1971).
An Introduction to Probability Theory and Its Applications, Vol.II , 2nd ed. Wiley, New York. MR0270403[3]
Krieger, A. M. , Pollak, M. and
Samuel-Cahn, E. (2007). Select sets: Rank andfile.
Ann. Appl. Probab. [4] Krieger, A. M. , Pollak, M. and
Samuel-Cahn, E. (2008). Beat the mean: Se-quential selection by better than average rules.
J. Appl. Probab. Preater, J. (2000). Sequential selection with a better-than-average rule.
Statist.Probab. Lett. Robbins, H. and
Siegmund, D. (1971). A convergence theorem for nonnegative al-most supermartingales and some applications. In
Optimizing Methods in Statis-tics (Proc. Sympos., Ohio State Univ., Columbus, Ohio, 1971)
Resnick, S. I. (1987).
Extreme Values, Regular Variation, and Point Processes . Ap-plied Probability. A Series of the Applied Probability Trust . Springer, New York.MR900810 A. M. KriegerDepartment of StatisticsThe Wharton SchoolUniversity of PennsylvaniaPhiladelphia, Pennsylvania 19104USAE-mail: [email protected]