[PDF] Extreme(ly) mean(ingful): Sequential formation of a quality group

Abstract

The present paper studies the limiting behavior of the average score of a sequentially selected group of items or individuals, the underlying distribution of which, F , belongs to the Gumbel domain of attraction of extreme value distributions. This class contains the Normal, Lognormal, Gamma, Weibull and many other distributions. The selection rules are the "better than average" ( β=1 ) and the " β -better than average" rule, defined as follows. After the first item is selected, another item is admitted into the group if and only if its score is greater than β times the average score of those already selected. Denote by Y ¯ k the average of the k first selected items, and by T k the time it takes to amass them. Some of the key results obtained are: under mild conditions, for the better than average rule, Y ¯ k less a suitable chosen function of logk converges almost surely to a finite random variable. When 1−F(x)= e −[ x α +h(x)] , α>0 and h(x)/ x α ⟶ x→∞ 0 , then T k is of approximate order k 2 . When β>1 , the asymptotic results for Y ¯ k are of a completely different order of magnitude. Interestingly, for a class of distributions, T k , suitably normalized, asymptotically approaches 1, almost surely for relatively small β≥1 , in probability for moderate sized β and in distribution when β is large.

Full PDF

aa r X i v : . [ m a t h . P R ] N ov The Annals of Applied Probability (cid:13)

Institute of Mathematical Statistics, 2010

EXTREME(LY) MEAN(INGFUL): SEQUENTIAL FORMATION OFA QUALITY GROUP

By Abba M. Krieger , Moshe Pollak and Ester Samuel-Cahn University of Pennsylvania, Hebrew University and Hebrew University

The present paper studies the limiting behavior of the averagescore of a sequentially selected group of items or individuals, theunderlying distribution of which, F , belongs to the Gumbel domainof attraction of extreme value distributions. This class contains theNormal, Lognormal, Gamma, Weibull and many other distributions.The selection rules are the “better than average” ( β = 1) and the“ β -better than average” rule, deﬁned as follows. After the ﬁrst itemis selected, another item is admitted into the group if and only ifits score is greater than β times the average score of those alreadyselected. Denote by Y k the average of the k ﬁrst selected items, andby T k the time it takes to amass them. Some of the key results ob-tained are: under mild conditions, for the better than average rule, Y k less a suitable chosen function of log k converges almost surelyto a ﬁnite random variable. When 1 − F ( x ) = e − [ x α + h ( x )] , α > h ( x ) /x α x →∞ −→

0, then T k is of approximate order k . When β >

1, theasymptotic results for Y k are of a completely diﬀerent order of mag-nitude. Interestingly, for a class of distributions, T k , suitably normal-ized, asymptotically approaches 1, almost surely for relatively small β ≥

1, in probability for moderate sized β and in distribution when β is large.

1. Introduction and summary.

Individuals are observed sequentially. Theproblem of whether to accept an individual at the time that she is observedhas a rich literature. The most celebrated version is the “Secretary Prob-lem,” where the criterion is to select one individual and the objective isto maximize the probability that the best individual is chosen. This set-ting has been extended in various ways including selecting a limited number

Received May 2009; revised March 2010. Supported by funds from the Marcy Bogen Chair of Statistics at the Hebrew Universityof Jerusalem. Supported by the Israel Science Foundation Grant 467/04.

AMS 2000 subject classiﬁcations.

Primary 62G99; secondary 62F07, 60F15.

Key words and phrases.

Selection rules, averages, better than average, sequential ob-servations, asymptotics.

This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in

The Annals of Applied Probability ,2010, Vol. 20, No. 6, 2261–2294. This reprint diﬀers from the original inpagination and typographic detail. 1

A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN of individuals and basing the reward on the rank or score of individual(s)selected.Another extension that has received recent attention is to select a groupof “quality members.” This might occur when a team of highly qualiﬁedprofessionals is assembled, for example, in an academic department or aconsulting group in a specialized area. The goal is to ﬁnd good rules foreither accepting or rejecting each additional individual into the group at thetime that the individual is observed.One such rule that has been studied is to add a new member to thegroup only if this will not decrease the average quality of the group, termedin the literature as the “better than average selection rule.” This tacitlyassumes that “quality” is measurable. A generalization of the rule would beto only admit a new member whose score is say 5% higher than the currentaverage. We term the extended rules as “ β -better than average rules.” Theserules reduce to the better than average rule when β = 1, ﬁrst considered byPreater [5], but allows say for β = 1 .

05 to produce a group that is even moreprogressively selective than when β = 1.The assumption that is commonly made is that the quality of the individ-uals are mutually independent from a common distribution. As the horizon, n , tends to inﬁnity we study the asymptotic behavior of the average qualityof the group and the rate at which the group grows for the β -better thanaverage rules.The β -better than average rules are considered in Krieger, Pollak andSamuel-Cahn [4], and the present paper, which can be read independently,can be considered its natural continuation. Sequential selection of a “good”group, based only upon the relative ranks of the observations is consid-ered in Krieger, Pollak and Samuel-Cahn [3]. It should be noticed that therules considered here can be implemented without knowledge of the under-lying distribution, though their asymptotic behavior depends strongly onthat distribution. For convenience, we assume that the ﬁrst item is alwaysselected. However, all asymptotic results remain correct if the selection pro-cess is adopted only after a core group of members already exists. Also, therandom variable is assumed to be nonnegative (or the process begins withthe ﬁrst nonnegative observation), because negative averages multiplied by β > Y k , of the group, after k items have been retained, and T k , the time (in terms of the number ofobserved items) it takes to amass a group of size k . Our interest is in theasymptotics of these quantities, as k → ∞ . This paper, unlike [4], consid-ers F belonging to the extreme value domain of attraction of the Gumbeldistribution exp {− e − x } only. Write 1 − F ( x ) = exp {− H ( x ) } . Emphasis isgiven to a subset of these distributions, which are also “stretch exponential”distributions, where H ( x ) = x α + h ( x ), with h ( x ) /x α x →∞ −→

0, for all x > x , XTREME(LY) MEAN(INGFUL) for some x , where α >

0. This class includes the Gamma and Normal dis-tributions as particular cases.The “expected overshoot,” f ( x ) = E ( X − x | X > x ), plays an essentialrole. Our main ﬁndings are: for the “better than average” rule ( β = 1),under some mild conditions, the quantity Y k − G − (log k ) converges a.s. toa ﬁnite random variable where G ( x ) = R xx /f ( u ) du . These mild conditionsare satisﬁed in particular by the stretch exponential distributions with α ≥ G ( x ) and H ( x ) are close to each otherin that G ( x ) = (1 + o (1)) H ( x ). The convergence of Y k − G − (log k ) is shownin Section 3, where also the convergence of the sequence of expected valuesand variances of { Y k − G − (log k ) } is established. The behavior of Y k for β > β > Y k /k β − converges a.s. to a ﬁnite positive random variable.The behavior of T k is discussed in Section 5. It is shown that for stretchexponential random variables with α > β = 1 and every ε > T k /k − ε → ∞ a.s. as k → ∞ , while T k /k ε → α = 1, T k /k converges to a ﬁnite positive random variable. The “standardized” variable T ∗ k = T k . k − X j =1 [1 − F ( βY j )] − for β ≥ α >

0. For diﬀerent values of β we obtain diﬀerent asymp-totic behavior: we show that for 1 ≤ β < / α the random variable T ∗ k converges to 1 a.s. For 1 + 1 / α ≤ β < /α it converges to 1 in probabil-ity. For β > /α the random variable T ∗ k converges in distribution to anexponential mean one distribution, while for β = 1 + 1 /α the convergencein distribution is to a sum of conditionally independent exponential randomvariables. We conclude with Section 6, which contains further comments andremarks. Section 2 contains some preliminaries. Proofs are relegated to theAppendix in order to highlight the results in the paper.

2. Mathematical preliminaries.

The observations are denoted by X , X , . . . and are i.i.d. random variables from a common absolutely continuous dis-tribution F . We assume that 1 − F ( x ) > x < ∞ , unless statedotherwise.The behavior of rules will be characterized by considering two quantities: • T k = The number of observations inspected until the k th item is retained(including that item). • Y k = The average score of the ﬁrst k items that are retained.The β better than average rule is deﬁned as follows: for ﬁxed β (whichis suppressed in the notation) and T k deﬁned above as the number of items A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN observed until the k th item is selected, let T = 1 and Y = X . Deﬁne T k and Y k inductively by T k +1 = inf { i > T k : X i > βY k } , k = 1 , , . . . ,Y k +1 = X T k +1 , k = 1 , , . . . . It is clear that Y k increases in k for β = 1. If β > X i to avoid the situation that if Y k is negative then the cutoﬀ to retain anobservation becomes less stringent.2.1. Theorems on almost sure convergence.

In this subsection, ﬁrst wepresent two theorems, that exist in the literature, which will be useful inproving asymptotic results for the quantities of interest. First, we shall needthe following result, due to Robbins and Siegmund [6], quoted as follows:

Theorem 2.1.

Let (Ω , F , P ) be a probability space and F ⊂ F ⊂ · · · asequence of sub- σ -algebras of F . For each n = 1 , , . . . , let z n , β n , ξ n and ζ n be nonnegative F n -measurable random variables such that E ( z n |F n − ) ≤ z n − (1 + β n − ) + ξ n − − ζ n − . Then lim n →∞ z n exists and is ﬁnite and P ∞ n =1 ζ n < ∞ a.s. on { P ∞ n =1 β n < ∞ , P ∞ n =1 ξ n < ∞} . Corollary 2.1.

Let z n , β n , ξ n and ζ n be nonnegative sequences of con-stants such that P β n and P ξ n converge, and z n ≤ z n − (1 + β n − ) + ξ n − − ζ n − . Then lim n →∞ z n exists and is ﬁnite and P ∞ n =1 ζ n < ∞ . Proof.

This follows trivially from Theorem 2.1. (cid:3)

We also need the following theorem that appears in Feller [2], page 239.

Theorem 2.2.

Let Q , Q , . . . be independent r.v.s with E ( Q n ) = 0 , andlet S n = P ni =1 Q i . If:(1) b < b < · · · → ∞ are constants and(2) P ∞ n =1 E ( Q n /b n ) < ∞ then b − n S n → a.s. as n → ∞ . XTREME(LY) MEAN(INGFUL) Classes of distributions.

Preater [5] showed that when F is exponen-tial and β = 1, Y k − log k converges almost surely to a Gumbel distribution.Krieger, Pollak and Samuel-Cahn [4] extended this result in several ways.The asymptotic behavior of other quantities, such as T k , were obtained, val-ues of β > F , such as the Pareto and Beta, wereanalyzed.An interesting question is how the rules behave for other distributions F . This depends on the behavior of the overshoot, X − a | X > a , and itsexpectation f ( a ), f ( a ) := E ( X − a | X > a ) . (2.1)Let x F = sup { x : F ( x ) < } . Definition 2.1 (See [7], Section 1.1). A distribution function F , belong-ing to the domain of attraction of the Gumbel extreme value distributionΛ( x ) = exp {− e − x } , −∞ < x < ∞ , is called a Von Mises function (VM) ifthere exists x such that for x < x < x F and r > − F ( x ) = r exp (cid:26) − Z xx [1 /f A ( u )] du (cid:27) := e − H ( x ) , (2.2)where f A ( u ) > x < u < x F and f A is absolutely continuous on ( x , x F )with derivative f ′ A ( u ) and lim u ր x F f ′ A ( u ) = 0.Note that Deﬁnition 2.1 is identical to deﬁnition (1.3) in Section 1.1 of [7],except that we refer to the auxiliary function by f A instead of f to distin-guish it from the expected overshoot function. To link the two functions, wedeﬁne g ( u ) = f ( u ) /f A ( u ). The representation of a VM distribution function F in (2.2) is equivalent to1 − F ( x ) = r exp (cid:26) − Z xx [ g ( u ) /f ( u )] du (cid:27) := e − H ( x ) . (2.3)It is shown in [7] that a twice diﬀerentiable VM distribution satisﬁeslim x ր x F F ′′ ( x )(1 − F ( x ))[ F ′ ( x )] = − . (2.4)It can be shown that lim u ր x F g ( u ) = 1 follows from (2.4).Let G ( x ) be deﬁned by e − G ( x ) = r exp (cid:26) − Z xx [1 /f ( u )] du (cid:27) . (2.5)Thus, f ( x ) = 1 G ′ ( x ) . (2.6) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN

Note that ddx G − ( x ) = f ( G − ( x )) . (2.7)It is clear by this deﬁnition that H ( x ) = (1 + o (1)) G ( x ) as x ր x F .We shall consider only such VM for which x F = ∞ . (But see Remark 3.1.)Some of our results that will follow hold for VM distributions, but mostof the results pertain to a rich subclass. Speciﬁcally: Definition 2.2. F is a generalized stretched exponential distribution ifit is VM with H ( x ) = cx α + h ( x ), h ′′ ( x ) exists and c > α > x →∞ h ( x ) x α = 0(2.8)and lim x →∞ h ′ ( x ) x α − = 0 . (2.9)This class of distributions is denoted by G α . By change of variables y = c /α x it suﬃces in the sequel to consider only c = 1.The reason for extending the stretched exponential by adding h ( x ) isto include many of the classical families of distributions such as Normal,Gamma, Lognormal and Weibull. For example, the right-hand tail probabil-ity of the standard normal behaves like φ ( x ) /x by Mills’ ratio where φ ( x ) isthe standard normal density. Hence the standard normal belongs to G with h ( x ) = log( x ).

3. Average, when β = 1 . In this section we consider the behavior of Y k ,the average after k items are retained, using the better than average rule.The emphasis is on the random variables that are generated from a VMdistribution. In the ﬁrst subsection, we consider the almost sure behavior,and in the ensuing subsection, results for the expectation and variance of Y k are presented.Let Z k = Y k − Y k − , the “overshoot” over Y k − . The results are based onthe following relationship: Y k = ( k − Y k − + Y k k = Y k − + Z k k = Y k − + Z ( Y k − ) k , (3.1)where Z ( a ) is distributed like X − a | X > a .The results depend on the expected overshoot f ( a ) = E [ Z ( a )]. We shalluse the following lemma later, that gives the expected overshoot and squaredovershoot for F in G α for large values of a . Speciﬁcally: XTREME(LY) MEAN(INGFUL) Lemma 3.1.

If the underlying distribution is in G α , α > then lim a →∞ EZ ( a ) a − α /α = 1(3.2) and lim a →∞ EZ ( a )2 a − α ) /α = 1 . (3.3)The proof of the results uses l’Hˆopital’s rule on E ( Z ( a )) = R ∞ (1 − F Z ( a ) ( y )) dy for the expected overshoot and E ( Z ( a )) = 2 R ∞ y (1 − F Z ( a ) ( y )) dy .This result implies that f ( a ) = a − α /α [1 + o (1)]. In some instances weneed a more reﬁned result on the rate, that is the o (1) term, which dependson h ( x ). An easy case, as shown in the proof of Corollary 3.1, is when h ( x ) = 0, in which case the rate of o (1) is 1 /a α .In the more general case, we want to include h ( x ). The point of adding h ( x ) is to extend our results to known distributions such as the Normal. Therole that h ( x ) plays, is that it is small relative to x α .The following lemma provides a handle on the overshoot. Lemma 3.2. If F ∼ G α , then f ( a ) = 1 H ′ ( a ) (cid:18) O (cid:18) a α (cid:19)(cid:19) . Furthermore, if h ′ ( x ) / ( x α − ε − ) goes to , where < ε < α , we have that f ( a ) = a − α (cid:20) o (cid:18) a ε (cid:19)(cid:21). α. (3.4)These conditions on h ( x ) and its derivatives are hardly restrictive as theintent is for h ( x ) to be small. In particular, if h ( x ) is x γ for γ < α , then allof the above conditions hold.The details of the proofs throughout this and the remaining sections ofthe paper appear in the Appendix.3.1. Results on almost sure convergence of the mean.

The main resultin this subsection is that under mild conditions Y k − G − (log k ) convergesalmost surely to a ﬁnite random variable (Theorem 3.2). This is an extensionof the result in [5] that Y k − log k converges a.s. to a Gumbel distributionwhen observations are generated from an exponential distribution. Theorem3.1, which is simpler than Theorem 3.2, considers only the G α class of dis-tributions. This theorem standardizes Y k by dividing it by a function of k .Theorem 3.2, however, provides a stronger result, which for the G α class ofdistributions is applicable when α > A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN

Theorem 3.1.

If the underlying distribution function is in G α , where α > , and lim x →∞ h ′ ( x ) /x α − ε − = 0 for some ε > and lim x →∞ h ′′ ( x ) /x α − = 0 , then lim k →∞ Y k (log k ) /α = lim k →∞ Y k G − (log k ) = lim k →∞ Y k H − (log k ) = 1 a.s. (3.6)The proof considers S k = ( A k − where A k = Y k (log k ) /α . Theorem 2.1 isused to show that S k converges almost surely. We do not believe that thestrengthening of condition (2.9) by (3.5) is necessary for the conclusion tohold, though we use it in the proof. We know from Theorem 3.2 that it isnot needed for α > Y k − G − (log k ) converges a.s. to a ﬁnite random variable as k → ∞ . The conditions for this result are diﬀerent from those of Theorem3.1, but distributions in G α with α > h made in Theorem 3.1. Theorem 3.2.

Let β = 1 and F be a VM distribution. Then under con-ditions: (A) EZ ( a ) < a γ for some < γ < ∞ and all a > a and (B) f ′ ( a ) ≤ for all a ≥ a , for some a < ∞ Y k − G − (log k ) converges a.s. to a ﬁnite random variable as k → ∞ . The core of the proof is to show that [ Y k − G − (log k )] converges almostsurely, using Theorem 2.1.Conditions (A) and (B) are usually satisﬁed for F a VM distribution when G ( x ) increases fast enough. In particular they hold for F ∈ G α with α > α >

0) follows from Lemma 3.1. Condition(B) holds since here f ( x ) = 1 /G ′ ( x ) = (1 + o (1)) { αx α − [1 + h ′ ( x ) αx α − ] } − , sofrom (2.9) f ( x ) is eventually decreasing. The case α = 1 holds when h ( x )is increasing. If F has increasing failure rate (IFR), that is, satisﬁes “newbetter than used,” then condition (B) is satisﬁed. Remark 3.1. If x F < ∞ , it is easy to see that lim k →∞ [ Y k − G − (log k )] =0 a.s. An example F of a VM distribution with x F < ∞ is 1 − F ( x ) = e /x I ( x < XTREME(LY) MEAN(INGFUL) Corollary 3.1.

Let F ∼ G α with α ≥ and h ( x ) = 0 . Then Y k − log /α k converges a.s. to a ﬁnite random variable as k → ∞ . Remark 3.2.

The conclusion of Theorem 3.2 does not hold for all F ∈G α , α >

0, thus not for all VM, for example, H ( x ) = x / . We omit the proof.3.2. Results on convergence of moments.

Theorem 3.3.

If conditions (A) and (B) given in Theorem 3.2 holdthen there exist constants < b , b , b < ∞ such that [ EY k − G − (log k )] → b ,E [ Y k − G − (log k )] → b and hence Var[ Y k − G − (log k )] → b .

4. Average, when β > . In this section we consider the behavior of Y k under the more stringent condition that an observation is retained only if itexceeds β times the previous average, where β >

1. The main result is that Y k must be standardized by an entirely diﬀerent quantity, namely, k β − , inorder to get a.s. convergence. For F ∈ G α this standardization is correct forall α >

0. The result depends on the following relationship: Y k = Y k − + ( β − Y k − k + Z ( βY k − ) k . (4.1)The result concerns B k = Y k k β − . Let F k denote the σ -ﬁeld generated by Y , . . . , Y k . It follows by dividing both sides of (4.1) by k β − that E ( B k |F k − ) = B k − (cid:18) O (cid:18) k (cid:19)(cid:19) + E (cid:18) Z ( βY k − ) k β (cid:12)(cid:12)(cid:12)(cid:12) F k − (cid:19) (4.2) = B k − (cid:18) O (cid:18) k (cid:19)(cid:19) + f ( βY k − ) /k β . Hence if the expected overshoot is bounded, it follows from Theorem 2.1that B k converges almost surely. A more reﬁned result appears in the nextsubsection followed by remarks on special cases. The section ends with re-sults showing that under some conditions the expected value and varianceof B k also converge. A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN

Almost sure convergence of the mean.

We ﬁrst show that B k con-verges almost surely under more general conditions in the following: Theorem 4.1.

Assume F is a VM distribution. Let B k = Y k k β − and f ( x ) = E ( X − x | X > x ) . (i) If f ( x ) < cx (log x ) ε , where c > and ε > , then B k converges a.s. toa nondegenerate positive random variable. (ii) If B k converges a.s., f is monotone and lim k →∞ E ( B k ) < ∞ thenfor some constant x > , Z ∞ x f ( x ) x dx < ∞ . (4.3) Remark 4.1. (a) Note that the suﬃcient condition (i) of Theorem 4.1can hold also for distributions that are not VM. An example is the Geometricdistribution.(b) Equation (4.3) does not have a β in the expression. Also, under themore restrictive condition of bounded expected overshoot that was used tointroduce this section (which led to an easy proof of almost sure convergenceof the desired quantity), (4.3) holds.The following is a general statement about convergence of Y k for thestretched exponential family of distributions. Corollary 4.1.

Let F ∈ G α , α > , β > . Then there exists a randomvariable < W β < ∞ such that Y k ( β ) k β − k →∞ −→ W β a.s. Proof.

Since f ( x ) x − α /α → f ( x ) < cx − α for some constant c > x > x for suitable choice of x . Hence the condition in (i) ofTheorem 4.1 holds. (cid:3) There exist VM distributions for which B k fails to converge a.s. to a ﬁnitelimit. The proposition below provides a general result for when B k does notconverge to a ﬁnite limit a.s. Proposition 4.1.

Let Ψ( a ) be an increasing positive function of a suchthat Z ∞ x Ψ( x ) x dx = ∞ . (4.4) XTREME(LY) MEAN(INGFUL) Let B k = Y k k β − , Z ( a ) ∼ X − a | X ≥ a and deﬁne Z ∗ ( a ) = Z ( a ) / Ψ( a ) . If thereexists a constant a and a nonnegative random variable V , not identicallyzero, such that for all a ≥ a V is stochastically smaller than Z ∗ ( a ) , then B k → ∞ a.s. as k → ∞ . Example 4.1.

Let 1 − F X ( x ) = e − (log x ) / , which is easily seen to be aVM distribution. Let Ψ( a ) = a/ log( a ). We shall show that the conditions(and hence the conclusions) of Proposition 4.1 hold for this example. Proof of Proposition 4.1.

It is immediate that R ∞ a Ψ( x ) x dx = ∞ .Furthermore,1 − F Z ∗ ( a ) ( x ) = 1 − F X ( a + xa/ log a )1 − F X ( a ) = exp {− (log a + log(1 + x/ log a )) / } exp {− (log a ) / } = exp (cid:26) − (log a ) (cid:18) log (cid:18) x log a (cid:19)(cid:19) − (cid:18) log (cid:18) x log a (cid:19)(cid:19) (cid:27) > exp (cid:26) − x − x / a ) (cid:27) > exp {− x − x / } for all a > e . Hence, if V is such that 1 − F V ( x ) = e − x − x / x ≥

0) then V is stochastically smaller than Z ∗ ( a ) for all a > e . Furthermore,lim n →∞ n X k =1 k β Ψ( γk β − ) = n X k =1 k β γk β − log( γk β − )= γ n X k =1 [ k (log γ + ( β −

1) log k )] − = ∞ . (cid:3) Note that since here f ( x ) = [1 + o (1)] x/ log x , it follows that one cannottake ε = 0 in Theorem 4.1(i).4.2. Convergence of moments.

We now turn to showing that the expec-tation of Y k suitably normalized converges to a ﬁnite limit for all randomvariables that belong to the stretch exponential.We ﬁrst consider EB k and Var B k in a general setting. Theorem 4.2.

Under the following three conditions: (a) Var

X < ∞ ; (b) f ( a ) is nonincreasing for a > a ; (c) EZ ( a ) < ca for some c > and a > a ; EB k and Var B k converge to a ﬁnite limit. A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN

Remark 4.2.

Condition (a) always holds for nonnegative X with F aVM distribution (see Exercise 1.1.1(a) of [7]). Lemma 3.1 implies that (c)holds for any F ∈ G α with α ≥ /

2. Condition (b) holds for all F ∈ G α , α > X ∼ Exp(1).The above theorem does not apply for F ∈ G α , with α ≤

1. Nevertheless, EB k converges in this case as shown in the following theorem. Theorem 4.3.

Let B k = Y k k β − . If F ∈ G α , α > and β > , then EB k converges to a ﬁnite limit. Proof.

By Theorem 4.2 we need only consider the case α ≤

1. FromLemma 3.1 it follows that for some c and k large enough f ( βY k − ) < cY − αk − = c [( k − β − B k − ] − α = c ( k − ( β − − α ) B − αk − ≤ c ( k − ( β − − α ) [1 + B k − ] . Substituting this into (4.2) yields E ( B k |F k − ) ≤ B k − (cid:20) O (cid:18) k min(2 , β − α ) (cid:19)(cid:21) + O (cid:18) k β − α (cid:19) . (4.5)Taking expectations on both sides of (4.5) and using Corollary 2.1 yields theresult. (cid:3)

5. Time until k items are kept. Discussion of the problem.

In this section we turn to the secondquantity of interest, T k , the number of items that are observed until k itemsare retained. Unfortunately, it is generally impossible to normalize T k bya function of k and achieve almost sure convergence to a nondegeneraterandom variable. Instead we consider the following quantity: T ∗ k = T k P k − j =1 [1 − F ( βY j )] − , (5.1)which depends on the averages { Y j } , the expectation of which tends to 1.The results are obtained for the G α , α > α ≥ β . When β is relatively small, 1 ≤ β < α , then the conver-gence is almost sure to 1. When β is moderate in size, 1 + α ≤ β < α ,the convergence is to 1, in probability. Finally, if β is large, β ≥ α , theconvergence is in distribution to an exponential or a sum of conditionallyindependent exponential random variables with means summing up to 1. XTREME(LY) MEAN(INGFUL) Almost sure convergence, when β = 1 . Theorem 5.1.

Let β = 1 and X i ∼ F where F is G α , α > . Then T ∗ k = T k P k − j =1 [1 − F ( Y j )] − → almost surely . The proof uses Theorem 2.2 by conditioning on the responses { Y k } , letting P j = 1 − F ( Y j − ), b j = P ji =1 P − j and Q i = T i − T i − − P − i with T = 0.Though Theorem 5.1 gives no explicit order of magnitude of the conver-gence of T k , in terms of k , we get an idea of this magnitude in the following: Corollary 5.1.

For any δ > and F ∈ G α , α > , β = 1lim T k /k − δ = ∞ and lim T k /k δ = 0 a.s. For the exponential distribution T k k converges a.s. to a limit as shown in[4].5.3. Asymptotic results when β > . The focus is on T ∗ k , the number ofobservations that are observed until k items are retained suitably normalizedas deﬁned in (5.1).For the sake of clarity, we consider in the continuation only F ∈ G α , α > h ( x ) ≡

0, that is, H ( x ) = x α . Theorem 5.2.

Let X ∼ F where − F ( x ) = e − x α and α > . Then as k → ∞ : (i) T ∗ k a.s. −→ for < β < α , (ii) T ∗ k P −→ for α ≤ β < α , (iii) T ∗ k D −→ Exp(1) and T k e βαY αk − D −→ Exp(1) for β > α . The result for β = 1 + α is of a diﬀerent nature, and hence is treatedseparately in Theorem 5.3. To prove parts (ii) and (iii) of Theorem 5.2, wecompute the limiting generating function of T k , suitably standardized, andare able to recognize the distribution for which this limit is the generatingfunction. The results then follow from the Continuity theorem. This line ofreasoning is also used in proving Theorem 5.3.Note that for U ∼ Geo( p ), Ee − tU = 11 + (1 − e − t ) / ( pe − t ) . A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN

We ignore the ﬁrst observation which adds one to T k (this will have no eﬀecton the limiting distribution). Hence the resulting random part of T k (whichwe refer to as ˜ T k ) is the sum of, conditionally on { Y j } ∞ j =1 , independentgeometric random variables where p j = e − ( βY j − ) α . We have conditionallyon { Y j } E ( e − tγ ( k ) ˜ T k ) = k Y j =2 (cid:20) − e − tγ ( k ) e − ( βY j − ) α − tγ ( k ) (cid:21) − , where the sequence γ ( k ) is positive, and will be deﬁned as a function of thegiven { Y k } , according to the need in the proof for each particular instance,but always tends to 0. Thuslog Ee − tγ ( k ) ˜ T k = − k X j =2 log (cid:18) − e − tγ ( k ) tγ ( k ) tγ ( k ) e tγ ( k )+( βY j − ) α (cid:19) (5.2) = − k X j =2 log[1 + (1 + o k (1)) tγ ( k ) e ( βY j − ) α ] . For (ii) we let γ ( k ) = 1 / P k − j =1 e ( βY j ) α and for (iii) we let γ ( k ) = e − ( βY k − ) α .We now turn to the case where β = 1 + 1 /α , so that β − /α . Thisis the only case where conditioning on the sequence { Y k } plays a role inthe limiting distribution obtained. We know from Theorem 4.1 that thereexists a random variable W , 0 < W < ∞ , such that Y k /k /α k →∞ −→ W a.s.Our result will be stated in terms of the value of W . Theorem 5.3.

Let X ∼ F where − F ( x ) = e − x α , α > and β = 1 + α .Let W = lim Y k /k /α . Then T ∗ k D −→ ∞ X j =1 R j as k → ∞ , where, conditionally on W = w , the R j are independently, exponentially dis-tributed with mean µ j , where µ j = exp( βw ) α − j ( βw ) α ] . Note that the µ j sum to 1. XTREME(LY) MEAN(INGFUL)

6. Concluding remarks.

The present paper extends the results in [4]where the Exponential, Beta and Pareto distributions are considered in de-tail, to other distributions that include the Normal, Gamma and Weibull.The results on the special distributions considered in [4] are “invertible” inthe sense that rates of convergence for Y k and T k imply rates of convergencefor the number of items that are kept and the average of the items kept after n items are observed. The results obtained for the distributions consideredhere are in general not invertible in this way.Preater in [5] considered the behavior of the average of the ﬁrst k itemsthat are kept, Y k , when the distribution generating the observations is ex-ponential and β = 1 in the β better than average rule. He observed that Y k − log k converges a.s. and in L to a Gumbel distribution. The behaviorof this quantity for β > β = 1, Y k − log k con-verges a.s. When β > Y k /k β − converges a.s. In addition, the rate when β > Y k when β = 1 depends on the distribution.There are two interesting mathematical observations. First, it is not sur-prising that there should be some relationship between the domain of attrac-tion to which the extremal distribution of F belongs and the limiting distri-bution of Y k , since the Y k process will, on the average, select larger and largeritems. Preater in [5] shows that Y k − log k and max { X , . . . , X k } − log k havethe exact same limiting Gumbel distribution when the observations are i.i.d.from an exponential distribution (though Y k converges a.s. and in L whilethe maximum converges only in distribution). Will the limiting distributionof Y k , and M k = max { X , . . . , X k } always agree, or at least have the samerate of convergence? From the general theory of extreme values it followsthat 1 f ( H − (log k )) ( M k − H − (log k )) D −→ U = Gumbel as k → ∞ . (6.1)This should be compared with our result for β = 1 (under the appropriateconditions of Theorem 3.2), Y k − G − (log k ) a . s . −→ some ﬁnite random variable as k → ∞ . (6.2)The “normalization” is the same in (6.1) and (6.2) if and only if f ( x ) ≡ X is exponential.The second interesting mathematical observation is that for the Beta andPareto distributions, discussed in [4], we get the same kind of a.s. conver-gence for T k , after normalization (depending on β ) for all β ≥

1. In thefamilies of distributions considered in the present paper, diﬀerent kinds ofasymptotic convergence hold for diﬀerent values of β . Speciﬁcally, when β is relatively small, the normalized quantity converges almost surely. When β is in the middle range, the convergence is in probability. For large valuesof β the convergence is in distribution. A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN

APPENDIX

A.1. Proofs for Section 3.

Proof of Lemma 3.1.

Consider (3.3). For any nonnegative randomvariable Q , EQ = 2 R ∞ y (1 − F Q ( y )) dy . Thus EZ ( a ) = 2 R ∞ a ( x − a ) e − H ( x ) dxe − H ( a ) . So lim a →∞ EZ ( a )2 a δ = lim a →∞ R ∞ a ( x − a ) e − ( x α + h ( x )) dxa δ e − ( a α + h ( a ))l’Hˆopital = lim a →∞ − R ∞ a e − ( x α + h ( x )) dxe − ( a α + h ( a )) { δa δ − − [ αa α − + h ′ ( a )] a δ } (A.1) = lim a →∞ R ∞ a e − ( x α + h ( x )) dxe − ( a α + h ( a )) αa α + δ − [1 + h ′ ( a ) / ( αa α − )]= lim a →∞ R ∞ a e − ( x α + h ( x )) dxe − ( a α + h ( a )) αa α + δ − by (2.9). Using l’Hˆopital’s rule once more we get that the value in (A.1)equals lim a →∞ − e − ( a α + h ( a )) αe − ( a α + h ( a )) { ( α + δ − a α + δ − − [ αa α − + h ′ ( a )] a α + δ − } = lim a →∞ α a α − δ . Thus if we take δ = 2(1 − α ) the above limit is 1 /α and (3.3) follows.The proof for E ( Z ( a )) = R ∞ a e − H ( x ) dx/e − H ( a ) follows in a similar manner. (cid:3) Proof of Lemma 3.2.

Through integration by parts f ( a ) = R ∞ a e − H ( x ) dxe − H ( a ) = R ∞ a e − H ( x ) H ′ ( x ) /H ′ ( x ) dxe − H ( a ) = [ − e − H ( x ) /H ′ ( x )] ∞ a e − H ( a ) − R ∞ a e − H ( x ) H ′′ ( x ) / ( H ′ ( x )) dxe − H ( a ) . Note that H ′′ ( x ) / ( H ′ ( x )) tends to 0 for a VM distribution. Now to get therate, consider H ( x ) = x α + h ( x ) where lim x →∞ h ′ ( x ) /x α − = 0 by (2.9), and XTREME(LY) MEAN(INGFUL) assume lim x →∞ h ′′ ( x ) /x α − = 0. This implies that H ′′ ( x )( H ′ ( x )) = O ( x α ). Sincethe ﬁrst term is 1 /H ′ ( a ), using l’Hˆopital’s rule on the second term yields f ( a ) = 1 H ′ ( a ) (cid:18) O (cid:18) a α (cid:19)(cid:19) . Finally, to get the rate at which f ( a ) / ( a − α /α ) goes to 1, we need the rateat which h ′ ( x ) /x α − goes to 0. If we assume that h ′ ( x ) / ( x α − ε − ) goes to 0,where 0 < ε < α , we have (3.4). (cid:3) Proof of Theorem 3.1.

Let A k = Y k (log k ) /α and S k = ( A k − . Notethat log( k − k = 1 − k log k + O ( k ), thus (cid:18) log( k − k (cid:19) /α = 1 − αk log k + O (cid:18) k (cid:19) . Hence, S k = (cid:18) A k − (cid:18) log( k − k (cid:19) /α + Z k k (log k ) /α − (cid:19) = (cid:20) ( A k − − − A k − (cid:18) αk log k + O (cid:18) k (cid:19)(cid:19) + Z k k (log k ) /α (cid:21) = ( A k − − + A k − (cid:18) α k (log k ) + O (cid:18) k (cid:19)(cid:19) + Z k k (log k ) /α + 2( A k − − (cid:20) Z k k (log k ) /α − A k − (cid:18) αk log k + O (cid:18) k (cid:19)(cid:19)(cid:21) − Z k k (log k ) /α A k − (cid:18) αk log k + O (cid:18) k (cid:19)(cid:19) . Taking conditional expectations on both sides, using (2.1), we therefore get E ( S k |F k − ) ≤ S k − + 2( S k − + 1) (cid:18) αk (log k ) + O (cid:18) k (cid:19)(cid:19) (A.2) + E ( Z ( Y k − ) |F k − ) k (log k ) /α + 2( A k − − (cid:20) f ( Y k − ) k (log k ) /α − A k − (cid:18) αk log k + O (cid:18) k (cid:19)(cid:19)(cid:21) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN since A k − ≤ S k − + 1). The ﬁrst two terms in (A.2) therefore cause noproblem in the application of Theorem 2.1 to S k . By Lemma 3.1, for all k suﬃciently large E ( Z ( Y k − ) |F k − ) k (log k ) /α < ε ) α Y − α ) k − k (log k ) /α = 2(1 + ε ) α A − α ) k − (log( k − /α ) − α ) k (log k ) /α < ε ) α A − α ) k − k (log k ) < ε ) α ( A k − + 1) k (log k ) < ε ) α (cid:20) S k − k (log k ) + 3 k (log k ) (cid:21) , so the second term in the last expression is summable, and again factoringout S k − the ﬁrst term is also summable.It remains to deal with the last term in (A.2). From (3.4), f ( Y k − ) = Y − αk − [1 + o (1 /Y εk − )] α . Thus the ﬁrst term in the square brackets in (A.2) satisﬁes f ( Y k − ) k log k /α = Y − αk − [1 + o (1 /Y εk − )] αk (log k ) /α = A − αk − [(log( k − /α ] − α [1 + o (1 / ( A k − (log( k − /α ) ε )] αk (log k ) /α = A − αk − [1 + o ( A − εk − / (log k ) ω )] αk log k , where ω = ε/α . The last line in (A.2) can therefore be rewritten as − A k − − A k − [1 − A − αk − + O (log k/k ) + A − αk − o ( A − εk − / (log k ) ω )] αk log k . (A.3)We want to study when (A.3) is positive for large k . This depends on theterm in brackets, which to simplify notation we denote by R ( x ), where x = A k − , and the dependence of R ( x ) on k is implicit. Note that for k suﬃcientlylarge O (log k/k ) < δ k ≡ / (log k ) ω . Also note that ν k ≡ A − εk − δ k < Y − εk − → k → ∞ and that ν k < x − ε δ k if A k − > x >

0. Hence when k is suﬃcientlylarge, R ( x ) ≤ R ( x ) ≤ R ( x ) where R ( x ) = 1 − x − α − δ k − ν k x − α (A.4) XTREME(LY) MEAN(INGFUL) and R ( x ) = 1 − x − α + δ k + ν k x − α . (A.5)The aim is to show that (A.3) is positive only when 1 − cδ k ≤ x ≤ cδ k for a suitably chosen constant 0 < c < ∞ . We consider two cases:(i) Assume x >

1. Then (A.3) is positive for the values of x such that R ( x ) <

0. Since R ( x ) ≤ R ( x ), the values of x such that R ( x ) <

0, or, equiv-alently, the values of x such that x α R ( x ) < x suchthat R ( x ) <

0. It suﬃces to consider x α − − δ k x α − ν k < . (A.6)The set of x such that (A.6) holds is equivalent to the set of x such that x < (cid:18) ν k − δ k (cid:19) /α < c δ k for k large for a suitably chosen constant 0 < c < ∞ .(ii) Assume x <

1. Then (A.3) is positive for the values of x such that R ( x ) >

0. Since R ( x ) ≥ R ( x ), the values of x such that R ( x ) >

0, or, equiv-alently, the values of x such that x α R ( x ) > x suchthat R ( x ) >

0. Hence, we want to consider when x α + δ k x α + ν k > . (A.7)Since δ k and ν k are arbitrarily small for k suﬃciently large, there exists x > x > x . Therefore,(A.7) is equivalent to x > (cid:18) − ν k δ k (cid:19) /α > − c δ k (A.8)for k suﬃciently large for a suitable chosen constant 0 < c < ∞ .The above analysis shows that (A.3) can be bounded from above by zerowhen A k − is outside the interval c ± δ k . When it is inside, (A.3) is boundedby O (1 / [ k (log k ) ω ]). Hence (A.3) is summable. Thus S k converges a.s. byTheorem 2.1. If S k converges to a value diﬀerent from 0 this would lead toa contradiction, as the sum of the terms in (A.3) would go to minus inﬁnity,while S k is nonnegative. Hence A k tends to 1 a.s.Note that when H ( x ) = x α + h ( x ), and (2.8) holds, then necessarily H − ( x ) = x /α + h ∗ ( x ) where h ∗ ( x ) x /α x →∞ −→

0. Thus H − (log k )(log k ) /α k →∞ −→ G − ( x ) = H − ( x )[1 + o ( x )], also the middle termin (3.6) converges a.s. to 1. (cid:3) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN

Proof of Theorem 3.2.

The ﬁrst step in the proof is to show that( Y k − G − (log k )) converges a.s. to a ﬁnite random variable as k → ∞ .Since Y k → ∞ , there will be a (possibly random) k such that for all k > k , everything written below holds. Consider k > k only. Let c k = G − (log k ). Then, by (2.7) and the boundedness of f , c k − c k − = (log k − log( k − G − ( u k )] ′ = − log (cid:18) − k (cid:19) f ( G − ( u k ))(A.9) = f ( G − ( u k )) k + O (cid:18) k (cid:19) , where the O ( k ) term is positive andlog( k − ≤ u k ≤ log k. (A.10)Note that the last equality in (A.9) follows since f is bounded by condition(B). Now write( Y k − c k ) = (cid:20) ( Y k − − c k − ) + Z ( Y k − ) k + ( c k − − c k ) (cid:21) = ( Y k − − c k − ) + Z ( Y k − ) k + ( c k − − c k ) + 2 Z ( Y k − ) k ( c k − − c k )+ 2( Y k − − c k − ) (cid:20) Z ( Y k − ) k + ( c k − − c k ) (cid:21) . Taking conditional expectation, conditional on F k − , yields E [( Y k − c k ) |F k − ] = ( Y k − − c k − ) | {z } (i) + E [ Z ( Y k − ) |F k − ] k | {z } (ii) + ( c k − − c k ) | {z } (iii) + 2 f ( Y k − ) k ( c k − − c k ) | {z } (iv) (A.11) + 2( Y k − − c k − ) (cid:20) f ( Y k − ) k + ( c k − − c k ) (cid:21)| {z } (v) . We shall show that the conditions for Theorem 2.1 hold. We shall examineeach term in (A.11) separately. We ﬁrst show that for any ω > Y k /k ω → k → ∞ a.s . (A.12) XTREME(LY) MEAN(INGFUL) Let W k ( ω ) = Y k k ω . Then clearly W k ( ω ) > E [ W k ( ω ) |F k − ] = (cid:18) k − k (cid:19) ω W k − ( ω ) + f ( Y k − ) k ω +1 < W k − ( ω ) + Bk ω +1 , since f is bounded (where we have denoted its bound by B ). It followsthat W k ( ω ) converges a.s. to a ﬁnite limit, L ( ω ) ≥

0. Then also W k ( ω/ → L ( ω/

2) a.s. But W k ( ω ) = W k ( ω/ /k ω/ , thus the limit must be 0 for all ω .Now consider term (ii) of (A.11). By condition (A) and (A.12), for all k suﬃciently large E [ Z ( Y k − ) |F k − ] k < Y γk − k < εk ωγ k a.s.(A.13)Choose ω < γ and write 1 − ωγ = δ . The rightmost expression in (A.13) isthen ε/k δ , which clearly is summable.Term (iii) is summable by (A.9) and the boundedness of f .Term (iv) is negative, and hence causes no problem.Term (v): note ﬁrst that by (A.9) f ( Y k − ) k + ( c k − − c k ) = f ( Y k − ) − f ( G − ( u k )) k + O (cid:18) k (cid:19) (A.14) = ( Y k − − G − ( u k )) k f ′ ( d k ) + O (cid:18) k (cid:19) , where d k is a value between Y k − and G − ( u k ). Since G − is increasing, itfollows from (A.10) that c k − ≤ G − ( u k ) ≤ c k . (A.15)Consider two cases:(a) Y k − − c k − ≤

0. Then by (A.15) also Y k − − G − ( u k ) ≤ O (1 /k ) term is positive.(b) Y k − − c k − >

0. If also Y k − − G − ( u k ) ≥

0, the previous argumentgoes through, except that we still must show that ( Y k − − c k − ) I ( Y k − − c k − > /k is summable. Now write ( Y k − − c k − ) I ( Y k − − c k − > < ( Y k − − c k − ) + 1. Thus( Y k − − c k − ) I ( Y k − − c k − > /k ≤ ( Y k − − c k − ) k + 1 k . (A.16)The ﬁrst term on the right-hand side of (A.16) can be combined with (i) in(A.11), and the second is clearly summable.Now suppose Y k − − G − ( u k ) < < Y k − − c k − . Then c k − < Y k − < G − ( u k ) < c k . (A.17) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN

Since both | Y k − c k − | and | Y k − G − ( u k ) | are less than c k − c k − , it followsfrom (A.9) that (v) is summable.It follows that in all cases we can write E [( Y k − c k ) |F k − ] ≤ ( Y k − − c k − ) (1 + B k − ) + D k − − V k − , where B k , D k and V k are nonnegativerandom variables, and B k and D k are summable. Thus by Theorem 2.1( Y k − c k ) k →∞ −→ W a.s. , (A.18)where 0 ≤ W < ∞ is a random variable. Thus | Y k − c k | → k →∞ √ W a.s.It remains to show that when W = 0, Y k − c k cannot jump between √ W and −√ W an inﬁnite number of times. It will then follow that the limitexists and is either √ W or −√ W . Recall that Y k − Y k − = Z k k , and that by(A.9) 0 < c k − c k − < γk , for some γ > P { Y k − Y k − > ε } = P (cid:26) Z k k > ε (cid:27) = P (cid:26) Z k k > ε (cid:27) ≤ Cε k δ . Thus by the Borel–Cantelli lemma P { Y k − Y k − > ε inﬁnitely often } = 0.This implies P {| ( Y k − c k ) − ( Y k − − c k − ) | > ε inﬁnitely often } = 0. Thus, if √ W > ε , Y k − c k cannot jump between √ W and −√ W an inﬁnite number oftimes, that is, Y − c k will converge a.s. to √ W or −√ W . Since for W > ε > W − ε >

0, it followsthat Y k − c k converges. Clearly on the set where { W = 0 } the statement( Y k − c k ) → Y k − c k → (cid:3) Proof of Corollary 3.1.

The expected overshoot given

X > a is f ( a ) = R ∞ a e − x α dxe − a α = 1 α R ∞ a α y /α − e − y dye − a α . The right-hand side follows by change of variables to y = x α . But in Abramowitzand Stegun [1], page 263, R ∞ x t ν − e − t dte − x = x ν − (cid:20) ν − x + O (cid:18) x (cid:19)(cid:21) as x → ∞ . This implies that f ( a ) = 1 α a − α (cid:20) /α − a α + O (cid:18) a α (cid:19)(cid:21) as a → ∞ . Equation (2.6) implies G ′ ( a ) = 1 f ( a ) = αa α − (cid:20) − /αa α + O (cid:18) a α (cid:19)(cid:21) as a → ∞ . Integrating both sides results in G ( a ) = a α + ( α −

1) log a + O (1) for large a since the remainder term O ( a − ( α +1) ) has ﬁnite integral. XTREME(LY) MEAN(INGFUL) Now for any ε >

0, if a is suﬃciently large, then G ( a /α + ε ) = ( a /α + ε ) α +( α −

1) log( a /α + ε ) + O (1). But ( a /α + ε ) α = a { εa /α } α = a + αεa − /α +smaller order terms. Hence if α > a is suﬃciently large, then G ( a /α + ε ) > a . A similar argument shows that if α > a is suﬃciently large,then G ( a /α − ε ) < a . Therefore lim a →∞ [ G − ( a ) − a /α ] = 0. (cid:3) Proof of Theorem 3.3.

Upon taking expectations on both sides of(A.11) we obtain E [( Y k − c k ) ] = E [( Y k − − c k − ) | {z } (i) ] + E (cid:18) E [ Z ( Y k − ) |F k − ] k | {z } (ii) (cid:19) + ( c k − − c k ) | {z } (iii) + 2 E [ f ( Y k − )] k ( c k − − c k ) | {z } (iv) (A.19) + 2 E (cid:18) ( Y k − − c k − ) (cid:20) f ( Y k − ) k + ( c k − − c k ) (cid:21)(cid:19)| {z } (v) . All we need to do is subtract E [( Y k − − c k − ) ] on both sides and sum. If aterm remaining on the right-hand side is positive then we need to show thatit is summable. If a term is negative it must be summable as the term on theleft-hand side is nonnegative. Hence we see that terms (ii), (iii) and (iv) causeno trouble. The only term of concern is (v). But the expectation (integralover the density of Y k ) can be divided into an integral over three regions:(i) Y k − ≤ c k − , (ii) Y k − ≥ G − ( u k ) and (iii) c k − < Y k − < G − ( u k ). Asin the proof of Theorem 3.2, the integrand for regions (i) and (ii) is negativeand over the third region it is positive, but can be dealt with in the sameway as in the proof of Theorem 3.2 by use of Corollary 2.1. The last twostatements of the theorem follow. (cid:3) A.2. Proofs for Section 4.

Proof of Theorem 4.1.

Proof of (i). E ( B k |F k − ) = B k − (cid:20)(cid:18) k − k (cid:19) β − (cid:18) β − k (cid:19)(cid:21) + f ( βY k − ) k β . (A.20)Thus E ( B k |F k − ) = B k − (cid:18) O (cid:18) k (cid:19)(cid:19) + f ( β ( k − β − B k − ) k β , (A.21) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN where O ( k ) >

0. Thus E ( B k |F k − ) > B k − (1 + O ( k )) which implies that B k converges, to a ﬁnite or inﬁnite limit.Suppose ﬁrst that the limit is inﬁnite. Then there exist k and D > /β such that for all k > k , B k − > D . But then from (i), for k > k f ( β ( k − β − B k − ) k β < B k − cβ ( k − β − k β [log( βB k − ) + ( β −

1) log( k − ε < B k − cβk [log( βD ) + ( β −

1) log( k − ε < B k − cβ ( β − ε k [log( k − ε . But the term multiplying B k − on the right is summable, which implies that(A.21) satisﬁes the condition of Theorem 2.1, and hence B k converges to aﬁnite limit. This contradiction implies that B k converges to a ﬁnite r.v. a.s. Proof of (ii). Now suppose that B k converges a.s. and lim EB k < ∞ . Itfollows from (4.1) and as in (A.20) that B k can be written as B k = B k − (cid:20) O (cid:18) k (cid:19)(cid:21) + Z ( βY k − ) k β , where O (1 /k ) is positive. It follows that B k > B k − , so that the limit ispositive. Since the support of the observations is not bounded, in a diﬀerentrealization one could obtain a higher value. Hence the limit is a nondegen-erate positive random variable. Set B = 0. Then B k = k X j =1 ( B j − B j − ) = O (1) k X j =1 B j − j + k X j =2 Z ( βY j − ) j β . (A.22)Since B k converges a.s. lim B k exists and is ﬁnite a.s. Taking expecta-tions and limits as k → ∞ on both sides of (A.22) and noting that EB k isassumed to be bounded implies that P ∞ j =1 Z ( βY j − ) j β < ∞ . This in turn im-plies P ∞ j =1 f ( βY j − ) j β < ∞ . Since B k converges a.s. to a random variable W β for 0 < ε < W β and a (random) k , we have for all k − > k , ( W β − ε )( k − β − < Y k − < ( W β + ε )( k − β − . If f is increasing ∞ > ∞ X k = k f ( βY k − ) k β > ∞ X k = k f ( β ( W β − ε )( k − β − ) k β (A.23) > (cid:18) (cid:19) β ∞ X k = k f ( A ( k − β − )( k − β , XTREME(LY) MEAN(INGFUL) where the inequality follows since ( k − k ) β > ( ) β , and where we have written A = β ( W β − ε ).Finally, ∞ X k = k f ( A ( k − β − )( k − β > Z ∞ k − x + 1) β f ( Ax β − ) dx> (cid:18) k − k − (cid:19) β Z ∞ k − f ( Ax β − ) x β dx. By change of variable to y = Ax β − the integral on the right-hand sidebecomes Aβ − R ∞ A ( k − β − f ( y ) y dy . This integral is therefore ﬁnite by (A.23). (cid:3) Proof of Proposition 4.1.

In a manner similar to the end of theproof in Theorem 4.1, it can be shown that if R ∞ C Ψ( y ) y dy diverges, thenlim n →∞ P nk = k k β Ψ( γk β − ) also diverges.Note that B k = B k − (cid:20) O (cid:18) k (cid:19)(cid:21) + Z ∗ ( βY k − ) k β Ψ( βY k − ) . (A.24)Let F k be the c.d.f. of Z ∗ ( βY k − ) conditional on Y k − . Let F V be the c.d.f.of V . Let U , U , . . . ∼ U [0 ,

1] i.i.d. Deﬁne V k = F − V ( U k ) (so V i are i.i.d. withc.d.f. F V ). Clearly, V k ≤ F − k ( U k ) conditional on Y k − once βY k − ≥ a (which will happen with probability 1).It follows that one can imbed the sequence Y , Y , . . . in a probability spacewhere V , V , . . . are i.i.d. with c.d.f. F V and V i ≤ Z ∗ ( βY k − ) for all i such that βY i − ≥ a conditional on Y k − . Deﬁne V ∗ i = c V i > c ) for some c such that P ( V i >c ) >

0. Clearly, V ∗ i ≤ Z ∗ ( βY k − ). Note that V ∗ i is c times a Bernoulli randomvariable. Now Z ∗ ( βY k − ) k β Ψ( βY k − ) ≥ V ∗ k Ψ( βY k − ) k β . (A.25)Recall that Y k ≥ β − kk Y k − so that for a constant a that is independent of Y and k , Y k > Y k Y j =2 β − jj ≥ Y a k β − . (A.26) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN

Hence (for k such that βY k − ≥ a and a a constant) since Ψ( a ) increasesin a Z ∗ ( βY k − ) k β Ψ( βY k − ) ≥ V ∗ k Ψ( βY a ( k − β − ) k β . (A.27)Finally, condition on Y and denote c k = Ψ( βY a ( k − β − ) k β . By (4.4) and what we showed above, lim n →∞ P nk =1 c k = ∞ . It is a straight-forward application of Kolmogorov’s three-series theorem (cf. Feller [2], page317) that lim n →∞ n X k =1 V ∗ k c k = ∞ a.s.(A.28)Putting (A.27) and (A.28) together obtains that Z ∗ ( βY k − ) k β Ψ( βY k − ) is notsummable. This and (A.24) imply that lim k →∞ B k = ∞ a.s. (cid:3) Proof of Theorem 4.2. EB k converges to a ﬁnite limit by (4.2) since f is bounded by assumption (b)Var B k = Var (cid:18) k − βk β Y k − + Z ( βY k − ) k β (cid:19) = (cid:20) ( k − β ) ( k − β − ) k β (cid:21) Var B k − + Var( Z ( βY k − )) k β (A.29) + 2 k − βk β Cov( Y k − , Z ( βY k − )) . We shall treat each of the three terms in (A.29) separately.(i) It is easily seen (by taking log) that the value in the square bracketis 1 + O ( k ).(ii) From condition (c) and the convergence of EB k to a ﬁnite limitVar( Z ( βY k − )) k β < EZ ( βY k − ) k β < cβEY k − k β < cβ (lim EB k + ε ) k β +1 . Thus the second term in the right-hand side of (A.29) is summable.(iii) We now show that the third term in the right-hand side of (A.29) isnegative or 0:Cov( Y k − , Z ( βY k − ))= E ( Y k − Z ( βY k − )) − E ( Y k − ) E ( Z ( βY k − )) XTREME(LY) MEAN(INGFUL) = E [ Y k − E ( Z ( βY k − ) |F k − )] − E ( Y k − ) E [ E ( Z ( βY k − ) |F k − )]= 1 β E [ βY k − f ( βY k − )] − β E ( βY k − ) Ef ( βY k − )= 1 β Cov( βY k − , f ( βY k − )) ≤ , where the last inequality follows from (b). It follows that (A.29) satisﬁes thecondition in Corollary 2.1 with z n = Var B n , and the result follows. (cid:3) A.3. Proofs for Section 5.

Proof of Theorem 5.1.

Let P j = 1 − F ( Y j − ). We shall use Theorem2.2 conditionally on the sequence { Y k } . Let b j = P ji =1 P − i and Q i = T i − T i − − P − i with T ≡

0. Obviously, the sequence { b j } ∞ j =1 satisﬁes the ﬁrstcondition of Theorem 2.2.Note that conditional on the sequence { P j } the distribution of T i − T i − isGeometric ( P i ) and these diﬀerences are conditionally independent of eachother. Hence { Q n } ∞ n =1 is a sequence of conditionally independent randomvariables with zero expectation and variance (1 − P n ) /P n . We shall showthat the second condition of Theorem 2.2 holds ∞ X n =1 E ( Q n /b n ) = ∞ X n =1 − P n P n (cid:30) n X j =1 P − j ! < ∞ X n =1 P n (cid:30) n X j =1 P − j ! . It therefore suﬃces to show that for all n ≥ n n X j =0 P n +1 P j +1 ≥ An / log n (A.30)for some A >

0. We shall actually show that for any 0 < ε < / j such that for all n ≥ j ≥ j P n +1 P j +1 > j − ε n ε . (A.31)From (A.31) it is immediate that (A.30) holds, since n X j =0 P n +1 P j +1 > n X j = j P n +1 P j +1 ≥ n ε n X j = j j − ε ≥ D n − ε − j − ε n ε > An − ε . Note that for H ( x ) = x α + h ( x ) for α > h that satisﬁes (2.8), we haveby Theorem 3.1, Y j = (log j ) /α (1 + ε j ) with ε j j →∞ −→

0. Thus H ( Y j ) = (log j )(1 + ε j ) α (cid:20) h ((log j ) /α (1 + ε j ))(log j )(1 + ε j ) α (cid:21) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN and since h ( x ) /x α x →∞ −→ ε > j suchthat for all j > j (1 + ε ) log j > H ( Y j ) > (1 − ε ) log j, which implies, since [1 − F ( Y j )] − = exp H ( Y j ), that j ε > [1 − F ( Y j )] − > j − ε . (A.32)Thus (A.31) follows.Note that here S n of Theorem 2.2 equals T n − P ni =1 P − i , thus b − n S n → T ∗ n − → { Y k } , it holds unconditionally. (cid:3) Proof of Corollary 5.1.

In (A.32) take any ε >

0. Hence for somepositive constants c , c , c ∗ , c ∗ and all k large enough c ∗ k ε > c k X j =1 j ε > k X j =1 [1 − F ( Y j )] − > c k X j =1 j − ε > c ∗ k − ε . Since T k P kj =1 [1 − F ( Y j )] − → k large enough and c ∗∗ a positive constant T k k − δ = T k P kj =1 [1 − F ( Y j )] − P kj =1 [1 − F ( Y j )] − k − δ > c ∗∗ k δ − ε → ∞ a.s. if δ > ε. The proof for T k k δ follows in a similar manner. (cid:3) Proof of Theorem 5.2.Proof of (i) . We shall (again) use Theorem 2.2 and show (A.30) where P j = 1 − F ( βY j − ). Assume that k (random) is such that for all k ≥ k , Z k < γY k − . Such a k exists with probability one by Lemma A.1. Then for k > k , Y k = Y k − + Z k + ( β − Y k − k ≤ Y k − (cid:18) γ + β − k (cid:19) . Thus Y αk ≤ Y αk − (cid:18) γ + β − k (cid:19) α ≤ Y αk − (cid:18) dk (cid:19) , (A.33)where d = ( γ + β − ρ uα and ρ uα (and for later purposes ρ lα ) is deﬁned by1 + ρ lα x ≤ (1 + x ) α ≤ ρ uα x for 0 ≤ x ≤ , (A.34) XTREME(LY) MEAN(INGFUL) where ρ lα = α and ρ uα = 2 α − α ≥

1, while ρ lα = 2 α − ρ uα = α when α <

1. We have used the inequality (1 + x ) α ≤ α − x , valid forall α ≥ ≤ x ≤

1. We can therefore write, using (A.33), P k +1 P k = exp {− β α ( Y αk − Y αk − ) } ≥ exp (cid:26) − β α dk Y αk − (cid:27) . (A.35)Now let k ≥ k be so large that for all k ≥ k , Y k < ( W + ε ) k β − , whichexists, by Theorem 4.1. Then we can continue the inequality in (A.35), by P k +1 P k > exp (cid:26) − β α dk ( W + ε ) α k α ( β − (cid:27) = exp {− Bk α ( β − − } . To simplify notation let τ = α ( β − − , (A.36)thus τ > −

1. For j > k > k , we have P n +1 P j +1 = n Y k = j +1 P k +1 P k > exp ( − B n X k = j +1 k τ ) > exp (cid:26) − Bτ + 1 ( n τ +1 − ( j + 1) τ +1 ) (cid:27) . Thus n X j =1 P n +1 P j +1 > n X j = k P n +1 P j +1 > e − [ B/ ( τ +1)] n τ +1 n X j = k e [ B/ ( τ +1)]( j +1) τ +1 . But n X j = k e [ B/ ( τ +1)]( j +1) τ +1 > Z nk +1 e [ B/ ( τ +1)] x τ +1 dx, thus n X j =1 P n +1 P j +1 > Z nk +1 e [ B/ ( τ +1)] x τ +1 dx/e [ B/ ( τ +1)] n τ +1 . (A.37)We would like the right-hand side of (A.37), divided by n / ε for some(small) ε >

0, to tend to a nonzero limit in order for (A.30) to hold. Thusconsider, by use of l’Hˆopital’s rule, the limit as y → ∞ of q ( y ) = R yk +1 e Ax τ +1 dxy δ e Ay τ +1 where A > , lim y →∞ q ( y ) = lim y →∞ e Ay τ +1 e Ay τ +1 ( δy δ − + A ( τ + 1) y τ + δ ) , which is ﬁnite when τ + δ = 0 and tends to ∞ when τ + δ <

0. Now for δ = 1 /

2, by (A.36) we get a ﬁnite limit when α ( β − − / A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN β = 1 + 1 / (2 α ). Thus for β < / (2 α ) there will exist an ε > P n +1 j =0 P n +1 P j +1 > n / ε , and the result (i) follows. Proof of (ii). Let γ ( k ) = 1 P k − j =1 e ( βY j ) α (A.38)in (5.2). Then clearly γ ( k ) →

0. We shall show later that [(1 + o k (1)) tγ ( k ) × e ( βY j − ) α ] of (5.2) is arbitrarily close to 0 for 2 ≤ j ≤ k for all suﬃcientlylarge k and β < /α . It suﬃces to show this for j = k . We can then write,using (5.2) and (A.38), − (1 + ε ) t = − (1 + ε ) k X j =2 tγ ( k ) e ( βY j − ) α < log Ee − tγ ( k ) ˜ T k < − (1 − ε ) k X j =2 tγ ( k ) e ( βY j − ) α = − (1 − ε ) t. It follows that lim k →∞ E ( e − tγ ( k ) T k ) = e − t , which is the desired result. Westill must show that [(1 + o k (1)) tγ ( k ) e ( βY j − ) α ] of (5.2) is arbitrarily closeto 0 for j = k for all suﬃciently large k and β < /α . Let ρ lα be deﬁnedby (A.34) Y αj − Y αj − > ρ lα ( β − Y αj − j > ρ lα ( β − j ( β − α W α (1 − ε ) j for all j suﬃciently large, where by Theorem 4.1 lim Y j − ( j − β − = W >

0. Thus γ ( k ) e ( βY k − ) α = 1 P k − j =1 e − β α ( Y αk − − Y αj ) = 1 P k − j =1 e − β α P k − i = j +1 ( Y αi − Y αi − ) < P k − j = j e − D P k − i = j +1 i ( β − α − → , for suitable large j , as long as ( β − α − <

0, that is, β < /α [wherewe have let D = ρ lα β α ( β − W α (1 − ε )]. Proof of (iii). Here let γ ( k ) = e − β α Y αk − . With this γ ( k ), (5.2) becomeslog Ee − tγ ( k ) ˜ T k = − k X j =2 log[1 + (1 + o k (1)) te − β α ( Y αk − − Y αj − ) ]= − log[1 + (1 + o k (1)) t ](A.39) − k − X j =2 log[1 + (1 + o k (1)) te − β α P k − i = j ( Y αi − Y αi − ) ] . XTREME(LY) MEAN(INGFUL) Now for some

D > { Y k } )0 < e − β α ( Y αi − Y αi − ) < e − βαρlα ( β − Y αi − i < e − Di α ( β − − . Thus e − β α ( Y αk − − Y αj − ) < e − D R k − j x ν dx = e − D/ ( ν +1)[( k − ν +1 − j ν +1 ] , where ν = α ( β − − >

0, that is, β > /α . Butlim k →∞ k − X j =2 e − D/ ( ν +1)[( k − ν +1 − j ν +1 ] = lim k →∞ R k e Dx ν +1 dxe Dk ν +1 l’Hˆopital = lim k →∞ D ( ν + 1) k ν = 0 . Since the sum in the right-hand side of (A.39) tends to 0 andlim k →∞ E ( e − tγ ( k ) T k ) → / (1 + t ) , which is Ee − tQ where Q ∼ Exp(1), T k e βα Y αk − tends in distribution to an expo-nential distribution. The above proof shows that P k − j =1 e ( βY j ) α /e ( βY αk − ) a . s . −→

1, thus also T ∗ k D −→ Exp(1). (cid:3)

Lemma A.1.

Let X ∼ F where F is G α with α > . Let β > and let Z k be the random “overshoot” over βY k − . For any γ > , and any ≤ δ < ( β − α , P ( Z k > γY k − /k δ inﬁnitely often ) = 0 . Proof.

Consider the event A = { Y k /k β − → W, < W < ∞} . We knowby Theorem 4.1 that P ( A ) = 1, and hence we shall assume that A occurs.Let A k = { Z k > γY k − /k δ } . We shall show that P ∞ k =1 P ( A k ) < ∞ so thatthe result will follow from the Borel–Cantelli lemma. P ( A k | Y k − ) = exp {− [( γ/k δ + β ) α − β α ] Y αk − − [ h (( γ/k δ + β ) Y k − ) − h ( βY k − )] } = exp (cid:26) − β α (cid:18)(cid:18) γ/βk δ (cid:19) α − (cid:19) Y αk − − h ′ ( Q k ) γk δ Y k − (cid:27) , where βY k − ≤ Q k ≤ ( β + γk δ ) Y k − . Write, by (2.9), (cid:12)(cid:12)(cid:12)(cid:12) − h ′ ( Q k ) γk δ Y k − (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) − h ′ ( Q k ) Q α − k γk δ Y k − Q α − k (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) o k Y αk − k δ (cid:12)(cid:12)(cid:12)(cid:12) , A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN where | o k | < ε for k ≥ k with large enough k and ε > k can be chosen to depend on Y only. For ε small enough thisimplies P ( A k | Y k − ) ≤ exp (cid:26) − β α (cid:18) αγ/βk δ + o (cid:18) k δ (cid:19)(cid:19) Y αk − + | o k | k δ Y αk − (cid:27) ≤ exp (cid:26) − [ β α − αγ − ε ] 1 k δ Y αk − (cid:27) ≤ exp (cid:26) − β α − αγ k δ cY k β − (cid:27) ≤ exp {− dY k β − − δ } . The next to last inequality follows from (A.26). Hence P ( A k | Y ) ≤ exp {− dY × k β − − δ } for k ≥ k = k ( Y ). If δ < β − P ∞ k =1 P ( A k | Y ) < ∞ , so, by theBorel–Cantelli lemma, conditional on Y , P ( A k i.o. | Y ) = 0. But this is truefor all Y . Hence P ( A k i.o.) = 0. (cid:3) Proof of Theorem 5.3.

Let γ ( k ) = 1 / P k − j =1 e dY αj where d = β α . Wewrite (5.2) aslog E ( e − tγ ( k ) ˜ T k ) = − k − X j =1 log[1 + t (1 + o k (1)) γ ( k ) e dY αk − j ] . (A.40)Since when R is exponentially distributed with mean µ , log E ( e − tR ) = − log(1 + µt ), it is suﬃcient to show that the right-hand side of (A.40) con-verges, as k → ∞ , to − P ∞ j =1 log[1 + tµ j ].First consider γ ( k ) e dY αk − j = 1 S j,k + T j,k , (A.41)where S j,k = P k − i = k − j e d ( Y αi − Y αk − j ) and T j,k = P k − j − i =1 e − d ( Y αk − j − Y αi ) . Note that Y i = Z i + βY i − where Z i is the amount above βY i − for the i th item thatis kept. Hence, Y i = ( i − Y i − + Z i + βY i − i = Y i − + Z i + Y i − /αi because β − /α. By Lemma A.1, for all i suﬃciently large Y αi = Y αi − (cid:18) Z i iY i − + 1 αi (cid:19) α = Y αi − (cid:18) i + smaller order terms (cid:19) . XTREME(LY) MEAN(INGFUL) Let w = lim k →∞ Y k /k /α . Therefore, lim i →∞ Y αi − Y αi − = w α and for ﬁxed b , lim i →∞ Y αi + b − Y αi = bw α . This implieslim k →∞ S j,k = lim k →∞ k − X i = k − j e d ( Y αi − Y αk − j ) = lim k →∞ j − X l =0 e d ( Y αk − j + l − Y αk − j ) (A.42) = j − X l =0 e dlw α = e dw α j − e dw α − . For any ε > m such that (1 − ε ) w α ≤ Y αi i +1 ≤ (1 + ε ) w α forall i ≥ m . This implieslim k →∞ T j,k = lim k →∞ m − X i =1 e − d ( Y αk − j − Y αi ) + lim k →∞ k − j − X i = m e − d ( Y αk − j − Y αi ) . Fix m . Then the ﬁrst limit on the right-hand side is clearly zero since Y k − j →∞ as k → ∞ . Consider the second termlim sup k →∞ k − j − X i = m e − d ( Y αk − j − Y αi ) ≤ lim k →∞ k − j − m X l =1 e − dlw α (1 − ε ) = 1 e d (1 − ε ) w α − . Similarly, lim inf k →∞ k − j − X i = m e − d ( Y αk − j − Y αi ) ≥ e d (1+ ε ) w α − . Hence, lim k →∞ T j,k = 1 e dw α − j. (A.43)Substituting the results (A.42) and (A.43) into (A.41) yieldslim k →∞ γ ( k ) e dY αk − j = 1( e dw α j − / ( e dw α −

1) + 1 / ( e dw α − e dw α − e dw α j = µ j for ﬁxed j. Returning to (A.40), ﬁx n . − k − X j =1 log[1 + t (1 + o k (1)) γ ( k ) e dY αk − j ]= − n − X j =1 log[1 + t (1 + o k (1)) γ ( k ) e dY αk − j ](A.45) A. M. KRIEGER, M. POLLAK AND E. SAMUEL-CAHN − k − X j = n log[1 + t (1 + o k (1)) γ ( k ) e dY αk − j ] . Equation (A.44) implies that each term in the sum of the ﬁrst expression onthe right-hand side converges to log(1 + tµ j ) as k → ∞ . We need to show thatlim k →∞ P k − j = n log[1 + t (1 + o k (1)) γ ( k ) e dY αk − j ] can be made arbitrarily smallby choosing n to be suﬃciently large (all terms in the sum are positive).Note that γ ( k ) e dY αk − j < S j,k . For any ε > n large enough so that Y αi i +1 ≥ (1 − ε ) w α for all i ≥ n . For j ≥ n , S j,k = k − X i = k − j e d ( Y αi − Y αk − j ) = j − X l =0 e d ( Y αk − j + l − Y αk − j ) ≥ j − X l =0 e dlw α (1 − ε ) = e dw α j (1 − ε ) − e dw α (1 − ε ) − . Hence, γ ( k ) e dY k − j < e dw α (1 − ε ) − e dw α j (1 − ε ) − < e − dw α ( j − − ε ) . Choose k large enough so that o k (1) < ε . Thenlim k →∞ k − X j = n log[1 + t (1 + o k (1)) γ ( k ) e dY αk − j ] ≤ lim k →∞ k − X j = n t (1 + ε ) e − dw α ( j − − ε ) < t (1 + ε ) e − dw α ( n − − ε ) e dw α (1 − ε ) − . Since the right-hand side goes to zero as n → ∞ the second term in the sumin (A.45) can be made arbitrarily small by choosing n suﬃciently large. (cid:3) REFERENCES [1]

Abramowitz, M. and

Stegun, I. A. (1964).

Handbook of Mathematical Functionswith Formulas, Graphs, and Mathematical Tables . National Bureau of StandardsApplied Mathematics Series . Dover, New York. MR0167642[2] Feller, W. (1971).

An Introduction to Probability Theory and Its Applications, Vol.II , 2nd ed. Wiley, New York. MR0270403[3]

Krieger, A. M. , Pollak, M. and

Samuel-Cahn, E. (2007). Select sets: Rank andﬁle.

Ann. Appl. Probab. [4] Krieger, A. M. , Pollak, M. and

Samuel-Cahn, E. (2008). Beat the mean: Se-quential selection by better than average rules.

J. Appl. Probab. Preater, J. (2000). Sequential selection with a better-than-average rule.

Statist.Probab. Lett. Robbins, H. and

Siegmund, D. (1971). A convergence theorem for nonnegative al-most supermartingales and some applications. In

Optimizing Methods in Statis-tics (Proc. Sympos., Ohio State Univ., Columbus, Ohio, 1971)

Resnick, S. I. (1987).

Extreme Values, Regular Variation, and Point Processes . Ap-plied Probability. A Series of the Applied Probability Trust . Springer, New York.MR900810 A. M. KriegerDepartment of StatisticsThe Wharton SchoolUniversity of PennsylvaniaPhiladelphia, Pennsylvania 19104USAE-mail: [email protected]