How to beat the 1/e-strategy of best choice (the random arrivals problem)
HHow to beat the /e -strategy of best choice(the random arrivals problem) Alexander Gnedin ∗ February 23, 2021
Abstract
In the best choice problem with random arrivals, an unknown number n of rankableitems arrive at times sampled from the uniform distribution. As is well known, areal-time player can ensure stopping at the overall best item with probability at least /e , by waiting until time /e then selecting the first relatively best item to appear(if any). This paper discusses the issue of dominance in a wide class of stoppingstrategies of best choice, and argues that in fact the player faces a trade-off betweensuccess probabilities for various values of n . We argue that the /e -strategy is nota unique minimax strategy and that it can be improved in various ways. In a version of the familiar best choice problem, n items ranked , , · · · , n from bestto worst arrive at independent, uniformly distributed random times on [0 , . Alice, areal-time player, observes arrivals in the chronological order, and ranks each item relativeto all other seen so far. She can stop anytime, winning if the stopping occurs at the bestof n items and losing in any other case. The question is about good stopping strategieswhen the player does not know n .An item is said to be a record if there is no better item that arrived before it. Thefirst observed item is a record, and also any subsequent arrival with relative rank one.The overall best item appears as the last record, therefore strategies ever stopping onnon-records should be discarded straight away.If Alice knew n she could achieve the maximum winning probability using the classic d ∗ -strategy : wait until d ∗ − items pass by (where d ∗ = d ∗ ( n ) is about n/e ) then stopat the first subsequent record, if available. For every n , the winning probability of the d ∗ -strategy exceeds /e , and approaches the lower bound monotonically as n increases.By certainty about n observing the arrival times is of no avail, because by exchangeabilitythey are independent of the relative ranks.However, if n is unknown, no strategy based only on counting and relative ranking ofthe arrivals can ensure a winning probability bounded away from zero for all n [1]. Starting ∗ Queen Mary, University of London a r X i v : . [ m a t h . P R ] F e b rom [24] such strategies were studied under the assumption that n is drawn from somedistribution known or partly known to the player, or under an upper constraint on thenumber of items [22, 21].At first glance if the player is completely ignorant about n there is no useful alternative,but this is where the time factor steps in. For n large, the number of items arrived bytime t is close to nt , thus the /e -strategy , which prescribes to wait until time /e thenstop at the first subsequent record (if any), will have about the same effect as the d ∗ -strategy that uses the full knowledge of n . Intuitively, in the random arrivals model, thelack of information about n is compensated by the proportionate growth of the numberof observations.Looking back in history, the /e -strategy seems to have made its first appearance in[15], where asymptotic optimality was noticed for the model with items observed at timesof the Poisson process. The exact optimality was shown in Rubin’s ‘secretary problem’with infinitely many arrivals [2, 17, 26], and in a Bayesian setting with n sampled fromthe improper uniform distribution on integers [29], but such limit forms of the problemare very different in spirit and will not be touched upon here. Stewart [29], assuming i.i.d.exponentially distributed arrival times, proved asymptotic optimality of the strategy withthe cutoff time chosen to be the /e quantile of the exponential distribution, and showednumerical evidence that the strategy performs well also for small values of n . Bruss [6]gave the strategy its name and made the key observations that the distribution of arrivaltimes is not important (provided it is continuous and known to the player) and thatthe benchmark winning probability /e appears as the sharp lower bound, approachedmonotonically as n increases.The findings from [6] taken together with the upper bound of the d ∗ -strategy implythat the /e -strategy is minimax. This naturally begs two questions: is the /e -strategyunique minimax, and is there something better? Corollary 2 from [6] stated the uniquenesswithin a class of strategies that wait certain fixed time then stop at the r th record to follow(though second part of the definition of the class on p. 883 is controversial). Subsequentwork citing the result upgraded it to a proof of the overall uniqueness of minimax strategy,and added further optimality properties, although no rigorous arguments were given (see[8] p. 313, [12] p. 3259). This made the way to surveys and popular sites forming aconsensus in the mathematical community that no strategy could be better. In this paperwe disprove these uniqueness and optimality claims from the position of game-theoreticdominance. Concretely, we show that(i) the /e -strategy is never optimal if n is drawn from some probability distributionknown to Alice,(ii) there are many strategies that achieve the bencmark /e asymptotically,(iii) for every u there exist strategies that strictly improve the /e -strategy simultane-ously for all n ≤ u ,(iv) there exists a simple strategy outperforming the /e strategy simultaneously for all n > (strictly for n > ), 2v) there exist more complex strategies strictly outperforming the /e strategy simul-taneously for all n > ,(vi) for every (cid:96) ≥ there exist strategies that guarantee the winning probability at least /e for all n , and are outperforming the /e -strategy simultaneously for all n > (cid:96) .In practical terms, for (cid:96) = 1 Alice faces a trade-off between some advantage when thechoice occurs from many opportunities against a higher risk of going away empty-handedif just a sole item shows up.Regarding dominance in the opposite direction, we argue that(vii) there exist minimax strategies (hence winning with chance above /e ), which areworse than the /e -strategy simultaneously for all n .Although the game has a value (see Corollary 3 below), by (i) there is no worst-casedistribution for the number of items, which paved the way to (vii). This extends the listof known dominance paradoxes in the optimal stopping games of best choice [18, 19, 20].A question which we find difficult and leave here open is the existence of unconstraineddominating strategy ( (cid:96) = 0 in (vi)). If the /e -strategy turns to be undominated, (vi)would mean that the source of rigidity lies in the trivial case n = 1 .We shall make use of strategies that rely decisions at record times also on the numberof arrivals seen so far. Such strategies, called in [11] ‘nonstationary’, have been used inselection problems with random number of items [4, 14, 23, 30]. Exploring this largeclass, we further construct counter-examples to the assertion of risk monotonicity relativeto the stochastic ordering on distributions of the number of items (see Theorem 2.3 in[10], Equation (36) in [27], Equation (76) in [28]). Failing monotonicity of the kind wasobserved long ago in the rank-based optimal stopping problems with random sample sizeand fixed arrival times [16]. Following a suggestion from [6], we adopt here the paradigm of game theory, hence as-suming that an antagonist player, Pierre, is in charge of the variable n .Pierre’s strategies are easy to describe. A pure strategy is just a positive integer, anda mixed strategy is a probability distribution ν = ( ν , ν , · · · ) .When Pierre plays ν , Alice sees the items coming at epochs of a random countingprocess N = ( N t , t ∈ [0 , , where N follows distribution ν . By the assumption ofuniform arrival times, N has the order statistics property: conditional on N t − = k theconfiguration of the first k ordered arrivals is uniformly distributed on the k -simplex { ( t , · · · , t k ) : 0 < t < · · · < t k < t } . Equivalently, N can be regarded as a Markovian,inhomogeneous birth process with N = 0 and the jump rates readily computable interms of the probability generating function of ν , see [25]. The observed k th arrival isranked R k th relative to the items seen so far, where the relative ranks R , R , · · · areindependent and independent of N , with R k being uniform on { , · · · , k } . Thus theprocess counting record times is derived from N by thinning, such that the k th arrival iskept with probability /k independently of anything else [5].3 emark The best choice problem with arrivals coming by a point process is a wellstudied topic. To adjust a bulk of past work [4, 9, 14, 15, 30] to the setting of this paper,the processes need to be conditioned on nonzero number of arrivals. Equivalently, we mayallow for extended distributions on nonnegative integers, of the form (1 − c, cν , cν , · · · ) , < c ≤ . Whatever c , from the first arrival on the conditional distribution of N is thesame. For instance, in the setting of arrivals by Poisson process, the game against thePoisson distribution is the same as against the zero-truncated Poisson distribution.The space of Alice’s strategies is immense. The generic pure strategy τ is a stoppingtime, which is as a certain choice (if any) from the set of record times. The eventuallyobserved sample data includes n and a sequence ( t , r ) , · · · , ( t n , r n ) , where t k ∈ (0 , arethe increasing arrival times and r k ∈ { , · · · , k } the relative ranks of items. The function τ chooses one of the t k ’s or (no stop) subject to the following conditions. Firstly, if t k is chosen then the same choice is valid for any other n (cid:48) ≥ k and possible ‘future’ data ( t (cid:48) k +1 , r (cid:48) k +1 ) , · · · , ( t (cid:48) n (cid:48) +1 , r (cid:48) n (cid:48) +1 ) . Secondly, t k can only be chosen if r k = 1 (record arrival).A win with τ occurs in the event that τ = t k (so, r k = 1 ) and r k > , · · · , r k +1 > forsome k . Let W ( τ, ν ) denote the winning probability when Alice plays stopping strategy τ , and write W ( τ, n ) if Pierre’s strategy is pure, thus W ( τ, ν ) = (cid:88) n ≥ ν n W ( τ, n ) . Definition 1.
Stopping strategy τ is called(a) n -optimal if it achieves max W ( · , n ) ,(b) ν -optimal if it achieves max W ( · , ν ) ,(c) asymptotically optimal if W ( τ, n ) → /e as n → ∞ ,(d) dominated (respectively, dominated on a subset of positive integers Z ) if there isanother strategy τ (cid:48) with W ( τ (cid:48) , n ) ≥ W ( τ, n ) for all n (respectively, for n ∈ Z ),where at least one of the inequalities is strict,(e) strongly dominated on Z if W ( τ (cid:48) , n ) > W ( τ, n ) , n ∈ Z .In statistical decision theory, undominated strategy is also called ‘admissible’ [3].Dicrete-time theory of optimal stopping ensures existence of maximisers in (b): theseare best-response counter-strategies of Alice. By the nature of arrivals process N , thesufficiency principle allows one to seek for maximisers within a reduced class of Markovian stopping times, which make decision at record time t only with the account of t and thenumber of arrivals N t . For t is a record time, we call the number of observed items N t the index of the record. In particular, the earliest arrival is a record of index .We will restrict further consideration to Markovian strategies of special form τ = min { t : N t − N t − = 1 , R N t = 1 , N t ≥ a N t } ( min ∅ = 1 ), where ≤ a k ≤ , k = 1 , , · · · . The cutoff a k specifies the earliest timewhen a record with index k can be accepted.4 .1 d -strategies The classic d -strategy stops at the first record of index at least d regardless of the arrivaltime. This has only extreme cutoffs a k = (cid:40) , k < d, , k ≥ d. The winning probability on n items is s ( d, n ) = d − n h ( d, n ) with h ( d, n ) := n (cid:88) j = d j − , d > , (1)and s (1 , n ) = 1 /n . The n -optimal strategy has d ∗ = min { d : h ( d, n ) ≥ } , (2)as is well known. For n (cid:54) = 2 the maximiser is is unique. For n = 2 , s (1 ,
2) = s (2 ,
2) = 1 / ;the non-uniqueness in this case leads to an interesting dominance phenomenon related tothe Blackwell-Cover paradox [18, 19]. x -strategies For ≤ x ≤ the strategy with identical cutoffs a k ≡ x , τ x := min { t : t ≥ x, R N t = 1 } , stops at the first record arriving after time x regardless of the index. In particular, τ /e isthe /e -strategy.The winning probability is a polynomial p n ( x ) := W ( τ x , n ) , which has several usefulrepresentations p n ( x ) = n − (cid:88) k =1 (cid:18) nk (cid:19) x k (1 − x ) n − k s ( k + 1 , n ) = n − (cid:88) k =1 x (1 − x ) k k + (1 − x ) n n =1 − x − n − (cid:88) k =2 (1 − x ) k − k ( k − . The first is obtained by conditioning on N x = k and is analogous to Stewart’s formula forexponential arrivals [29]. The second is obtained by conditioning on the event that k topitems arrive after x and the ( k + 1) st before x [6].The third can be argued by coupling over the sample size, as follows. Suppose absoluteranks , · · · , n − appear as a winning (respectively, losing) configuration in the problemwith n − items. Adding the worst n th arrival does not change the status quo unless5 ut[8]= Figure 1: p n ( x ) for n = 1 , , , , and − x log x none of the arrivals falls in [0 , x ] , the best is the first appearing after x , and the worstfalls between x and the best arrival. The formula follows then from the recursion p n ( x ) = p n − − (1 − x ) n n ( n − . . Note that p n (0) = 1 /n . From the third representation it is most easy to see that [6](a) p n ( x ) ↓ − x log x as n ↑ ,(b) the p n s ’s are unimodal on [0 , , with maximum points x n satisfying x = x > x > · · · → /e, (c) p and p are strictly decreasing. If Pierre plays the geometric distribution ν n = α (1 − α ) n − , the ν -optimal strategy is τ x with x = (cid:18) e − α − α (cid:18) − e (cid:19)(cid:19) + , (3)see [4, 7, 11]. Note that x = 0 for α ≥ /e .For Alice the game is the same as with arrivals coming by a mixed Poisson process N with mean- ( α/ (1 − α )) exponentially distributed rate. Equivalently, N is a time-changedYule process with birth rate ( k + 1)(1 − α ) α + t (1 − α )
6n state N t = k . Since the next arrival is a record with probability / ( k + 1) , the can-cellation of the state variable shows that the record-counting process has deterministiccompensator hence is a Poisson process by Watanabe’s theorem. See [4, 11, 9] for char-acterisation of this ‘stationary’ case. Basic dominance features of the x -strategies are drawn straight from the monotonicityproperties of p n ’s. Proposition 2.
The x -strategies satisfy (i) τ x dominates τ y for /e ≤ x < y ≤ , (ii) τ x is dominated by τ y on { n, n + 1 , · · · } for ≤ x < y ≤ x n , (iii) τ x is undominated for ≤ x < /e .Proof. The first two assertions are clear from the properties of p n ’s. The third followsfrom ν -optimality, for ν the geometric distribution with parameter found from (3). Corollary 3.
The stopping game has the following features: (i) the value of the game is /e , (ii) Pierre has no minimax strategy and so the game has no saddle-point, (iii) the /e -strategy is never ν -optimal.Proof. We have W ( τ /e , ν ) > /e , but Pierre can keep the winning probability below /e + ε by playing large n or a stochastically large ν . This gives (i).Since p n (1 /e ) > /e , also W ( τ /e , ν ) > /e for any ν , whence (ii).To argue (iii), note that p (cid:48) n (1 /e ) < implies (cid:80) n ν n p (cid:48) n (1 /e ) < , thus x (cid:55)→ W ( τ x , ν ) isstrictly decreasing in a vicinity of /e . Thus for every fixed ν there exists a τ x strategywith winning probability strictly higher than W ( τ /e , ν ) . Proposition 4.
Suppose strategy τ has cutoffs satisfying a k → x and a > . Then W ( τ, n ) → − x log x as n → ∞ . In particular, for x = 1 /e the strategy is asymptoticallyoptimal.Proof. Suppose < x < and fix δ > . all cutoffs bigger than some k are within ( x − δ, x + δ ) . Since the number of arrivals on [0 , a goes to infinity with n , the probabilityto stop before x − δ approaches . The probability to stop within ( x − δ, x + δ ) hasasymptotic upper bound δ/x , since the process of record times converges to Poissonprocess with intensity /t . Finally, on the event τ > x + δ the strategy coincides with τ x + δ , and the result follows by sending δ → . The extreme values x = 0 and are treatedsimilarly. 7nalysis of stopping strategies with multiple cutoffs is much more involved, but a keyidea is seen from the classic best choice problem. Fix k . For n < k the value of cutoff a k for n -optimality does not matter. As n increases, it is optimal to keep a k = 0 as long as k ≥ d ∗ ( n ) , switching to a k = 1 for n about ke . The next lemma quantifies the winningprobability as the cutoffs vary between the extreme positions.Let w n ( a ) be W ( τ, n ) for τ the strategy with cutoffs a = ( a , a , · · · ) . Note that w n only depends on a , · · · , a n . Lemma 5.
Suppose a k ≤ min( a , · · · , a k − ) , then for n ≥ k∂w n ( a ) ∂a k = (cid:18) n − k − (cid:19) a k − k (1 − a k ) n − k [ h ( k + 1 , n ) − . (4) Proof.
Compare strategy τ with cutoffs a and τ (cid:48) with k th cutoff changed to a k − δ , and allother being unaltered. The strategies make different choices only if there is a record withindex k and arrival time between a k − δ and a k . In that event both strategies do not stopbefore a k − δ , because the preceding records have index below k and by the assumption oncutoffs could not be accepted. Then, τ (cid:48) stops and wins with probability k/n , while τ picksthe next record (check the acceptance condition!) and wins with probability s ( k + 1 , n ) .The formula follows from the formula for the density of the k th uniform order statisticand (1).Note that the only zero derivative in (4) is ∂w /∂a = 0 , with all other being sign-definite, a property which will turn important for constructing constrained dominatingstrategies. For simplicity of exposition, we choose all cutoffs to the left of /e . Proposition 6.
For every finite set of integers Z there exists a strategy strongly domi-nating the /e -strategy on Z .Proof. Fix any large enough n , and set iteratively n j = d ∗ ( n j − ) − until n m = 0 for some m , resolving the ambiguity for n = 2 arbitrarily. We introduce a sequence of strategies τ , · · · , τ m , where τ j dominates the /e -strategy for n ≤ n , while dominating strongly for n j +1 < n ≤ n j .Define τ by setting a n = 1 for n < n ≤ n , and leaving the remaining cutoffs at /e . According to (4) this gives some improvement. Over this range of n , let α be theminimum advantage over the /e -strategy, W ( τ , n ) − W ( τ /e , n ) . For smaller values of n the difference is zero.Inductively, with τ j − defined, let α j − be the minimum advantage of this strategyfor n > n j . To define τ j , set cutoffs a n for n j +1 < n ≤ n j equal to some value a n j ,leaving other cutoffs same as for τ j − . As a n j decreases from /e , a strict advantage for n j +1 < n ≤ n j is gained, but disadvantage for n > n j increases. Choose a n j in such a waythat a n j − < a n j < /e and the advantage for n > n j is still at least α j − / , say.If the sequence of cutoffs is nonincreasing, there is an integral counterpart of (4) w n ( a ) = n − (cid:88) k =0 π n,k ( a ) s ( k + 1 , n ) , (5)8here for < k < nπ n,k ( a ) = n (cid:18) n − k − (cid:19) (cid:90) a k a k +1 x k − (1 − x ) n − k d x + (cid:18) nk (cid:19) a kk +1 (1 − a k +1 ) n − k , (6)and π n, ( a ) = (1 − a ) n .More detailed discussion of the cutoff monotonicity and derivation of (5) will appearelsewhere. In general, however, a ν -optimal strategy need not have nonincreasing cutoffs,as the next example demonstrates. The possibility of such irregularity (analogous tostopping islands in [24]) was mentioned in [10], p. 827. Example 7.
Consider a two-point distribution ν = ν = 1 / . Since d ∗ (10) = 4 and d ∗ (100) = 38 , a ν -optimal strategy will have a = a = a > a > · · · > a > .Then a = · · · = a = 1 , a = · · · , a = 0 , because after the th arrival is observed,the only remaining option is N = 100 . /e strategy n > For < x < a ≤ , define τ by choosing cutoffs a = 1 and a k = x for k > . In theevent N x > the strategy coincides with τ x . If N x = 0 the strategy selects the secondrecord (if any).Suppose N x = 0 . Then τ skips the first arrival and wins with probability s (2 , n ) , while τ x stops and wins with probability s (1 , n ) . From (5) W ( τ, n ) − W ( τ x , n ) = [(1 − x ) n − (1 − a ) n ]( s (2 , n ) − s (1 , n )) , where s (2 , n ) − s (1 , n ) = − , n = 1 , , n = 2 ,h (3 , n ) /n, n > . Thus τ dominates τ x for n > (that is, on the set Z = { , , · · · } ), and strongly dominatesfor n > . For n large, the advantage is asymptotic to (1 − x ) n (log n ) /n .In particular, setting a = 1 , x = 1 /e the strategies compare for n = 1 as to − /e ,for n = 2 both win with probability (1 − /e ) , and for n = 3 the advantage over the /e -strategy is (1 − /e ) = 0 . · · · .Setting a = 1 − /e, x = 1 /e yields a strategy that dominates τ /e for n ≥ , dominatesstrongly for n > and for n = 1 , has the same winning probability as the /e -strategy.Clearly, this strategy is minimax.We see that if Pierre is prohibited to play n = 1 , or if the game starts with an itemthat sets a standard but is not choosable (like in the models in [15, 13]) then there is agood reason to discard the /e -strategy. 9 .2 Dominance for n > We introduce next non-Markovian strategies to strongly dominate some x -strategies (in-cluding the /e -strategy) for n > . A key observation is that p n ’s for n > are strictlyincreasing on [0 , c ] , where c = 2 − √ . · · · is the maximum point of p .Fix x > c and choose y in the range (cid:18) x − c − c (cid:19) + < y ≤ x. For x (cid:48) = c (1 − y ) + y let τ = (cid:40) τ x , if N y > ,τ x (cid:48) , if N y = 0 . Given N y = 0 , the arrivals sequence can be identified with order statistics from theuniform distribution on [ y, , hence τ wins with probability p n ( c ) , and τ x with probability p n (( x − y ) / (1 − y ) , which gives W ( τ, n ) − W ( τ x , n ) = (1 − y ) n (cid:20) p n ( c ) − p n (cid:18) x − y − y (cid:19)(cid:21) , which is strictly positive for n > .In particular, for x = 1 /e choosing y = x the advantage over the /e strategy for n > becomes (1 − /e ) n ( | c log c | − /n ) , where | c log c | = 0 . · · · . The surplus results from the event that the first arrival appearsafter /e . The exponential factor can be increased by tuning parameter y , for instancetaking y = 1 / (2 e ) gives W ( τ, n ) − W ( τ /e , n ) = (cid:18) − e (cid:19) n (cid:20) p n ( c ) − p n (cid:18) e − (cid:19)(cid:21) , where for n → ∞ p n ( c ) − p n (cid:18) e − (cid:19) → . · · · . So the advantage is of higher order than for the strategies in Section 4.1.
A class of minimax strategies is obtained by shifting the cutoffs to the right, the directionopposite to that used in Proposition 6.Formula (4) says that it is advantageous to shift cutoff a k to the right, provided thenumber of items is big enough to satisfy d ∗ ( n ) > k ; that is, for n such that the n -optimalstrategy would stop at record with index k . Since τ /e has winning probability well above /e for small n , there is some room to trade the winning chance for smaller n againstsome advantage for larger. 10n Section 4.1 we deformed the /e -strategy by chosing a = 1 /e , thus reducing W ( τ /e ,
1) = 1 − /e = 0 . · · · to /e = 0 . · · · , but gaining for n ≥ (strictlyfor n > ).More generally, think of /e as the initial position of all cutoffs. Choose /e < a ≤ − /e , then increase a to a position so as to keep a < a = 1 /e and the winningprobability for n = 2 , , strictly above /e . Since d ∗ (5) = 3 this improves the chancesfor n > . Then a can be increased subject to similar constraints for n = 3 , , , , ( d ∗ (8) = 4 ), and so on. Every step gives a minimax strategy that dominates τ /e stronglyon { n, n + 1 , · · · } but is dominated by τ /e on the first n integers. Infinitely many stepsresult in a minimax strategy dominated by τ /e .Computation with (5) shows that for the first three cutoffs it turns that the only activeconstraints is the winning probability for n = 1 , , . Pushing these to the limits, we get a = 1 − e = 0 . · · · ,a = (cid:18) − e (cid:19) / = 0 . · · · ,a = 12 (cid:34) (cid:18) − e (cid:19) / − e (cid:35) / = 0 . · · · , which defines a strategy which has w n ( a , a , a , /e, /e, · · · ) = 1 /e = 0 . · · · , for n = 1 , , , exactly, and beats τ /e starting from n = 5 (though the theory above guaranteed improve-ment for n ≥ ): n w n p n (1 /e ) Remark
It is not hard to see that a strategy with nonincreasing cutoffs allocated onboth sides from /e cannot dominate τ /e . Evaluation of such strategies is more difficultto be used for verifying the dominance conjecture stated in Introduction. The winning probability W ( τ, n ) decreases with n for τ = τ x , but may fluctuate in general(the last example). Also, note that for Markovian strategy with nonincreasing cutoffs, τ is nonincreasing with n , as is seen by coupling different sizes.The monotonicity of winning probability holds for the d ∗ -strategy, and this is veryintuitive as the strategy depends on n and the choice problem becomes harder. Forrandom sample size the natural sense of monotonicity is relative to the (partial) stochasticorder (cid:31) on distributions. Recall that ν (cid:48) (cid:31) ν if ν has heavier tails.11heorem 2.3 from [10], when adjusted to the best-choice context, asserts that therelation ν (cid:48) (cid:31) ν implies max W ( · , ν (cid:48) ) ≤ max W ( · , ν ) , in line with the fixed number ofitems case. We will disprove this by counter-examples. A minor delicate point is thatin [10], p. 825, the payoff of non-stopping in case n = 1 is 1 (note that in this settingPierre will never play n = 1 , and the /e -strategy will be dominated, see Section 4.1).The monotonicity claim is re-stated in [11] under the opposite convention that the non-stopping in the n = 1 case is assessed as . The first, simpler, example to follow is acounter-example under the second convention, and the second works for both. Example 2
Suppose Pierre plays a two-point distribution ν = 1 − p, ν = p . Thedistribution is strictly increasing in the stochastic order as p increases from to .We have d ∗ (1) = 1 , d ∗ (4) = 2 . If the first arrival is not chosen, then regardless of thetime the best way to proceed it to stop at the next record (with a hope that n = 4 ). Thus a = a = a = 0 , and the strategies failing the condition can be discarded by dominance.We leave to the reader a rigorous proof that the optimal acceptance region A is an interval,but this is intuitively obvious in this simple situation. Thus the only indeterminate ofthe stopping strategy is the cutoff a . Given the first item comes before the cutoff, Aliceproceeds with -strategy, otherwise with -strategy. Changing the variable as b = 1 − a for shorthand, the total winning probability is computed as w ( b, p ) := (1 − p ) b + p (cid:18) (1 − b ) s (2 ,
4) + b (cid:19) =(1 − p ) b + p
24 (11 − b ) , ( b, p ) ∈ [0 , , ] where we used s (2 ,
4) = 11 / . For p ≤ / this increases in b , hence the best responsehas a = 1 − b = 0 and the strategy always stops at the first arrival. For < p < / the function w ( p, · ) has a single mode inside (0 , . The saddle point is b ∗ = 0 . · · · , p ∗ = 0 . · · · , where b ∗ is a root of − b + 124 (11 − b ) = 0 . This is an equaliser, hence w ( b ∗ , p ) = b ∗ , p ∈ [0 , . Thus if Pierre plays ν = 1 − p ∗ , ν = p ∗ , Alice can only achieve b ∗ , and so for p ∈ [ p ∗ , b ∗ = w ( b ∗ , p ) < max b w ( b, p ) ≤ w (0 ,
1) = s (2 ,
4) = 1124 = 0 . · · · , where the second term increases in p . The conclusion is that a lottery on { , } may turnfor Alice less favourable than certain n = 4 .12 xample 3 Suppose Pierre plays ν = 1 − p, ν = p . Since d ∗ (3) = 2 , d ∗ (6) = 3 , Alicewill play a = 1 , a = a = a = a = 0 and some a . Let b = 1 − a for shorthand, β ( k, n ) := (cid:0) nk (cid:1) (1 − b ) k b n − k . If n = 3 Alice wins with probability f ( b ) := [ β (0 ,
3) + β (1 , s (2 ,
3) + [( β (2 ,
3) + β (3 , s (3 , , and if n = 6 with probability f ( b ) := [ β (0 ,
6) + β (1 , s (2 ,
6) + ( β (2 ,
6) + β (3 ,
6) + β (4 ,
6) + β (5 ,
6) + β (6 , s (3 , . The game on the unit square has the payoff ‘matrix’ w ( b, p ) := (1 − p ) f ( b ) + pf ( b ) . With the experience of the previous example, we look for an equalising strategy. Equation f ( b ) = f ( b ) becomes b − b + 13 b − b + 17180 = 0 , and has a unique suitable root b ∗ = 0 . · · · , with w ( b ∗ , p ) = 0 . · · · for all p ∈ [0 , .To find a best response to p , we observe that w ( b, p ) is maximal at b ( p ) = 1 for p ≤ . · · · , and at b ( p ) = (cid:18) − p )17 p (cid:19) / for p > . · · · , as found by solving ∂w ( b, p ) ∂p = (1 − p ) b (1 − b ) − p b (1 − b ) = 0 . Minimising w ( b ( p ) , p ) returns p ∗ = 0 . · · · , so ( b ∗ , p ∗ ) is a saddle point. The ν -optimalwinning probability w ( b ( p ) , p ) is strictly increasing on [ p ∗ , . But the larger p , the larger ν stochastically. Remark
A gap in [10] (Theorem 2.3) appeared in the short argument on p. 827, wheredependence of the stopping and continuation risks on N t was ignored. The asserted par-allel with Theorem 2.1 of [10] is not relevant here, as the result concerns strategies thatrely decisions on the arrival times and the relative ranks only, while even the classic d -strategies (embedded in continuous time) are not of this kind. Nevertherless, the impli-cation ν (cid:48) (cid:31) ν ⇒ max W ( · , ν (cid:48) ) ≤ max W ( · , ν ) does hold under the additional assumptionthat ν (cid:48) is a convolution of ν with another distribution on nonnegative integers; in thatcase the proof follows by coupling exactly as in the fixed sample size problems [2, 16].13 eferences [1] Abdel-Hamid, A.R., Bather, J.A. and Trustrum, G.B. (1982) The secretary problemwith an unknown number of candidates. J. Appl. Probab. , 619–630.[2] Berezovsky, B.A. and Gnedin, A. V. The best choice problem.
Moscow, Nauka, 1984.[3] Blackwell, D. and Girshick, M. A.
Theory of games and statistical decisions.
JohnWiley and Sons, Inc., New York; Chapman and Hall, Ltd., London, 1954. xi+355 pp.[4] Browne, S. (1993) Records, mixed Poisson processes and optimal selection: an inten-sity approach. Preprint.[5] Browne, S. and Bunge, J. (1995) Random record processes and state dependentthinning.
Stoch. Proc. Appl. , 131–142.[6] Bruss, F.T. (1984) A unified approach to a class of best choice problems with anunknown number of options. Ann. Probab. , 882–889.[7] Bruss, F. T. (1987) On an optimal selection problem of Cowan and Zabczyk. J. Appl.Probab. , 918–928.[8] Bruss, F. T. (1988) Invariant record processes and applications to best choice mod-elling. Stochastic Process. Appl. , 303–316.[9] Bruss, F. T. and Rogers , L. C. G. (1991) Embedding optimal selection problems ina Poisson process. Stochastic Process. Appl. , 267–278.[10] Bruss, F. T. and Samuels, S. M. (1987) A unified approach to a class of optimalselection problems with an unknown number of options. Ann. Probab. , 824–830.[11] Bruss, F. T. and Samuels, S. M. (1990) Conditions for quasi-stationarity of the Bayesrule in selection problems with an unknown number of rankable options. Ann. Probab. , 877–886.[12] Bruss, F. T. and Yor, M. (2012) Stochastic processes with proportional incrementsand the last-arrival problem. Stochastic Process. Appl. , 3239–3261.[13] Campbell, G.C. and Samuels, S.M. (1981) Choosing the best from the current crop.
Adv. Appl. Probab. Theory Probab. Appl. , 584–592.[15] Gaver, D. P. (1976) Random record models. J. Appl. Probability , 538–547.[16] Gianini-Pettitt, J. (1979) Optimal selection based on relative ranks with a randomnumber of individuals. Adv. Appl. Prob. , 720–736.[17] Gianini, J. and Samuels, S. (1976) The infinite secretary problem. Ann. Probab. ,418–432.[18] Gnedin, A. (1994) A solution to the game of googol. Ann. Probab. , 1588–1595.[19] Gnedin, A. (2016) Guess the larger number. Math. Appl. (Warsaw) , 183–207.[20] Gnedin, A. and Krengel, U. (1995) A stochastic game of optimal stopping and orderselection. Ann. Appl. Probab. , 310 –321.[21] Hill, T.P. and Kennedy, D. (1994) Minimax-optimal strategies for the best-choiceproblem when a bound is known for the expected number of objects. SIAM J. Controland Optimization , 937–951.[22] Hill, T.P. and Krengel, U. (1991) Minimax-optimal stop rules and distributions insecretary problems. Ann. Probab. , 342–353.1423] Kurushima, A. and Ano, K. (2003) A note on the full-information Poisson arrivalselection problem. J. Applied Probab. , 1147–1154.[24] Presman, E. and Sonin, I. (1972) The best choice problem for a random number ofobjects. |it Theor. Probab. Appl. , 657–668.[25] Puri, P.S. (1982) On the characterization of point processes with the order statisticproperty without the moment condition. J. Appl. Prob. , 39–51.[26] Rubin, H. (1966) The “secretary” problem, Ann. Math. Statist. , 544.[27] Samuels, S.M. Secretary problems as a source of benchmark bounds . Stochastic in-equalities (Seattle, WA, 1991), 371–387, IMS Lecture Notes Monogr. Ser., 22, Inst.Math. Statist., Hayward, CA, 1992.[28] Samuels, S.M.
Secretary problems.
Handbook of sequential analysis, 381–405, Statist.Textbooks Monogr., 118, Dekker, New York, 1991.[29] Stewart, T. J. (1981) The secretary problem with an unknown number of options.