[PDF] A study of counts of Bernoulli strings via conditional Poisson processes

Abstract

We say that a string of length d occurs, in a Bernoulli sequence, if a success is followed by exactly (d−1) failures before the next success. The counts of such d -strings are of interest, and in specific independent Bernoulli sequences are known to correspond to asymptotic d -cycle counts in random permutations. In this note, we give a new framework, in terms of conditional Poisson processes, which allows for a quick characterization of the joint distribution of the counts of all d -strings, in a general class of Bernoulli sequences, as certain mixtures of the product of Poisson measures. This general class includes all Bernoulli sequences considered before, as well many new sequences.

Full PDF

aa r X i v : . [ m a t h . P R ] J a n A STUDY OF COUNTS OF BERNOULLI STRINGS VIACONDITIONAL POISSON PROCESSES

FRED W. HUFFER, JAYARAM SETHURAMAN, AND SUNDER SETHURAMAN

Abstract.

A sequence of random variables, each taking values 0 or 1, iscalled a Bernoulli sequence. We say that a string of length d occurs, in aBernoulli sequence, if a success is followed by exactly ( d −

1) failures beforethe next success. The counts of such d -strings are of interest, and in speciﬁcindependent Bernoulli sequences are known to correspond to asymptotic d -cycle counts in random permutations.In this note, we give a new framework, in terms of conditional Poissonprocesses, which allows for a quick characterization of the joint distribution ofthe counts of all d -strings, in a general class of Bernoulli sequences, as certainmixtures of the product of Poisson measures. In particular, this general classincludes all Bernoulli sequences considered in the literature, as well as a hostof new sequences. Introduction

In this note, we study the joint distribution of the counts of certain d -stringsof all orders d > Y = { Y n } n ≥ is a sequence of { , } -valued random vari-ables. For d ≥

1, we say that a d -string occurs if a 1 is followed by exactly ( d − d -string occurs attime n ≥ Y n,d = 1 where Y n,d = (cid:26) Y n Y n +1 for d = 1 Y n (1 − Y n +1 ) · · · (1 − Y n + d − ) Y n + d for d ≥ , that is, if h Y n , . . . , Y n + d i = h , , . . . , | {z } d − , i . Let Z d = P n ≥ Y n,d be the count of all d -strings, for d ≥

1, and Z = h Z d : d ≥ i be the “count vector” of strings. [In general, Z may have divergent components, but Mathematics Subject Classiﬁcation. primary 60C05; secondary 60K99.

Key words and phrases.

Bernoulli, cycles, strings, spacings, nonhomogeneous, Poisson pro-cesses, random permutations.Research partially supported by ARO-W911NF-04-1-0333, nsa-h982300510041, and NSF-DMS-0504193. for the Bernoulli sequences considered in this article it is easily shown (by takingexpectations) that all components Z k are ﬁnite with probability 1.]In this notation, the general problem is to understand the distribution of Z and its connection to the underlying sequence Y . Aside from the problem’s basicinterest, d -strings and their counts from speciﬁc independent Bernoulli sequenceshave interpretations with respect to random permutations, record values, Bayesiannonparametrics, and species allocation models through Ewens sampling formula.We will use “ d =” to signify “equals in distribution,” and L ( X ) to denote thelaw or distribution of the random variable X . Denote also Po( λ ) as the Poissonmeasure on R with intensity λ , and I ( B ) as the indicator of a set B . Example 1.1.

Let S n = { , , . . . , n } , and consider the Feller algorithm to generatea permutation π : S n → S n uniformly among the n ! choices (cf. Feller (1945)):1. Draw an element uniformly from S n , and call it π (1). If π (1) = 1, a 1-cycleis completed. If π (1) = 1, make another draw uniformly from S n \ { π (1) } ,and call it π ( π (1)). Continue drawing from S n \ { π (1) , π ( π (1)) } , . . . namingthem π ( π ( π (1))), and so on, until a cycle (of some length) is ﬁnished.2. From the elements left in S n \ { π (1) , π ( π (1)) , . . . , } after the ﬁrst cycle iscompleted, follow the process in step 1 with the smallest remaining numbertaking the role of “1” to ﬁnish a second cycle. Repeat until all elements of S n are exhausted.Let I ( n ) k be the indicator that a cycle is completed at the k th Feller draw from S n .A moment’s thought convinces that { I ( n ) k } nk =1 are independent Bernoulli randomvariables with P ( I ( n ) k = 1) = 1 / ( n − k + 1) as, independent of the past, exactly onechoice at time 1 ≤ k ≤ n from the remaining n − k + 1 members left in S n completesthe cycle. Denote C ( n ) k as the number of k -cycles in π , C ( n ) k = ( I ( n )1 + P n − i =1 I ( n ) i I ( n ) i +1 for k = 1 Q k − l =1 (1 − I ( n ) l ) I ( n ) k + P n − ki =1 I ( n ) i Q i + k − l = i +1 (1 − I ( n ) l ) I ( n ) i + k for 2 ≤ k ≤ n. Now let Y be the independent sequence where P ( Y k = 1) = 1 /k for k ≥

1, sothat Y k d = I ( n ) n − k +1 for 1 ≤ k ≤ n . Then, as Y n , and Y n − k +1 Q nl = n − k +2 (1 − Y l ) for2 ≤ k ≤ n all vanish in probability as n ↑ ∞ , we conclude for each k ≥ n →∞ C ( n ) k d = Z k .Finally, as is well-known, the asymptotic cycle counts { lim n C ( n ) k } k ≥ are dis-tributed as independent Poisson random variables with respective means 1 /k for k ≥ Z d = Q k ≥ Po(1 /k ). [Example 2.1, in section2, gives a derivation in our Poisson process framework. See also Arratia-Barbour-Tavar´e (1992, 2003) for more discussion with Ewens sampling formula.] Example 1.2.

Consider the standard nonparametric problem of estimating theunknown distribution function F from independent and identically distributed ob-servations { X i } i ≥ . A Bayesian may place on F a Dirichlet prior with parameters aµ where a > µ is a non-atomic probability measure.Let Y = 1 and for n ≥ Y n = 1 if X n is a new observation, that is if X n

6∈ { X , . . . , X n − } , and Y n = 0 otherwise. Then, it can be shown that Y is anindependent Bernoulli sequence with P ( Y n = 1) = a/ ( a + n −

1) for n ≥ n ) − P ni =1 Y i → a a.s. The latter result can be interpreted in terms of counts ERNOULLI STRINGS AND POISSON PROCESSES 3 of strings in this Bernoulli sequence. See Korwar-Hollander (1973) for more details,and also Ghosh-Ramamoorthi (2003).In the literature, to our knowledge, only the count vectors of the following classof underlying independent Bernoulli sequences have been investigated. Denote theindependent Bernoulli sequence Y where P ( Y n = 1) = a/ ( a + b + n −

1) for n ≥ Y = Bern( a, b ). The case a = 1, b = 0 is Example 1.1 (see also Arratia-Tavar´e (1992)). The case a > b = 0 is Example 1.2. For this case, Arratia-Barbour-Tavar´e (1992) observe that the associated Z d = Q k ≥ Po( a/k ) throughconnections with Ewens sampling formula. When a = 1, b >

0, Sethuraman-Sethuraman (2004), employing factorial moments, show that, given the value x ofa Beta( b,

1) random variable, Z d = Q k ≥ Po((1 − x k ) /k ). Such a distribution willbe called a “mixture of independent Poisson factors.” When a > b > x of a Beta( b, a ) random variable, Z d = Q k ≥ Po( a (1 − x k ) /k ),again a mixture of independent Poisson factors. We note also that several inter-esting studies of 1-strings preceded some of the above work, e.g. an unpublishedmanuscript of Diaconis, Chern-Hwang-Yeh (2000), M´ori (2001), Joﬀe-Marchand-Perron-Popadiuk (2004), and references therein in these and the above papers.With this background, our main idea is that it is easier to study Z starting froman extrinsic “conditional marked Poisson process model” (CMPP) rather than di-rectly from the Bernoulli sequence. Namely, we prove that when the underlyingBernoulli sequence Y is generated through a CMPP model, the count vector Z isdistributed as a mixture of independent Poisson factors in terms of model parame-ters (Theorem 2.2). As remarked earlier, the Poisson process techniques used hereare diﬀerent from previous methods and allow quick derivations. Perhaps inter-estingly, the sequences Y found in our model include many dependent Bernoullisequences (some explicit examples are in section 5). However, the most general se-quence studied till now, the independent sequence Bern( a, b ) with a > b ≥ ( a, b ). Denote the independentBernoulli sequence Y where P ( Y = 1) = 1, and P ( Y n = 1) = a/ ( a + b + n −

2) for n ≥ Y = Bern ( a, b ). The Bern ( a, b ) sequence appends a 1 to the Bern( a, b )sequence and picks up one more d -string contributed by any leading 0’s in Bern( a, b ).We show that the distribution of the count vector Z for Bern ( a, b ) for a > , b ≥ ≤ b <

1, and in this case even the distribution of Z , the count of 1-strings inBern ( a, b ), is not a mixture of Poisson distributions (Proposition 4.5). However,the distribution of Z in Bern ( a, b ) can be expressed through a recurrence relationfor all values of b including 0 ≤ b < a, b ) and Bern ( a, b ) respectively. Last, in section 5, two explicitdependent Bernoulli sequences, arising from the CMPP model, are given. FRED W. HUFFER, JAYARAM SETHURAMAN, AND SUNDER SETHURAMAN CMPP models

The following “Poisson process” derivation of the distribution of Z with respectto Bern(1 ,

0) (cf. Example 1.1) motivates subsequent development.

Example 2.1.

Consider the following standard way to generate a Bern(1 ,

0) se-quence. Let { β i } i ≥ be independent, identically distributed (iid) Uniform[0 ,

1] ran-dom variables, and deﬁne Y n = I ( β n is a record) , n ≥

1. R`enyi’s theorem showsthat { Y n } n ≥ are independent and P ( Y n = 1) = 1 /n for n ≥

1, that is Y =Bern(1 , { X i } i ≥ be the record values among { β i } i ≥ . Notice that the pointprocess N on [0 ,

1] deﬁned by N ( A ) = P i ≥ δ X i ( A ) is a nonhomogeneous Poissonprocess on [0 ,

1] with intensity 1 / (1 − x ) (cf. Resnick (1994)). For each point X i , wecan associate a Geometric(1 − X i ) variable L i (a “mark”) corresponding to the num-ber of uniform random variables in { β i } i ≥ to the next record. Then, by thinningdecompositions, Z k = P i ≥ I ( L i = k ) = P i ≥ δ X i ([0 , I ( L i = k ) for k ≥ R (1 − x ) − x k − (1 − x ) dx = 1 /k for k ≥ Z , we then see what associated Bernoulli sequence Y arises.Consider a sequence of random variables ( X , L ) = { ( X i , L i ) } i ≥ on R × N where N = { , , . . . } , and the point process N on R given by N ( A )= P i ≥ δ X i ( A ). Letalso g : R → [0 , ∞ ) be a probability density function (pdf), and for each x ∈ R r ( x, · ) , q ( x, · ) : N → [0 ,

1] be probability mass functions, and λ x : R → [0 , ∞ ) be anintensity function.Then, we say ( X , L ) is the conditional marked Poisson process M ( g, r, λ, q ) ifthe following hold:1. X has pdf g ,2. conditional on X = x , N is a nonhomogeneous Poisson process withintensity function λ x ( · ),3. P ( L = k | X ) = r ( X , k ) for k ≥

1, and4. P ( L n = k | X , L , L , . . . , L n − ) = q ( X n , k ) for k, n ≥ L ∗ = L , and L ∗ r = L ∗ r − + L r for r ≥

1. We now deﬁne a Bernoulli sequence Y based on ( X , L ) as follows: Y n = 1 if n is of the form L ∗ r for some r ≥

0, and Y n = 0 otherwise. Another way to say this is Y n = (cid:26) n < L ∗ , or L ∗ r < n < L ∗ r +1 for r ≥

01 when n = L ∗ r for r ≥ . (2.1)Then, the count vector Z is given by Z k = X n ≥ I ( L n = k ) , for k ≥ . (2.2)We note the zeroth mark L is not included in the above summation since any Y i with i < L is part of an initial segment of zeros of the sequence not preceded bya 1, and so does not contribute to any d -string, for d ≥ Theorem 2.2.

Suppose R λ w ( x ) q ( x, k ) dx < ∞ for all w ∈ R and k ≥ . Then,the count vector Z associated with sequence Y , deﬁned through CMPP ( X , L ) = ERNOULLI STRINGS AND POISSON PROCESSES 5 M ( g, r, λ, q ) , is distributed as follows. Given the value X = x , Z d = Y k ≥ Po (cid:18) Z λ x ( x ) q ( x, k ) dx (cid:19) . Remark 2.3.

The distribution of Z does not depend on the transition function r ,consistent with the discussion of L before the theorem.Also, for a given k ≥ Z k is inﬁnite with positive probability exactly whenthere is a set B such that P ( X ∈ B ) > R λ w ( x ) q ( x, k ) dx = ∞ for w ∈ B . Proof of Theorem 2.2.

Recall the count vector representation (2.2). Condi-tional on X = x , the point process M on R × N given by M ( A × { k } ) = P i ≥ δ X i ( A ) I ( L i = k ) is a Poisson process on R × N with intensity function λ x ( x ) q ( x, k ) (cf. Proposition 4.10.1 (b) Resnick (1994)). Hence, it follows that,given X = x , the variables M ( R × { k } ) = P n ≥ I ( L n = k ) = Z k are independentPoisson variables with respective means R λ x ( x ) q ( x, k ) dx , for k ≥ (cid:4) The sequence

Bern( a, b )We now derive the count vector distribution for the sequence Bern( a, b ) using aCMPP model. Denote, as usual, for α, β >

0, the Beta function B ( α, β ) = Γ( α )Γ( β )Γ( α + β ) , (3.1)and let1. ¯ g ( x ) = x b − (1 − x ) a − /B ( b, a ) on 0 < x <

1, the Beta( b, a ) pdf,2. ¯ r ( x, k ) = x k − (1 − x ) for k ≥ λ w ( x ) = [ a/ (1 − x )] I ( w < x < q ( x, k ) = x k − (1 − x ) for k ≥ Proposition 3.1.

The model ( X , L ) = M (¯ g, ¯ r, ¯ λ, ¯ q ) produces an independentBernoulli sequence Y d = Bern( a, b ) for a > and b > whose count vector Z , conditional on the value x of a Beta( b, a ) random variable, is distributed as Q k ≥ Po( a (1 − x k ) /k ) . Remark 3.2.

As a corollary, by taking b ↓

0, we recover the count vector distribu-tion for Bern( a,

0) already considered in the literature as simply Z d = Q k ≥ Po( a/k ).Note that ( X , L ) → (0 ,

1) in distribution as b ↓ λ w ( · ) can begenerated in the following way. First, the point process formed by the recordvalues from an iid sequence of Beta(1 , a ) random variables is a Poisson processwith intensity a/ (1 − x ), the Beta(1 , a ) failure rate (cf. Resnick (1994) Proposition4.11.1 (b)). Next, we thin this process as follows. Let X d = Beta( b, a ), and { X i } i ≥ be the record values from an iid sequence of Beta(1 , a ) random variables, subjectto X i > X for i ≥

1. Then, conditional on X = x , the point process ¯ N deﬁnedby ¯ N ( A ) = P i ≥ δ X i ( A ) is the desired Poisson process with intensity function¯ λ x ( x ) = [ a/ (1 − x )] I ( x < x < Proof of Proposition 3.1.

The second part on the count vector distribution followsfrom Theorem 2.2, noting for k ≥

1, that Z ¯ λ x ( x )¯ q ( x, k ) dx = Z x ax k − dx = a (1 − x k ) k . (3.2) FRED W. HUFFER, JAYARAM SETHURAMAN, AND SUNDER SETHURAMAN

For the ﬁrst part, we observe that the distribution of { Y i } i ≥ given through (2.1)is uniquely determined by the probabilities of cylinder sets of the form E ( k , . . . , k n ) = ( L = k , L = k , . . . , L n = k n ) (3.3)= (cid:16) Y t = 1 for t ∈ { K , K , . . . , K n } , and Y t = 0 otherwise for 1 ≤ t ≤ K n (cid:17) where k , k , . . . , k n are positive integers and K = k , K = K + k , . . . , K n = K n − + k n are their partial sums. If the probability of sets of the form E def = E ( k , . . . , k n ) is a product of appropriate marginal probabilities then { Y n , n ≥ } will be the Bernoulli sequence Bern( a, b ). We will proceed to establish this.Let A n = { < x < x < · · · < x n < } . Using the Beta variables representationin Remark 3.2, write P ( E ) = Z A n ¯ g ( x )¯ r ( x , k ) n Y i =1 h P ( X i ∈ dx i | X i > x i − )¯ q ( x i , k i ) i dx . Since P ( X i ∈ dx i | X i > x i − ) = a (1 − x i ) a − / (1 − x i − ) a dx i for 1 ≤ i ≤ n , we havefurther that the last line equals a n B ( b, a ) Z A n x b + k − n Y i =1 x k i − i (1 − x n ) a dx . . . dx n (3.4)= B ( b + K n − , a + 1) B ( b, a ) · a n Q n − s =0 ( b + K s − α Γ( α ) = Γ( α + 1), that (3.4) becomes a Q K n − r =0 ( b + r ) Q K n − r =0 ( a + b + r ) · a n Q n − s =0 ( b + K s −

1) = K n Y i =1 b + i − a + b + i − n Y r =0 ab + K r − Q K n i =1 P ( Y i = 0) Q nr =0 [ P ( Y K r = 1) /P ( Y K r = 0)] with Y speciﬁedas Bern( a, b ). (cid:4) The sequence

Bern ( a, b )We will derive the count vector distribution for the sequence Bern ( a, b ), andshow a dichotomy depending on whether b ≥ b <

1. We ﬁrst consider the casewhere a > b >

1. Deﬁne1. g ∗ ( x ) = x b − (1 − x ) a /B ( b − , a +1) on 0 < x <

1, the Beta( b − , a + 1) pdf,2. r ∗ ( x,

1) = 1,3. λ ∗ w ( x ) = [ a/ (1 − x )] I ( w < x < q ∗ ( x, k ) = x k − (1 − x ) for k ≥ Proposition 4.1.

The CMPP model ( X , L ) = M ( g ∗ , r ∗ , λ ∗ , q ∗ ) produces an inde-pendent Bernoulli sequence Y d = Bern ( a, b ) for a > and b > , and, conditionalon a Beta ( b − , a + 1) variable X = x , the distribution of its count vector Z is Q k ≥ Po( a (1 − x k ) /k ) . Remark 4.2.

As a corollary, by taking b ↓

1, we ﬁnd the count vector distributionfor Bern ( a,

1) to be simply Z d = Q k ≥ Po( a/k ). [In fact, Bern ( a,

1) coincides withthe sequence Bern( a,

0) mentioned earlier in Remark 3.2.]Also, we note the Poisson process in the above CMPP model with intensity λ ∗ can be generated, as in Proposition 3.1, by taking X d = Beta( b − , a + 1), ERNOULLI STRINGS AND POISSON PROCESSES 7 and { X i } i ≥ as the sequence of records from an iid sequence of Beta(1 , a ) randomvariables, subject to the condition X > X . Proof of Proposition 4.1.

We need only establish the distribution of Y , as the laststatement follows from Theorem 2.2 and the computation (3.2). The calculationsare similar to the proof of Proposition 3.1. Let k = 1 , k , k , . . . , k n be positiveintegers, and K = k = 1 , K = K + k , . . . , K n = K n − + k n be their partialsums. Recall the cylinder set deﬁned in (3.3) and let E def = E (1 , k , . . . , k n ) = ( L = 1 , L = k , . . . , L n = k n ) , and set A n = { < x < x < · · · < x n < } . Write, using the construction inRemark 4.2, that P ( E ) = 1 B ( b − , a + 1) Z A n (cid:2) x b − (1 − x ) a (cid:3) · × n Y i =1 (cid:2) a (1 − x i ) a − / (1 − x i − ) a (cid:3)(cid:2) x k i − i (1 − x i ) (cid:3) dx . . . dx n = a n B ( b − , a + 1) Z A n x b − n Y i =1 x k i − i (1 − x n ) a dx . . . dx n . Then, with (3.1) and α Γ( α ) = Γ( α + 1), the last line equals B ( b + K n − , a + 1) B ( b − , a + 1) · a n ( b − Q n − s =1 ( b + K s − Q K n − r =0 ( b − r ) Q K n − r =0 ( a + b + r ) · a n ( b − Q n − s =1 ( b + K s − K n − Y i =1 b + i − a + b + i − n Y r =1 ab + K r − P ( Y = 1) Q K n i =2 P ( Y i = 0) Q nr =1 [ P ( Y K r = 1) /P ( Y K r = 0)] with Y speciﬁed as Bern ( a, b ). (cid:4) We now give the distribution of the count vector under Bern ( a, b ) for all a > b ≥ Y . Denote Z ( a, b ) as the count vector with respect to Bern ( a, b ) for a > b ≥

0. Let W n be the sequence whose n th co-ordinate is 1 and all the other co-ordinates are zero,for n ≥

1. Let also p n = (cid:26) aa + b for n = 2 aa + b + n − Q n − r =0 b + ra + b + r for n ≥ ( a, b ) occurs at time n ≥

2, and note P n ≥ p n = 1. Proposition 4.3.

For a > and b ≥ , we have L ( Z ( a, b )) = X n ≥ p n L (cid:16) Z ( a, b + n −

1) + W n − (cid:17) , (4.1) and Z ( a, b + n − , conditional on the value x of a Beta( b + n − , a + 1) randomvariable, is distributed as Q k ≥ Po( a (1 − x k ) /k ) , for b > and n ≥ . FRED W. HUFFER, JAYARAM SETHURAMAN, AND SUNDER SETHURAMAN

Remark 4.4.

The special case b = 0 is interesting. The sequence Bern ( a,

0) isthe independent sequence where Y = Y = 1 and P ( Y n = 1) = a/ ( a + n −

2) for n ≥

3. That is, starting from time n = 2, the sequence is Bern ( a,

1) = Bern( a, Z ( a,

0) is distributed as ˆ Z + W whereˆ Z d = Q k ≥ Po( a/k ) is the count vector for Bern( a, p = 1 (when b = 0) and Z ( a,

1) = ˆ Z . Proof of Proposition 4.3.

The distribution of Z ( a, b ) follows by conditioning onthe ﬁrst time that Y n = 1 for n ≥

2. The distributions of Z ( a, b + n −

1) arecompletely speciﬁed by Proposition 4.1 and Remark 4.2, since b + n − ≥ n ≥ (cid:4) From (4.1), it is not clear whether the distribution of Z ( a, b ) is a mixture ofproduct Poisson factors or not for 0 ≤ b <

1. We show now that even the ﬁrstcomponent Z ( a, b ) is not a mixture of Poissons when 0 ≤ b < Proposition 4.5.

The distribution of Z ≡ Z ( a, b ) , the count of -strings in the Bern ( a, b ) sequence, is not a mixture of Poissons when ≤ b < , that is, there isno measure µ on [0 , ∞ ) such that E h exp { tZ } i = Z [0 , ∞ ) e v ( e t − dµ ( v ) . (4.2) Proof.

It is well known that when (4.2) holds, the variable Z is over-dispersed,that is O ( Z ) def = Var( Z ) − E ( Z ) ≥

0. The proof now follows by the expressionfor O ( Z ) in (4.4) below. Let Y = Bern ( a, b ). Then, Z = Y + ˆ Z = Y + Y Y + Z +1 (4.3)where ˆ Z = P i ≥ Y i Y i +1 and Z +1 = P i ≥ Y i Y i +1 , and the latter is independentof Y . Furthermore ˆ Z , Z +1 are the counts of strings of order 1 from Bern( a, b ),Bern( a, b + 1), respectively, and their distributions are known from Proposition 3.1.Hence, by easy calculations E ( ˆ Z ) = a ( a + b ) , E ( Z +1 ) = a ( a + b + 1) , E ( ˆ Z ) = a ( a + 1)( a + b )( a + b + 1) + a ( a + b ) . From the identities in (4.3), we have E ( Z ) = a ( a + 1)( a + b ) , E ( Z ) = a ( a + 1)( a + b ) + a ( a + 1)( a + 2)( a + b )( a + b + 1) . This leads to O ( Z ) = a ( a + 1)( b − a + b ) ( a + b + 1) (4.4)which is negative for b <

1, and positive for b > (cid:4) Some dependent Bernoulli sequences

Two examples of dependent Bernoulli sequences, arising in CMPP models withsimple structures, whose count vector distributions are mixtures of independentPoisson factors are given.

First Sequence.

For a > b >

0, denote P a,b as the probability distri-bution of the CMPP M (¯ g, ¯ r, ¯ λ, ¯ q ) described in Proposition 3.1 which gives rise to ERNOULLI STRINGS AND POISSON PROCESSES 9 the Bernoulli sequence Bern( a, b ). Let now r + ( x, k ) = kx k − (1 − x ) for k ≥ M (¯ g, r + , ¯ λ, ¯ q ) with ¯ g, ¯ λ, ¯ q the same as inProposition 3.1. Denote the probability measure under this model as P + = P + a,b .Note that r + ( x, k ) = k [¯ r ( x, k ) − ¯ r ( x, k + 1)] where ¯ r ( x, k ) = x k − (1 − x ). Recallthe cylinder set E def = E ( k , . . . , k n ) from (3.3) where k , k , . . . , k n are positiveintegers, and K , K , . . . , K n their partial sums. It is easy to see that P + ( E ) = k h P a,b (cid:16) E ( k , . . . , k n ) (cid:17) − P a,b (cid:16) E ( k + 1 , k , . . . , k n ) (cid:17)i . From this expression, the distribution of Y can be recovered, and shown to be notthat of independent Bernoulli variables. For instance, P + ( Y = 1) = P a,b ( Y = 1) − P a,b ( Y = 0 , Y = 1) = a ( a + 1)( a + b )( a + b + 1) , and analogously P + ( Y = 1) = a ( a + 2) + 2 ba ( a + 1)( a + b )( a + b + 1)( a + b + 2) . Thus P + ( Y = 1) P + ( Y = 1) = a ( a + 1)( a + 2 a + 2 ba + 2 b )( a + b ) ( a + b + 1) ( a + b + 2) , which does not match P + ( Y = 1 , Y = 1) = a ( a + 2)( a + b )( a + b + 1)( a + b + 2)for a, b > P a,b and P + havethe same distribution, and by Proposition 3.1 conditional on the value of x of aBeta( b, a ) variable, the count vectors are distributed as Q k ≥ Po( a (1 − x k ) /k ). Second Sequence.

Consider P , , the measure for the CMPP model discussedin Example 2.1 and Remark 3.2, with respect to Bernoulli sequence Bern(1 , X , L ) ≡ (0 , { X i } i ≥ are the records from an iid Uniform[0 ,

1] sequence,and L i are Geometric(1 − X i ) for i ≥ P ′ stand for the measure under the “switched” CMPP model where ( X , L )and ( X , L ) are interchanged. The probabilities of Y on cylinder sets (cf. (3.3),under P ′ , is given by P ′ (cid:16) E (1 , k , . . . , k n ) (cid:17) = P ′ ( L = k , . . . , L n = k n )= P , ( L = k , L = k , and L i = k i for 3 ≤ i ≤ n )for positive integers k = 1 , k , . . . , k n , with K = 1 , K = K + k , . . . , K n = K n − + k n as their partial sums. Under both models P , and P ′ , as only twoterms ( L , L ) exchange places, the associated count vectors are the same, and byProposition 3.1 distributed as Q k ≥ Po(1 /k ).We now show that { Y i } i ≥ is not an independent sequence under P ′ . From thecalculation in (3.4) with ( X , L ) ≡ (0 , Y ≡ r ( x,

1) = 1 (take b ↓ a = 1, we can write P ′ ( Y = 1) = P , ( L = 1) = X k ≥ P , ( L = k, L = 1)= X k ≥ Z

36. However, P ′ ( Y = 1) P ′ ( Y = 1) = 11 / =1 / P ′ ( Y = 1 , Y = 1). References [1] Arratia, R., Barbour, A.D. and Tavar´e, S. (1992) Poisson process approximations for theEwens sampling formula.

Ann. Appl. Probab. Logarithmic Combinatorial Structures: AProbabilistic Approach.

European Mathematical Society, Z¨urich.[3] Arratia, R., and Tavar´e, S. (1992) The cycle structure of random permutations.

Ann. Probab. Random Structures and Algorithms Bull. Amer. Math. Soc. Bayesian Nonparametrics , Springer Verlag, NewYork.[7] Holst, Lars (2007) Counts of failure strings in certain Bernoulli sequences. to appear in J.Appl. Probab. [8] Joﬀe, A., Marchand, E., Perron, F. and Popadiuk, P. (2004) On sums of products of Bernoullivariables and random permutations.

Journal of Theoretical Probability Theory Probab. Appl. Ann. Probab. Acta ScientariumMathematica (Szeged) Adventures in Stochastic Processes.

Second Ed. Birkh¨auser, Boston.[13] Sethuraman, Jayaram and Sethuraman, Sunder (2004) On counts of Bernoulli strings andconnections to rank orders and random permutations. In

A festschrift for Herman Rubin.IMS Lecture Notes Monograph Series Department of Statistics, Florida State University, Tallahassee, FL 32306.e-mail: [email protected]

Department of Statistics, Florida State University, Tallahassee, FL 32306.e-mail: [email protected]