aa r X i v : . [ m a t h . P R ] A ug MAXIMAL CLADES IN RANDOM BINARY SEARCHTREES
SVANTE JANSON
Abstract.
We study maximal clades in random phylogenetic trees withthe Yule–Harding model or, equivalently, in binary search trees. We useprobabilistic methods to reprove and extend earlier results on momentasymptotics and asymptotic normality. In particular, we give an expla-nation of the curious phenomenon observed by Drmota, Fuchs and Lee(2014) that asymptotic normality holds, but one should normalize usinghalf the variance. Introduction
Recall that there are two types of binary trees; we fix the notation asfollows. A full binary tree is an rooted tree where each node has either 0or 2 children; in the latter case the two children are designated as left child and right child . A binary tree is a rooted tree where each node has 0, 1or 2 children; moreover, each child is designated as either left child or rightchild , and each node has at most one child of each type. (Both versionscan be regarded as ordered trees, with the left child before the right whenthere are two children.) It is convenient to regard also the empty tree ∅ asa binary tree (but not as a full binary tree). In a full binary tree, the leaves(nodes with no children) are called external nodes ; the other nodes (having 2children) are internal nodes . There is a simple, well-known bijection betweenfull binary trees and binary trees: Given a full binary tree, its internal nodesform a binary tree; this is a bijection, with inverse given by adding, to anygiven binary tree, external nodes as children at all free places.Note that a full binary tree with n internal nodes has n +1 external nodes,and thus 2 n + 1 nodes in total. In particular, the bijection just describedyields a bijection between the full binary trees with 2 n + 1 nodes and thebinary trees with n nodes.If T is a binary, or full binary, tree, we let T L and T R be the subtreesrooted at the left and right child of the root, with T L = ∅ [ T R = ∅ ] if theroot has no left [right] child. Date : 27 August, 2014.2010
Mathematics Subject Classification. A phylogenetic tree is the same as a full binary tree. In this context,the clade of an external node v is defined to be the set of external nodesthat are descendants of the parent of v . (This is called a minimal clade by Blum and Fran¸cois [3] and Chang and Fuchs [6].) Note that two cladesare either nested or disjoint; furthermore, each external node belongs tosome clade (for example its own). Hence, the set of maximal clades formsa partition of the set of external nodes. We let F ( T ) denote the number ofmaximal clades of a phylogenetic tree T . (Except that for technical reasons,see Section 2, we define F ( T ) = 0 for a phylogenetic tree T with only oneexternal node. Obviously, this does not affect asymptotics.) The maximalclades, and the number of them, were introduced by Durand, Blum andFran¸cois [11], together with a biological motivation, and further studied byDrmota, Fuchs and Lee [10].The phylogenetic trees that we consider are random; more precisely, weconsider the Yule–Harding model of a random phylogenetic tree ¯ T n with agiven number n internal, and thus n +1 external, nodes. These can be definedrecursively, with ¯ T the unique phylogenetic tree with 1 node (the root), and¯ T n +1 obtained from ¯ T n ( n >
0) by choosing an external node uniformly atrandom and converting it to an internal node with two external children.(Alternatively, we obtain the same random model by constructing the treebottom-up by Kingman’s coalescent [17], see further Aldous [2], Blum andFran¸cois [3] and Chang and Fuchs [6].) Recall that, for any n >
1, thenumber of internal nodes in the left subtree ¯ T n, L (or the right subtree ¯ T n, R )is uniformly distributed on { , . . . , n − } , and that conditioned on thisnumber being m , ¯ T n, L has the same distribution as ¯ T m ; see also Remark 5.1.Under the bijection above, the Yule–Harding random tree ¯ T n correspondsto the random binary search tree T n with n nodes, see e.g. Blum, Fran¸coisand Janson [4] and Drmota [9].The random variable that we study is thus X n := F ( ¯ T n ), the number ofmaximal clades in the Yule–Harding model. It was proved by Durand andFran¸cois [12] that the mean number of maximal clades E X n ∼ αn , where α = 1 − e − . (1.1)This was reproved by Drmota, Fuchs and Lee [10], in a sharper form: Theorem 1.1 ([12; 10]) . E X n = E F ( T n ) = αn + O (1) , (1.2) where α is given by (1.1) . Moreover, Drmota, Fuchs and Lee [10] found also corresponding resultsfor the variance and higher central moments:
Theorem 1.2 ([10]) . As n → ∞ , E ( X n − E X n ) ∼ α n log n, (1.3) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 3 and for any fixed integer k > , E ( X n − E X n ) k ∼ ( − k kk − α k n k − . (1.4)As a consequence of (1.3)–(1.4), the limit distribution of F ( ¯ T n ) (aftercentering and normalization) cannot be found by the method of moments.Nevertheless, [10] further proved asymptotic normality, where, unusually,the normalizing uses (the square root of) half the variance: Theorem 1.3 ([10]) . As n → ∞ , X n − E X n p α n log n d −→ N (0 , . (1.5)Here and below, d −→ denotes convergence in distribution; similarly, p −→ will denotes convergence in probability. Unspecified limits (including im-plicit ones such as ∼ and o (1)) will be as n → ∞ . Furthermore, Y p = o p ( a n ),for random variables Y n and positive numbers a n , means Y n /a n p −→
0. Welet
C, C , C , . . . denote some unspecified positive constants.The purpose of the present paper is to use probabilistic methods to re-prove these theorems, together with some further results; we hope that thiscan give additional insight, and it might perhaps also suggest future gener-alizations to other types of random trees.In particular, we can explain the appearance of half the variance in The-orem 1.3 as follows:Fix a sequence of numbers N = N ( n ), and say that a clade is small ifit has at most N + 1 elements, and large otherwise. (We use N + 1 in thedefinition only for later notational convenience; the subtree correspondingto a small clade has at most N internat nodes.) Let X Nn be the number ofmaximal small clades, i.e., the small clades that are not contained in anyother small clade. It turns out that a suitable choice of N is about √ n ; wegive two versions in the next theorem. Theorem 1.4. (i)
Let N := √ n . Then Var( X Nn ) ∼ α n log n and X Nn − E X Nn p Var X Nn d −→ N (0 , . (1.6) Furthermore, X n − X Nn = o p (cid:0)p Var X Nn (cid:1) and E X n − E X Nn = o (cid:0)p Var X Nn (cid:1) ,so we may replace X Nn by X n in the numerator of (1.6) . However, Var( X n − X Nn ) ∼ Var( X Nn ) ∼ α n log n. (1.7)(ii) Let √ n ≪ N ≪ √ n log n , for example N := n log log n . Then theconclusions of (i) still hold; moreover, P ( X n = X Nn ) → . The theorem thus shows that the large clades are rare, and do not con-tribute to the asymptotic distribution; however, when they appear, thelarges clades give a large (actually negative) contribution to X n , and as SVANTE JANSON a result, half the variance of X n comes from the large clades. (When thereis a large clade, there is less room for other clades, so X n tends to be smallerthan usually. See also (2.4) and (2.2) below.)For higher moments, the large clades play a similar, but even more ex-treme, role. Note that (for n >
2) with probability 2 /n , the root of ¯ T n has one internal and one external node, and then there is a clade consist-ing of all external nodes; this is obviously the unique maximal clade, andthus X n = 1. Since E X n = αn + O (1) by Theorem 1.1, we thus have X n − E X n = − αn + O (1) with probability 2 /n , and this single exceptionalevent gives a contribution ∼ ( − k α k n k − to E ( X n − E X n ) k , which ex-plains a fraction ( k − /k of the moment (1.4); in particular, this explainswhy the moment is of order n k − .We shall see later that, roughly speaking, the moment asymptotic in (1.4)is completely explained by extremely large clades of size Θ( n ), which appearin the O (1) first generations of the tree.This will also lead to a version of (1.4) for absolute central moments: Theorem 1.5.
For any fixed real p > , as n → ∞ , E (cid:12)(cid:12) X n − E X n (cid:12)(cid:12) p ∼ pp − α p n p − . (1.8)In Section 2, we transfer the problem from random phylogenetic trees torandom binary search tree, which we shall use in the proofs. The theoremsabove are proved in Sections 3–7.2. Binary trees
We find it technically convenient to work with binary trees instead offull binary trees (phylogenetic trees), so we use the bijection in Section 1to define F ( T ) also for binary trees T . (We use the same notation F ; thisshould not cause any confusion.) With this translation, our problem is thusto study X n := F ( T n ), where T n is the binary search tree with n nodes.The clades in a phylogenetic tree correspond to the internal nodes thathave at least one external child, i.e., the nodes in the corresponding binarytree that have outdegree at most 1. We call such nodes green . For a binarytree T , the number F ( T ) is thus the number of maximal green nodes , i.e.,the number of green nodes that have no green ancestor. (This holds also forthe phylogenetic tree T with a single node, and thus for the empty binarytree, with our definition F ( T ) = 0 in this case.)It follows that, for any binary tree T , F ( T ) := ( T has a green root ,F ( T L ) + F ( T R ) otherwise . (2.1) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 5
Define, for a binary tree T , f ( T ) := F ( T ) − F ( T L ) − F ( T R ) = − F ( T R ) , T L = ∅ , T = ∅ , − F ( T L ) , T R = ∅ , T = ∅ , , otherwise . (2.2)Then F ( T ) is given by the recursion F ( T ) = F ( T L ) + F ( T R ) + f ( T ) , (2.3)and thus F ( T ) = X v ∈ T f ( T v ) , (2.4)where T v is the subtree rooted at v , consisting of v and all its descendants.In another words, F ( T ) is the additive functional defined by the toll function f ( T ). The advantage of this point of view is that we have eliminated themaximality condition and now sum over all subtrees T v , and that we canuse general results for this type of sums, see Holmgren and Janson [16].We let T denote the random binary search tree with a random numberof elements such that P ( |T | = n ) = 2 / (( n + 1)( n + 2)), n >
1. The randombinary tree T can be constructed by a continuous-time branching process:Let ( ˜ T t ) t > be the growing tree that starts with an isolated root at time t = 0 and such that each existing node gets a left and a right child afterrandom waiting times that are independent and Exp(1); we stop the processat a random time τ ∼ Exp(1), independent of everything else, and cantake T = ˜ T τ , see Aldous [1] (where it is also proved that T is the limit indistribution of a random fringe tree in a binary search tree).3. The mean
Recall that T n is the random binary search tree with n nodes. Define ν n := E F ( T n ) and µ n := E f ( T n ), with F and f as in Section 2. (Inparticular, ν = µ = 0, while ν = µ = 1 since F ( T ) = f ( T ) = 1.) For n > T n, L is empty with probability 1 /n , and conditioned on this event, T n, R has the same distribution as T n − . The same holds if we interchange L and R . Hence, taking the expectation in (2.2), µ n = n (cid:0) − E F ( T n − ) (cid:1) = n (cid:0) − ν n − (cid:1) , n > . (3.1)Furthermore, we see that (2.2) implies P (cid:0) f ( T n ) = 0 (cid:1) /n. (3.2)Since obviously 0 F ( T ) | T | , we have by (2.2) also −| T | f ( T ) | f ( T ) | | T | (3.3)for any binary tree T . In particular, this and (3.2) yield | µ n | E | f ( T n ) | n P (cid:0) f ( T n ) = 0 (cid:1) . (3.4) SVANTE JANSON
It is now a simple consequence of general results that ν n := E F ( T n ) isasymptotically linear in n . Recall the random binary tree T defined inSection 2. Lemma 3.1. ν n := E F ( T n ) = nα + O (1) , (3.5) where α := E f ( T ) = ∞ X n =1 n + 1)( n + 2) E f ( T n ) = ∞ X n =1 n + 1)( n + 2) µ n = ∞ X n =1 n ( n + 1)( n + 2) (1 − ν n − ) . (3.6) Proof.
An instance of Holmgren and Janson [16, Theorem 3.8]. More ex-plicitly, see [16, Theorem 3.4], E F ( T n ) = ( n + 1) n − X k =1 k + 1)( k + 2) µ k + µ n , (3.7)which implies the result by (3.4) and (3.1). (cid:3) In order to prove Theorem 1.1, it remains to show that α defined in (3.6)equals (1 − e − ) / Lemma 3.2. E f ( T ) = 1 − e − . (3.8)We can prove Lemma 3.2 by probabilistic methods, using the constructionof T by a branching process in Section 2. However, this proof is considerablylonger than the proof of Theorem 1.1 by singularity analysis of generatingfunctions in [12] and [10]; we nevertheless find the probabilistic proof inter-esting, and perhaps useful for future generalizations, but since the methodsin it are not needed for other results in the present paper, we postpone ourproof of Lemma 3.2 to Section 7.4. Variance
Let γ n := Var( f ( T n )) and σ n := Var( F ( T n )). Then γ = γ = σ = σ =0 and, for n >
2, using (2.2), γ n = E f ( T n ) − µ n = 2 n E (cid:0) F ( T n − ) − (cid:1) − µ n n n = 2 n. (4.1)Before proving the variance asymptotics in (1.3), we begin with a weakerestimate. Lemma 4.1.
For n > , σ n := Var F ( T n ) = O ( n log n ) . (4.2) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 7
Proof.
By [16, Theorem 3.9], where it suffices to sum to n since we mayreplace f ( T ) by 0 for | T | > n without changing F ( T n ), σ n Cn (cid:18) n X k =1 γ k k / (cid:19) + sup k γ k k + n X k =1 µ k k ! = O ( n log n ) , (4.3)using (4.1) and (3.4), provided n >
2. The case n = 1 is trivial. (cid:3) Write f ( T ) = g ( T ) + h ( T ), where g ( T ) := ( − ν | T |− , T L = ∅ , T = ∅ or T R = ∅ , T = ∅ , , otherwise . (4.4)and thus, see (2.2), h ( T ) := ν | T R | − F ( T R ) , T L = ∅ ,ν | T L | − F ( T L ) , T R = ∅ , , otherwise . (4.5)Then g ( T ) = 1, h ( T ) = 0, and, for k >
2, using (3.1) and (3.4), E g ( T k ) = 2 k (cid:0) − ν k − (cid:1) = µ k = O (1) , (4.6) E h ( T k ) = 2 k E (cid:0) ν k − − F ( T k − ) (cid:1) = 0 , (4.7)and, using Lemma 4.1,Var h ( T k ) = 2 k E (cid:0) ν k − − F ( T k − ) (cid:1) = 2 k σ k − = O (log k ) . (4.8)Let, for an arbitrary binary tree T , G ( T ) := X v ∈ T g ( T v ) and H ( T ) := X v ∈ T h ( T v ) , (4.9)so by (2.4), F ( T ) = G ( T ) + H ( T ) . (4.10) Lemma 4.2.
For n > , E G ( T n ) = ν n , (4.11) E H ( T n ) = 0 , (4.12)Var H ( T n ) = O ( n ) . (4.13) Proof.
By [16, Theorem 3.4], cf. (3.7), and (4.7), E H ( T n ) = ( n + 1) n − X k =1 k + 1)( k + 2) E h ( T k ) + E h ( T n ) = 0 , (4.14)which proves (4.12). This implies (4.11), since by (4.10), E G ( T n ) = E F ( T n ) − E H ( T n ) = ν n . (4.15) SVANTE JANSON
Similarly, by [16, Theorem 3.9], cf. (4.3), and (4.7)–(4.8),Var H ( T n ) Cn (cid:18) ∞ X k =1 log kk / (cid:19) + sup k > log kk + 0 ! = O ( n ) . (cid:3) We shall see that this means that H ( T n ) is asymptotically negligible, andthus it suffices to consider G ( T n ).Note that g ( T ) depends only on the sizes | T L | and | T R | . This enables usto easily estimate the variance of G ( T n ). Theorem 4.3.
For all n > , Var G ( T n ) = 4 α n log n + O ( n ) . (4.16) Proof.
Write g ( T ) = g ( | T | , | T L | , | T R | ). (We only care about g ( k, j, l ) when j + l = k −
1, but use three arguments for emphasis.) Thus g ( k, , k −
1) = g ( k, k − ,
0) = 1 − ν k − and otherwise g ( k, j, k − j −
1) = 0. Let, as in [16,Theorem 1.29], I k be uniformly distributed on { , . . . , k − } and ψ k := E (cid:0) ν I k + ν k − − I k + g ( k, I k , k − − I k ) − ν k (cid:1) = 1 k k − X j =1 ( ν j + ν k − − j − ν k ) + 2 k (cid:0) ν k − + 1 − ν k − − ν k (cid:1) = 1 k k − X j =1 ( ν j + ν k − − j − ν k ) + 2 k ( ν k − = O (1) + 2 k (cid:0) αk + O (1) (cid:1) = 2 α k + O (1) , (4.17)where we used that ν j = αj + O (1) by Theorem 1.1. By [16, Lemma 7.1],thenVar G ( T n ) = ( n + 1) n − X k =1 k + 1)( k + 2) ψ k + ψ n = ( n + 1) n − X k =1 α k + O (1)( k + 1)( k + 2) + O ( n ) = ( n + 1) n − X k =1 α k + O ( n )= 4 α n log n + O ( n ) . (4.18) (cid:3) We can now prove (1.3) in Theorem 1.2. (Higher moments are treated inSection 6.)
Theorem 4.4.
For all n > , Var F ( T n ) = 4 α n log n + o ( n log n ) . (4.19)This follows from (4.10), (4.16) and (4.13) by Minkowski’s inequality (thetriangle inequality for √ Var ).
AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 9 Asymptotic normality
We prove the central limit theorem Theorem 1.3 by a martingale centrallimit theorem for a suitable martingale that we construct in this section.Consider the infinite binary tree T ∞ , where each node has two children,and denote its root by o . We may regard any binary tree T as a subtree of T ∞ with the same root o . (In the general sense that the node set V ( T ) is asubset of V ∞ := V ( T ∞ ), and that the left and right children are the same asin T ∞ , when they exist.) In particular we regard the random binary searchtree T n as a subtree of T ∞ .Order the nodes in T ∞ in breadth-first order as v (1) = o, v (2) , . . . , and let V j := { v (1) , . . . , v ( j ) } be the set of the first j nodes. Let F j be the σ -fieldgenerated by the sizes |T n,v, L | and |T n,v, R | of the two child subtrees of T n ateach node v ∈ V j . Equivalently, we may regard V j as the internal nodes ina full binary tree; let ∂V j be the corresponding set of j + 1 external nodes.Then F j is generated by the subtree sizes |T n,v | for all v ∈ ∂V j , togetherwith the indicators { v ∈ T n } , v ∈ V j , that describe T n ∩ V j . (We regardthe subtree T n,v as defined for all v ∈ V ∞ , with T n,v = ∅ if v / ∈ T n .) Then,conditioned on F j , T n consists of some given subtree of V j together withattached subtrees T n,v at all nodes v ∈ ∂V j ; these are independent binarysearch trees of some given orders.We allow here j = 0; V = ∅ and F is the trivial σ -field. Remark 5.1.
As is well-known, see e.g. [9], another construction of therandom binary search tree T n ( n >
1) is to let the random variable I n beuniformly distributed on { , . . . , n − } , and to let T n be defined recursivelysuch that, given I n , T n, L and T n, R are independent binary search trees with |T n, L | = I n and |T n, R | = n − − I n . (When the tree is used to sort n keys, I n tells how many of the keys that are assigned to the left subtree.) The pair( I n , n − − I n ) thus tells how the tree is split at the root, and there is asimilar pair for each node. Then F j is generated by these pairs (i.e., splits)for the nodes v , . . . , v j .Recall that g ( T ) by (4.4) depends only on the sizes | T L | and | T R | . Hence, F j specifies the value of g ( T n,v ) for every v ∈ V j , and it follows that E (cid:0) G ( T n ) | F j (cid:1) = E (cid:16) X v ∈ V ∞ g ( T n,v ) (cid:12)(cid:12)(cid:12) F j (cid:17) = X v ∈ V j g ( T n,v ) + X v ∈ ∂V j ν |T n,v | . (5.1)Since the sequence of σ -fields ( F j ) ∞ is increasing, the sequence M n,j := E (cid:0) G ( T n ) | F j (cid:1) , j >
0, is a martingale (for any fixed n ). It follows from (5.1)that the martingale differences are∆ M n,j := M n,j − M n,j − = g ( T n,v ( j ) ) + ν |T n,v ( j ) L | + ν |T n,v ( j ) R | − ν |T n,v ( j ) | , (5.2)where v ( j ) L and v ( j ) R are the children of v ( j ). It follows easily that, with ψ k defined in (4.17), E (cid:0) | ∆ M n,j | | F j − (cid:1) = E (cid:0) | ∆ M n,j | | |T n,v ( j ) | (cid:1) = ψ |T n,v ( j ) | . (5.3) Consequently, the conditional square function is given by W n := ∞ X j =1 E (cid:0) | ∆ M n,j | | F j − (cid:1) = X v ∈ V ∞ ψ |T n,v | = X v ∈T n ψ |T n,v | . (5.4)(It suffices to sum over v ∈ T n , since ψ = 0.) This is again a sum ofthe same type as (2.4) and (4.9), for the random tree T n . (Note that thetoll function ψ | T | here depends only on the size of T .) In particular, [16,Theorem 3.4] applies (in this case we can also use [7], [8] or [13]); this yields E W n = ( n + 1) n − X k =1 k + 1)( k + 2) ψ k + ψ n . (5.5)If j is large enough, say j > n , then V ( T n ) ⊆ V j and thus M n,j = G ( T n ).In particular, G ( T n ) = M n, ∞ . Thus, by a standard (and simple) martingaleidentity, Var G ( T n ) = Var M n, ∞ = E W n ; hence (5.5) yields the first equalityin (4.18). (This is no coincidence; the proof just given of (5.5) is essentiallythe same as the proof of [16, Lemma 7.1] that was used in (4.18), but statedin martingale formulation.)We now split the sum G ( T n ) into two parts, roughly corresponding tosmall and large clades. We fix a cut-off N = N ( n ); for definiteness andsimplicity we choose N = N ( n ) := √ n , but we note that the argumentsbelow hold with a few minor modifications for any N > √ n with N = o ( √ n log n ). We then define, for binary trees T , g ′ ( T ) := g ( T ) {| T | N } (5.6) g ′′ ( T ) := g ( T ) {| T | > N } = g ( T ) − g ′ ( T ) . (5.7)In analogy with (2.4) and (4.9), we define further G ′ ( T ) := X v ∈ T g ′ ( T v ) and G ′′ ( T ) := X v ∈ T g ′′ ( T v ); (5.8)thus G ( T ) = G ′ ( T ) + G ′′ ( T ). We shall see that, asymptotically, both G ′ ( T n )and G ′′ ( T ) contribute to the variance with equal amounts, but nevertheless G ′′ ( T n ) is negligible (in probability).We begin with the main term G ′ ( T n ). Lemma 5.2. As n → ∞ , Var (cid:0) G ′ ( T n ) (cid:1) = 2 α n log n + O ( n ) , (5.9) G ′ ( T n ) − E G ′ ( T n ) p α n log n d −→ N (0 , . (5.10) Proof.
We define ν ′ n := E G ′ ( T n ). Note that g ′ ( T ) depends only on thesizes | T L | and | T R | . Hence we can repeat the argument above and definea martingale M ′ n,j := E (cid:0) G ′ ( T n ) | F j (cid:1) , j >
0, with G ′ ( T n ) = M ′ n, ∞ andmartingale differences ∆ M ′ n,j = ϕ ′ ( T n,v ( j ) ) , (5.11) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 11 where we define, cf. (5.2), ϕ ′ ( T ) := g ′ ( T ) + ν ′| T L | + ν ′| T R | − ν ′| T | . (5.12)By [16, Theorem 3.4] again, cf. (3.7) and (5.5), using E g ( T k ) = µ k = O (1)by (4.6), ν ′ m = ( m + 1) m − X k =1 k + 1)( k + 2) E g ′ ( T k ) + E g ′ ( T m )= ( m + 1) ( m − ∧ N X k =1 k + 1)( k + 2) E g ( T k ) + O (1)= ( m + 1) N X k =1 k + 1)( k + 2) µ k + O (1) . (5.13)Hence, (5.12) yields, after cancellations, ϕ ′ ( T ) = g ′ ( T ) + O (1) = ( g ( T ) + O (1) , | T | N,O (1) , | T | > N. (5.14)Let ψ ′ k := E | ϕ ′ ( T k ) | . (5.15)Then, by (5.14), (4.4) and (3.5), cf. (4.17), ψ ′ k = ( E (cid:0) g ( T k ) + O (1) (cid:1) = 2 α k + O (1) , k N,O (1) , k > N. (5.16)Furthermore, by (5.11) and (5.15), E (cid:0) | ∆ M ′ n,j | | F j − (cid:1) = E (cid:0) | ϕ ′ ( T n,v ( j ) ) | | |T n,v ( j ) | (cid:1) = ψ ′|T n,v ( j ) | . (5.17)Hence, the conditional square function of ( M ′ n,j ) j is W ′ n := ∞ X j =1 E (cid:0) | ∆ M ′ n,j | | F j − (cid:1) = X v ∈ V ∞ ψ ′|T n,v | = X v ∈T n ψ ′|T n,v | . (5.18)Yet another application of [16, Theorem 3.4] yields, using (5.16), E W ′ n = ( n + 1) n − X k =1 k + 1)( k + 2) ψ ′ k + ψ ′ n = ( n + 1) N X k =1 α k ( k + 1)( k + 2) + O ( n )= 4 α n log N + O ( n ) = 2 α n log n + O ( n ) . (5.19)Since Var G ′ ( T n ) = Var (cid:0) M ′ n, ∞ (cid:1) = E W ′ n , (5.9) follows from (5.19). Moreover, the representation (5.18) and [16, Theorem 3.9] (again sum-ming only to n , as we may) yield, noting that the toll function ψ ′| T | dependsonly on the size of T , using (5.16),Var( W ′ n ) Cn n X k =1 ( ψ ′ k ) k C n N X k =1 C n n X k =1 k = O ( nN ) = O ( n ) . (5.20)Hence, Var (cid:0) W ′ n / ( n log n ) (cid:1) → n → ∞ , which together with (5.19) implies W ′ n n log n p −→ α . (5.21)Note also that g ( T ) = O ( | T | ) by (4.4) and (3.5), and thus (5.14) implies ϕ ′ ( T ) = O ( N ) for all trees T . Thus (5.11) yieldssup j | ∆ M n,j |√ n log n = O (cid:16) N √ n log n (cid:17) = o (1) . (5.22)We now apply the central limit theorem for martingale triangular arrays,in the form in [5, Corollary 1] (see also [15, Theorem 3.1]), which shows that(5.21) and (5.22) together imply G ′ ( T n ) − E G ′ ( T n ) √ n log n = M n, ∞ − E M n, ∞ √ n log n d −→ N (cid:0) , α (cid:1) . (5.23)(Actually, [5, Corollary 1] assumes instead of (5.22) only a conditionalLindeberg condition, which is a trivial consequence of the uniform bound(5.22).) (cid:3) Remark 5.3.
We used the breadth-first order above as just one convenientorder. It is perhaps more natural to consider instead of the sets V j arbitrarynode sets V of (finite) subtrees of T ∞ that include the root o . This wouldgive us, instead of ( M n,j ) j , a martingale indexed by binary trees. However,we have no use for this exotic object here, and use instead the standardmartingales above. Lemma 5.4. E | G ′′ ( T n ) | = O (cid:0) √ n (cid:1) , (5.24)Var( G ′′ ( T n )) = 2 α n log n + O ( n ) . (5.25) Proof.
By (5.7), (4.4) and (4.6), E | g ′′ ( T k ) | = | E g ( T k ) | · { k > N } = O (1) · { k > N } (5.26)and thus, using the triangle inequality and [16, Theorem 3.4], E | G ′′ ( T n ) | ( n + 1) n − X N k + 1)( k + 2) E | g ′′ ( T k ) | + E | g ′′ ( T n ) | = O (cid:16) nN (cid:17) , yielding (5.24). AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 13
For the variance, we use either [16, Theorem 1.29] as in the proof ofTheorem 4.4, or the (essentially equivalent) martingale argument in (5.11)–(5.19) and conclude that, with some ψ ′′ k satisfying ψ ′′ k = ( O (1) , k N, E (cid:0) g ( T k ) + O (1) (cid:1) = 2 α k + O (1) , k > N, (5.27)we have Var G ′′ ( T n ) = ( n + 1) n − X k =1 k + 1)( k + 2) ψ ′′ k + ψ ′′ n = ( n + 1) n − X k = ⌊ N ⌋ +1 α kk + O ( n )= 4 α n log( n/N ) + O ( n ) = 2 α n log n + O ( n ) . (cid:3) Proof of Theorem 1.3.
It follows from (5.24) that G ′′ ( T n ) − E G ′′ ( T n ) p α n log n p −→ , (5.28)which together with (5.10) yields G ( T n ) − E G ( T n ) p α n log n d −→ N (0 , . (5.29)Similarly, (4.13) implies H ( T n ) − E H ( T n ) p α n log n p −→ , (5.30)which together with (5.29) yields (1.5), recalling X n = F ( T n ) = G ( T n ) + H ( T n ) by (4.10). (cid:3) Proof of Theorem 1.4. (i). Define, similarly to (5.6)–(5.7), f ′ ( T ) := f ( T ) {| T | N } , f ′′ ( T ) := f ( T ) {| T | > N } , (5.31) h ′ ( T ) := h ( T ) {| T | N } , h ′′ ( T ) := h ( T ) {| T | > N } , (5.32)and corresponding sums F ′ ( T ) := P v ∈ T f ′ ( T v ) and similarly F ′′ ( T ), H ′ ( T ), H ′′ ( T ). The argument in (2.1)–(2.4) is easily modified and shows that X Nn = F ′ ( T n ) = G ′ ( T n ) + H ′ ( T n ) . (5.33)The same proof as for Lemma 4.2 yields alsoVar H ′ ( T n ) = O ( n ) and Var H ′′ ( T n ) = O ( n ) . (5.34)Hence, (1.6) follows from Lemma 5.2 and (5.33).Furthermore, X n − X Nn = F ′′ ( T n ) = G ′′ ( T n ) + H ′′ ( T n ) . (5.35) By (5.33) and (5.35), (1.7) follows from (5.9) and (5.25), using (5.34) andMinkowski’s inequality. Similarly, E | X n − X Nn | E | G ′′ ( T n ) | + E | H ′′ ( T n ) | = O ( √ n ) , (5.36)using (5.24), (5.34) and H¨older’s inequality, together with E H ′′ ( T n ) = 0,which is proved as (4.12).(ii). The conclusions of (i) hold by the same proofs (with some minormodifications in some estimates).Moreover, let Z n,k be the number of clades of size k + 1. Then, for n > E Z n,k = nk ( k +1)( k +2) , k < n, n , k = n, , k > n, (5.37)see [6, Theorem 1]. (This can be seen as another example of [16, Theorem3.4].) Consequently, P ( X n = X Nn ) P (cid:16) X k>N Z n,k > (cid:17) E X k>N Z n,k = n − X ⌊ N ⌋ +1 nk ( k + 1)( k + 2) + 2 n = O (cid:16) nN (cid:17) + O (cid:16) n (cid:17) = o (1) , (5.38)which completes the proof. (cid:3) Higher moments
We begin the proof of Theorem 1.5 by proving a weaker estimate. We let k X k p := ( E X p ) /p for any random variable X . Recall that ν n := E F ( T n ). Lemma 6.1.
For any fixed real p > , and all n > , E (cid:12)(cid:12) F ( T n ) − ν n (cid:12)(cid:12) p C ( p ) n p − . (6.1) Equivalently, (cid:13)(cid:13) F ( T n ) − ν n (cid:13)(cid:13) p = O ( n − /p ) . (6.2) Proof.
Fix p > m > C i belowmay depend on p but not on m .) Let V j and F j be as in Section 5, andwrite V ′ m := V m − , F ′ m := F m − . Thus ∂V ′ m consists of the 2 m nodes in T ∞ of depth m , and V ′ m consists of the 2 m − T , F ( T ) = X v ∈ V ′ m f ( T v ) + X v ∈ ∂V ′ m F ( T v ) . (6.3) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 15
Furthermore, by (1.2), X v ∈ ∂V ′ m ν | T v | = X v ∈ ∂V ′ m (cid:0) α | T v | + O (1) (cid:1) = α X v ∈ ∂V ′ m | T v | + O (2 m )= α | T | + O (2 m ) = ν | T | + O (2 m ) . (6.4)Hence, by combining (6.3) and (6.4), F ( T ) − ν | T | = X v ∈ V ′ m f ( T v ) + X v ∈ ∂V ′ m (cid:0) F ( T v ) − ν | T v | (cid:1) + O (2 m ) . (6.5)We shall use this decomposition for the binary search tree T n . Note firstthat by (3.2)–(3.3), E | f ( T n ) | p n p P (cid:0) f ( T n ) = 0 (cid:1) n p − . (6.6)(This holds for any p > p = 1.)Hence, for any v ∈ V ∞ , E (cid:0) | f ( T n,v ) | p (cid:12)(cid:12) |T n,v | (cid:1) |T n,v | p − n p − , (6.7)and thus E | f ( T n,v ) | p n p − . (6.8)Let Y := P v ∈ V ′ m f ( T n,v ) be the first sum in (6.5) for T = T n . By Minkowski’sinequality and (6.8), k Y k p X v ∈ V ′ m k f ( T n,v ) k p m /p n ( p − /p . (6.9)Let Z := P v ∈ ∂V ′ m (cid:0) F ( T n,v ) − ν |T n,v | (cid:1) be the second sum in (6.5) for T = T n .The σ -field F ′ m specifies the sizes of the subtrees T n,v for v ∈ ∂V ′ m , andconditioned on F ′ m , these subtrees are independent and distributed as T n ( v ) of the given sizes n ( v ). Hence, conditionally on F ′ m , the terms in the sum Z are independent and have means zero, so we can apply Rosenthal’s inequality[14, Theorem 3.9.1], which yields E (cid:0) | Z | p | F ′ m (cid:1) C X v ∈ ∂V ′ m E (cid:0) | F ( T n,v ) − ν |T n,v | | p | F ′ m (cid:1) + C (cid:16) X v ∈ ∂V ′ m E (cid:0) | F ( T n,v ) − ν |T n,v | | | F ′ m (cid:1)(cid:17) p/ . (6.10)We note first that by (1.3), E (cid:0) | F ( T n,v ) − ν |T n,v | | | F ′ m (cid:1) C |T n,v | log |T n,v | C |T n,v | log n, (6.11)and thus X v ∈ ∂V ′ m E (cid:0) | F ( T n,v ) − ν |T n,v | | | F ′ m (cid:1) C X v ∈ ∂V ′ m |T n,v | log n C n log n. (6.12) Hence the second term on the right-hand side in (6.10) is C ( n log n ) p/ .Taking the expectation in (6.10) we thus obtain E | Z | p C X v ∈ ∂V ′ m E | F ( T n,v ) − ν |T n,v | | p + C ( n log n ) p/ . (6.13)Let A n := E | F ( T n ) − ν n | p . We can write (6.5) for T = T n as F ( T n ) − ν n = Y + Z + O (2 m ) . (6.14)Thus, by Minkowski’s inequality, (6.9) and (6.13), A n = E (cid:12)(cid:12) Y + Z + O (2 m ) (cid:12)(cid:12) p p (cid:0) E | Y | p + E | Z | p + O (2 m ) (cid:1) C mp n p − + C E | Z | p + C m C E | Z | p + C mp n p − . (6.15)Furthermore, (6.13) can be written E | Z | p C X v ∈ ∂V ′ m E A |T n,v | + C ( n log n ) p/ . (6.16)We prove the lemma by induction, and assume that A k Ck p − forall k < n . Since |T n,v | < n for every v ∈ ∂V ′ m , (6.16) and the inductivehypothesis yield E | Z | p C C X v ∈ ∂V ′ m E |T n,v | p − + C ( n log n ) p/ . (6.17)If v is a child of the root, then |T n,v | is uniformly distributed on { , . . . , n − } ,so |T n,v | d = ⌊ nU ⌋ nU , where U ∼ U (0 ,
1) is uniformly distributed on [0 , m , it follows that for any v ∈ ∂V ′ m , |T n,v | n m Y i =1 U i , (6.18)with U , . . . , U m independent and U (0 , E |T n,v | p − E (cid:16) n p − m Y i =1 U p − i (cid:17) = n p − m Y i =1 E U p − i = n p − (1 /p ) m , (6.19)since E U p − i = R u p − d u = 1 /p . There are 2 m nodes in ∂V ′ m , and thus(6.17) yields E | Z | p C C m (1 /p ) m n p − + C ( n log n ) p/ , (6.20)which together with (6.15) yields, since ( n log n ) p/ = O ( n p − ) when p > A n C C C (2 /p ) m n p − + C C ( n log n ) p/ + C mp n p − C C C (2 /p ) m n p − + C mp n p − . (6.21)Now choose m such that (2 /p ) m C C < / p > C := 2 mp +1 C . With these choices, (6.21) yields A n Cn p − + Cn p − = Cn p − . (6.22) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 17
In other words, we have proved the inductive step: A k Ck p − for k < n implies A n Cn p − . Consequently, this is true for all n >
0, i.e., (6.1) holds.(The initial cases n = 0 and n = 1 are trivial, since A = A = 0.) (cid:3) Lemma 6.2.
For any fixed real p > , as n → ∞ , k F ( T n ) k p ∼ αn, (6.23) k f ( T n ) k p ∼ /p αn − /p . (6.24) Proof.
By Minkowski’s inequality, (6.2) and (1.2), (cid:13)(cid:13) F ( T n ) (cid:13)(cid:13) p = (cid:12)(cid:12) E F ( T n ) (cid:12)(cid:12) + O ( n − /p ) = αn + O ( n − /p ) ∼ αn, (6.25)which is (6.23).For n >
2, it follows from (2.2) that E | f ( T n ) | p = 2 n E | − F ( T n − ) | p = 2 n k F ( T n − ) − k pp ∼ α p n p − , (6.26)since (6.23) obviously implies also k F ( T n ) − k p ∼ αn . (cid:3) The idea in the proof of Theorem 1.5 is to approximate E | X n − E X n | p = E (cid:12)(cid:12)P v (cid:0) f ( T n,v ) − E f ( T n,v ) (cid:1)(cid:12)(cid:12) p by E P v (cid:12)(cid:12) f ( T n,v ) − E f ( T n,v ) (cid:12)(cid:12) p , or simpler by E P v (cid:12)(cid:12) f ( T n,v ) (cid:12)(cid:12) p = P v E (cid:12)(cid:12) f ( T n,v ) (cid:12)(cid:12) p . The heuristic reason for this is that themoment E (cid:12)(cid:12)P v (cid:0) f ( T n,v ) − E f ( T n,v ) (cid:1)(cid:12)(cid:12) p is dominated by the event when thereis one large term (corresponding to one large clade, cf. the discussion beforeTheorem 1.5), and then (cid:12)(cid:12)(cid:12)X v (cid:0) f ( T n,v ) − E f ( T n,v ) (cid:1)(cid:12)(cid:12)(cid:12) p ≈ X v (cid:12)(cid:12) f ( T n,v ) − E f ( T n,v ) (cid:12)(cid:12) p ≈ X v | f ( T n,v ) | p . (6.27)We shall justify this in several steps. We begin by finding the expectationof the final sum in (6.27), cf. the sought result (1.8). Lemma 6.3. As n → ∞ , E X v ∈T n | f ( T n,v ) | p ∼ pp − α p n p − . (6.28) Proof.
We apply again [16, Theorem 3.4] and obtain E X v ∈T n | f ( T n,v ) | p = ( n + 1) n − X k =1 k + 1)( k + 2) E | f ( T k ) | p + E | f ( T n ) | p . (6.29)By (6.26), 2( k + 1)( k + 2) E | f ( T k ) | p ∼ k · α p k p − = 4 α p k p − (6.30) as k → ∞ , and it follows that, as n → ∞ , using p > E X v ∈T n | f ( T n,v ) | p ∼ ( n + 1) n − X k =1 α p k p − + 2 α p n p − ∼ n α p p − n p − + 2 α p n p − = 2 pp − α p n p − . (cid:3) Next we take again some m > n , and thus, since p > k Z k p C (2 /p ) m/p n − /p + O (cid:0) ( n log n ) / (cid:1) = C (2 /p ) m/p n − /p + o (cid:0) n − /p (cid:1) . (6.31)Consequently, by (6.14) and Minkowski’s inequality, (cid:12)(cid:12) k F ( T n ) − ν n k p − k Y k p (cid:12)(cid:12) k Z k p + O (2 m ) = C (2 /p ) m/p n − /p + o (cid:0) n − /p (cid:1) . (6.32)In particular, (6.32) and (6.2) imply k Y k p = O ( n − /p ). By the mean valuetheorem, | x p − y p | p | x − y | max { x p − , y p − } (6.33)for any x, y >
0; hence (6.32) implies, using also (6.2) again, E | F ( T n ) − ν n | p − E | Y | p = O (cid:0) (2 /p ) m/p n p − (cid:1) + o (cid:0) n p − (cid:1) . (6.34)Let δ > J v bethe indicator of the event that v is green and |T n,v | > δn . (The idea is thatthe significant contributions only come from nodes v with J v = 1.) Lemma 6.4.
For each fixed m > and δ > , and all n > , P (cid:16) X v ∈ V ′ m J v > (cid:17) m +1 δ − n − = O (cid:0) n − (cid:1) , (6.35) P (cid:16) X v ∈ V ′ m J v > (cid:17) m +1 δ − n − = O (cid:0) n − (cid:1) . (6.36) Proof.
We use again the σ -fields F j from Section 5. Since F j − specifies |T n,v j | , but not how this subtree is split at v j , we have P ( J v j = 1 | F j − ) |T n,v j | {|T n,v j | > δn } δn , (6.37)and thus, by taking the expectation, P ( J v j = 1) / ( δn ). Since there are < m nodes in V ′ m , (6.35) follows.Furthermore, for any two nodes v i and v j with i < j , J v i is determinedby F j − , and (6.37) thus gives also P ( J v i J v j = 1 | F j − ) = E ( J v i J v j | F j − ) = J v i P ( J v j = 1 | F j − ) δn J v i . (6.38) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 19
Thus, by taking the expectation and using (6.37) again, P ( J v i J v j = 1) / ( δn ) . Summing over the less than (cid:0) m (cid:1) < m − pairs ( v i , v j ) with v i , v j ∈ V ′ m yields (6.36). (cid:3) Proof of Theorem 1.5.
We show this in several steps.
Step 1.
Define Y := X v ∈ V ′ m J v f ( T n,v ) . (6.39)Since f ( T n,v ) = 0 unless v is green, we have Y − Y = X v ∈ V ′ m (1 − J v ) f ( T n,v ) = X v ∈ V ′ m f ( T n,v ) {|T n,v | < δn } . (6.40)For each v , it follows from (6.6) by conditioning on |T n,v | that E (cid:12)(cid:12) f ( T n,v ) {|T n,v | < δn } (cid:12)(cid:12) p δn ) p − . (6.41)Hence, (6.40) and Minkowski’s inequality yield (cid:12)(cid:12) k Y k p − k Y k p (cid:12)(cid:12) k Y − Y k p X v ∈ V ′ m k f ( T n,v ) {|T n,v | < δn }k p m +1 /p ( δn ) − /p . (6.42)Thus k Y k p = O ( n − /p ) + O (2 m δ − /p n − /p ), and (6.33) yields E | Y | p − E | Y | p = O (cid:0) (2 m δ − /p + 2 mp δ p − ) n p − (cid:1) . (6.43) Step 2.
Similarly, using (6.41) again, E (cid:16) X v ∈ V ′ m | f ( T n,v ) | p − X v ∈ V ′ m J v | f ( T n,v ) | p (cid:17) = X v ∈ V ′ m E (cid:0) | f ( T n,v ) | p {|T n,v | < δn } (cid:1) m +1 ( δn ) p − . (6.44) Step 3.
By (6.39), | Y | p − P v ∈ V ′ m | J v f ( T n,v ) | p = 0 unless P v ∈ V ′ m J v >
2, andin the latter case we have by (3.3) the trivial bounds | Y | p (2 m n ) p and P v ∈ V ′ m | J v f ( T n,v ) | p m n p , and thus (cid:12)(cid:12) | Y | p − P v ∈ V ′ m | J v f ( T n,v ) | p (cid:12)(cid:12) mp n p .Consequently, by (6.36), E (cid:12)(cid:12)(cid:12) | Y | p − X v ∈ V ′ m | J v f ( T n,v ) | p (cid:12)(cid:12)(cid:12) mp n p P (cid:16) X v ∈ V ′ m J v > (cid:17) = O ( n p − ) . (6.45)Thus, for fixed m > δ > E | Y | p − X v ∈ V ′ m E | J v f ( T n,v ) | p = O (cid:0) n p − (cid:1) = o (cid:0) n p − (cid:1) . (6.46) Step 4.
Define F ( p ) ( T ) := P v ∈ T | f ( T v ) | p . Then, in analogy with (6.3), F ( p ) ( T ) = X v ∈ V ′ m | f ( T v ) | p + X v ∈ ∂V ′ m F ( p ) ( T v ) . (6.47) Note that Lemma 6.3 implies E F ( p ) ( T n ) = O ( n p − ). Hence, by first condi-tioning on F ′ m , and using (6.19), E X v ∈ ∂V ′ m F ( p ) ( T n,v ) C E X v ∈ ∂V ′ m |T n,v | p − = C (2 /p ) m n p − . (6.48)Taking T = T n in (6.47) and taking the expectation, we thus find E X v ∈T n | f ( T n,v ) | p − E X v ∈ V ′ m | f ( T n,v ) | p = O (cid:0) (2 /p ) m n p − (cid:1) . (6.49) Step 5.
Finally, combining (6.34), (6.43), (6.46), (6.44), (6.49) and (6.28),we obtain E | F ( T n ) − ν n | p = 2 pp − α p n p − + O (cid:0) (2 /p ) m/p n p − (cid:1) + O (cid:0) m δ − /p n p − (cid:1) + O (cid:0) mp δ p − n p − (cid:1) + o ( n p − ) . (6.50)For any ε >
0, we can make each of the error terms on the right-hand sideless than εn p − by first choosing m large and then δ small, and finally n large. Consequently, E | F ( T n ) − ν n | p = pp − α p n p − + o ( n p − ). (cid:3) Proof of (1.4) . Now p = k is an integer. If k is even, then (1.4) is the sameas (1.8), so we may assume that p = k > x, y . Thus for any random variables X and Y , using also H¨older’s inequality, E | X p − Y p | p E (cid:0) | X − Y | | X | p − + | X − Y | | Y | p − (cid:1) p k X − Y k p (cid:0) k X k p − p + k Y k p − p (cid:1) . (6.51)It is now easy to modify the proof of Theorem 1.5 and obtain E (cid:0) F ( T n ) − ν n (cid:1) p = E X v ∈T n f ( T n,v ) p + o (cid:0) n p − (cid:1) . (6.52)Furthermore, it follows from (2.2) that f ( T ) | T | = 1. Hence, X v ∈T n f ( T n,v ) p = − X v ∈T n | f ( T n,v ) | p + O ( n ) . (6.53)The estimate (1.4) now follows from (6.52), (6.53) and (6.28). (cid:3) Proof of Lemma 3.2
Define a chain of length k in a (binary) tree T to be a sequence of k nodes v · · · v k such that v i +1 is a (strict) descendant of v i for each i = 1 , . . . , k − v , . . . , v k are some nodes (in order) on some path from theroot. We say that the chain v · · · v k is green if all nodes v , . . . , v k are green.(The nodes between the v i ’s may have any colour.) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 21
For a binary tree T and k >
1, let F k ( T ) be the number of green chains v · · · v k in T , and let f k ( T ) be the number of such chains where v is theroot. Obviously, cf. (2.4), F k ( T ) = X v ∈ T f k ( T v ) . (7.1)These functionals are useful to us because of the following simple relations,that are cases of inclusion-exclusion. Lemma 7.1.
For any binary tree T , f ( T ) = ∞ X k =1 ( − k − f k ( T ) , (7.2) F ( T ) = ∞ X k =1 ( − k − F k ( T ) . (7.3) Proof.
Let v be a node in T and consider the contribution to the sum in(7.3) of all chains with final node v k = v . This is clearly 0 if 1 if v is notgreen, and it is 1 if v is a maximal green node; furthermore, if v is greenbut has j > P ji =0 (cid:0) ji (cid:1) ( − i = (1 − j = 0. Hence the right-hand side of (7.3) is thenumber of maximal green nodes, i.e., F ( T ).For (7.2) we can argue similarly: Both sides are 0 unless the root o isgreen. If it is, the chain o gives contribution 1, and by inclusion-exclusion,the chains with a given final node v = o yield together a ycontribution − v is green and there are no green nodes between v and o , and 0otherwise. Hence the sum equals f ( T ) by (2.2). (Alternatively, (7.2) followsby induction from (7.3), (2.4) and (7.1).) (cid:3) Lemma 7.2.
For every k > , E f k ( T ) = k ( k + 3)( k + 1)( k + 2) · k − k ! = 2 k − k ! − k ( k + 2)! . (7.4) Proof.
We use the construction of T = ˜ T τ in Section 2, which we formulateas follows. Consider again the infinite binary tree T ∞ , and grow ˜ T t as asubtree of T ∞ , cf. Section 5. To do this, we equip each node v in T ∞ withtwo clocks C L ( v ) and C R ( v ). These are started when v is added to thegrowing tree ˜ T t , and each chimes after a random time with an exponentialdistribution with mean 1; when the clock chimes we add a left or right child,respectively, to v . There is also a doomsday clock C , started at 0 and withthe same Exp(1) distribution; when it chimes (at time τ ), the process isstopped and the tree ˜ T τ is output. All clocks are independent of each other.Fix a chain v · · · v k in the infinite tree T ∞ , with v = o , the root. Let ℓ i > v i and v i +1 . We compute the probabilitythat v · · · v k is a green chain in T = ˜ T τ by following the construction of˜ T t as time progresses, checking in several steps whether still v · · · v k is a candidate for a green chain, and computing the probability of this. (We usethroughout the proof the Markov property and the memoryless property ofthe exponential distribution.) We assume for notational convenience thatthe path from v to v k always uses the left child of each node. (By symmetry,this does not affect the result.)1. If k >
1, we first need that v = o has a left child but no right child(in order to be green); in particular, of the three clocks C L ( v ), C R ( v ), C that run from the beginning, C L ( v ) has to chime first. This has probability1 / v gets a left child w . If ℓ >
0, we needa left child of w , and still no right child at v . (But we do not care whetherwe get a right child at w or not.) Hence we need that C L ( w ) chimes firstamong the three clocks C L ( w ), C R ( v ), C (ignoring all other clocks). Thishas probability 1 / ℓ nodes; thus, the total probability that steps 1 and2 succeed is 3 − ( ℓ +1) .3. This takes us to v . If k >
2, we need a left child but no right child at v , and still no right child at v . Hence, the next chime from the four clocks C L ( v ), C R ( v ), C R ( v ), C has to come from C L ( v ). This has probability1 / ℓ nodes between v and v ; again the prob-ability of success at each of these nodes is 1 /
4. Hence the probability thatSteps 3 and 4 succeed is 4 − ( ℓ +1) .5. Steps 3 and 4 are repeated for v i for each i < k , yielding a probability( i + 2) − ( ℓ i +1) of success for each i .6. Finally, we have obtained v k , and wait for the doomsday clock. Untilit chimes, we must not get any right child at v , . . . , v k − , and we must getat most one child at v k . Hence, among the k + 2 clocks C R ( v ) , . . . , C R ( v k ), C L ( v k ), C , the next chime must be either from C (probability 1 / ( k + 2)),or from C L ( v k ) or C R ( v k ), followed by C (probability k +2 · k +1 ). Theprobability of success in this step is thus1 k + 2 + 2 k + 2 · k + 1 = k + 3( k + 1)( k + 2) . (7.5)Combining the six steps above, we see that the probability that v · · · v k is a green chain in ˜ T τ is k + 3( k + 1)( k + 2) k − Y i =1 (cid:16) i + 2 (cid:17) ℓ i +1 . (7.6) AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 23
Given ℓ , . . . , ℓ k − , there are Q k − i =1 ℓ i +1 choices of the chain v · · · v k , allwith the same probability, so summing over all ℓ , . . . , ℓ k − >
0, we obtain E f k ( T ) = k + 3( k + 1)( k + 2) k − Y i =1 ∞ X ℓ i =0 (cid:16) i + 2 (cid:17) ℓ i +1 = k + 3( k + 1)( k + 2) k − Y i =1 i = k + 3( k + 1)( k + 2) · k − ( k − k ( k + 3)( k + 1)( k + 2) · k − k ! . (cid:3) Proof of Lemma 3.2.
By Lemmas 7.1 and 7.2, and a simple calculation, E f ( T ) = ∞ X k =1 ( − k − E f k ( T ) = ∞ X k =1 (cid:18) ( − k − k ! + ( − k ( k + 2)! (cid:19) = 1 − e − , noting that we may take the expectation inside the sum since it also followsfrom Lemma 7.2 that P ∞ k =1 E | f k ( T ) | = P ∞ k =1 E f k ( T ) < ∞ . (cid:3) Recall that this, together with Lemma 3.1, completes our probabilisticproof of Theorem 1.1.
Remark 7.3.
If we in the proof above change the doomsday clock and letit have an arbitrary rate λ >
0, and denote the resulting random binary treeby T ( λ ) , then the same argument yields E f k ( T ( λ ) ) = k + λ + 2( k + λ )( k + λ + 1) k − Y i =1 ∞ X ℓ i =0 (cid:16) i + λ + 1 (cid:17) ℓ i +1 = k + λ + 2( k + λ )( k + λ + 1) k − Y i =1 i + λ −
1= ( k + λ − k + λ + 2)( k + λ )( k + λ + 1) 2 k − λ k = 2 k − λ k − k λ k +2 . (7.7)Thus by Lemma 7.1, letting F denote the confluent hypergeometric func-tion, see e.g. [18, §§ E f ( T ( λ ) ) = ∞ X k =1 ( − k − E f k ( T ( λ ) ) = ∞ X k =1 (cid:18) ( − k − λ k + ( − k λ k +2 (cid:19) = − (cid:0) F (1; λ ; − − (cid:1) + 14 (cid:16) F (1; λ ; − − (cid:16) − λ + 2 · λ ( λ + 1) (cid:17)(cid:17) = 14 + λ − λ ( λ + 1) − F (1; λ ; − . (7.8)Furthermore, if λ > E F ( T ( λ ) ) by the same method; theonly difference is that we also allow a path of length ℓ > v , which gives an additional factor (1 + λ ) − ℓ for each v · · · v k , leading to E F k ( T ( λ ) ) = ∞ X ℓ =0 (cid:18) λ + 1 (cid:19) ℓ E f k ( T ( λ ) ) = λ + 1 λ − E f k ( T ( λ ) ) , (7.9)and hence, using both parts of Lemma 7.1, E F ( T ( λ ) ) = ∞ X k =1 ( − k − E F k ( T ( λ ) ) = λ + 1 λ − E f ( T ( λ ) ) . (7.10)Moreover, a simple argument shows that, for any n > P ( |T ( λ ) | = n ) = n Y i =2 ii + λ · λn + 1 + λ = λn !(2 + λ ) n , (7.11)and conditioned on |T ( λ ) | = n , T ( λ ) has the same distribution as T n , i.e.,( T ( λ ) | |T ( λ ) | = n ) d = T n . Hence, E F ( T ( λ ) ) = ∞ X n =1 λn !(2 + λ ) n ν n , (7.12)which can be interpreted as an unusual type of generating function for thesequence ( ν n ); note that (7.10) and (7.8) yield an explicit expression for it. References [1] David Aldous, Asymptotic fringe distributions for general families ofrandom trees.
Ann. Appl. Probab. (1991), no. 2, 228–266.[2] David Aldous, Probability distributions on cladograms. Random Dis-crete Structures (Minneapolis, MN, 1993) , 1–18, IMA Vol. Math. Appl.,76, Springer, New York, 1996.[3] Michael G. B. Blum and Olivier Fran¸cois, Minimal clade size and exter-nal branch length under the neutral coalescent.
Adv. in Appl. Probab. (2005), no. 3, 647–662.[4] Michael G. B. Blum, Olivier Fran¸cois and Svante Janson, The mean,variance and limiting distribution of two statistics sensitive to phyloge-netic tree balance. Ann. Appl. Probab. (2006), no. 4, 2195–2214.[5] B. M. Brown and G. K. Eagleson, Martingale convergence to infinitelydivisible laws with finite variances. Trans. Amer. Math. Soc. (1971),449–453.[6] Huilan Chang and Michael Fuchs, Limit theorems for patterns in phy-logenetic trees.
J. Math. Biol. (2010), no. 4, 481–512.[7] Luc Devroye, Limit laws for local counters in random binary searchtrees. Random Structures Algorithms (1991), no. 3, 303–315.[8] Luc Devroye, Limit laws for sums of functions of subtrees of randombinary search trees. SIAM J. Comput. (2002/03), no. 1, 152–171.[9] Michael Drmota, Random Trees . Springer, Vienna, 2009.
AXIMAL CLADES IN RANDOM BINARY SEARCH TREES 25 [10] Michael Drmota, Michael Fuchs and Yi-Wen Lee, Limit laws for thenumber of groups formed by social animals under the extra clusteringmodel. (Extended abstract.)
Proceedings, 2014 Conference on Analysisof Algorithms, AofA ’14 (Paris, 2014) , DMTCS Proceedings , 2014.[11] Eric Durand, Michael G. B. Blum and Olivier Fran¸cois, Prediction ofgroup patterns in social mammals based on a coalescent model.
J. The-oret. Biol. (2007), no. 2, 262–270.[12] Eric Durand and Olivier Fran¸cois, Probabilistic analysis of a genealog-ical model of animal group patterns.
J. Math. Biol. (2010), no. 3,451–468.[13] Philippe Flajolet, Xavier Gourdon and Conrado Mart´ınez, Patterns inrandom binary search trees. Random Structures Algorithms (1997),no. 3, 223–244.[14] Allan Gut, Probability: A Graduate Course , 2nd ed., Springer, NewYork, 2013.[15] P. Hall and C. C. Heyde,
Martingale Limit Theory and its Application .Academic Press, New York, 1980.[16] Cecilia Holmgren and Svante Janson, Limit laws for functions of fringetrees for binary search trees and recursive trees. Preprint, 2014. arXiv:1406.6883v1 [17] J. F. C. Kingman, The coalescent.
Stochastic Process. Appl. (1982),no. 3, 235–248.[18] NIST Handbook of Mathematical Functions . Edited by Frank W. J.Olver, Daniel W. Lozier, Ronald F. Boisvert and Charles W. Clark.Cambridge Univ. Press, 2010.Also available as
NIST Digital Library of Mathematical Functions , http://dlmf.nist.gov/ Department of Mathematics, Uppsala University, PO Box 480, SE-751 06Uppsala, Sweden
E-mail address : [email protected] URL : ∼∼